The present invention generally relates to design optimization and, more specifically, to training generative models for optimizing designs.
Metasurfaces are subwavelength-structured artificial media that can shape and localize electromagnetic waves in unique ways. Photonic technologies serve to manipulate, guide, and filter electromagnetic waves propagating in free space and in waveguides. Due to the strong dependence between geometry and function, much emphasis in the field has been placed in identifying geometric designs for these devices given a desired optical response. The vast majority of these design concepts utilize relatively simple shapes that can be described using physical intuition.
As examples, silicon photonic devices typically utilize adiabatic tapers and ring resonators to route and filter guided waves, and metasurfaces, which are diffractive optical components used for wavefront engineering, typically utilize arrays of nanowaveguides or nanoresonators comprising simple shapes. While these design concepts work well for certain applications, they possess limitations, such as narrow bandwidths and sensitivity to temperature, which prevents the further advancement of these technologies.
Systems and methods for generating designs in accordance with embodiments of the invention are illustrated. One embodiment includes a method for training a generator to generate designs. The method includes steps for generating a plurality of candidate designs using a generator, evaluating a performance of each candidate design of the plurality of candidate designs, computing a global loss for the plurality of candidate designs based on the evaluated performances, and updating the generator based on the computed global loss.
In a further embodiment, the method further includes steps for receiving an input element of features representing the plurality of candidate designs, wherein the input element includes a random noise vector.
In still another embodiment, the input element further includes a set of one or more target parameters, and the set of target parameters includes at least one of a wavelength, a deflection angle, device thickness, device dielectric, polarization, phase response, and incidence angle.
In a still further embodiment, evaluating the performance includes performing a simulation of each candidate design.
In yet another embodiment, the simulation is performed using a physics-based engine.
In a yet further embodiment, computing the global loss includes weighting a gradient for each candidate design based on a value of a performance metric for the candidate design.
In another additional embodiment, the performance metric is efficiency.
In a further additional embodiment, computing the global loss comprises computing forward electromagnetic simulations of the plurality of candidate designs, computing adjoint electromagnetic simulations of the plurality of candidate designs, and computing an efficiency gradient with respect to refractive indices for each candidate design by integrating the overlap of the forward electromagnetic simulations and the adjoint electromagnetic simulations.
In another embodiment again, the global loss includes a regularization term to ensure binarization of the generated patterns.
In a further embodiment again, the generator includes a set of one or more differentiable filter layers.
In still yet another embodiment, the filter layers includes at least one of a Gaussian filter layer and a set of one or more binarization layers to ensure binarization of the generated patterns.
In a still yet further embodiment, the method further includes steps for receiving a second input element that represents a second plurality of candidate designs, generating the second plurality of candidate designs using the generator, wherein the generator is trained to generate high-efficiency designs, evaluating each candidate design of the second plurality of candidate designs based on simulated performance of each of the second plurality of candidate designs, and selecting a set of one or more highest-performing candidate designs from the second plurality of candidate designs based on the evaluation.
In still another additional embodiment, each design of the plurality of candidate designs is a metasurface.
One embodiment includes a non-transitory machine readable medium containing processor instructions for training a generator to generate designs, where execution of the instructions by a processor causes the processor to perform a process that generates a plurality of candidate designs using a generator, evaluates a performance of each candidate design of the plurality of candidate designs, computes a global loss for the plurality of candidate designs based on the evaluated performances, and updates the generator based on the computed global loss.
One embodiment includes a system comprising a processor and a memory, wherein the memory comprises a training application for training a generator to generate designs, where execution of the instructions by a processor causes the processor to perform a process that generates a plurality of candidate designs using a generator, evaluates a performance of each candidate design of the plurality of candidate designs, computes a global loss for the plurality of candidate designs based on the evaluated performances, and updates the generator based on the computed global loss.
Additional embodiments and features are set forth in part in the description that follows, and in part will become apparent to those skilled in the art upon examination of the specification or may be learned by the practice of the invention. A further understanding of the nature and advantages of the present invention may be realized by reference to the remaining portions of the specification and the drawings, which forms a part of this disclosure.
The description and claims will be more fully understood with reference to the following figures and data graphs, which are presented as exemplary embodiments of the invention and should not be construed as a complete recitation of the scope of the invention.
Turning now to the drawings, systems and methods for training and utilizing generative networks for global optimization (or optimization networks or generative design networks) in a design process. In a variety of embodiments, global optimization networks can be trained to generate multiple designs and to use a simulator (e.g., a physics based simulation) to determine various parameters of the designs. Parameters in accordance with various embodiments of the invention can be used for various purposes including (but not limited to) identifying high-performing designs, identifying adjoint gradients, computing a loss function, etc. In numerous embodiments, rather than training data, a simulator is used to perform forward and adjoint electromagnetic simulations to train a generative design network. In several embodiments, generative design networks can be trained using training data, as well as augmented training data that is generated using a generative design network. Augmented training data can include (but is not limited to) designs that are generated for parameter values outside the range of the values found in the training data, as well as high-performing designs discovered during the generation process. Generative design networks in accordance with numerous embodiments of the invention can be conditional networks, allowing for the specification of specific target parameters for the designs to be generated.
In numerous embodiments, processes can be used for generating and discovering metasurface designs in accordance with various embodiments of the invention are illustrated. Metasurfaces are subwavelength-structured artificial media that can shape and localize electromagnetic waves in unique ways. Metasurfaces are foundational devices for wavefront engineering because they have electromagnetic properties that are tailored by subwavelength-scale structuring. Metasurfaces can focus and steer an incident wave and manipulate its polarization in nearly arbitrary ways, surpassing the limits set by conventional optics. They can also shape and filter spectral features, which has practical applications in sensing. These technologies are useful in imaging, sensing, and optical information processing applications, amongst others, and can operate at wavelengths spanning the ultraviolet to radio frequencies. Metasurfaces have been implemented in frameworks as diverse as holography and transformation optics, and they can be used to perform mathematical operations with light.
Conventional metasurfaces utilize discrete phased array elements, such as plasmonic antennas, nanowaveguides, and Mie-resonant nanostructures. These devices produce high efficiency responses when designed for modest deflection angles and single functions. However, when designed for more advanced capabilities, they suffer from reduced efficiencies due to their restrictive design space. Iterative topology optimization, including adjoint-based and objective-first methods, is an alternative design approach that can extend the capabilities of metasurfaces beyond those utilizing phased array elements. Devices based on this concept have non-intuitive, freeform layouts, and they can support high efficiency, large angle, multi-wavelength operation. However, the design process is computationally intensive and requires many simulations per device, preventing its scaling to large aperiodic regions.
In many embodiments, generative networks for global optimization use conditional generative adversarial networks (GANs), which can serve as an effective and computationally efficient tool to produce high performance designs with advanced functionalities. As a model system, conditional GANs in accordance with a number of embodiments of the invention can generate silicon metagratings with various target characteristics (such as, but not limited to, deflection angle, wavelength, etc.), which are periodic metasurfaces designed to deflect electromagnetic waves to a desired diffraction order.
In several embodiments, conditional GANs can be trained on high-resolution images of high efficiency, topology-optimized devices. After training, conditional GANs in accordance with several embodiments of the invention can generate high performance metagratings operating across a broad range of wavelengths and angles. While many of the examples described herein focus on the variation of two device parameters (i.e., wavelength and deflection angle), one can imagine generalizing the generative design approach to other device parameters as well as different combinations of such parameters. Device parameters in accordance with several embodiments of the invention can include (but are not limited to) device thickness, device dielectric, polarization, phase response, and incidence angle. Compared to devices designed using only iterative optimization, processes in accordance with various embodiments of the invention can produce and refine devices at over an order of magnitude faster time scale. Optimization networks trained in accordance with a number of embodiments of the invention are capable of learning features in topologically complex metasurfaces, and can produce high performance, large area devices with tractable computational resources.
Many approaches based on feedforward neural networks attempt to explicitly learn the relationship between device geometry and electromagnetic response. In prior studies, neural networks were applied to the inverse design of relatively simple shapes, described by a small number of geometric parameters. These studies successfully demonstrated the potential of deep neural networks for electromagnetics design. However, they required tens of thousands of training data points and are limited to the design of simple geometries, making the scaling of these concepts to complicated shapes extremely data intensive.
With conditional GANs, which are deep generative models, processes in accordance with certain embodiments of the invention directly sample the space of high efficiency designs without the need to accurately predict the performance of every device along an optimization trajectory. This focused sampling focuses learning on important topological features harvested from high-performance metasurfaces, rather than attempting to predict the behavior of every possible device, most of which are very far from optimal. In this manner, optimization networks can produce high efficiency, topologically-intricate metasurfaces with substantially less training data.
Various methods based on local optimization have been proposed. Among the most successful of these concepts is the adjoint variables method, which uses gradient descent to iteratively adjust the dielectric composition of the devices and improve device functionality. This design method has enabled the realization of high performance, robust devices with nonintuitive layouts, including new classes of on-chip photonic devices with ultrasmall footprints, non-linear photonic switches, and diffractive optical components that can deflect and focus electromagnetic waves with high efficiencies. While adjoint optimization has great potential, it is a local optimizer and depends strongly on the initial distribution of dielectric material in the devices. As such, identifying a high performance device typically requires the optimization of many devices with random initial dielectric distributions and selecting the best device. This approach is very computationally expensive, preventing the scaling of these concepts to large, multi-functional devices.
Systems and methods in accordance with many embodiments of the invention present a new type of global optimization, based on the training of a generative neural network without a training set, which can produce high-performance metasurfaces. Instead of directly optimizing devices one at a time, optimization can be reframed as the training of a generator that iteratively enhances the probability of generating high-performance devices. In many embodiments, loss functions used for backpropagation can be defined as a function of generated patterns and their performance gradients. Performance gradients in accordance with numerous embodiments of the invention can include efficiency gradients which can be calculated by the adjoint variable method using physics based simulations, such as (but not limited to) forward and adjoint electromagnetic simulations. Performance gradients in accordance with a variety of embodiments of the invention can include (but are not limited to) heat conductivity in a thermal conductor used as a heat sink, generated power in a thermoelectric, speed and power in an integrated circuit, and/or power generated in a solar collection device.
Distributions of devices generated by the network continuously shift towards high-performance design space regions over the course of optimization. Upon training completion, the best-generated devices have efficiencies comparable to or exceeding the best devices designed using standard topology optimization. Similar processes in accordance with several embodiments of the invention can generally be applied to gradient-based optimization problems in various fields, such as (but not limited to) optics, mechanics and electronics. Systems and methods in accordance with a number of embodiments of the invention train and employ generative neural networks to produce high performance, topologically complex metasurfaces in a computationally efficient manner.
Systems and methods in accordance with many embodiments of the invention provide a global optimizer, based on a generative neural network, which can output highly efficient topology-optimized metasurfaces operating across a range of parameters. A key feature of the network in accordance with a number of embodiments of the invention is the presence of a noise vector at the network input, which enables the full design parameter space to be sampled by many device instances at once. In a variety of embodiments, training can be performed by calculating the forward and adjoint electromagnetic simulations of outputted devices and using the subsequent efficiency gradients for back propagation. With metagratings operating across a range of wavelengths and angles as a model system, devices produced from trained global optimization networks in accordance with certain embodiments of the invention can have efficiencies comparable to the best devices produced by brute force topology optimization. Reframing of adjoint-based optimization to the training of a generative neural network can apply generally to physical systems that support performance improvements by gradient descent.
Systems and methods in accordance with numerous embodiments of the invention introduce a new concept in electromagnetic device design by incorporating adjoint variable calculations directly into generative neural networks. Systems in accordance with some embodiments of the invention are capable of generating high performance topology-optimized devices spanning a range of operating parameters with modest computational cost. In numerous embodiments, a global search can be performed in the design space by sampling many device instances, which cumulatively span the design space, and optimize the responses of the device instances using physics-based calculations over the course of network training. As a model system, an ensemble of silicon metagratings can be designed that operate across a range of wavelengths and deflection angles. Although many of the examples described herein are specified for silicon metagratings, one skilled in the art will recognize that similar systems and methods can be used in a variety of applications, including (but not limited to) aperiodic, broadband devices, without departing from this invention.
A system that can be used for generative design in accordance with some embodiments of the invention is shown in
Users may use personal devices 180 and 120 that connect to the network 160 to perform processes for receiving, performing and/or interacting with a deep learning network that uses systems and methods for training models and/or generating designs in accordance with various embodiments of the invention. In the illustrated embodiment, the personal devices 180 are shown as desktop computers that are connected via a conventional “wired” connection to the network 160. However, the personal device 180 may be a desktop computer, a laptop computer, a smart television, an entertainment gaming console, or any other device that connects to the network 160 via a “wired” and/or “wireless” connection. The mobile device 120 connects to network 160 using a wireless connection. A wireless connection is a connection that uses Radio Frequency (RF) signals, Infrared signals, or any other form of wireless signaling to connect to the network 160. In
Although a specific example of a system for generative design is illustrated in
An example of a generative design element for training and/or utilizing a generative model in accordance with a number of embodiments is illustrated in
Peripherals 225 can include any of a variety of components for capturing data, such as (but not limited to) cameras, displays, and/or sensors. In a variety of embodiments, peripherals can be used to gather inputs and/or provide outputs. Peripherals and/or communications interfaces in accordance with many embodiments of the invention can be used to gather inputs that can be used to train and/or design various generative elements.
In several embodiments, memory 230 is any form of storage configured to store a variety of data, including, but not limited to, a generative design application 232, training data 234, and model data 236. Generative design application 232 in accordance with some embodiments of the invention directs the processor 210 to perform any of a variety of processes, such as (but not limited to) using data from training data 234 to update model parameters 236 in order to train and utilize generative models to generate outputs. In a variety of embodiments, generative design applications can perform any of a number of different functions including (but not limited to) simulating performance of generated outputs, computing global losses, global optimization, and/or retraining generative models based on generated, simulated, and/or optimized outputs. In some embodiments, training data can include “true” data which a conditional generator is being trained to imitate. For example, true data can include examples of highly efficient metagrating designs, which can be used to train a generator to generate new samples of metagrating designs. Alternatively, or conjunctively, generative design models in accordance with many embodiments of the invention can be trained using no training data at all. Model data or parameters can include data for generator models, discriminator models, and/or other models that can be used in the generative design process.
Although a specific example of a generative design element is illustrated in
A generative design application for training a model for generative design in accordance with an embodiment of the invention is conceptually illustrated in
Although a specific example of a training and generation application is illustrated in
A process for training a model to generate candidate device designs in accordance with an embodiment of the invention is conceptually illustrated in
In many GAN implementations, such as the image generation of faces or furniture, there are no quantitative labels that can be used to evaluate the quality of training or generated datasets. Training in accordance with several embodiments of the invention can quantify the quality of the training and generated data by evaluating device efficiency using an electromagnetics solver (e.g., Reticolo RCWA electromagnetic solver in MATLAB). In numerous embodiments, for a fraction of network training iterations, network loss is directly calculated as a function of the efficiencies of the training and generated devices, as evaluated with an electromagnetics solver. This value can then back-propagated to improve the networks in a manner that matches with the desired physical output in accordance with some embodiments of the invention. The discriminator can directly learn to differentiate between low and high efficiency devices, while the generator can directly learn to generate high efficiency devices.
Process 400 uses the trained model to generate (410) candidate device designs. Trained models in accordance with certain embodiments of the invention are conditional GANs that can be trained to produce outputs with a specified set of parameters. In a number of embodiments, candidate devices include devices with extrapolated parameters that are specified to lie outside of the range of parameters used to train the model. Process 400 filters (415) the candidate device designs to identify the best candidate device designs. The best candidate device designs in accordance with a variety of embodiments of the invention are determined based on a simulation of the performance or efficiency of a candidate design. In a variety of embodiments, filtering the candidate devices can include (but is not limited to) one or more of selecting a certain number of candidate devices, selecting a percentage of the candidate devices, and sampling a diverse sample of candidate devices from a range of simulated performance.
Process 400 optimizes (420) the filtered candidate device designs. Optimizing the filtered candidate device designs in accordance with numerous embodiments of the invention can improve the device efficiencies, incorporate robustness to fabrication imperfections, and enforce other experimental constraints within the candidate device designs. Process 400 determines (425) whether to retrain the generator model. In a number of embodiments, processes can determine to retrain a generator model based on any of a number of criteria, including (but not limited to) after a predetermined period of time, after a number of candidate designs are optimized, after a threshold simulated performance is reached, etc. When process 400 determines (425) to retrain the generator model, the process returns to step 405 to retrain the model using the generated optimized designs. In certain embodiments, retraining models based on generated, optimized, and extrapolated designs can allow a retrained model to generate better candidates for further extrapolated designs. Retraining in accordance with various embodiments of the invention uses the generated designs as new ground truth images to retrain the generator based on new features identified in the generated designs. When process 400 determines (425) that the model is not to be retrained, the process outputs (430) the generated optimized designs. In certain embodiments, processes can use generated optimized designs both for retraining and as output of the generation process. The generated optimized designs in accordance with various embodiments of the invention can be used to fabricate and characterize high efficiency beam deflectors and lenses.
Although various processes for training models and generating device designs are discussed above with reference to
In some embodiments, processes in accordance with a variety of embodiments of the invention produce a high-quality training dataset consisting of high-resolution images of topology-optimized metagratings. Example silicon metagratings are illustrated in
Representative images from a training dataset in accordance with many embodiments of the invention are shown in
Conditional GANs in accordance with a variety of embodiments of the invention consist of two separate networks, a generator and a discriminator. An example of a conditional GAN in accordance with an embodiment of the invention is illustrated in
The network structure of the elements of a specific example of a conditional GAN in accordance with a variety of embodiments of the invention is described in the tables below.
The input to the generator is a 128×1 vector of Gaussian random variables, the operating wavelength λ, and the output deflection angle θ. In a variety of embodiments, these input values can be normalized to numbers between −1 and 1. In a number of embodiments, the output of the generator, as well as the input to the discriminator, can include binary images on a 64×256 grid, which is half of one unit cell. Mirror symmetry along the y-axis can enforced by using reflecting padding in the convolution and deconvolution layers in accordance with many embodiments of the invention. In many embodiments, periodic padding can be used to capture the periodic nature of the metagratings. In some embodiments, the training dataset can be augmented by including multiple copies of the same devices in the training dataset, with each copy randomly translated along the x-axis.
Generators in accordance with a number of embodiments of the invention can be trained to produce images of new devices. Inputs for generators can include (but are not limited to) one or more of the metagrating deflection angle θ, operating wavelength λ, and/or an array of normally-distributed random numbers, which can provide diversity to the generated device layouts. In a number of embodiments, discriminators can be trained to distinguish between actual devices from the training dataset and those from the generator.
The training process can be described as a two-player game in which the generator tries to fool the discriminator by generating real-looking devices, while the discriminator tries to identify and reject generated devices from a pool of generated and real devices. In this manner, the discriminator serves as a simulator that evaluates the performance of the generator and learns based on this information. In numerous embodiments, a generator and a discriminator are alternately trained over many iterations, and each network improves after each iteration. Upon completion, generators in accordance with numerous embodiments of the invention will have learned the underlying topological features from optimized metagratings, and will be able to produce new, topologically complex devices for a desired deflection angle and wavelength input. The diversity of devices produced by the generator reflect the use of a random noise input in a probabilistic model.
A specific example of an implementation of a conditional generative network, with specific hyperparameters and other details, in accordance with a number of embodiments of the invention is described below. However, one skilled in the art will recognize that many different hyperparameters and models can be used without departing from the essence of the invention.
In this example, during the training process, both the generator and discriminator use an optimizer (e.g., the Adam optimizer, gradient descent, etc.) with a batch size of 128, learning rate of 0.001, beta1 of 0, and beta2 of 0.99. The improved Wasserstein loss is used with a gradient penalty, with lambda=10. (31,32). In this example, the network was trained on one Tesla K80 GPU for 1000 iterations, which takes about 5 minutes.
Generators in accordance with many embodiments of the invention can be trained to produce different layouts of devices operating at a given degree deflection angle (e.g., 70 degrees) and a given wavelength (e.g., 1200 nm). At 1200 nm, the operating wavelength is red-shifted beyond those of all devices used for training. Device generation, even for thousands of devices, is computationally efficient and takes only a few seconds using a standard computer processing unit. In several embodiments, device efficiencies can be calculated using a rigorous coupled-wave analysis solver (e.g., full-wave Maxwell solvers).
Device efficiency distributions for devices generated in accordance with an embodiment of the invention are illustrated in
In various embodiments, high efficiency devices produced by the generative design system can be further refined with iterative topology optimization. This additional refinement serves multiple purposes. First, it can further improve the device efficiencies. Second, it can incorporate robustness to fabrication imperfections into the metagrating designs, which makes experimentally fabricated devices more tolerant to processing defects. Third, it can enforce other experimental constraints, such as grid snapping or minimum feature size. In several embodiments, relatively few iterations (e.g., ˜30 iterations) of topology optimization can be used at this stage, because the devices from the generative design system are already highly efficient and near a local optimum in the design space.
In this example, the performance of devices produced by the generative design system is quantified by simulating the diffraction efficiencies of the generated devices with the RCWA solver Reticolo. A test dataset consisting of 935,000 generated devices was used. The wavelengths of these devices range from 500 nm to 1300 nm with a step size of 50 nm, and the target deflection angles range from 35 degrees to 85 degrees with a step size of 5 degrees. There are 5000 device instances of each wavelength and deflection angle combination. The simulations were run in parallel on the Stanford computing cluster Sherlock, and the computation time was 15 seconds per device. The 50 most efficient devices for each wavelength and deflection angle combination (indicated by the dashed box) were then iteratively refined with adjoint-based topology optimization. Because the GAN output patterns are quite close to optimal patterns, relatively few iterations are required to refine them.
The final device efficiency distributions are plotted in the second part 810 of
With topology refinement in accordance with a number of embodiments of the invention, devices can be optimized to be robust to geometric erosion and dilation. To enforce physical robustness constraints in the generated designs, modifications to the GAN can be made at the network architecture level and in the training process in accordance with numerous embodiments of the invention. Robustness constraints can be essential to generating devices that are tolerant to random experimental fabrication perturbations. Devices defined by an “intermediate” pattern are robust to both global and local perturbations if their geometrically “eroded” and “dilated” forms are also high efficiency. At an architectural level, these robustness criteria can be mimicked in the GAN discriminator by using image sets of the intermediate, eroded, and dilated devices as inputs. By enforcing low network loss for these sets of devices, the robustness properties of the training set devices can be learned by the generator.
Representative images of high efficiency metagratings from the generator are shown in
Designing robust, high-efficiency metagratings with the GAN generator and iterative optimizer can be applied to a broad range of desired deflection angles and wavelengths. With the same training data from before, robust metagratings can be designed with operating wavelengths ranging from 500 nm and 1300 nm, in increments of 50 nm, and angles ranging from 35 and 85 degrees, in increments of 5 degrees. In a number of embodiments, models can be trained to generate devices with parameters beyond the parameters found in a training dataset. Processes in accordance with numerous embodiments of the invention train a model by iteratively generating extended training samples (i.e., training samples with parameters incrementally beyond the current training dataset) and training the conditional generator on the extended training samples. A plot of device efficiencies for metagratings produced by an example GAN generator is illustrated in the first chart 1205 of
Chart 1210 shows a plot of the device efficiencies of the best generated devices after topology refinement. Nearly all the metagratings with wavelengths in the 600-1300 nm range and angles in the 35-75 degree range have efficiencies near or over 80%. These data indicate that a conditional GAN in accordance with a number of embodiments of the invention can broadly generalize to wavelengths and angles beyond those specified in the training dataset and effectively produce high performance devices.
Not all the devices produced with methods in accordance with some embodiments of the invention exhibit high efficiencies, as chart 1210 shows clear drop-offs in efficiencies for devices designed for shorter wavelengths and ultra-large deflection angles. One source for this observed drop-off is that these devices are in a parameter space that requires topologically distinctive features not found in the training dataset. As such, the conditional GAN can have difficulties learning the proper patterns required to generate high performance devices. In addition, there are device operating regimes for which efficient beam deflection is not physically possible with 325 nm-thick silicon metagratings. For example, device efficiency will drop off as the operating wavelength becomes substantially larger than the device thickness.
An important feature of conditional GANs in accordance with various embodiments of the invention is that the scope of its capabilities can be enhanced by network retraining with additional data. In many embodiments, the data for retraining a conditional GAN can originate from two sources. The first is from the iterative optimization of initial random dielectric distributions, which is how the initial metagrating training dataset is produced in accordance with a variety of embodiments of the invention. The second is from the GAN generator and iterative optimizer themselves, which yield high efficiency devices. This second source of training data suggests a pathway to expanding the efficacy of a conditional GAN with high computational efficiency.
As a proof-of-concept, the generator and iterative optimizer is used to produce 6000 additional high efficiency (70%+) robust metagratings with wavelengths and angles spanning the full parameter space. A plot of device efficiencies for metagratings produced by a GAN generator retrained on the generated metagratings is illustrated in chart 1215 of
Chart 1220 illustrates a plot of differences in device efficiencies between those produced by the retrained GAN generator in 1215 and those produced by the initial GAN generator in 1205. Quantitatively, over 80% of the devices in the parameter space have improved efficiencies after retraining as illustrated in chart 1220. For all plots in
A comparison between the generated output devices and the training dataset is described below. Chart 1305 of
In a variety of embodiments, conditional GANs can provide clear advantages in computational efficiency compared to brute force topology optimization, in order to produce many thousands of high performance devices, including many devices for each wavelength and angle pair. In particular, with a higher dimensional parameter space, brute force optimization methods simply cannot scale, making data-driven methods a necessary route to the design of topologically-complex devices. Further, the identification of large numbers of high performance devices, as can be attained using methods in accordance with certain embodiments of the invention, can be important because it enables the use of statistical, large data analyses to deepen the understanding of the high-dimensional phase space for metasurface design. Having a diversity of device layouts for a given optical function can also be practically useful in experimental implementation to account for any constraints in the fabrication process.
A comparison of GAN-based computation cost and network retraining efficacy results for generating devices is illustrated in
The data used for this analysis are taken from a broad range of wavelength and angle pairs and are summarized in
The table above illustrates the computational cost of device generation and refinement using a conditional GAN in accordance with a variety of embodiments of the invention. The average percentage of refined GAN-generated devices that are “above threshold” is 12% (as shown in
An illustration of the overall design platform is illustrated in
In summary, generative neural networks can facilitate the computationally efficient design of high performance, topologically-complex metasurfaces. Neural networks are a powerful and appropriate tool for this design problem for two reasons. First, there exists a strong interdependence between device topology and optical response, particularly for high performance devices. Second, using the combination of iterative optimizers and accurate electromagnetic solvers allows for the generation of high quality training data and validate device performance. Data-driven design processes in accordance with various embodiments of the invention can apply to the design and characterization of other complex nanophotonic devices, ranging from dielectric and plasmonic antennas to photonic crystals. One skilled in the art will recognize that methods in accordance with several embodiments of the invention can be similarly applied to the design of devices and structured materials in other fields, such as (but not limited to) acoustics, mechanics, and electronics, where there exist strong relationships between structure and response.
In various embodiments, generative neural networks can produce high efficiency, topologically complex metasurfaces in a highly computationally efficient manner. As a model system, conditional generative adversarial networks can be utilized to produce highly-efficient metagratings over a broad range of deflection angles and operating wavelengths. Generated device designs in accordance with a number of embodiments of the invention can be further locally optimized and/or serve as additional training data for network refinement. Data-driven design tools in accordance with numerous embodiments of the invention can be broadly utilized in other domains of optics, acoustics, mechanics, and electronics.
Systems and methods in accordance with various embodiments of the invention present a novel global optimization method that can optimize the generation of various elements, such as (but not limited to) metagratings, grating couplers, on-chip photonic devices (splitters, mode converters, etc.), scalar diffractive optics, optical antennas, and/or solar cells. Global optimization methods in accordance with various embodiments of the invention can also be used to optimize other types of systems such as (but not limited to) acoustic, mechanical, thermal, electronic, and geological systems. The inverse design of metasurfaces is a non-convex optimization problem in a high dimensional space, making global optimization a huge challenge. In various embodiments, processes can combine adjoint variables electromagnetic calculations with a generative neural network to realize high performance photonic structures.
While approaches in accordance with some embodiments of the invention can use adjoint-based gradients to optimize metagrating generation, it is qualitatively different from adjoint-based topology optimization. Adjoint-based topology optimization, as applied to a single device, is a local optimizer. The algorithm takes an initial dielectric distribution and enhances its efficiency by adjusting its refractive indices at each segment using gradient descent. This method is performed iteratively until the device reaches a local maximum in the design space. The performance of the final device strongly depends on the choice of initial dielectric distribution. These local optimizers can be used in a global optimization scheme by performing topology optimization on many devices, each with different initial dielectric distributions that span the design space. Devices that happen to have initial dielectric distributions near favorable regions of the design space will locally optimize in those regions and become high performing.
A comparison between adjoint-based topology optimization and global optimization is illustrated in
This global approach with topology optimization is an effective route to designing a wide range of photonic devices. However, its usage is accompanied by a number of caveats. First, it requires significant computational resources. Hundreds of electromagnetic simulations are required to topology optimize a single device, and for many devices, this number of simulations can scale to very large numbers. Second, the sampling of the design space is limited to the number of devices being optimized. For complex devices described by a very high dimensional design space, this sampling may be insufficient. Third, the devices locally optimize independently of one another, such that gradient information from one device does not impact other devices. As a result, it is not possible for the optimizer to explore beyond the local design spaces demarcated by the initial device distributions.
Approaches in accordance with various embodiments of the invention are qualitatively different in that they can optimize an entire distribution of device instances, as represented by the noise vector. In a variety of embodiments, the starting point of each iteration is similar to adjoint optimization and involves the calculation of efficiency gradients for individual devices using the adjoint method. However, the difference arises when these gradients are backpropagated into the network. When considering the backpropagation of the efficiency gradient from even a single device, all the weights in the network get updated, thereby modifying the mapping of the entire distribution of device instances to device layouts. This points to the presence of crosstalk, in which the gradients from one device instance influence other device instances. Crosstalk is useful because devices in promising parts of the design space exhibit particularly large gradients and can more strongly bias the overall distribution of device instances to these regions. Devices stuck in sub-optimal local maxima of the design space can be biased away from these regions. Regulation of the amount of crosstalk between devices, which is important to stabilizing the optimization method, can be achieved from the non-linearity intrinsic to the neural network itself.
Approaches in accordance with numerous embodiments of the invention are effective at broadly surveying the design space, enhancing the probability that optimal regions of the design space are sampled and exploited. Such global surveying is made possible in part because the input noise in accordance with several embodiments of the invention represents a continuum of device instances spanning the high dimensional design space, and in part because different subsets of devices can be sampled in each iteration, leading to the cumulative sampling of different regions of the design space. Further, systems and methods in accordance with certain embodiments of the invention can enable the simultaneous optimization of devices designed across a continuum of operating parameters in a single network training session. In the case of metagratings, these parameters can include the outgoing angle and wavelength, each spanning a broad range of values. This co-design can lead to a substantial reduction in computation time per device and is made possible because these devices operate with related physics and strongly benefit from crosstalk from the network training process.
An example schematic of a silicon metagrating that deflects normally-incident transverse magnetic (TM)-polarized light of wavelength to an outgoing angle θ is illustrated in
The objective of optimization is to search for the metagrating pattern that maximizes deflection efficiency. In this example, the metagratings consist of silicon nanoridges and deflect normally-incident light to the +1 diffraction order. The thickness of the gratings is fixed to be 325 nm and the incident light is TM-polarized. For each period, the metagrating is subdivided into N=256 segments, each possessing a refractive index value between silicon and air during the optimization process. These refractive index values are the design variable in our problem and are specified as x (a 1×N vector).
The deflection efficiency is defined as the power of light going into the desired direction of deflection angle θ normalized to power of incident light. The deflection efficiency is a nonlinear function of index profile Eff=Eff(x), governed by Maxwell's equations. This quantity, together with the electric field profiles within a device, can be accurately solved using a wide range of electromagnetic solvers.
In numerous embodiments, an optimization objective can be to maximize the deflection efficiency of the metagrating at a specific operating wavelength λ and outgoing angle θ:
Here, physical devices that possess binary index values in the vector: x∈{−1,1}N are of particular interest, where −1 represents air and +1 represents silicon.
A schematic of a generative neural network-based optimization in accordance with an embodiment of the invention is illustrated in
Instead of directly optimizing a single device, which is the case of the adjoint variables method, processes in accordance with several embodiments of the invention can optimize a distribution of devices by training a generative neural network. In many embodiments, processes do not require any pre-prepared training data. In a variety of embodiments, the input of the generator can be a random noise vector z∈(−a, a) and has the same dimension as the output device index profile x∈[−1,1]N. a is the noise amplitude. The generator can be parameterized by ϕ, which relates z to x through a nonlinear mapping: x=Gϕ(z). In other words, the generator maps a uniform distribution of noise vectors to a device distribution Gϕ:(−a, a)Pϕ, where Pϕ(x) defines the probability of x in device space =[−1,1]N.
In a number of embodiments, objectives of the optimization can be framed as maximizing the probability of the highest efficiency device in S:
While such an objective function is rigorous, it cannot be directly used for network training due to two reasons. The first is that the derivative of the δ function is nearly always zero. To circumvent this issue, the δ function can be rewritten as the following:
By substituting the δ function with this Gaussian form and leaving σ as a tunable parameter, Equation 2 can be relaxed to become:
The second reason is that the objective function depends on the maximum of efficiency Effmax, which is unknown. To address this problem, Equation 4 can be approximated with a different function, namely the exponential function:
This approximation works because Pϕ(x|Eff(x)>Effmax)=0 and the new function only needs to approximate that in Equation 4 for efficiency values less than Effmax. With this approximation, Effmax can be removed from the integral:
A=exp(−Effmax/σ) is a normalization factor and does not affect the optimization. In a number of embodiments, the precise form of the approximation function can vary and be tailored depending on the specific optimization problem.
In practice, a batch of devices {x(m)}m=1M can be sampled from P. The objective function can be further approximated as:
In many cases, the deflection efficiency of device x can be calculated using an electromagnetic solver, such that Eff(x) is not directly differentiable for backpropagation. To bypass this problem, the adjoint variables method can be used to compute an efficiency gradient with respect to refractive indices for device x:
To summarize, in various embodiments, the electric field terms from the forward simulation Efwd can be calculated by propagating a normally-incident electromagnetic wave from the substrate to the device. The electric fields from the adjoint simulation Eadj can be calculated by propagating an electromagnetic wave in the direction opposite of the desired outgoing direction from the forward simulation. Efficiency gradient g in accordance with many embodiments of the invention can be calculated by integrating the overlap of those electric field terms:
Finally, the adjoint gradients and objective function can be used to define the loss function L=L(x,g). In some embodiments, L can be defined such that minimizing L is equivalent to maximizing the objective function
during generator training. With this definition, L must satisfy
and is defined as:
Eff(m) and g(m) are independent variables calculated from electromagnetic solver, which are detached from x(m). In a variety of embodiments, a regularization term −|x|·(2−|x|) can be added to L to ensure binarization of the generated patterns. This term reaches a minimum when generated patterns are fully binarized. In certain embodiments, a coefficient γ can be introduced to balance binarization with efficiency enhancement in the final loss function:
In numerous embodiments, the loss can then be backpropagated through the generator to update the weights of the model.
In numerous embodiments, global optimization networks can be conditional networks that can generate outputs according to particular input parameters. A schematic of a global optimization network for conditional metagrating generation in accordance with an embodiment of the invention illustrated in
The output is the refractive index values of the device, n. The weights of the neurons are parameterized as w. Initially, the weights in the network are randomly assigned and different z map onto different device instances: n=Gw(z; λ, θ). In this initial network state, the ensemble of noise vectors {z} maps onto an ensemble of device instances {n} that span the device design space. The ensemble of all possible z and corresponding n, given (λ, θ) as inputs, are denoted as {z} and {n|λ, θ}, respectively.
An important feature of neural networks in accordance with a number of embodiments of the invention is the ability to incorporate layers of neurons at the output of a network. Layers in accordance with some embodiments of the invention can perform mathematical operations on the output device. In some embodiments, the last layer of the generator is a Gaussian filter, which eliminates small, pixel-level features that are impractical to fabricate. Output neuron layers in accordance with a variety of embodiments of the invention can include (but are not limited to) Gaussian filters, binarization filters, etc. The only constraint with these mathematical operations is that they need to be differentiable, so that they support backpropagation during network training.
In numerous embodiments, optimization networks can include differentiable filters or operators for specific purposes. Optimization networks in accordance with several embodiments of the invention can use a Gaussian filter to remove small features, which performs convolution between input images and a Gaussian kernel. In several embodiments, optimization networks can use binarization functions (e.g., a tanh function) to binarize the images. Gradients of the loss function are able to backpropagate through those filters to neurons, so that the generated images are improved within the constraint of those filters. Filters and operators in accordance with a number of embodiments of the invention can include (but are not limited to) Fourier transform, Butterworth filter, Elliptic filter, Chebyshev filter, Elastic deformation, Projective transformation, etc. Examples of the effects of filter layers in accordance with a variety of embodiments of the invention are illustrated in
In several embodiments, proper network initialization is used to ensure that a network at the start of training maps noise vectors {z} to the full design space. Processes in accordance with a number of embodiments of the invention can take randomly assign the weights in the network with small values (e.g., using Xavier initialization), which sets the outputs of the last deconvolution layer to be close to 0. In certain embodiments, processes can directly add the noise vector z to the output of the last deconvolution layer using an “identity shortcut.” In some such embodiments, the dimensionality of z is matched with n. In a number of embodiments, by combining the random assignments and using the identity shortcut, the initial ensemble of all possible generated device instances {n|λ, θ} can have approximately the same distribution as the ensemble of noise vectors {z}, and it therefore spans the full device design space.
During network training, the goal in accordance with certain embodiments of the invention is to iteratively optimize the weights w to maximize the objective function L=
The term Effmax(λ(m), θ(m)) is the theoretical maximum efficiency for each wavelength and angle pair. In practice, Effmax(λ(m), θ(m)) is unknown, as it represents the efficiencies of the globally optimal devices. In several embodiments, over the course of network training, Effmax(λ(m), θ(m))) can be estimated to be the highest cumulative efficiency calculated from the batches of generated devices. Eff(m) is the efficiency of the mth device and can be directly calculated (e.g., with forward electromagnetic simulation). The expression
represents a bias term that preferentially weighs higher efficiency devices during network training and reduces the impact of low efficiency devices that are potentially trapped in undesirable local optima. In a variety of embodiments, the magnitude of this efficiency biasing term can be tuned with the hyperparameter σ.
In numerous embodiments, the gradient of the loss function with respect to the indices, for the mth device, is
In this form, minimizing the loss function L is equivalent to maximizing the device efficiencies in each batch. To train the network and update w in accordance with some embodiments of the invention, backpropagation can be used to calculate
each iteration.
To ensure that the generated devices are binary, a regularization term in accordance with certain embodiments of the invention can be added to the loss function. Regularization terms in accordance with some embodiments of the invention can be −|n(m)|·(2−|n(m)|). This term reaches a minimum when |n(m)|=1 and the device segments are either silicon or air. Binarization conditions in accordance with many embodiments of the invention can serve as a design constraint that limits metagrating efficiency, as the efficiency enhancement term (Equation 12) favors grayscale patterns. To balance binarization with efficiency enhancement in the loss function, processes in accordance with many embodiments of the invention can include a tunable hyperparameter β. The final expression for the loss function in accordance with certain embodiments of the invention is:
In many embodiments, the gradients of efficiency with respect to n, which specify how the device indices can be modified to improve the objective function, can be calculated for each device. For the ith segment of the mth device, which has the refractive index ni(m), this gradient normalized to M is defined as
To ensure that the gradient for backpropagation for each device has this form, the objective function can be defined to be:
The gradient of this objective function with respect to the index, at the ith output neuron for the mth device, is
matching the desired expression. To calculate the gradients applied to w each iteration, the efficiency gradients can be backpropagated for each of the M devices and the subsequent gradients can be averaged on w.
In various embodiments, efficiency gradients can be calculated using the adjoint variables method, which is used in adjoint-based topology optimization. These gradients are calculated from electric and magnetic field values taken from forward and adjoint electromagnetic simulations. In a number of embodiments, neural networks, in which the non-linear mapping between (λ, θ) and device layout are iteratively improved using physics-driven gradients, can be viewed as a reframing of the adjoint-based optimization process. Unlike other manifestations of machine learning-enabled photonics design, approaches in accordance with various embodiments of the invention do not use or require a training set of known devices but instead can learn the physical relationship between device geometry and response directly through electromagnetic simulations.
Although many of the examples herein are described with reference to efficiency gradients, one skilled in the art will recognize that similar performance gradients can be used in a variety of applications, including (but not limited to) other types of efficiency gradients and/or other types of performance gradients, without departing from this invention. Performance gradients in accordance with a variety of embodiments of the invention can include heat conductivity in a thermal conductor used as a heat sink, generated power in a thermoelectric, speed and power in an integrated circuit, and/or power generated in a solar collection device. In the case of aperiodic broadband devices, an efficiency gradient can include the weighted summation of efficiency gradients at different wavelengths.
In examples described herein, the architecture of the generative neural network is adapted from DCGAN, which comprises 2 fully connected layers, 4 transposed convolution layers, and a Gaussian filter at the end to eliminate small features. One skilled in the art will recognize that similar systems and methods can be used in a variety of applications, without departing from this invention. Activation functions of examples described herein use LeakyReLU for activation, except for the last layer, which uses a tanh, but one skilled in the art will recognize that various activation functions can be used in a variety of applications, without departing from this invention. Architectures in accordance with some embodiments of the invention can include dropout layers and/or batchnorm layers to enhance the diversity of the generated patterns. In a number of embodiments, periodic paddings can be used to account for the fact that the devices are periodic structures.
An example network architecture of a conditional global optimization network in accordance with an embodiment of the invention is illustrated in
During the training process in accordance with a number of embodiments of the invention, generators can use the Adam optimizer with a batch size of 1250, learning rate of 0.001, of 0.9, β2 of 0.99, and σ of 0.6. In a variety of embodiments, conditional global optimization networks can be trained for a number of iterations (e.g., 1000). β is 0 for a first portion (e.g., 500) of the iterations and is increased (e.g., 0.2) for the remaining iterations. In certain embodiments, can be updated multiple times during training.
A process for training a global optimization network is illustrated in
Process 2300 generates (2305) a plurality of designs. Designs in accordance with some embodiments of the invention can be any of a number of different types of designs that can be simulated by a physics-based engine. In many embodiments, in order to generate the designs, the generator is provided with inputs to direct the generation. Inputs in accordance with numerous embodiments of the invention can include, but are not limited to, random noise vectors and target design parameters (e.g., target wavelength). In a variety of embodiments, random noise vectors are sampled from a latent space. In order to fully sample the space in the early stage of training, batch sizes in accordance with certain embodiments of the invention can initially be relatively large and then gradually reduce to a small number when design samples start to cluster. a should be a relatively large number ˜10-40 for Xavier initialization. By conditioning global optimization networks with a continuum of operating parameters, ensembles of devices can be simultaneously optimized, further reducing overall computation cost.
Process 2300 simulates (2310) a performance of each design. Simulations in accordance with many embodiments of the invention can be used to determine various characteristics of each generated design. In various embodiments, simulations are performed using an electromagnetics solver that can perform forward and adjoint simulations. Simulations in accordance with a variety of embodiments of the invention can be performed in parallel across multiple processors and/or machines in existing cloud and/or server computing infrastructures.
Process 2300 computes (2315) a global loss for the plurality of designs. In some embodiments, global losses allow each candidate design to contribute to the global loss that will be backpropagated through the generator. Global losses in accordance with a number of embodiments of the invention can be weighted based on a performance metric (e.g., efficiency) to bias the generator to generate high-performance designs.
Process 2300 updates (2320) the generator based on the computed global loss. In several embodiments, updating the generator comprises backpropagating the global loss through the generator.
In a variety of embodiments, once a generator has been trained, it can be used to generate candidate designs, where a number of the candidate designs are selected for further processing, such as (but not limited to) optimization, fabrication, implementation, etc. In certain embodiments, the number is a pre-selected number (e.g., 1, 5, etc.). Alternatively, or conjunctively, all elements with characteristics exceeding a threshold value (e.g., an efficiency value) are selected. By taking the best device from the optimized device batch {b{x}{circumflex over ( )}{(m)}|x(m)˜Pϕ*}m=1M, there is a possibility for the optimizer to get to the global optimum.
Results of global optimization processes in accordance with a number of embodiments of the invention using a simple testing case are illustrated in
Eff(x1,x2)=exp(−2x12)cos(9x1)+exp(−2x22)cos(9x2) (15)
which is a non-convex function with plenty of local optima and one global optimum at (0, 0). Algorithm 1 is used to search for the global optimum, with hyperparameters α=1e−3, β1=0.9, β2=0.999, α=30, and σ=0.5, and the batch size M=100 is constant. The generator is trained for 150 iterations and the generated samples over the course of training are shown as red dots in stages 2405-2420. Initially, the samples spread out over the x space, then gradually converge to a cluster located at the global optimum. No samples are trapped in local optima. This experiment was repeated 100 times, and 96 of them successfully found the global optimum.
In another example, processes in accordance with several embodiments of the invention are applied to the inverse design of 63 different types of metagratings, each with differing operating wavelengths and deflection angles. The wavelengths λ range from 800 nm to 1200 nm, in increments of 50 nm, and the deflection angles θ range from 40 degrees to 70 degrees, in increments of 5 degrees.
Processes in accordance with numerous embodiments of the invention are compared with brute-force topology optimization. For each design target (λ, θ), 500 random gray-scale vectors are each iteratively optimized using efficiency gradients with respect to device patterns. Efficiency gradients are calculated from forward simulation and backward simulation. In this example, a threshold filter is used to binarize the device patterns. Each starting point is also optimized for 200 iterations, and the highest efficiency device among 500 candidates is taken as final design.
In many inverse design approaches, brute-force searching with local optimizers is used to find out the global optimum. With brute-force searching, a large number of device patterns are randomly initialized and then optimized individually using gradient descent. The highest efficiency device among those optimized devices is taken as the final design. With this approach, many devices usually get trapped in local optima in . Additionally, finding the global optimum in a very high dimensional space is more challenging with this method.
In several embodiments, a distribution of devices can be collectively optimized. As indicated in Equation 11, higher efficiency devices bias the generator more than low-efficiency devices, which can be helpful to avoid low-efficiency local optima. The device distribution dynamically changes during the training process, and over the course of optimization, more calculations are performed to explore more promising parts of the design space and away from low-efficiency local optima.
Comparative results of brute-force strategies and global optimization processes in accordance with numerous embodiments of the invention are illustrated in
Efficiency histograms, for select wavelength and angle pairs, of devices designed using brute-force topology optimization (top row) and generative neural network-based optimization (bottom row) are illustrated in
In this example, the hyperparameters are set to α=0.05, β1=0.9, β2=0.99, a=40, α=0.2, and γ=0.05. The initial batch size is 500 and gradually decreases to 20. To prevent vanishing gradients when the generated patterns are binarized as x∈{−1,1}N, the last activation function tanh is replaced with 1.02*tanh. For each combination of wavelength and angle, the generator is trained for 200 iterations. When the training is done, 500 device samples are produced by the generator and the highest efficiency device is taken as the final design.
Comparison with Adjoint-Based Topology Optimizer
To benchmark devices designed from processes in accordance with various embodiments of the invention, results from adjoint-based topology optimization and global optimization networks (or generative design networks) is illustrated in
The efficiency values of plots 2705 and 2710 indicate that the best devices from global optimization networks compare well with the best devices from adjoint-based optimization. Statically, 57% of devices from global optimization networks have efficiencies higher than those from adjoint-based optimization, and 87% of devices from global optimization have efficiencies within 5% or higher than those from adjoint-based optimization. While global optimization performs well for most wavelength and angle values, it does not optimally perform in certain regimes, such as the wavelength and angle ranges of 1200 nm to 1300 nm and 50 degrees to 60 degrees, respectively. In several embodiments, these nonidealities can be improved with further refinement of the network architecture and training process.
The efficiency histograms from adjoint-based topology optimization and global optimization, for select wavelength and angle pairs, are illustrated in
A visualization of the evolution of device patterns and efficiency histograms as a function of unconditional global optimization training is illustrated in
An examination of total computation time indicates that global optimization is computationally efficient when simultaneously optimizing a broad range of devices operating at different wavelengths and angles. In this example, the total number of simulations to train the global optimization network is 1,200,000: the network trains over 30,000 iterations, uses batch sizes of M=20 device instances per iteration, and uses a forward and adjoint simulation per device to compute its efficiency gradient. When divided by the 150 unique wavelength and angle combinations, the number of simulations per wavelength and angle pair is 8,000, which amounts to 20 adjoint-based topology optimization runs (one run has 200 iterations and 2 simulations/iteration). As a point of comparison, 500 adjoint-based topology optimization runs were required to produce the adjoint-based optimizations.
Efficiency histograms of generated devices for unconditional and conditional global optimization at various iterations are illustrated in
In a variety of embodiments, generated devices can be further refined using adjoint-based boundary optimization. Example results of adjoint-based boundary optimization in accordance with a number of embodiments of the invention are illustrated in
In several embodiments, instead of optimizing many devices individually, global optimization for non-convex problems can be reframed as the training of a generator to generate high performing devices with high probability. Efficiency gradients of multiple device samples can collectively improve the performance of the generator, which is helpful to explore the whole device space and avoid low-efficiency local optima. Systems and methods in accordance with numerous embodiments of the invention can be applied to other complex systems, such as (but not limited to) 2D or 3D metasurfaces, multi-function metasurfaces, and other photonics design problems. Multi-function metasurfaces design can require optimizing multi-objectives simultaneously.
Certain issues can arise in the generative design of higher-dimension metasurfaces. First, upon scaling, the design space becomes exponentially larger, making a search through this space highly computationally expensive and potentially intractable. Consider, as an example, a two layer metasurface, where each layer is 128 by 256 pixels: the total number of possible device configurations is 265,536, which is an immense number. The global optimization problem amounts to searching for a needle in a haystack the size of many universes. Systems and methods in accordance with various embodiments of the invention can initially train a global optimization network on a problem with much coarser spatial resolution, which is a much more tractable problem in a lower dimension design space. The spatial resolution of the network can then be progressively increased, through the addition of deconvolution layers at the output of the global optimization network, and the network can be retrained after each addition. In the example of a two layer metasurface, each device layer can be specified to have a spatial resolution of 8 by 16 pixels. The total number of possible device configurations is 2256, which is tractable and similar to the 1D metagrating device space described above. The resolution can then be increased (e.g., to 16 by 32 pixels, 32 by 64 pixels, 64 by 128 pixels, and then 128 by 256 pixels).
Progressive growth optimization networks in accordance with numerous embodiments of the invention assume that the design space for high quality, low spatial resolution devices puts the search in some proximity of the desired region of the overall design space. As spatial resolution increases and the dimensionality of the design space increases, global optimization networks can function more as a local optimizer at these more limited regions of the design space. In many embodiments, a low resolution optimization is performed with a global optimization network. High-performing designs are selected, and a higher-resolution optimization is performed in the “space” of the high-performing designs. As such, instead of searching for a needle in a giant haystack, the process can first start with a smaller haystack that has the main qualitative features of the giant haystack, find a low resolution needle, grow the haystack, and repeat. In a variety of embodiments, the distribution of generated designs is tuned to be broader or narrower at different levels of the search. For example, processes in accordance with various embodiments of the invention can generate narrow distributions early in the training process, with increasingly broader distributions as the process continues.
Figuring out network architectures and hyperparameters for a specific design problem can be difficult. Typically, these parameters are manually hand-tuned by a data scientist, and the methodology employed involves a combination of experience, intuition, and heuristics. In many cases, the network parameters to depend strongly on the specific problem of interest, meaning that they will need to be constantly modified as the design problem changes. Also, there are many candidate architectures and hyperparameters to draw from, making it unclear what to try. In addition, global optimization networks are an entirely new type of neural network concept for which there is no preexisting experience and intuition.
Systems and methods in accordance with several embodiments of the invention can utilize concepts in meta-learning to discover and refine network architectures and hyperparameters suitable for global optimization systems. An example of a meta-learning system is illustrated in
Approaches in accordance with a number of embodiments of the invention can provide an effective and computationally-efficient global topology optimizer for metagratings. In some embodiments, a global search through the design space is possible because the generative neural network can optimize the efficiencies of device distributions that initially span the design space. The best devices generated by global optimization compare well with the best devices generated by adjoint-based topology optimization. Although specific examples of generative design networks or global optimization networks are described herein, one skilled in the art will recognize that networks with various parameters, such as (but not limited to) network architecture, input noise characteristics, and training parameters, can be used as appropriate to a variety of applications, without departing from this invention. Adjustment of parameters in accordance with some embodiments of the invention can lead to higher performance and more robustness to stochastic variations in training. Systems and methods in accordance with many embodiments of the invention can be applied to various other metasurface systems, including (but not limited to) aperiodic, broadband devices. In various embodiments, systems and methods can apply to the design of other classes of photonic devices and more broadly to other physical systems in which device performance can be improved by gradient descent.
While specific processes for global optimization are described above, any of a variety of processes can be utilized for global optimization as appropriate to the requirements of specific applications. In certain embodiments, steps may be executed or performed in any order or sequence not limited to the order and sequence shown and described. In a number of embodiments, some of the above steps may be executed or performed substantially simultaneously where appropriate or in parallel to reduce latency and processing times. In some embodiments, one or more of the above steps may be omitted. Although the above embodiments of the invention are described in reference to metasurface design, the techniques disclosed herein may be used in any of several types of gradient-based generative processes, including (but not limited to) optics design, and/or optimizations of acoustic, mechanical, thermal, electronic, and geological systems.
Although the present invention has been described in certain specific aspects, many additional modifications and variations would be apparent to those skilled in the art. It is therefore to be understood that the present invention may be practiced otherwise than specifically described, including any variety of models of and machine learning techniques to train and generate metagratings, without departing from the scope and spirit of the present invention. Thus, embodiments of the present invention should be considered in all respects as illustrative and not restrictive.
The current application is a U.S. national phase of PCT Application No. PCT/US2019/041414 entitled, “Systems and Methods for Generative Models for Design”, filed Jul. 11, 2019, which claims the benefit of and priority under 35 U.S.C. § 119(e) to U.S. Provisional Patent Application No. 62/696,700 entitled “Metamaterial Discovery Based on Generative Neural Networks” filed Jul. 11, 2018, U.S. Provisional Patent Application No. 62/772,570 entitled “Systems and Methods for Data-Driven Metasurface Discovery” filed Nov. 28, 2018, and U.S. Provisional Patent Application No. 62/843,186 entitled “Global Optimization of Dielectric Metasurfaces Using a Physics-driven Neural Network” filed May 3, 2019. The disclosures of PCT Application No. PCT/US2019/041414 and U.S. Provisional Patent Application Nos. 62/696,700, 62/772,570, and 62/843,186 are hereby incorporated by reference in their entirety for all purposes.
This work was supported by the U.S. Air Force under Award Number FA9550-18-1-0070, the Office of Naval Research under Award Number N00014-16-1-2630.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2019/041414 | 7/11/2019 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
62843186 | May 2019 | US | |
62772570 | Nov 2018 | US | |
62696700 | Jul 2018 | US |