A Monte Carlo simulation is a simulation in which a probability distribution is estimated by generating random samples and categorizing those random samples to generate the estimate. Some forms of Monte Carlo simulations are subject to a burn-in phenomenon, in which a large number of initial samples are generated and discarded. Burn-in represents a large portion of simulation time.
A more detailed understanding can be had from the following description, given by way of example in conjunction with the accompanying drawings wherein:
Techniques are disclosed for performing a Monte Carlo simulation. The techniques include obtaining an initial Monte Carlo simulation sample from a trained machine learning model, and including the initial Monte Carlo simulation sample in a sample distribution; generating a subsequent Monte Carlo simulation sample from a most recently included Monte Carlo simulation sample most recently included into the sample distribution; determining whether to include the subsequent Monte Carlo simulation sample into the sample distribution based on an inclusion criterion; and repeating the generating and determining steps until a termination criterion is met.
In various alternatives, the processor 102 includes a central processing unit (CPU), a graphics processing unit (GPU), a CPU and GPU located on the same die, or one or more processor cores, wherein each processor core can be a CPU or a GPU. In various alternatives, the memory 104 is located on the same die as the processor 102, or is located separately from the processor 102. The memory 104 includes a volatile or non-volatile memory, for example, random access memory (RAM), dynamic RAM, or a cache.
The storage 106 includes a fixed or removable storage, for example, without limitation, a hard disk drive, a solid state drive, an optical disk, or a flash drive. The input devices 108 include, without limitation, a keyboard, a keypad, a touch screen, a touch pad, a detector, a microphone, an accelerometer, a gyroscope, a biometric scanner, or a network connection (e.g., a wireless local area network card for transmission and/or reception of wireless IEEE 802 signals). The output devices 110 include, without limitation, a display, a speaker, a printer, a haptic feedback device, one or more lights, an antenna, or a network connection (e.g., a wireless local area network card for transmission and/or reception of wireless IEEE 802 signals).
The input driver 112 and output driver 114 include one or more hardware, software, and/or firmware components that are configured to interface with and drive input devices 108 and output devices 110, respectively. The input driver 112 communicates with the processor 102 and the input devices 108, and permits the processor 102 to receive input from the input devices 108. The output driver 114 communicates with the processor 102 and the output devices 110, and permits the processor 102 to send output to the output devices 110.
In a Markov Chain Monte Carlo (“MCMC”) simulation, a simulator performs a “walk” to generate samples for a sample distribution in sequence. The simulator generates any given sample by modifying an immediately prior sample by a random amount and determining whether to include the sample in the sample distribution based on some inclusion criteria. When this process is terminated, the sample distribution is considered to be an estimate of the probability distribution attempting to be determined.
There are a wide variety of possible inclusion criteria. One example dictated by the Metropolis-Hastings algorithm. To use this algorithm, it must be possible to calculate the ratio of densities of any two values in the true distribution (that is, the distribution attempting to be learned). A “density” or probability density function of a continuous random variable is a function whose value for any given sample in the sample space (the set of possible values for the continuous random variable) provides a relative likelihood that the value of the random variable would equal that sample.
According to the Metropolis-Hastings algorithm, the simulator selects a candidate sample by modifying a prior sample already included, for inclusion into the sample distribution. The simulator calculates the ratio of probability densities for the newly generated sample and the sample from which that sample was generated. If this ratio is greater than one, then the simulator includes the candidate sample into the sample distribution. If the ratio is not greater than one, then the simulator generates a random number between 0 and 1. If this random number is greater than the ratio, then the simulator rejects the random sample and if the random number is less than or equal to the ratio, then the simulator includes the random sample into the sample distribution. The simulator continues performing the above operations, generating new candidate samples and including or not including those samples into the sample distribution as described. The resultant sample distribution should converge to the true probability distribution given enough samples. Although the Metropolis-Hastings algorithm has been described as an example inclusion criteria, it should be understood that any technically feasible inclusion criteria could be used.
Although the sample distribution converges to the true distribution given enough samples, it is possible that such convergence would take an extremely large number of samples. This is because, if the initial sample is far from a location of “high probability,” and is thus in a location of “low probability,” then the simulator will have to generate a large number of samples before generating samples of relatively high probability. The samples generated in these areas of low probability will skew the sample distribution unless an extremely large number of samples are generated.
To counteract the above effect, a technique referred to as burn-in is frequently used.
In
To train the model, a model generator 502 accepts the training data items 506 and trains the machine learning model 504 based on those training data items 506. Each training data item 506 is associated with a particular probability distribution. Specifically, the distribution characterizing data 510 is data that characterizes the probability distribution in some way. In some examples, the distribution characterizing data 510 is data that characterizes a mathematical description of the probability distribution. In an example, the distribution characterizing data 510 includes coefficients for a function associated with the probability distribution, such as the density function or a different function. In some examples, the distribution characterizing data 510 also or alternatively includes numerical values for one or more parameters for a mathematical function that mathematically descries the probability distribution. In various examples, the distribution characterizing data 510 includes statistical parameters, such as a distribution type (e.g., Normal, Weibull), mean, standard deviation, and scale parameter. In various examples, the distribution characterizing data 510 includes a parametric description of a physical model that is being modeled statistically with the distribution. In an example, the Monte Carlo simulation is performed to determine an electron density distribution for a configuration of atoms. In this example, the distribution characterizing data 510 includes parameters such as the types of the atoms (e.g., element number and isotope number) and the positions of the atoms. In other examples, the Monte Carlo simulation is performed to determine other physical characteristics of other systems, and the distribution characterizing data 510 includes one or more physical parameters of those systems.
The high-density sample 508 is a sample for the probability distribution associated with the training data item 506. The notion that the sample 508 is “high density” means that the sample is in an area of high probability for a particular probability distribution. There are many possible ways to characterize a “high density” sample. In an example, the high density sample is the mean of a probability distribution. (For a vector, in some examples, the mean is a vector including the mean of each element in the vectors of the probability distribution.). In other examples, the high density sample is the median, mode, or other value that is found within a part of the probability distribution that has “high probability” within that distribution. In some examples, the high-density sample is the sample having the highest value for the probability density function. In an example, the high-density sample is a point that nearly satisfies the governing equations in integral form.
In other words, the training data items 506 are items with which the model generator 502 trains the initial sample machine learning model 504 to generate a high-density sample (label) for a probability distribution when provided with data characterizing that probability distribution. The training data items 506 provide labels in the form of high-density samples 508, and input data in the form of distribution-characterizing data 510. The model generator 502 trains the model 504 to produce a high-density sample 508 in response to input data that is analogous to the distribution-characterizing data 510.
At step 702, the simulator system 600 accepts subject-characterizing data which characterizes a probability distribution that the simulator system 600 is trying to generate a sample distribution for. The subject-characterizing data is similar to the distribution-characterizing data in that the subject-characterizing data is associated with and characterizes a particular probability distribution that the simulator system 600 is attempting to determine through simulation. In various examples, the simulator system 600 obtains this subject-characterizing data automatically from a computer system or from input provided by a human operator. The simulator system 600 applies the subject-characterizing data to the inference system 604. The inference system 604 applies the subject-characterizing data to the initial sample machine learning model 504, which outputs an initial sample. The inference system 604 provides this initial sample to the Monte Carlo simulator 602, which performs a Monte Carlo simulation starting with the initial sample.
At step 704, the Monte Carlo simulator 602 performs a Markov Chain Monte Carlo simulation using the generated initial sample. In various examples, the Monte Carlo simulator 602 performs the simulation as described elsewhere herein. The Monte Carlo simulator 602 includes the initial sample into the sample distribution. At step 706, the Monte Carlo simulator 602 generates a new sample based on that initial sample, by modifying the initial sample by a random amount. The Monte Carlo simulator 602 determines whether to include the generated sample into the sample distribution or to discard the sample based on inclusion criteria. Some examples of inclusion criteria, such as the Metropolis-Hastings algorithm, are described elsewhere herein. The Monte Carlo simulator 602 includes the sample into the sample distribution if the inclusion criteria indicates that the sample should be included and does not include the sample if the inclusion criteria indicates that the sample should not be included. The Monte Carlo simulator 602 generates another sample in a similar manner from the most recently added sample, and determines whether to add that sample to the sample distribution based on inclusion criteria as described above. The Monte Carlo simulator 602 continues generating samples and adding accepted samples to the sample distribution until a termination criterion is met. In examples, the termination criterion includes that a certain number of samples have been generated or that the Monte Carlo simulator 602 receives a termination signal from, for example, a user. At step 708, the Monte Carlo simulator 602 outputs the generated sample distribution as the resulting sample distribution.
Use of the initial sample that is in a “high-probability” area of the probability distribution that is being estimated helps to reduce or eliminate the burn-in period. In the example of
In various implementations, the inference system 604, Monte Carlo simulator 602, and model generator 502 are located within a computer system such as the computer system 100 of
It should be understood that many variations are possible based on the disclosure herein. Although features and elements are described above in particular combinations, each feature or element can be used alone without the other features and elements or in various combinations with or without other features and elements.
The methods provided can be implemented in a general purpose computer, a processor, or a processor core. Suitable processors include, by way of example, a general purpose processor, a special purpose processor, a conventional processor, a graphics processor, a machine learning processor, a digital signal processor (DSP), a plurality of microprocessors, one or more microprocessors in association with a DSP core, a controller, a microcontroller, Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) circuits, any other type of integrated circuit (IC), and/or a state machine. Such processors can be manufactured by configuring a manufacturing process using the results of processed hardware description language (HDL) instructions and other intermediary data including netlists (such instructions capable of being stored on a computer readable media). The results of such processing can be maskworks that are then used in a semiconductor manufacturing process to manufacture a processor which implements features of the disclosure.
The methods or flow charts provided herein can be implemented in a computer program, software, or firmware incorporated in a non-transitory computer-readable storage medium for execution by a general purpose computer or a processor. Examples of non-transitory computer-readable storage mediums include a read only memory (ROM), a random access memory (RAM), a register, cache memory, semiconductor memory devices, magnetic media such as internal hard disks and removable disks, magneto-optical media, and optical media such as CD-ROM disks, and digital versatile disks (DVDs).