The field of machine learning may be described as a subfield of computer science that gives computers the ability to learn without being explicitly programmed. Machine learning typically involves generating computer code from a specification that includes a set of examples that define a desired computational task that the code executes.
Implementations generally relate to generating circuits. In one implementation, a method includes generating accumulated statistics, where the generating of the accumulated statistics is repeated based on one or more generation criteria. In some implementations, the generating of the accumulated statistics includes: generating one or more changed versions of a current circuit where the current circuit includes logic cells that perform Boolean operations, generating statistics associated with performance of the one or more changed versions based on the application of a training set to each changed version, and adding the generated statistics associated with each changed version to the accumulated statistics. The method further includes selecting a changed version from the one or more changed versions based on the accumulated statistics, updating the current circuit based on the selected changed version, and repeating the method multiple times based on the accumulated statistics.
Other aspects and advantages of the described implementations will become apparent from the following detailed description, taken in conjunction with the accompanying drawings, illustrating by way of example the principles of the described implementations.
Implementations generally relate to generating circuits. A system generates networks of simple computing elements that perform a desired computing task that is specified by a training set. Unlike the conventional methods for creating artificial neural networks (ANNs), implementations described herein generate a network including logic cells that perform Boolean logic operations, which are more efficient than the logic cells used in ANNs.
Implementations enable the automated generation of circuits to perform a desired computing task that is described by a collection of examples in a training set. In some implementations, the system generates a circuit that includes simple computing elements called logic cells, which calculate their outputs by means of Boolean logic operations. In various implementations, a system may use hardware, software, or a combination thereof to generate circuits. Various implementations described herein may be applied to various technologies such as handwriting recognition, speech recognition, self-driving cars, and even, as described below, the automatic generation of circuits.
In an example implementation, the system may generate a circuit for categorization. In this example implementation, the system may generate a circuit to perform the categorization of handwritten digits “0” through “9” by performing handwriting recognition. The training set for such categorization may contain examples including an image of a digit along with the category to which it belongs (e.g., an image of a handwritten number “7” with the category “7”). In an example implementation, the system may generate a circuit with an output that asserts the Boolean value “True” when the circuit infers that the image applied to the inputs of the circuit belongs to category “7”.
At block 204, the system 100 decides whether to select a changed version of a current circuit from one or more changed versions based on the accumulated statistics. In various implementations, a task of block 204 is to select a changed version that is “better” than the current circuit. In various implementations, the term “better” indicates that the changed circuit is an improvement over the current circuit. For example, a “better” version may produce fewer errors than the current circuit. By constantly selecting the “better” version each time through the loop created by block 208, the current circuit undergoes a process of incremental improvement. In various implementations, the accumulated statistics are the means for measuring the “better” version. Example measures for selecting the “better” version are described in more detail herein.
Some measures used to determine “better” versions of a network may include counting and reducing categorization errors. Categorization errors may be divided into two types. A first type of categorization error is a false positive. A false positive occurs when an example not belonging to a category is wrongly inferred to belong to it. In the example application, this will occur when the output for category “7” generates a True value when an image of category “1” is applied to the circuit. The second type of categorization error is a false negative. A false negative occurs when an example belonging to a category is inferred to not belong it. In the example application, a false negative would occur when the output for category “7” is false when the image of a “7” is applied to the inputs of the circuit. As such, when a circuit is generated to perform a categorization and the measure for the “better” version reduces an error rate, the accumulated statistics will contain a measure of the error rate. In some implementations, the error rate of an output is calculated from stored counts of the false positive inferences in a training set, and stored counts of the false negative inferences in a training set. These counts, along with the count of the total number of examples in the training set, define the true positive and true negative inferences. In various implementations, the generating of the statistics may include counting false positive inferences, and counting false negative inferences.
In some implementations, a measure for the “better” version may be a measurement of a reduction of the cost to create a circuit. In some implementations, the accumulated statistics 108 store numeric measures of the cost of the changed version. Numeric measures of the cost of a circuit may include, but are not limited to, the count of each of the different types of logic cells, the count of connections in the circuit, and the count of the number of layers in the circuit. The measure of cost may be intended to measure the cost to simulate the circuit in software or the cost to implement the circuit in hardware. For example, the time required to simulate a circuit goes up with the number of logic cells in the circuit, and the number of transistors required to implement a circuit in hardware also goes up with the number of logic cells. Because of that, one numeric measure of the cost of a circuit may be the number of logic cells in the circuit. In some implementations, different logic cells may require different amounts of time to simulate, and have different requirements when being implemented in hardware. As such, to calculate a numeric measure for the cost of a circuit, the counts of each type of logic cell would be individual measures of cost. In various implementations, the selecting of the changed version is based on a numeric value that varies with numeric measures of the cost of the changed version.
Multiple measures for the “better” option may be combined to cause the circuit being generated to be optimized for multiple criteria at once. For example, multiplying the number of connections times the number of errors would result in a measure that could cause a circuit to be optimized for reduced cost and reduced errors at the same time.
The supervisor program 102 is the computer process that evaluates the accumulated statistics and decides to select a changed version in block 204. The evaluation of the accumulated statistics may be a sophisticated process that significantly expedites the method of implementations described herein. Generating accumulated statistics in block 202 takes time and incurs computing costs. In some implementations, the decision in block 204 to continue to generate accumulated statistics is a decision that weighs the potential benefit of discovering a better changed version with the compute cost of continuing to look for it. This evaluation may be done poorly by a simple algorithm, or it may be done more efficiently by a more complex computation that is adaptive and embodies the experience of having trained many different circuits. In an example application, the system may utilize one or more circuits generated by the circuit generation methods described herein, within the supervisor program to make the decision of block 204. Just as computers are used in the process of designing other computers, circuits that are generated by embodiments described herein may be used in the process that generates other circuits. In various implementations, one or more of the generation criteria is generated by applying accumulated statistics to a decision circuit, where the decision circuit is generated by the method of
At block 206, the system 100 updates the current circuit 104 based on the selected changed version. There are various possible ways to achieve this, and they depend on how the description of the selected changed version is stored. If a description of the selected changed version consists of a change that was applied to the current circuit, the updating of the current circuit may include changing the current circuit to be equal to the selected changed version. If the description of the selected changed version is a complete copy of the current circuit with a change applied to it, the updating of the current circuit may include replacing the current circuit with the selected changed version.
At block 208, the system repeats the method multiple times based on the accumulated statistics 108 until finished. In some implementations, the decision to continue generating statistics is made based on the accumulated statistics 108. The compute cost to improve the circuit may be calculated by multiplying the cost per hour of the compute resources used times the number of hours used to generate the improved circuit. The cost to generate each changed version may be computed in this way. The improvement that a changed version makes may be measured in the error rate reduction that the changed version achieves. In various implementations, the compute cost is a measure of effort, and the improvement each changed version makes is a measure of benefit. In some implementations, the decision to finish may be made when the benefit of a changed version is no longer worth the effort. This decision may be made by dividing the measure of benefit by the measure of effort, and finishing when that number falls below a predetermined threshold.
In various implementations, the generated accumulated statistics step in block 202 has a variety of changes that the system knows how to apply to a circuit. In some implementations, the decision to finish in block 208 may be made when all known changes have been tried, or when all changes of a certain type have been tried. In some implementations, the decision to finish at block 208 may be made when the current circuit 104 has arrived at a state of making no errors on the training set 106.
At block 302, the system generates one or more changed versions of a current circuit. In some implementations, the current circuit includes logic cells that perform Boolean operations. For example, in some implementations, the generating of one or more of the changed versions of the current circuit includes establishing a new logic cell interconnection. In another example, in some implementations, the generating of one or more of the changed versions of the current circuit includes removing a logic cell interconnection. In yet another example, in some implementations, the generating of one or more of the changed versions of the current circuit includes adding a new logic cell. In yet another example, in some implementations, the generating of one or more of the changed versions of the current circuit includes changing the value of a variable used by a logic cell.
As described in more detail herein,
In some implementations, generated changed versions of a current circuit may include multiple changes of the types described above. For example a changed version may be one that is generated by adding one cell, adding a input to that cell, adding a output from that cell, and changing an internal variable of the cell connected to the output of the cell that was just added.
An example of an internal variable to a logic cell is shown in
Another example of an internal variable is shown in
Referring again to
In some implementations, the statistics generated from the application of a training set to a circuit includes the counts of the number of times each circuit output correctly indicates the category it is designated to indicate. In some implementations, calculating an error rate may be performed by counting the total number of examples presented to the network, counting the False Positives indicated by each output, and counting the False Negatives indicated by each output. In some implementations, the equation for this calculated error rate is the percentage of true positive examples, the correctly inferred examples belonging to the output's category, minus the percentage of false positive examples, where the incorrectly inferred examples do not belong to the output's category.
There are two extremes for how a generated circuit may function to correctly categorize examples in a training set. In the example of a circuit that performs handwriting recognition, the circuit may correctly categorize an image of a “3” by having extracted the essence of what it is to be a “3”. The circuit may identify the image as including two continuous semicircular strokes one on top of the other, with openings that face to the left. This method of categorizing an example is based on performing computations that have been generalized from the examples in the training set in order to identify intermediate features. This method may then use those features to correctly classify the examples in the training set.
The other extreme for how a circuit may correctly categorize examples in a training set is to simply memorize each example exactly, and then indicate when one of the memorized examples is present. This extreme is simply memorizing, and is undesirable. It is the opposite of categorizing examples by means of generalizations and features. The functioning of a generated network by means of memorizing examples in a training set instead of properly generalizing is an observed phenomena that is called “overfitting” in the field of artificial neural networks.
In various implementations, the system prevents the problem of overfitting when circuits are generated from training sets. For example, in some implementations, one way to prevent overfitting is to recognize that overfitting takes place when the generated circuit performs its computation by means of identifying individual or small groups of examples in the training set. In some implementations, a way to prevent the circuit from identifying individual or small groups of examples is to not make any changes to the circuit that only affect individual or small groups of examples. In some implementations, overfitting may be prevented by requiring that each change applied to the circuit results in the circuit functioning differently for at least a minimum number of examples.
In this way, in some implementations, changes are made if they operate on groups of examples, and not individual examples. In some implementations, the system selects the changed version based on a selection criterion that requires that the circuit generates outputs that are different when compared to a previous version for at least a minimum number of examples.
In some implementations, the selection criterion may be that one or more generated signals assume Boolean values that are different compared to a previous version for at least a predetermined number of training set examples in a specific category. This prevents overfitting by requiring that when changes result in selecting additional examples that belong to the category of an output of the network, that those additional examples are a group of a minimum predetermined size. Overfitting can also occur when individual examples are excluded from a specific category.
In some implementations, the selecting of the changed version may be based on a selection criterion that one or more generated signals assume Boolean values that are different compared to a previous version for at least a predetermined number of training set examples that are not in a specific category. This prevents overfitting by requiring that changes, which result in excluding additional examples that do not belong to the category of an output of the network, that those additional examples are a group of a minimum size.
Referring again to
Although the steps, operations, or computations may be presented in a specific order, the order may be changed in particular implementations. Other orderings of the steps are possible, depending on the particular implementation. In some particular implementations, multiple steps shown as sequential in this specification may be performed at the same time. Also, some implementations may not have all of the steps shown and/or may have other steps instead of, or in addition to, those shown herein.
Another type of logic cell would be the logic cell shown in
Logic cell 700 is an example of a logic cell having internal variables, one variable indicating the number of internal logical AND operations, and other variables indicating which logic cell input is connected to each of the internal logical AND operations. As shown in
Logic cells of a wide variety of functionality may be trained with the method of implementations described herein. Another type of useful logic cell is one that has numeric inputs and Boolean outputs. This is useful functionality because many kinds of data that may be applied to machine learning are numeric, and Boolean logic for computation is preferable. One possible numeric to Boolean function could be “approximately equal,” where Boolean output signals that multiple numeric inputs are equal to within a predetermined measure. Another interesting numeric to Boolean function may include “approximately zero,” where an input is below a threshold. Another numeric may include “large,” where an input exceeds a threshold. Another numeric may include “ascending,” where a group of inputs are ordered and are increasing. Another numeric may include “descending,” where a group of inputs are ordered and are decreasing. The numeric values that are used for thresholds, or “approximately equal” operations may be the values of other inputs, or the values of variables that are set outside of the cell. In various implementations, the generating of the statistics includes setting at least one logic cell output to a Boolean value based on multiple logic cell inputs having numeric values.
In various implementations, logic cells of different types may be arranged in different layers of a circuit. For example, the circuit shown in
In various implementations, a normal training set includes training examples, where each example is a collection of data. Part of that collection of data is applied to the circuit inputs when the example is applied to a circuit. Part of that collection of data is additional information that may include a category to which the example belongs. In the application example of the training set of handwritten digits, each training example includes an image and a category. The image is an array of values corresponding to the pixels in the image. When one of these examples is applied to the circuit, each pixel corresponds to one of the circuit inputs, and its value is set to that circuit input. When stored on a computer, the data for an example is frequently stored as a vector, which is an ordered sequence of data. A typical training vector in a training set contains the values of all the circuit inputs for one training example.
In some implementations, the data representation used in generating statistics associated with performance of each changed version may be a different type of vector, called a signal vector. This alternate vector representation is valuable, because it speeds the computation of the statistics associated with a changed version in a network of Boolean logic cells. Whereas a normal example vector is a collection of the values of a number of different signals (circuit input signals) associated with one example, the signal vector is a collection of the values that a single signal has for every different example in the training set. In the application example of the training set of handwriting samples, where each sample is bitmap of a handwritten digit, the first bit in the top left corner is almost always white. This is because the digits tend to be drawn in the center of a bitmap. In the example vector representation, this means that the first bit in every example vector is almost always zero (for white). In contrast, the signal vector representation for the signal representing the first bit in the top left corner is a vector that includes almost all zeros.
The signal vector representation is a very fast structure for Boolean signals, because the value of a signal may be represented in a single bit. A normal data word in a modern computer processor is 64 bits long. As such, the signal vector representation of a signal for 64 Boolean examples may be efficiently packed into one data word. In some implementations, the application of the training set to at least one logic cell is performed in parallel, where the logic cell inputs of the one logic cell have values that are Boolean vectors. In some implementations, each bit of a Boolean vector represents a value of the logic cell input for one training set example. Internal logic of the at least one logic cell performs vector operations to generate a vector value for at least one logic cell output.
When the operations in a Boolean logic cell are either logical AND or logical OR, modern computer processors can perform these operations on a signal vector by using bitwise AND and OR instructions which are therefore able to process 64 examples in a single instruction step. This offers a significant speed advantage for the generation of Boolean logic circuits. For an even greater speed advantage, custom electronic hardware could be used to perform bitwise logical operations of two signal vectors. In some implementations, the system may perform a bitwise AND operation of two input vectors having variable lengths greater than a word. In some implementations, the system may perform a bitwise OR operation of two input vectors having variable lengths greater than a word.
Other generated statistics calculations are also speeded up by the signal vector representation. Counting the number of examples for which a given signal has a value of “1”, or True, in the training set, amounts to counting the “1”s in the signal vector for the given signal. Conversely, counting the number of times an output signal is False for the examples in a training set amounts to counting the “0”s in the signal vector for the given signal.
Many modern computer processors offer a population count (POPCNT) instruction, which counts the number of “1” bits in a data word. This instruction makes counting the “1”s in a signal vector very fast and efficient. In some implementations this computation can be sped up with custom circuitry that when executed is further operable to perform operations including counting a number of bits set to “1” or “0” in a Boolean input vector having a variable length greater than a word. In some implementations, the system counts a number of bits set to “1” in a Boolean input vector having a variable length greater than a word. In some implementations, the system counts a number of bits set to “0” in a Boolean input vector having a variable length greater than a word. The counting of the number of bits set to “1” or “0” is frequently performed after the logical AND and OR operation of two vectors having variable lengths, and custom hardware can be created to speed the calculation of both operations simultaneously. In this situation, the custom hardware can be constructed to perform a logical bitwise AND or OR operation of two input vectors having variable lengths greater than a word, and to count the number of bits set to “1” or “0” set in the resulting Boolean output vector at the same time.
Computing system 900 also includes a software application 910, which may be stored on memory 906 or on any other suitable storage location or computer-readable medium. Software application 910 provides instructions that enable processor 902 to perform the implementations described herein and other functions. Software application 910 may also include an engine such as a network engine for performing various functions associated with one or more networks and network communications. The components of computing system 900 may be implemented by one or more processors or any combination of hardware devices, as well as any combination of hardware, software, firmware, etc.
For ease of illustration,
In various implementations, computing system 900 includes logic encoded in one or more non-transitory computer-readable storage media for execution by the one or more processors. When executed, the logic is operable to perform operations associated with implementations described herein.
Although the description has been described with respect to particular embodiments thereof, these particular embodiments are merely illustrative, and not restrictive. Concepts illustrated in the examples may be applied to other examples and implementations.
In various implementations, software is encoded in one or more non-transitory computer-readable media for execution by one or more processors. The software when executed by one or more processors is operable to perform the implementations described herein and other functions.
Any suitable programming language can be used to implement the routines of particular embodiments including C, C++, Java, assembly language, etc. Different programming techniques can be employed such as procedural or object oriented. The routines can execute on a single processing device or multiple processors. Although the steps, operations, or computations may be presented in a specific order, this order may be changed in different particular embodiments. In some particular embodiments, multiple steps shown as sequential in this specification can be performed at the same time.
Particular embodiments may be implemented in a non-transitory computer-readable storage medium (also referred to as a machine-readable storage medium) for use by or in connection with the instruction execution system, apparatus, or device. Particular embodiments can be implemented in the form of control logic in software or hardware or a combination of both. The control logic when executed by one or more processors is operable to perform the implementations described herein and other functions. For example, a tangible medium such as a hardware storage device can be used to store the control logic, which can include executable instructions.
Particular embodiments may be implemented by using a programmable general purpose digital computer, and/or by using application specific integrated circuits, programmable logic devices, field programmable gate arrays, optical, chemical, biological, quantum or nanoengineered systems, components and mechanisms. In general, the functions of particular embodiments can be achieved by any means as is known in the art. Distributed, networked systems, components, and/or circuits can be used. Communication, or transfer, of data may be wired, wireless, or by any other means.
A “processor” may include any suitable hardware and/or software system, mechanism, or component that processes data, signals or other information. A processor may include a system with a general-purpose central processing unit, multiple processing units, dedicated circuitry for achieving functionality, or other systems. Processing need not be limited to a geographic location, or have temporal limitations. For example, a processor may perform its functions in “real-time,” “offline,” in a “batch mode,” etc. Portions of processing may be performed at different times and at different locations, by different (or the same) processing systems. A computer may be any processor in communication with a memory. The memory may be any suitable data storage, memory and/or non-transitory computer-readable storage medium, including electronic storage devices such as random-access memory (RAM), read-only memory (ROM), magnetic storage device (hard disk drive or the like), flash, optical storage device (CD, DVD or the like), magnetic or optical disk, or other tangible media suitable for storing instructions (e.g., program or software instructions) for execution by the processor. For example, a tangible medium such as a hardware storage device can be used to store the control logic, which can include executable instructions. The instructions can also be contained in, and provided as, an electronic signal, for example in the form of software as a service (SaaS) delivered from a server (e.g., a distributed system and/or a cloud computing system).
It will also be appreciated that one or more of the elements depicted in the drawings/figures can also be implemented in a more separated or integrated manner, or even removed or rendered as inoperable in certain cases, as is useful in accordance with a particular application. It is also within the spirit and scope to implement a program or code that can be stored in a machine-readable medium to permit a computer to perform any of the methods described above.
As used in the description herein and throughout the claims that follow, “a”, “an”, and “the” includes plural references unless the context clearly dictates otherwise. Also, as used in the description herein and throughout the claims that follow, the meaning of “in” includes “in” and “on” unless the context clearly dictates otherwise.
Thus, while particular embodiments have been described herein, latitudes of modification, various changes, and substitutions are intended in the foregoing disclosures, and it will be appreciated that in some instances some features of particular embodiments will be employed without a corresponding use of other features without departing from the scope and spirit as set forth. Therefore, many modifications may be made to adapt a particular situation or material to the essential scope and spirit.
Number | Date | Country | |
---|---|---|---|
62519743 | Jun 2017 | US |