The present disclosure relates to neural networks; more specifically, to heterogenous neural networks.
This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description section. This summary does not identify required or essential features of the claimed subject matter. The innovation is defined with claims, and to the extent this summary conflicts with the claims, the claims should prevail.
Embodiments disclosed herein provide systems and methods for creation and use of a heterogenous neural network that has unrelated functions as an activation function in neurons in an artificial neural network.
In embodiments, a method is disclosed to create a neural network that solves a linked network of equations, implemented in a computing system comprising one or more processors and one or more memories coupled to the one or more processors, the one or more memories comprising computer-executable instructions for causing the computing system to perform operations comprising: creating object neurons for functions in the linked network of functions, the functions having: respective external variables that that are inputs into the respective functions, and respective internal properties of the respective functions; arranging object neurons in order of the linked functions such that a function is associated with a corresponding object neuron; and assigning the associated function to the activation function of each respective object neuron.
In some embodiments, object neurons are connected where each respective function external variable is an edge of the corresponding object neuron and wherein a value of the variable is a weight for the edge.
In some embodiments, at least two activation functions represent unrelated functions.
In some embodiments, respective functions have respective internal properties.
In some embodiments, an input associated with the corresponding object neuron, is created with the input having an edge that connects to the corresponding object neuron.
In some embodiments, a first object neuron has multiple edges connected to a second object neuron.
In some embodiments, a first object neuron has multiple edges connected to a downstream neuron, and a different number of edges connected to an upstream neuron.
In some embodiments, an activation function is comprised of multiple equations.
In some embodiments, at least two functions in the linked network of functions are unrelated.
In some embodiments, the derivative of the neural network is computed to minimize a cost function.
In some embodiments, the neural net has inputs into the neural net and computing the derivative of the neural network applies to a subset of inputs into the neural net.
In some embodiments, computing the derivative of the neural network applies to permanent neuron inputs or to temporary neuron inputs.
In some embodiments, computing the derivative of the neural network comprises using backpropagation or automatic differentiation.
In some embodiments, the cost function determines the distance between neural network output and real-word data associated with a system associated with the linked network of equations.
In some embodiments, a system is disclosed that comprises: at least one processor; a memory in operable communication with the processor, the computing code associated with the processor configured to create a neural network corresponding to a series of linked functions, the functions having input variables and output variables, at least one function having an upstream function which passes at least one variable to the function and a downstream function, to which is passed at least one variable by the function, comprising: performing a process that includes associating a neuron with each function, creating associated neurons for each function, arranging the associated neurons in order of the linked functions, creating, for each function input variable, an edge for the neuron corresponding to the function, the edge having an upstream end and a downstream end, connecting the downstream end to the neuron, connecting the upstream edge to the a neuron associated with the upstream function; creating, for each function output variable, an edge for the neuron corresponding to the function, the edge having an upstream end and a downstream end, connecting the upstream edge to the neuron, connecting the downstream edge to the neuron associated with the downstream function; and associating each function with an activation function in its associated neuron.
In some embodiments, a permanent value is associated with at least one function; and a neural net input is created for the permanent value.
In some embodiments, there are two permanent values associated with the at least one function, a neural net input is created for each of the permanent values, and a downstream edge of the neural net input for to the neuron associated with the at least one function is created.
In embodiments, input variables for a most-upstream function correspond to neural network input variables.
In embodiments, a computer-readable storage medium is disclosed which is configured with instructions which open execution by one or more processors perform a method for creating a neural network that solves a linked network of equations, the method comprising: creating object neurons for equations in the linked network of functions, the functions having: respective external variables that that are inputs into the respective functions, and respective internal properties of the respective functions; and arranging object neurons in order of the linked functions such that a function is associated with a corresponding object neuron; and assigning the associated function to the activation function of each respective object neuron.
In embodiments, at least two activation functions represent different functions.
These, and other, aspects of the invention will be better appreciated and understood when considered in conjunction with the following description and the accompanying drawings. The following description, while indicating various embodiments of the embodiments and numerous specific details thereof, is given by way of illustration and not of limitation. Many substitutions, modifications, additions or rearrangements may be made within the scope of the embodiments, and the embodiments includes all such substitutions, modifications, additions or rearrangements.
Non-limiting and non-exhaustive embodiments of the present embodiments are described with reference to the following FIGURES, wherein like reference numerals refer to like parts throughout the various views unless otherwise specified.
Corresponding reference characters indicate corresponding components throughout the several views of the drawings. Skilled artisans will appreciate that elements in the FIGURES are illustrated for simplicity and clarity and have not necessarily been drawn to scale. For example, the dimensions of some of the elements in the figures may be exaggerated relative to other elements to help to improve understanding of various embodiments. Also, common but well-understood elements that are useful or necessary in a commercially feasible embodiment are often not depicted in order to facilitate a less obstructed view of these various embodiments.
Disclosed below are representative embodiments of methods, computer-readable media, and systems having particular applicability to heterogenous neural networks.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present embodiments. It will be apparent, however, to one having ordinary skill in the art that the specific detail need not be employed to practice the present embodiments. In other instances, well-known materials or methods have not been described in detail in order to avoid obscuring the present embodiments.
Reference throughout this specification to “one embodiment”, “an embodiment”, “one example” or “an example” means that a particular feature, structure or characteristic described in connection with the embodiment or example is included in at least one embodiment of the present embodiments. Thus, appearances of the phrases “in one embodiment”, “in an embodiment”, “one example” or “an example” in various places throughout this specification are not necessarily all referring to the same embodiment or example. Furthermore, the particular features, structures or characteristics may be combined in any suitable combinations and/or sub-combinations in one or more embodiments or examples.
Embodiments in accordance with the present embodiments may be implemented as an apparatus, method, or computer program product. Accordingly, the present embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.), or an embodiment combining software and hardware aspects. Furthermore, the present embodiments may take the form of a computer program product embodied in any tangible medium of expression having computer-usable program code embodied in the medium.
Any combination of one or more computer-usable or computer-readable media may be utilized. For example, a computer-readable medium may include one or more of a portable computer diskette, a hard disk, a random access memory (RAM) device, a read-only memory (ROM) device, an erasable programmable read-only memory (EPROM or Flash memory) device, a portable compact disc read-only memory (CDROM), an optical storage device, and a magnetic storage device. Computer program code for carrying out operations of the present embodiments may be written in any combination of one or more programming languages.
Embodiments may be implemented in edge computing environments where the computing is done within a network which, in some implementations, may not be connected to an outside internet, although the edge computing environment may be connected with an internal internet. This internet may be wired, wireless, or a combination of both. Embodiments may also be implemented in cloud computing environments. A cloud model can be composed of various characteristics (e.g., on-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, etc.), service models (e.g., Software as a Service (“SaaS”), Platform as a Service (“PaaS”), Infrastructure as a Service (“IaaS”), and deployment models (e.g., private cloud, community cloud, public cloud, hybrid cloud, etc.).
The flowchart and block diagrams in the flow diagrams illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present embodiments. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It will also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, may be implemented by general or special purpose hardware-based systems that perform the specified functions or acts, or combinations of general and special purpose hardware and computer instructions. These computer program instructions may also be stored in a computer-readable medium that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instruction means which implement the function/act specified in the flowchart and/or block diagram block or blocks.
As used herein, the terms “comprises,” “comprising,” “includes,” “including,” “has,” “having,” or any other variation thereof, are intended to cover a non-exclusive inclusion. For example, a process, article, or apparatus that comprises a list of elements is not necessarily limited to only those elements but may include other elements not expressly listed or inherent to such process, article, or apparatus.
Further, unless expressly stated to the contrary, “or” refers to an inclusive or and not to an exclusive or. For example, a condition A or B is satisfied by any one of the following: A is true (or present) and B is false (or not present), A is false (or not present) and B is true (or present), and both A and B are true (or present).
Additionally, any examples or illustrations given herein are not to be regarded in any way as restrictions on, limits to, or express definitions of any term or terms with which they are utilized. Instead, these examples or illustrations are to be regarded as being described with respect to one particular embodiment and as being illustrative only. Those of ordinary skill in the art will appreciate that any term or terms with which these examples or illustrations are utilized will encompass other embodiments which may or may not be given therewith or elsewhere in the specification and all such embodiments are intended to be included within the scope of that term or terms. Language designating such non-limiting examples and illustrations includes, but is not limited to: “for example,” “for instance,” “e.g.,” and “in one embodiment.”
“Program” is used broadly herein, to include applications, kernels, drivers, interrupt handlers, firmware, state machines, libraries, and other code written by programmers (who are also referred to as developers) and/or automatically generated. “Optimize” means to improve, not necessarily to perfect. For example, it may be possible to make further improvements in a program or an algorithm which has been optimized.
Additionally, any examples or illustrations given herein are not to be regarded in any way as restrictions on, limits to, or express definitions of any term or terms with which they are utilized. Instead, these examples or illustrations are to be regarded as being described with respect to one particular embodiment and as being illustrative only. Those of ordinary skill in the art will appreciate that any term or terms with which these examples or illustrations are utilized will encompass other embodiments which may or may not be given therewith or elsewhere in the specification and all such embodiments are intended to be included within the scope of that term or terms.
Artificial neural networks are powerful tools that have changed the nature of the world around us, leading to breakthroughs in classification problems, such as image and object recognition, voice generation and recognition, autonomous vehicle creation and new medical technologies, to name just a few. However, neural networks start from ground zero with no training. Training itself can be very onerous, both in that an appropriate training set must be assembled, and that the training often takes a very long time. For example, a neural net can be trained for human faces, but if the training set is not perfectly balanced between the many types of faces that exist, even after extensive training, it may still fail for a specific subset; at the best, the answer is probabilistic; with the highest probability being considered the answer.
Existing approaches offer three steps to develop a deep learning AI model. The first step builds the structure of a neural network through defining the number of layers, number of neurons in each layer, and determines the activation function that will be used for the neural network. The second step determines what training data will work for the given problem, and locates such training data. The third step attempts to optimize the structure of the model, using the training data, through checking the difference between the output of the neural network and the desired output. The network then uses an iterative procedure to determine how to adjust the weights to more closely approach the desired output. Exploiting this methodology is cumbersome, at least because training the model is laborious.
One the neural net is trained, it is basically a black box, composed of input, output, and hidden layers. The hidden layers are well and truly hidden, with no information that can be gleaned from them outside of the neural net itself. Thus, to answer a slightly different question, a new neural net, with a new training set must be developed, and all the computing power and time that is required to train a neural net must be employed.
We describe herein a heterogeneous neural net. A typical neural net comprises inputs, outputs, and hidden layers connected by edges which have weights associated with them. The neural net sums the weights of all the incoming edges, applies a bias, and then uses an activation function to introduce non-linear effects, which basically squashes or expands the weigh/bias value into a useful range; often deciding whether the neuron will, in essence, fire, or not. This new value then becomes a weight used for connections to the next hidden layer of the network. The activation function does not do separate calculations.
In embodiments described herein, the fundamentals of physics are utilized to model single components or pieces of equipment on a one-to-one basis with neural net neurons. When multiple components are linked to each other in a schematic diagram, a neural net is created that models the components as a neurons. The values between the objects flow between the neurons as weights of connected edges. These digital analog neural nets model not only the real complexities of systems but also their emergent behavior and the system semantics. Therefore, it bypasses two major steps of the conventional AI modeling approaches: determining the shape of the neural net, and training the neural net from scratch. As the neurons are arranged in order of an actual system (or set of equations) and because the neurons themselves comprise an equation or a series of equations that describe the function of their associated object, and certain relationships between them are determined by their location in the neural net. Therefore, a huge portion of training is no longer necessary, as the neural net itself comprises location information behavior information, and interaction information between the different objects represented by the neurons. Further, the values held by neurons in the neural net at given times represent real-world behavior of the objects so represented. The neural net is no longer a black box but itself contains important information. This neural net structure also provides much deeper information about the systems and objects being described. Since the neural network is physics- and location-based, unlike the conventional AI structures, it is not limited to a specific model, but can run multiple models for the system that the neural network represents without requiring separate creation or training.
In one embodiment, the neural network that is described herein shapes the location of the neurons to tell you something about the physical nature of the system and places actual equations into the activation function. The weights that move between neurons are equation variables. Different neurons may have unrelated activation functions, depending on the nature of the model being represented. In an exemplary embodiment, each activation function in a neural network may be different.
As an exemplary embodiment, a pump could be represented in a neural network as a series of network neurons, some that represent efficiency, energy consumption, pressure, etc. The neurons will be placed such that one set of weights (variables) feeds into the next neuron (e.g., with an equation as its activation function) that uses those weights (variables). Now, two previous required steps, shaping the neural net and training the model may already be performed, at least to a large part. Using embodiments discussed here the neural net model need not be trained on information that is already known.
In some embodiments, the individual neurons represent physical representations. These individual neurons may hold parameter values that help define the physical representation. As such, when the neural net is run, the parameters helping define the physical representation can be tweaked to more accurately represent the given physical representation.
This has the effect of pre-training the model with a qualitative set of guarantees, as the physics equations that describe objects being modeled are true, which saves having to find training sets and using huge amounts of computational time to run the training sets through the models to train them. A model does not need to be trained with information about the world that is already known. With objects connected in the neural net like they are connected in the real world, emergent behavior arises in the model that maps to the real world. This model behavior that is uncovered is otherwise too computationally complex to determine. Further, the neurons represent actual objects, not just black boxes. The behavior of the neurons themselves can be examined to determine behavior of the object, and can also be used to refine the understanding of the object behavior.
With reference to
A computing environment may have additional features. For example, the computing environment 100 includes storage 140, one or more input devices 150, one or more output devices 160, and one or more communication connections 170. An interconnection mechanism (not shown) such as a bus, controller, or network interconnects the components of the computing environment 100. Typically, operating system software (not shown) provides an operating environment for other software executing in the computing environment 100, and coordinates activities of the components of the computing environment 100. The computing system may also be distributed; running portions of the software 185 on different CPUs.
The storage 140 may be removable or non-removable, and includes magnetic disks, magnetic tapes or cassettes, CD-ROMs, CD-RWs, DVDs, flash drives, or any other medium which can be used to store information and which can be accessed within the computing environment 100. The storage 140 stores instructions for the software 185 to implement methods of neuron discretization and creation.
The input device(s) 150 may be a device that allows a user or another device to communicate with the computing environment 100, such as a touch input device such as a keyboard, video camera, a microphone, mouse, pen, or trackball, a scanning device, touchscreen, or another device that provides input to the computing environment 100. For audio, the input device(s) 150 may be a sound card or similar device that accepts audio input in analog or digital form, or a CD-ROM reader that provides audio samples to the computing environment. The output device(s) 160 may be a display, printer, speaker, CD-writer, or another device that provides output from the computing environment 100.
The communication connection(s) 170 enable communication over a communication medium to another computing entity. The communication medium conveys information such as computer-executable instructions, compressed graphics information, or other data in a modulated data signal. Communication connections 170 may comprise a device 144 that allows a client device to communicate with another device over network 170. A communication device may include one or more wireless transceivers for performing wireless communication and/or one or more communication ports for performing wired communication. In embodiments, communication device 144 may be configured to transmit data associated [[describe data transferred]] to information server These connections may include network connections, which may be a wired or wireless network such as the Internet, an intranet, a LAN, a WAN, a cellular network or another type of network. It will be understood that network 170 may be a combination of multiple different kinds of wired or wireless networks. The network 170 may be a distributed network, with multiple computers acting in tandem.
A computing connection 170 may be a portable communications device such as a wireless handheld device, a cell phone device, and so on.
Computer-readable media are any available non-transient tangible media that can be accessed within a computing environment. By way of example, and not limitation, with the computing environment 100, computer-readable media include memory 120, storage 140, communication media, and combinations of any of the above. Configurable media 170 which may be used to store computer readable media comprises instructions 175 and data 180. Data Sources 190 may be computing devices, such as a general hardware platform servers configured to receive and transmit information over the communications connections 170. Data sources 190 may be configured to communicate through a direct connection to an electrical controller. The competing environment 100 may be an electrical controller that is directly connected to various resources, such as HVAC resources, and which has CPU 110, a GPU 115, Memory, 120, input devices 150, communication connections 170, and/or other features shown in the computing environment 100. The computing environment 100 may be a series of distributed computers. These distributed computers may comprise a series of connected electrical controllers.
Moreover, any of the methods, apparatus, and systems described herein can be used in conjunction with combining abstract interpreters in a wide variety of contexts.
Although the operations of some of the disclosed methods are described in a particular, sequential order for convenient presentation, it should be understood that this manner of description encompasses rearrangement, unless a particular ordering is required by specific language set forth below. For example, operations described sequentially can be rearranged or performed concurrently. Moreover, for the sake of simplicity, the attached figures may not show the various ways in which the disclosed methods, apparatus, and systems can be used in conjunction with other methods, apparatus, and systems. Additionally, the description sometimes uses terms like “determine,” “build,” and “identify” to describe the disclosed technology. These terms are high-level abstractions of the actual operations that are performed. The actual operations that correspond to these terms will vary depending on the particular implementation and are readily discernible by one of ordinary skill in the art.
Further, data produced from any of the disclosed methods can be created, updated, or stored on tangible computer-readable media (e.g., tangible computer-readable media, such as one or more CDs, volatile memory components (such as DRAM or SRAM), or nonvolatile memory components (such as hard drives) using a variety of different data structures or formats. Such data can be created or updated at a local computer or over a network (e.g., by a server computer), or stored and accessed in a cloud computing environment.
Notice that a neuron may have multiple edges connected to, and inputting to the same downstream neuron. Similarly, a neuron may have multiple output edges connected to the same neuron upstream.
Activation functions in a neuron transform the weights on the upstream edges, and then send none, some, or all of the transformed weights to the next neuron(s). Not every activation function 420, 440, 475 transforms every weight. Some activation functions may not transform any weights.
Neurons have activation functions. Rather than being a simple equation used over most or all of a neural net to introduce non-linearality into the system with the effect of moving any given neuron's output into a desired range, activation functions in some embodiments disclosed here are one or more equations that determine actual physical behavior of the object that the neuron represents. In some embodiments, the activation functions represent functions in a system to be solved. These equation(s) have both input variables that are represented in the neural net as edges with weights, and variables that are properties 710A of the object itself. A representative set of equations to model boiler behavior is shown at 715A. The properties may be represented as input neurons into the neural network with edges connected to the boiler neuron.
As the properties are inputs, backpropagation to the properties will allow the neural network system to be tested at the output(s) against real system data. The cost function can measure the difference between the output of the neural network and the output of the actual system under similar starting conditions. The starting conditions can be provided by inputs which may be temporary inputs or a different sort of input. The backpropagation minimizes the cost function. This process can be used to fine-tune the neural network to more closely match the real-world system. Temporary variables, in some embodiments, describe properties of the state of the system. Modifying the inputs of the temporary variables will modify the state of the system being modeled by the neural network, such that inputting a state will change the state of the system throughout as the new state works its way through the system. Inputs into the variables, such as the temporary variables may be time curves. Inputs into the permanent variables may also be time curves whose value does not change over time. Unlike traditional neural nets, whose hidden variables are well and truly hidden such that their intermediate values are indecipherable to users, values of the neurons during running a neural net (e.g., midway through a time curve, at the end of a run, etc.) can produce valuable information about the state of the objects represented by the neurons. For example, the boiler at a given moment has values in all its activation function equations that describe the nature of the boiler at that given time.
When a fully constituted neural network runs forward it changes weights as per the calculations at the individual neurons. Input, e.g., into the relay over time (e.g., in the form of a time curve) can modify the workings of the neural network by switching objects on and off, or by modifying the amount a given object is on. Other modifications that change what parts of a neural network are running at a particular time are also included within the purview of this specification. Unlike standard neural nets, at a given time, neurons that represent physical objects can switch on an off, such as a relay 205 turning on at a certain time, sending electricity 235 to a boiler, to give a single example, changing the flow of the neural net. Similarly, a portion of the neural net can turn off at a given time, stopping the flow of a portion of the neural net. If the relay 205 were to turn off, then the boiler 225 will cease to run.
In some embodiments, method 1000 may be implemented in one or more processing devices (e.g., a digital processor, an analog processor, a digital circuit designed to process information, an analog circuit designed to process information, a state machine, and/or other mechanisms for electronically processing information). The one or more processing devices may include one or more devices executing some or all of the operations of method 1000 in response to instructions stored electronically on an electronic storage medium. The one or more processing devices may include one or more devices configured through hardware, firmware, and/or software to be specifically designed for execution of one or more of the operations of method 1000.
In some embodiments, a neural network method solves a linked network of equations. This linked network of equations may be equations representing a physical system, such as the one described in
At operation 1010, object neurons are arranged in order of the linked functions such that a function is associated with a corresponding object neuron. With reference to
At operation 1015, the associated function is assigned to the activation function of each respective object neuron. Each object has a function that represents an equation or a series of equations. Examples of this can be seen with reference to
At operation 1020, object neurons are connected such that each respective function external variable is an edge of the corresponding object neuron and a value of the variable is a weight of the edge. With reference to
At operation 1023, inputs are created for internal properties. Respective functions have respective internal properties, as seen with reference to properties 710A and 710B in
The neural net runs forward first, from the inputs to the outputs. With the results, a cost function is calculated. At operation 1025, the derivative of the neural network is calculated. In prior neural networks, each activation function in the neural network is the same. This has the result that the same gradient calculation can be used for each neuron. In embodiments disclosed here, each neuron has the potential of having different equations, and therefore different gradient calculations are required to calculate the derivative of each neuron. This makes using standard backpropagation techniques slower, though certainly still possible. However, when the equations are differentiable then autodifferentiation may be used to compute the derivative of the neural network. Autodifferentiation allows the gradient of a function to be calculated as fast as calculating the original function times a constant, at worse. This allows the complex functions involved in the heterogenous neural networks to be calculated within a reasonable time.
At operation 1030, automatic differentiation is used to compute the derivative of the neural network. Other methods of gradient computation are envisioned as well. For example, as shown at operation 1035, in some embodiments, backpropagation is used to compute the derivative of the neural network. This may be used, for example, when the equations are not all differentiable. When the neural network is modeling the real world, such as shown in
At operation 1040, the derivative is computed to only some of the inputs. For example, the derivative may only be computed for the permanent/property inputs of the neurons, marked with a “P” in
In view of the many possible embodiments to which the principles of the disclosed invention may be applied, it should be recognized that the illustrated embodiments are only examples of the invention and should not be taken as limiting the scope of the invention. Rather, the scope of the invention is defined by the following claims. We therefore claim as our invention all that comes within the scope and spirit of these claims.