The invention relates to deep learning processes, and to electronic circuits used to simulate and implement deep learning rules.
The principle of minimum energy is well known in physics, and is based on the second law of thermodynamics. The principle states that in a closed system having constant external parameters (such as the volume and any outside forces or influences) and entropy, the internal energy will decrease and approach a minimum value at equilibrium. A closed system is not isolated, but rather is connected to another system, and can exchange energy, such as heat, with the other system, but can't exchange matter. Thus, in a closed system, entropy remains constant, and therefore the energy of that system decreases to a minimum value at equilibrium, in the process transferring its energy to the other system.
The Minimum Free Energy (MFE) of the brain ΔHbrain≡ΔEbrain−ToΔSbrain≦0, consistent with the irreversible thermodynamics of a total system including the brain and its isothermal environment at To:ΔStot>0. The gradient force of MFE is biologically known as the chemical affinity. It is known that the four major forces of physics, namely, gravitational, electromagnetic, strong (Yukawa potential), and weak (Fermi neutron decay) have a divergent singularity due to a small denominator that defines the existence of gravitational inertial mass, the Coulomb charge, the Mason particle, and the neutrino particle. Likewise, the singularity of the MFE gradient force defines the existence of glial cells.
Unsupervised Deep Learning (UDL) requires a house-keeping servant glial cell per thresholding logic node, namely a brain-style neuron. The linear synaptic weight matrix adjustment has been derived in the manner of D.O. Hebb circa 1950, as well as nonlinear morphology learning of hidden layer architecture, known as deep learning (DL). Recently, leading Internet companies have invested in major efforts in this field, resulting in a number of accomplishments. For example, Andrew Ng of the Google Brain deep learning project developed a neural network trained using deep learning algorithms to learn to recognize higher-level concepts, such as cats, after watching only YouTube videos and without first having been told what a cat is. Yann LeCun preformed artificial intelligence (AI) research for Facebook and developed arbitrary face expression recognition. George Dahl did deep learning AI research at Microsoft and is currently a research scientist at Google for speech recognition. Such a unified capability may be referred to as a Caianiello-Hebb-Ng-LeCun-Dahl (CHNLD) Deep Learning Rule (DLR).
It has also been observed that with respect to survival, nature seems to prefer pluralism, regarding endothermic vs. exothermic intelligence as cephalopods like octopus and cuttlefish have demonstrated stunning levels of intelligence, including puzzle solving and even learning through observation. Nonetheless, we can derive unsupervised deep learning rules by observing the necessary and sufficient conditions of warm blood animals roaming the Earth, which have an average brain temperature <Tbrain>≈To kept at a mean value controlled by the brain's hypothalamus center, although the mean and variance of this temperature differs from species to species. For example, chicken and other bird brains are kept at a temperature of 40° C., perhaps for in order to have a body temperature that is capable of hatching eggs, whereas Homo sapiens brains and bodies are kept around at 37° C. to provide optimum elasticity of red blood hemoglobin cells, allowing them to squeeze through capillaries, emulating an ocean environment. Further, all animals receive sensory inputs in pairs, thus giving rise to a linear vector time series.
Knowledge representation is a field of AI dedicated to representing information about the world in a form that a computer system can utilize to solve complex tasks. With isothermal equilibrium and <Tbrain>≈To, and a linear vector time series (t)=[A](t) in 10D including an unknown mixing matrix [A], the Internal Knowledge Representation (IKR) is an ensemble of the degree of different uniformity {}, known as entropy in thermodynamics. One can apply the Ludwig Boltzmann definition of the total entropy to the brain and the brain's surrounding environment as a measure of the total degree of uniformity, and also his assertion of irreversible thermodynamic heat death ΔStot>0, due to the incessant molecular collisional mixing that produces greater uniformity. The inverse turns out to be the Maxwell-Boltzmann Canonical Probability
where Hbrain≡Ebrain−ToSbrain turns out to be the Helmholtz free energy, that is, energy that is available to do work after subtracting the exhaust thermal waste energy ToSbrain. This derivation makes use of the conservation of energy, ΔEtot=ToΔSenvironment+ΔEbrain=0, such that the internal brain energy can exchange with environmental thermal energy.
Note the use of a scalar S in Eq. (1); the Boltzmann thermodynamic entropy denoted by the scalar S is used as the measure of the degree of uniformity, which helps define the important isothermal Helmholtz MFE in the brain-style computation of Eq. (2). Intuitively, this is similar to a principle of paleontology in which mountain-top rocks have less uniformity and therefore less information when compared to eroded beach sand having much more uniformity and a larger entropy value, without having much paleontological information (except the emotional sensation e-IQ). However, the vector denotes the ordered set of components of different uniformity of the computational thresholding “nodes”, emulating brain-style “neurons” that have different MFE states.
Dropping the explicit function of time of the incoming exemplar data (t), they can be collected as an ensemble of exemplars with stable mean, variance, and kurtosis {}. The input layer of sensor vector data time series can be given in an ensemble denoted by the angular brackets.
{}=[A]{n} (3)
The IKR has multiple hidden layers connecting layer to layer, , n=1, 2, 3, . . . . Furthermore, the n-th layer of deep learning is connected through the dendritic trees to the adjacent n+1st layer of neurons in the hidden layers of IKR. The change in the MFE, ΔHbrain, can help determine the synaptic weight per exemplar realization
[W]={right arrow over (S)}n. (4)
According to the invention, nonlinear morphology UDL rules are derived that can automatically change the architecture of hidden layers, that is, recruiting nodes or not (namely pruning) into functional units among layers of our brain hidden from the outside world. This is not unlike the anecdotal biological fact that male finch bird sing a new song every spring to attract female finch bird in order to lay more fraternized eggs of brainy offspring.
Δ{right arrow over (S)}≡{right arrow over (S)}n+1−{right arrow over (S)}n={right arrow over (S)}n+1−[W]>0, (5)
For further background, see the references cited infra, the contents of which are incorporated herein in their entireties.
According to an aspect of the invention, a deep learning neuromorphic system includes an electronic circuit having input ports and an output port. The input ports are configured to receive differential photon detector outputs as circuit inputs. The electronic circuit is configured to apply unsupervised deep learning rules to the circuit inputs to provide a current mirror output. The output port is configured to provide the current mirror output to a plotter.
The differential photon detector outputs can be associated with a minimum free energy of a subject brain. The unsupervised deep learning rules can predict the glial cell force voltage of the subject brain. The current mirror output can relate to the glial cell force voltage.
The system can include the photon detector. The photon detector can be configured to receive a video input and provide a corresponding differential output.
The electronic circuit can include three-port semiconductor devices.
The system can include the plotter.
The electronic circuit can be configured as a system-on-chip.
According to another aspect of the invention, a method of deep learning neuromorphic application includes receiving differential photon detector outputs as inputs to an electronic circuit. The electronic circuit applies unsupervised deep learning rules to the inputs to provide a current mirror output. The current mirror output is provided to a plotter.
The method can include associating a minimum free energy of a subject brain with the differential photon detector outputs. The method can include using the unsupervised deep learning rules to predict the glial cell force voltage of the subject brain. The method can include relating the current mirror output to the glial cell force voltage.
The method can include using a photon detector to receive a video input and provide a corresponding differential output.
The electronic circuit can include three-port semiconductor devices.
The electronic circuit can be configured as a system-on-chip.
The method can also include making a pruning decision based on the current mirror output.
An artificial neural network (ANN) weighted matrix [W] is illustrated in
Ad hoc DL might take place when more traffic flows through, as the impedance value would be reduced proportionally according to the definition of SO. The fault tolerance (FT) of massive parallel and distributed (MPD) processing is a result of the geometry in taking the threshold of the nearest neighbor indicated in
Difference sensing directly on a photonic detector in different sequential timing should be adaptive to incoming moving film for Automatic Pattern Recognition (APR), the speed of which is estimated from its neighborhood's projected changes. These neighborhood current fluctuations of Photon Detectors (PDs) can be measured using a Complementary Metal-Oxide Semiconductor (CMOS) Transconductance Amplifier (TA), which converts the voltage potential to current as shown in
The circuit of
Boltzmann entropy Eq. (1) defines Maxwell-Boltzmann Probability, Eq. (2),
Then, the Boltzmann irreversible heat death ΔStot>0 due to incessant inter-molecular collisions smoothing the degree of uniformity measured by the entropy
ΔHbrain=ΔEbrain−ToΔSbrain≦0
That is, Min. ΔHbrain≡ΔEbrain−ToΔSbrain≦0 free to do the work, similar to how in a gasoline internal chemical energy reaction, the exhaust waste heat energy must be extracted in order for the mechanical expansion to take place. Q.E.D.
Higher homeostasis temperatures do not imply higher intelligence.
It is difficult to conduct test and evaluation on this corollary. However, based on the fact that Homo sapiens eat chicken; but not vice versa, it is assumed that we are smarter in most real world activities. Q.E.D.
Use has been made of the chain rule and partial differentiation of layered morphology defined by Eq. (5):
There exists an implicit threshold at the exponential-folding length e−1≈0.7 for the passing over weakly linear Hebb rule to the binary states controlled by the thermal energy brittleness
The divergence of the MFE gradient force glia≡−ΔH/Δ, relative to the difference of the MFE ΔH=ΔHrecruiting−ΔHpruning gradient force is biologically known as the chemical affinity, the physical space exclusion of which proves the existence of a finite-size house-keeping servant glial cell, one per neuron, about 1/10th the size of the neuron.
Weak ΔHbraino(|Δ|1);
On the other hand, the regularity of the divergence sustains a constant force
Strong ΔHbraino(const.);
which implies the node activity is significant for the house-keeping nodal glial cells to recruit the node into functional units. The mathematical regularization of the divergence is accomplished because the physical exclusion size is prevented from reaching the zero distant origin by the geometry size of the house-keeping servant glial cell, which as noted above is about 1/10th the size of a neuron and defines the importance of the specific axon connectivity of the neighboring neuron nodes. For example, a reduction of the MFE force for insignificant nodes calls for the pruning mechanism.
Remarks are in Order
The unified field theory of all the four major physical forces, namely, gravitational, electromagnetic, strong nucleon, and weak neutron, requires nature to define the mass, the charge, the mason, and the neutrino. Likewise, the singularity of the brain MFE gradient force defines the existence of glial cells, see Eqs. (9, 10).
Note that a male finch bird can recruit its auditoria neural nodes to sing a new song every spring to attract a female finch bird (Paton, J. A.; Nottebohm, F. “Neurons generated in the adult brain are recruited into functional circuits,” SCIENCE 1984; 225(4666):1046-1048, Rockefeller Univ. circa 1970). UDL may be defined, like a Zebra finch bird, as follows: Not only can the brain UDL synaptic junctions among neurons have minor synaptic weight adjustment according to the Hebb linear product of the input and output rule (intuitively, a pipe of efficient I/O capability has a larger cross section of axon with a lower impedance), but also UDL can have dynamic architecture recruiting or not in the morphology by allowing a top layer to decide where to prune nodes and to recruit a lower layer's node at those layers hidden from the outside input world. This is the UDL of the IKR mechanism.
The MFE gradient force ≡−ΔH/Δ can determine the architecture of morphological learning with UDL, known as the CHNLD Rule.
Given growing or pruning recruiting neurons and layers into functional units, Maxwell-Boltzmann probability is applied to recruit-or-not a two-state normalization for the Nonlinear Sigmoidal Morphologic Rule
where the MFE gradient force defines the biological house-keeping servant glial cell per neuron:
glial
≡−ΔH/Δ
(11)
given the MFE determined by the ensemble input data {}
as well as the UDL Rule of the synaptic weight matrix
Q.E.D.
The isothermal free energy Hbrain=Ebrain−<To>Sbrain gradient force is the House-Keeping Servant (HKS) Glial Cells (GC) force:
where Ebrain is the internal energy and Sbrain is the measure of degree of uniformity, called entropy. The change in free energy toward the minimum is responsible for the self-organizational shift of the Hidden Deep Layer Architecture shape, from a beer belly shape to the hour-glass of ANN. Thus, the “learning machines” prosaically emulate the biological “brains”.
The Supervised Alan Turing (SAT) tests from the original definition of AI, which took days to evaluate using a supercomputer, is updated as an Unsupervised Turing Test (UTT), taking mere hours to evaluate using a PC. This is more than undifferentiated from a human at another computer terminal, but also can defeat a human most of the time in certain tasks, generalized as Unsupervised Natural Intelligence (NI).
The unsupervised learning rule of hidden deep layers architecture is Δ≡n+1−[W], n=1, 2, 3, . . . . The vector n+1 denotes each computing threshold logic node of different degrees of uniformity, this measure called entropy S driven by the ensemble of exemplar input data {}. While the strength increases and passes a threshold Δ for the recruiting node for a growing architecture, the strength for pruning diminishes. The weighted Maxwell-Boltzmann probability WMB=exp(−ΔHbrain/kBTo), ΔHbrain=ΔHrecruiting−ΔHpruning by a soft sigmoid threshold derived by the two-state normalization
where, the unknown equilibrium constant drops out in the changing ΔHbrain slope for glial cells, except the connectivity matrix and data ensemble, [W].
The efficiency of new unsupervised capability is driven by the natural relaxation toward less energy and more entropy, namely the MFE inequality ΔH=ΔEbrain−<To>ΔSbrain≦0. The unsupervised learning capability is derived from the Boltzmann assertion of irreversible heat death ΔSbrain>0 of a closed system due to increasing uniformity generated by incessant inter-molecular collisions as a consequence of Nernst thermodynamic 3rd law To≧0.
The UDL Rule of the synaptic weight matrix [W] driven by the incoming image ensemble {{right arrow over (X)}} dynamic interconnectivity of ANN, whether to be recruiting or not, namely pruning with the threshold to be empirically obtained from the stable mean, variance, and kurtosis of the exemplar ensemble of images, or voice, or video, or gaming {{right arrow over (X)}} without human supervision.
where the superscript T is the transpose operation that turns a row into column and vice versa.
As described above,
The morphology changes may be described as “Netted Sensing” in the real world surveillance application of Computational Intelligence (CI) training with the UDL rule.
While the MFE will cause glial force pruning of the insignificant next layer neurons, the MFE can also significantly grow the connection of IKR of next layer neurons, with the help of the hypothalamus brain center command secreting immunoreactivities hormones and dopamine (cf. Paul Greengard, et al.).
Of course, the devil of ANN training is in the details, and the time history of exposure makes difference, when the learning process alters the morphology. Such a dual capability of adaptive weights and morphology changes is the hallmark of modern deep-layer learning. Short term memory (STM) is located at the brain's frontal lobe. An average of the synaptic weights over the time <[W(t)]> can become long term memory (LTM) storage at the hippocampus, of which the left hemisphere is known to be logically rational, and the right hemisphere is mostly emotional, which fMRI hemodynamic imaging seems to confirm.
This is related to, and claims priority from, U.S. Provisional Application for Patent No. 62/341,478, which was filed on May 25, 2016, the entire disclosure of which is incorporated herein in its entirety.
Number | Date | Country | |
---|---|---|---|
62341478 | May 2016 | US |