The present disclosure generally relates to machine learning methods. In particular the present disclosure relates to an energy-efficient learning framework which exploits structural and functional similarities between a machine learning network and a general electrical network satisfying the Tellegen's theorem.
From an energy point of view, the dynamics of an electrical network is similar to that of a machine learning network. Both the networks evolve over a conservation manifold to self-optimize themselves to a low-energy state. In literature, this analogy has served as the basis for energy-based learning models where the network energy landscape is designed to match the learning objective function. The network variables then evolve according to some physical principles subject to network constraints to seek out an energy optimal state. Some notable examples of energy-based learning models include the Ising models, the Hopfield network, the Boltzmann machine and its variants. However, most of these formulations assume that the energy being minimized in the network is dissipative in nature, whereas in a physical electrical network, the total power SN (also known as the apparent power) comprises not only the dissipative component (also referred to as the active-power) but also a latent or stored non-dissipative component (also referred to as the reactive-power). This is illustrated in
Total Network Power SN=Active Power PN+j×Reactive Power QN Eqn. (1)
where j=√{square root over (−1)} denotes the imaginary component.
While the active-power PN represents the rate-of-energy loss in the network, the reactive-power QN represents the rate of change in stored energy in the network's electric and magnetic fields (typically modeled as lumped capacitive and inductive elements). In the design of electrical networks, reactive-power is generally considered to be a nuisance since it represents the latent power that does not perform any useful work. However, from the point of view of learning, the reactive-power could be useful not only for storing the network or learned parameters, but also to improve the dynamics of the network during learning.
Sonification, or the encoding of information in meaningful audio signatures, has emerged as a potential candidate for augmenting or replacing its visual counterpart because of various advantages it presents. Standard sonification methods existing in literature involve solving a learning or predictive task on the data and subsequently mapping the output to an audio signature, which is then used by the end user to take a decision. In this paper, we present a novel framework for sonification of high-dimensional data using the complex growth transform dynamical system model introduced in our previous work. In contrast to traditional sonification methods, the proposed technique integrates both the learning (or more generally, optimization) and sonification stages into the same module. The growth transform based sonification algorithm takes as input the data and optimization parameters underlying the learning or prediction task on one hand, and psychoacoustic parameters defined by the user on the other. Finally, it outputs a single-channel human-recognizable audio signature that encodes the high-dimensional space of the data, as well as the complexity of the optimization problem. We have also shown how different sonification strategies can be adopted, based on whether we want to use an existing audio clip, or customize the framework to create our own audio signal for sonification. Experiments on synthetic and real-world datasets of both static and time-varying nature demonstrate how the model reduces high-dimensional data into a low-bandwidth audio signature, which could potentially aid the end user in the decision-making process.
Other objects and features will be in part apparent and in part pointed out hereinafter.
In various aspects, a resonant machine learning processor is described that includes a network of internal nodes. Each internal node includes an LC tank with a variable capacitor with capacitance Ci connected electrically in parallel with a variable inductor with inductance Li. Each internal node electrically connected to a ground node and to each remaining internal node of the network of internal nodes. Each node further includes a normalized voltage phasor Vi, a normalized current phasor Ii, and a phase angle ϕi defined across each node of the network of internal nodes. During learning a plurality of learning parameters, the capacitance Ci and the inductance Li of each node is modulated to adjust a relative magnitude of the normalized voltage phasor Vi and the normalized current phasor Ii to optimize a total active network power and to maintain a total network reactive network power at essentially zero until a steady state network resonance is achieved. After completion of learning, the steady state network resonance is maintained, and the plurality of learned network parameters are stored and sustained using resonant magnetic fields and resonant magnetic fields produced within the LC tanks of the network in internal nodes. The steady state network resonance is maintained without dissipating power. Optimizing the total active network power and maintaining the total network reactive network power at essentially zero may include modulating the capacitance Ci and the inductance Li of each internal node according to a system objective function subject to at least one constraint:
subject to Σi=1N(|Vi|2+|Ii|2)=1, |ϕi|≤π, β≥0, where is a loss function defined over the plurality of learning parameters αi=|Vi|2+|Ii|2; Ψ is a regularization or penalty function; h is a hyperparameter that acts as a tradeoff between (·) and Ψ(·); β is a hyperparameter that acts as a tradeoff between convergence of the network of internal nodes and dissipation of active power Σi=1N|Vi∥Ii| cos ϕi; and N is the number of internal nodes.
In some aspects, the resonant machine learning processor may be a resonant support vector machine processor. Optimizing the total active network power and maintaining the total network reactive network power at essentially zero comprises modulating the capacitance Ci and the inductance Li of each internal node according to a system objective function SVM subject to at least one constraint, comprising:
subject to Σi=1N, (|Vi|2+|Ii|2)=1, |ϕi|≤π, β≥0, where the system function SVM is defined over a set of learning variables αi=|Vi|2+|Ii|2; K(xi, xi) is a kernel function defined in terms of X=[xi, . . . , xj, . . . , xN]; X is a D-dimensional input dataset of size N; v∈(0,1) is a parameter specifying a size of a control surface; Ψ is a penalty function; h defines a steepness of the penalty function; β is a hyperparameter that acts as a tradeoff between convergence of the plurality of nodes and dissipation of active power Σi=1N|Vi∥Ii|cos ϕi; and N is the number of internal nodes.
In various other aspects, a resonant machine-learning network system is described that includes a computing device with at least one processor and a memory storing a plurality of modules. Each module includes instructions executable on the at least one processor. The plurality of modules include a resonant machine learning network module to define a plurality of interconnected nodes. Each node includes a normalized voltage phasor Vi, a normalized current phasor Ii, and a phase angle ϕi defined across each node of the plurality of nodes. The plurality of modules further include a complex growth transform module to update the plurality of nodes according to complex growth transform model, and a resonant network convergence module to converge the plurality of nodes to a steady state solution by optimizing a system objective function subject to at least one constraint. The system objective function subject to at least one constraint may be given by:
subject to:
In some aspects, the complex growth transform model may include a system of update equations given by
ω is an angular frequency of the plurality of nodes, τi is a time constant associated with the development of ϕi, and ( )* denotes a complex conjugate of a phasor.
The complex growth transform model may also include an annealing procedure that includes providing a value of β according to an annealing schedule defining a plurality of values of β at a plurality of times during optimization of the system objective function subject. The annealing schedule may be selected from one of a constant schedule wherein the value of β remains constant, a switching schedule wherein the value of β switches from a first value to a second value at a switching time, and a logistic schedule, wherein the value of β changes according to a logistic curve.
In some additional aspects, the plurality of interconnected nodes may be divided into M subgroups where each node includes the voltage phasor Vik, the current phasor Iik, and the phase angle ϕik defined across each node k=1, . . . , M. In these additional aspects, the system objective function subject to at least one constraint may be given by:
subject to Σi=1N
Also in these additional aspects, the complex growth transform model may include a system of update equations given by:
ωk is an angular frequency of the nodes within the kth subgroup; τik is a time constant associated with the development of ϕik; and ( )* denotes a complex conjugate of a phasor.
In other additional aspects, the resonant machine-learning network system may be a resonant SVM system with a SVM system objective function SVM subject to at least one constraint given by:
subject to Σi=1N(|Vi|2+|Ii|2)=1, |ϕi|≤π, β≥0, where the SVM system objective function SVM is defined over a set of learning variables αi=|Vi|2+|Ii|2; K(xi,xj) is a kernel function defined in terms of X=[xi, . . . , xj, . . . , xN]; X is a D-dimensional input dataset of size N; v∈(0,1) is a parameter specifying a size of a control surface; Ψ is a penalty function; h defines a steepness of the penalty function; β is a hyperparameter that acts as a tradeoff between convergence of the plurality of nodes and dissipation of active power Σi=1N|Vi∥Ii| cos ϕi; and N is the number of interconnected nodes.
Other objects and features will be in part apparent and in part pointed out hereinafter.
The patent or application file contains at least one drawing executed in color. Copies of this patent or patent application publication with color drawing(s) will be provided by the Office upon request and payment of the necessary fee.
Those of skill in the art will understand that the drawings, described below, are for illustrative purposes only. The drawings are not intended to limit the scope of the present teachings in any way.
and acoustic parameters like sampling frequency, allowable frequency range of operation, maximum and minimum loudness values. The disclosed framework outputs a single channel audio waveform that encodes the underlying optimization process.
There are shown in the drawings arrangements which are presently discussed, it being understood, however, that the present embodiments are not limited to the precise arrangements and are instrumentalities shown. While multiple embodiments are disclosed, still other embodiments of the present disclosure will become apparent to those skilled in the art from the following detailed description, which shows and describes illustrative aspects of the disclosure. As will be realized, the invention is capable of modifications in various aspects, all without departing from the spirit and scope of the present disclosure. Accordingly, the drawings and detailed description are to be regarded as illustrative in nature and not restrictive.
In various aspects, an energy-efficient learning framework is described which exploits structural and functional similarities between a machine learning network and a general electrical network satisfying the Tellegen's theorem. The described formulation ensures that the network's active-power is dissipated only during the process of learning, whereas the network's reactive-power is maintained to be zero at all times. As a result, in steady-state, the learned parameters are stored and self-sustained by electrical resonance determined by the network's nodal inductances and capacitances. Based on this approach, three novel concepts are introduced: (a) a learning framework where the network's active-power dissipation is used as a regularization for a learning objective function that is subjected to zero total reactive-power constraint; (b) a dynamical system based on complex-domain, continuous-time growth transforms which optimizes the learning objective function and drives the network towards electrical resonance under steady-state operation; and (c) an annealing procedure that controls the trade-off between active-power dissipation and the speed of convergence. As a representative example, the proposed framework may be used for designing resonant support vector machines (SVMs), and it is demonstrated below that the support-vectors correspond to an LC network with self-sustained oscillations. It is also shown that this resonant network dissipates less active-power compared to its non-resonant counterpart.
In various aspects, a framework which exploits both active and reactive network power for learning and memory is disclosed. The objective of this framework in various aspects is to achieve a network power profile, as illustrated in
In various aspects, this steady-state condition corresponds to a state of electrical resonance, and this concept is generalized to a framework of resonant machine learning below. To reach this steady-state condition, as described in additional detail below, a dynamical system based on complex-domain continuous-time growth-transforms is presented that extends previous work on growth transform networks using real-variables. The complex-domain formulation allows manipulation of the relative phase between the voltage and current variables associated with the network nodes and thus is used to optimize the active-power dissipation during learning.
In general, the disclosed approach may be applied to different learning networks. By way of non-limiting example, described in detail below, the disclosed framework is used to design resonant one-class support vector machines (SVMs). In this context, the performance of the resonant SVM model is compared with its non-resonant variant. A brief discussion of additional applications of the disclosed framework as applied to other network-based models that do not involve learning, for instance, the coupled oscillator networks as described in additional detail below.
In the various aspects described below, various mathematical expressions may include variables as defined in Table 1 below.
+
N
Optimization and Electrical Resonance
Consider an electrical network as shown in
ΣiVijI*ij=0 Eqn. (2)
Isolating the apparent power flowing from the nodes to the ground terminal from that flowing between the internal nodes results in the expression of Eqn. (3):
where ST=Σi(Vi0)Ii0* is the nodal apparent power, and PT=Σi Re{Vi0Ii0} and are the total active and reactive power consumed at the nodes. Similarly, SN, PN and QN represent the apparent, active and reactive power consumed due to current flow between the network nodes (other than the ground terminal). Note that this result holds even if active-power sources are embedded in the network, as shown in
Thus, Equations (3) imply that if the active-power at the nodes of the network PT is minimized subject to the constraint that the nodal reactive power QT=0, then this formulation is equivalent to minimizing the network active-power PN while ensuring that the network reactive power QN=0. This result can be expressed as:
where Vi0=V and Ii0=Ii. If it is assumed that the ith node is associated with a lumped capacitance Ci, and a lumped inductance Li, ensuring zero nodal reactive-power implies
where
represent the current flowing through Ci and the voltage across Li respectively. Equation (5) is equivalent to:
ΣiN(½Ci|V|2+½Li|Ii|2=E0 Eqn. (6)
where |Vi|2=ViVi* and |Ii|2=IiIi*, implying that the total network reactive energy is conserved to be equal to some constant value E0.
Satisfying the constraint described above is equivalent to sustaining a condition of electrical resonance. The optimization problem as expressed in Eqn. (4) can be transformed as:
where ϕi denotes the phase-angle between the voltage and current phasors at the ith node.
Note that the optimization in Equation (7) may result in three types of solutions in steady-state: (a) (|Ii|≠0, |Vi|≠0, |ϕi|=π/2) which corresponds to a resonant LC tank; (b) (|Ii|=0, |Vi|≠0) which corresponds to an electrically isolated or floating node; and (c) (|Ii|≠0, |Vi|=0) which corresponds to a short-circuit condition. In various examples, described below, these steady-state resonance conditions are illustrated using a simple LC tank. Note that in all cases, active power is dissipated only during the learning phase, where Ci and Li change to change the relative magnitude of the voltage and current variables.
The constraint as expressed in Equation (8) can be simplified by normalizing with respect to E0 such that:
ΣiN(|Vi|2+|Ii|2)=1 Eqn. (9)
where
represent the dimension-less voltages and currents.
Henceforth, unless stated otherwise, subsequent expressions are derived in terms of dimensionless quantities. The optimization framework in Equation (7) is extended to include a general optimization function , expressed as:
In this formulation, the active-power dissipation in Equation (10) acts as a regularization function with β≥0 being a hyper-parameter. Note that the objective function {|Vi|, |Ii|}) is only a function of the magnitudes of the voltages and currents and is independent of the phase-angle ϕi, This ensures independent control of the magnitudes and the phase to achieve the desired objective of optimizing the active-power dissipation. This is illustrated in
The two important properties of the optimization framework in Equation (10) are as follows:
For a convex cost function :
This result follows from the three possible solutions of optimization framework of Eqn. (7).
If β is slowly annealed to a sufficiently high value, then ϕu→π/2 under steady state for i: |Vi∥Ii|≠0. This implies that the network achieves zero active power dissipation in steady state. Note that the method also holds for non-convex objective functions as well. In this case however, the network might show resonant behavior at a locally optimal solution.
By way of non-limiting example, consider a single-variable quadratic optimization problem of the form 1(x)=x2, subject to the constraint |x|≥1, x∈. Substituting x=|V|2−|I|2, the problem can be mapped as described in additional detail below into a form equivalent to Equation (10):
In various aspects, the optimization framework described in Equation (10) involves complex phasors and hence entails the use of learning models operating in the complex domain for reaching the optimal solution. To this end, a dynamical system may be used to solve this optimization problem. The main result is summarized in Table 2 and details of the proof are provided in additional detail below.
Theorem 1: The system of nonlinear dynamical equations given by Equations (15)-(18) in Table 2 converge to the optimal point of Equation (14) in the steady state, with zero energy dissipation i.e., Σn=1N|Vi∥Ii|cos ϕi=0. A formal proof of Theorem I is provided below.
In some aspects, the optimization framework given by Equations (12)-(13) may be extended to a multi-variable space, as expressed by:
The optimal solution is reached when ϕi=±π∀i which implies N=0. For the sake of comparison, we will consider two variants: (a) the non-resonant model (nr) where β=0, and (b) the resonant model (r), where β≠0.
In various aspects, under steady-state conditions, the model nr dissipates power. However for the model r the steady-state active-power goes to zero. This is illustrated in
By way of non-limiting example, the hyperparameter β was annealed and its impact on the active-power dissipation metric N and the convergence of the object function n for the model r was evaluated.
from t=0.1 s, and takes on a maximum value of βmax=1 (k and to are hyperparameters determining the steepness and mid-point of the sigmoid respectively); β=switching corresponds to the case when β switches from a minimum value (βmin=0) to a maximum value (βmax=1) at t=0.3 s, after the system has converged to the optimal solution. In all of the cases, the model is observed to converge to the optimal solution, irrespective of the choice of β. However, different annealing schedules for β lead to different active-power dissipation profiles. For example, a constant value of β throughout the duration of the experiment would lead to faster minimization of the active-power dissipation metric, but at the cost of slower convergence. The opposite trend can be seen when β is slowly annealed to a sufficiently high value throughout the course of the optimization. The annealing schedule thus acts as a trade-off between the speed of convergence, and the rate of minimization of the active power.
Table 2: Complex growth transform dynamical system
and τiωϕi+ϕi(t)=gϕi(t),∀i=1, . . . ,N (17)
ω is an arbitrary angular frequency, and τi is the time-constant associated with the evolution of ϕi.
A. Model Properties and Extensions
In various aspects, the dynamical system represented by Equations (15)-(18) and the resonant optimization framework also exhibits the following properties and extensions.
In one aspect, energy constraints can be imposed over subgroups of nodes in the network: In one aspect, the reactive energy conservation constraint is imposed between subgroups of nodes, instead of on the network as a whole, i.e., Σi=1N
The update equations in this case are given by:
where ωk is the constant angular frequency of the kth subgroup of nodes, and
In various other aspects, system dynamics and reactive-energy constraints remain invariant under the imposition of a global phase. The network dynamics remain invariant to the imposition of a global phase component on all the network variables, and the conservation constraint is also satisfied in this case. The governing equations are given by:
where ϕg is the global phase and
In various additional aspects, reactive-energy constraints remain invariant with varying network dynamics under the imposition of a relative phase: The conservation constraints are satisfied on the introduction of a relative phase component between the voltage and current phasors of each node, even though the overall network dynamics change. The governing equations are given by:
where ϕi=ϕI
In another additional aspect, the model is dissipative and converges to limit cycle oscillations in steady state. The second order time derivatives of Equations (15) and (16) lead to the following:
The first terms in the RHS of Equations (24) and (25) correspond to stable limit cycle oscillations of all the phasors, while the other terms correspond to the dissipative effects in the network. This demonstrates that the network as a whole is essentially a coupled dissipative system that is capable of self-sustained oscillations under steady-state. Each individual state variable describing the network thus returns to the same position in its respective limit cycle at regular intervals of time, even when subjected to small perturbations.
In various aspects, the frameworks described above can be applied for constructing resonant machine learning networks. In general, the framework can applied to any learning network that optimizes a cost-function defined over a set of learning variables αi as
Here ({αi}) represents a loss-function which depends on the learning problem (e.g. supervised, unsupervised or semi-supervised) and the dataset under consideration (e.g., training data). The second term Ψ(·) in the objective function is any linear or nonlinear function which represents either (a) a regularization function, or (b) a penalty function used to satisfy optimization constraints. h is a hyperparameter which acts as a trade-off between (·) and Ψ(·). Because αi could be viewed as probability measure, the optimization framework in Equation (26) naturally lends itself to probabilistic learning models.
The above problem can be mapped to the resonant learning framework described above by substituting αi=|Vi|2+|Ii|2, to arrive at the following problem:
Note that non-probabilistic learning problems can also be mapped to the probabilistic framework by imposing an additional constraint, as discussed in additional detail below.
A. One-Class Resonant SVM
In various aspects, the framework in Equation (27) can be used to design a resonant one-class SVM. The solution of a generic one-class SVM is obtained by solving the following optimization problem:
where X=[x1, . . . , xi, . . . , xN]∈N×D is the D dimensional input dataset of size N, v∈{0, 1} is a parameter which controls the size of the decision surface, K(·, ·) is a positive definite Kernel function satisfying Mercer's conditions, and αi's are the learning parameters.
The optimization problem above can be reformulated by replacing the inequality constraint with a smooth penalty or loss function Ψ(·) including, but not limited to, a logarithmic barrier, e.g.,
yielding:
The parameter h determines the steepness of the penalty function, where h→∞ implies an almost-accurate inequality constraint. An equivalent complex-domain representation in terms of voltages and currents in an LC network can be arrived at by considering αi=|Vi|2+|Ii|2 ∀i. In this case, the network is considered to be globally energy constrained, and all the individual units in the network have the same frequency co. The redefined learning problem is defined as follows:
Introducing the active-power dissipation regularization results in the following problem:
The update equations in this case are of the form shown in Equations (15)-(18) as described above.
and takes on maximum values of Omax=1 and Omax=10 respectively starting from r3 min=0. Finally, (3=switching corresponds to the case when 13 switches from a minimum value ((3 min=0) to a maximum value Wmax=10) at t=2 s, after the system has converged to the optimal solution.
In various aspects, a complex-domain formulation of a machine learning network that ensures that the network's active power is dissipated only during the process of learning whereas the network's reactive power is maintained to be zero at all times is disclosed. It was demonstrated that the active power dissipation during learning can be controlled using a phase regularization parameter. Also, the framework is robust to variations in the initial conditions and to the choice of the input/driving frequency co. The proposed approach thus provides a physical interpretation of machine learning algorithms, where the energy required for storing learned parameters is sustained by electrical resonances due to nodal inductances and nodal capacitances.
For instance, the solution of the one-class SVM, described in detail above, could be interpreted as follows, as illustrated in
In various additional aspects, network dynamics may be incorporated into the free frequency variable, and the phase information associated with each node may be utilized in the learning process. Although the methods and experimental results described herein are based on an unsupervised learning setting, the methods disclosed herein are suitable for use in developing an energy-efficient framework for supervised learning problems. Although the dynamical system models described herein are based on complex-domain growth transforms, the formulation is general enough to be applicable to other complex-domain learning models, where the disclosed approach may demonstrate a richer set of dynamics, robustness with respect to noise, and better convergence properties in the case of classification problems. Moreover, the phase information aspect of the disclosed methods may provide additional insights in the context of many complex valued physical signals (or complex domain transforms of physical signals, e.g., Fourier or wavelet transforms). Because the framework described herein preserves both the magnitude and phase information, the disclosed methods enable enhanced flexibility compared to other complex domain learning models in terms of phase manipulation/cancellation.
In addition to implementing classic machine learning algorithms, the complex growth transform dynamical system disclosed herein may also be used for designing synchronized networks of coupled oscillators. Such networks can be potentially used for solving different computing tasks like optimization, pattern matching etc. as is achievable using coupled oscillatory computing models. An oscillator network designed in this fashion is capable of demonstrating stable, self-sustaining oscillatory dynamics, whereby the network can return to its initial stable limit cycle configuration following small perturbations, while simultaneously minimizing some underlying system-level objective function. The framework disclosed herein may also be used to study connections between learning and synchronization and/or the emergence of a rhythmic periodic pattern exhibited by a group of coupled oscillators, which may enable enhanced understanding of a variety of periodic processes pervading complex networks of different biological, physical, social and quantum ensembles. In this regard, the existing mathematical models for such collective behavior are mostly phenomenological or bottom-up, and in general do not provide a network-level perspective of the underlying physical process. The disclosed growth-transform formulation, thus, could provide new network-level insights into the emergence of phase synchronization, phase cohesiveness and frequency synchronization in coupled-oscillator networks.
Data Sonification Using Complex Growth Transform Dynamical Systems
With the multitude of interdisciplinary data available in recent times, the search for better techniques for perceptualizing data before applying it to any learning or predictive task has continued to be one of the most important research areas over the past few decades. While visualization still remains the modality of choice for most applications, research on sonification, or the representation of data using human-recognizable audio signatures, is gaining momentum as a complementary or alternative modality to visual perception.
Some of the advantages of data sonification/audification, which make them ideal candidates for potentially augmenting, and in some specific applications, replacing visualization for analyzing the properties of the dataset, are as follows:
Temporal resolution for auditory perception is better than that for visual perception. This makes them suitable for time-varying data with complex patterns, which might otherwise be missed by visual displays.
Audio is orientation-agnostic and has a wider spatial range, since the user is not required to be oriented towards a particular direction. In contrast, objects need to be within the field of vision for the visual system. Humans typically respond faster to auditory feedback when compared to visual feedback.
In scenarios where the visual scene is crowded with multiple displays, or the visual system is otherwise occupied, auditory signals can be used as a means of drawing the user's attention to a particular segment of the visual field.
Auditory perception provides a natural alternative to shrinking display sizes, especially for monitoring and alerting applications.
In this paper, we present a novel framework for sonifying high-dimensional data using a class of complex dynamical systems based on Baum-Eagon growth transforms. Given a continuously differentiable optimization problem defined on a conservation manifold, and sonification parameters like the sampling frequency, spectral range, maximum allowable loudness, maximum power content of the output signal etc., the proposed algorithm produces an audio signature which encodes both the complexity of the optimization problem, as well as the complexity of the dataset. The scope of the paper is illustrated in
The disclosed sonification method inherently maps a high-dimensional data into a lower-dimensional single-channel audio signal. This is particularly suitable when dealing with low throughput systems having limited capacity/bandwidth. The algorithm automatically combines the learning and sonification stages into a single module. This is unlike the sonification-based decision making algorithms existing in literature, which typically accomplish learning the decision parameters and mapping the same to different sound parameters in subsequent stages. However, similar to other sonification techniques, this method too involves a human-in-the-loop component, since the decision is ultimately taken by the end user.
In addition, the disclosed sonification method is versatile and can be tailored to accommodate a variety of learning models, problem dimensionality and dataset sizes. Additionally, the complex growth transform dynamical system provides a wide range of tunable parameters which can be customized for different applications.
A. Visualization of High-Dimensional Data
A variety of methods for visualization of high-dimensional data have been previously described. While elementary visualization schemes make use of different visual attributes like color, shape, size, spatial location etc. for representing information, almost all the standard visualization techniques typically use these after mapping the data to a lower dimensional (usually two or three dimensional) space. Some of the most commonly used techniques used in practice include linear techniques like PCA, LDA and their variants that only preserve the global structure of the data, while nonlinear methods like Locally Linear Embedding, Isomap, Laplacian EigenMap etc have been designed to preserve both global and local information. However, they are usually not very adept in preserving both global and neighborhood information in a single map. Stochastic neighborhood preserving approaches like tSNE and its variants have been proposed to alleviate these problems, however they suffer from scalability issues and do not perform well on noisy data. A more recent technique called PHATE proposed in the context of high-dimensional biological data was shown to preserve both local and global information, while simultaneously being scalable and noise robust.
However, these techniques are suitable for time-invariant or static data, and analysis of time-varying high-dimensional using these methods would involve visualizing the data frame-wise in a lower dimensional space. Though time-varying methods based on variants of the visualization techniques for static data have been proposed, e.g. m-tSNE, they lack interpretability and suffer from the same limitation of being restricted to only three dimensions for representing the data. Though other visual attributes like color, size, shape etc. can be used in conjunction with the three spatial dimensions, these may not be enough for complex datasets with very high dimensionality and can cause information overload of the visual system. The sheer magnitude and dimensionality of data available in recent times are slowly pushing the users to the limits of comprehension and interpretability of visual information. Additionally, existence of complex, time-varying patterns in the data are sometimes difficult to capture using only the visual faculties. Thus, employing sonification as an alternative or complementary modality would help users to interpret the data more effectively, or to draw their attention to the important aspects of the data which can then be monitored/analyzed by visual inspection.
B. Data Sonification and its Applications
Sonification, similar to its visual counterpart, provides a number of attributes which can be used for representing information, like pitch, loudness, timbre, rhythm, duration, harmonic content etc. At least two different sonification methods exist in literature depending on how the data is mapped into an audio signal. Of these, the parameter mapping method, where different variables can be mapped into different attributes of the sound signal to create a sonified signature, is the most commonly used. Parameter mapping based sonification has been used to a wide range of applications, ranging from sonifying astronomical data like gravitational waves and photons emitted by the Higgs bosons, to synthesizing novel protein structures and detecting anomalies in medical data (e.g. CT scans of Alzheimer's patients, EEG and ECG signals, skin cancer detection, epileptic seizure detection), detection of anomalous events etc. Sonification based detection of intrusion in networks, analysis of stock market data, and in therapeutic treatment of freezing of gait in Parkinson's patients by mapping kinematic data to the frequency of the sonified signal. More recently, sonification has been applied to the analysis of RNA sequences in different strains of the Covid19 virus. Unlike the visualization schemes, sonification offers a much wider parameter space for the variables to be mapped into. For example, humans can perceptualize sound signals anywhere from 20 Hz to 20 kHz, and frequency differences as low as 3 Hz are easily discernible by the human auditory system. However, in tasks that require the user to take a decision based on the data properties, usually the data is first passed through a machine learning or optimization module which learns the manifold the data resides in, and then maps the output to different properties of the sound signal to be sonified.
III. Sonification of an Optimization Problem Using Complex Growth Transforms
In this section, we introduce a novel framework for sonifying high-dimensional data while simultaneously solving an underlying optimization problem, by using a variant of the complex domain dynamical system model proposed in our previous work. The sonification algorithm has been summarized in Table 3, while the proof has been outlined in Appendix VIII-A. The method is not restricted to optimization problems defined over a probabilistic simplex, and can be extended to problems defined by constraints of the form |pik|≤γ, γ>0 by a mapping procedure shown below.
Note that the disclosed method is a type of parameter mapping based sonification, since different attributes of the data or optimization process are mapped to different parameters of the output sound signal. In particular, each variable or data point can be mapped to a complex growth transform limit cycle oscillator, subgroups of which are globally coupled together by some conservation constraint. This is illustrated in
By way of non-limiting example, consider a simple one-dimensional quadratic optimization problem given by:
Taking p=|ψ1|2−|ψ2|2, where ψ1, ψ2∈, the above problem can be solved by using the complex growth transform dynamical system model.
By way of another non-limiting example, consider the following quadratic optimization problem:
Taking p=|ψ1|2−|ψ2|2, where ψ1, ψ2∈, the above problem can be solved by using the complex growth transform dynamical system model. The final sonified output is then given by:
ψsum(t)=Σi=1Nψi1+ψi2(t).
Case 1: For the first set of simulations, the relative frequencies are chosen to be ξik=ξ=500 Hz for all the oscillators, and no amplitude modulations or frequency modulations were added to the relative frequency trajectories. The plots of exp(jξik t), which is a unit amplitude sinusoid with a constant frequency for all the oscillators in this case, are shown in
Case 2:
Case 3: In this case, a varying amplitude term constant for each oscillator was added to the relative frequency term, and no frequency modulation was added. The multiplicative term in the update equation is thus replaced by exp(jξikt)←exp((−ρ+jξ)t), where ρ=1 was chosen for the experiment.
Case 4: In the last set of experiments, both the amplitude and frequency of the relative frequency trajectories were modulated for all the oscillators, i.e., exp(jξikt)←exp((−ρ+jξσik(t))t). In both Cases 3 and 4, the relative frequencies were applied only during the transient phase of the simulation, and was set to zero during the initial stage as well as after convergence to a steady state. It can be seen that the frequency shift during the transient phase in Case 4 is significantly higher than all the other stages.
Based on the above experiments, we can thus encode information about the complexity of the optimization problem and the total energy available to the network. Also, different encoding strategies can be employed by exploiting the relative frequencies of the oscillators. Additionally, frequency selectors/tuners can also be implemented by assigning sufficiently lower levels of energy to the oscillator network, such that only the oscillator with the maximum energy content is sustained.
Sonification Strategies
Different sonification strategies can be adopted by considering different mapping schemes for both the baseline frequencies ωi, and the relative frequencies ξik. In a sense, the sonification process can be thought of as projecting high-dimensional data into a low-dimensional basis space cre-ated by predetermined frequency trajectories. We will present here three different strategies of sonifying data: (a) using frequencies equally spaced on the Bark-scale, (b) creating chords based on a musical scale of choice, and (c) extracting dominant frequency trajectories from a chosen musical piece and mapping the basis set of frequencies to these trajectories. For generating human recognizable auditory signatures, all the baseline and relative frequencies should lie within the range of human perception, i.e., 20 Hz-20 kHz. Furthermore, the largest frequency assigned to a variable should be less than the Nyquist frequency to avoid aliasing.
Some of the desirable characteristics of a candidate sonification strategy are as follows: 1) Different data distributions should be encoded by different sound signatures, depending on the underlying optimization task; 2) The complexity of the dataset or the underlying optimization problem should affect the output sound signature; and 3) For time-varying data, drift in the data distribution over time should lead to a drift in the sonified signal as well.
For example, if we consider clustering as the underlying optimization problem, where the number of allowable clusters (K) is fixed a priori, then we would ideally want the sonified output to give an indication of (a) the instantaneous cluster densities, (b) instantaneous separation between the clusters and (c) the time it takes for the optimization problem to converge to the optimal cluster assignments.
A. Bark Scale-Based Sonification
This method involves mapping the relative/baseline frequencies to equally spaced frequencies on the Bark scale. Any number of frequency trajectories can be selected if masking is acceptable in the end application. However, if we want to eliminate masking effects from the output sonified signal, the Bark scale frequencies should be chosen in a way such that the critical bands around these frequencies do not overlap with each other. Since the Bark scale has 24 critical bands, this implies that the number of frequencies in the basis set are limited to a maximum of 24 if we want to avoid masking. Additionally, the sonification module should be designed in a way that each frequency trajectory remains within its critical band on the Bark scale, even during the transient phase. The advantage of this method is that changes in each frequency trajectory (i.e., each sonified variable) can be discerned unambiguously, since there is no mixing of trajectories throughout the duration of sonification.
B. Musical Chord-Based Sonification
This approach is similar to the Bark scale-based approach, with the relative (or baseline) frequencies being mapped to a predetermined musical scale, e.g., the equally-tempered Western scale. Depending on the user requirements, we may create a chord by choosing notes in a single octave as the basis set. We can also select notes over multiple octaves, associating a different timbre to each, giving the impression of multiple different instruments being played. For example, we can map the frequencies to every other note in a diatonic scale such that they form a triad (e.g., the notes C, E and G form the C major triad in the equally-tempered scale around A4=440 Hz). Similarly, four or more random notes from the same octave to create a generic chord. The method also allows for the creation of arpeggios (or a broken chord being played multiple times in succession) by suitably defining periodic repetitions of the set of chords. Overlapping of the frequency trajectories corresponding to different sonification variables may or may not occur, depending on (a) the pairwise distances between the frequency trajectories, and (b) the maximum extent of frequency perturbation caused by the sonification module. Additionally, since humans are more adept at recognizing time-varying audio signatures compared to static tones, a small slowly varying sinusoidal variation may be added on top of the original frequency trajectories. The amplitudes and frequencies of these sinusoidal variations may be either (a) kept constant, or made to vary based on (b) the convergence properties of the optimization problem, or on (c) the instantaneous statistical properties of the sonification variables.
C. Sonification Using an Existing Musical Piece
This method of sonification involves extracting a certain number of dominant frequency trajectories (depending on the desired size K of the basis set) from an existing musical piece. Additionally, the following steps need to be taken for extracting the dominant frequency trajectories from the chosen musical piece.
Typically, a musical piece may be much longer in duration compared to the simulation duration, and has a much higher sampling rate (44 kHz). Thus we would need to extract a segment of the entire composition and compress its time scale to ensure a proper mapping. Next, we analyze the spectrogram of the time-scaled musical sample and extract K dominant frequency components in each time window of the spectrogram, based on the power content of each frequency component in that window. Finally, we carry out upsampling and interpolation to form continuous frequency trajectories that represent K most dominant components of the musical composition.
The sonification variables are then mapped to these frequency trajectories. During the sonification process, these trajectories undergo perturbations from their original time signature in the transient phase, depending on the dataset complexity and the optimization problem being solved. The baseline frequency perturbations may either remain unmodified, or can be made a function of the convergence properties only. They might also be made to depend on certain statistical properties of the dataset or the optimization variables. Finally, for better interpretability, the output of the sonification module can be treated as a “noise” signal and superimposed on the original musical composition. The degree of deviation of this super-imposed signal from the original composition thus encodes the complexity of the dataset and the optimization process.
Experiments on Synthetic Datasets
A. Sonification of a Clustering Problem
As a case study, we will consider a data clustering problem, where the goal is to sonify the data by mapping each data cluster to a particular tone or frequency trajectory, and with the amplitude of each trajectory encoding the cluster density. Since the focus of the paper is on advocating a new sonification framework and not improving the efficiency of the particular clustering algorithm chosen, for the sake of simplicity we will consider a similarity-based probabilistic clustering approach. This involves solving a non-negative matrix factorization problem that minimizes the distance between a similarity matrix computed pairwise between the data points, and the actual likelihoods of the different data points to be clustered together in space. Considering a D-dimensional dataset X∈N×D and W∈N×N being the similarity matrix which computes some pairwise distance between the data points, the following optimization problem assigns each of the data points to each of K possible clusters:
Here, pik denotes the probability of the ith data point of belonging to the kth cluster, and β denotes a scaling factor such that βpipj represents the true likelihood of the ith data point of belonging to the kth cluster. In this approach, each wij is assumed to be normally distributed about its corresponding true likelihood with mean μ and variance σ2∀i, k. Note that the similarity matrix can be chosen to encode any pairwise distance metric between the data points, e.g., W can be the RBF Kernel computed pairwise between the data points by mapping them to a high-dimensional space. Note that the sonification approach is generic enough and can be applied to other types of clustering algorithms like k-means, Gaussian mixture models, etc. Following the procedure outlined in Table 3 above, we can apply the mapping pik=|ψik|2, ψik∈. Sonification of the clustering problem can then be achieved in the following manner.
We assign the same baseline frequency to all the sub-groups (individual data points in this case), i.e., ωi=ω∀i. Each cluster is assigned to a particular relative frequency trajectory according to the sonification strategy chosen, i.e., ξik=ξk ∀i. Depending on the sonification strategy, the unperturbed version of the relative frequencies ξk (t) may be chosen according to one of the strategies described below.
Bark scale based: Each of the K clusters is mapped to a distinct frequency on the Bark scale (with or without masking effects, depending on the frequency spacing). A slow sinusoidal variation may be added about each orig-inal frequency trajectory depending on the instantaneous cluster density, as well as the convergence characteristics of the optimization problem. The relative frequencies are thus modulated over time governed by the following equation:
ξk(t)←ξk(t)[1+ak sin(2πbkΔfk(t)t]sk Eqn. (40)
where
ak, bk are real-valued scalar quantities, ck(0)=N/K is the initial cluster density (assuming the data points to be uniformly distributed among the clusters), and ck (t)=Σi=1N|ψik|2 an is the instantaneous cluster density of the kth cluster.
Musical chord based: Each of the K clusters is mapped to a distinct frequency on a chosen musical scale. In this case too, a slow sinusoidal variation may be added about each original frequency trajectory depending on the instantaneous cluster density, as well as the convergence characteristics of the optimization problem. The evolution equations for ξk(t) are similar to those for the Bark scale-based method.
(c) Using an existing musical composition: Each of the K clusters is mapped to a distinct frequency trajectory extracted from the musical compostions. The instantaneous statistical properties of each cluster as well as the convergence properties of the optimization variables may be used as a scaling factor, in order to enhance the perturbations caused by the growth transform optimization framework. In this case, the evolution equations may have a sinusoidal variation as in the previous two approaches, or may be of the following form:
ξk(t)→ξk(t)[1+akΔfk(t)]sk Eqn. (41)
This is because the original frequency trajectories vary over time, and hence an additional sinusoidal perturbation for recognizing the audio signature in the steady state might not be necessary.
The update equation for β is given by:
while those of the complex waveforms ψik are obtained using the complex growth transform updates in Table 3. The time evolution of ξk(t)∀i occurs according to the update rules, depending on the chosen sonification strategy.
The final output of the sonification module is obtained by superimposing the waveforms of all the oscillators, i.e., ψsum(t) Σi=1NΣk=1Kψik(t)
Next, we demonstrate the properties of the sonification technique when applied to the clustering problem for different sonification strategies, actual number of clusters in the dataset, dataset complexity (i.e., cluster alignments) and the number of clusters assigned a priori (i.e., the size of the basis set of frequencies K).
Derivation of the Sonification Algorithm
Theorem 1: Considering ψ∈DC={Ψ∈N×K:Σk=1K|ψik|2∀i=1, . . . , N}⊂N×K, a time evolution of the form given below converges to limit cycles corresponding to the optimal point of a Lipschitz continuous objective function H(|Ψ|2):
ψik,n←ψik,n−1[cos(θi,n−1)+j sin(θi,n−1)σik,n−1(|ψn−1|2)] Eqn. (43)
where σik→1 ∀i=1, . . . , N, k=1, . . . , M in steady state.
Proof: Consider a constrained optimization problem of the following form:
As described above, we used the Baum-Eagon inequality to show that the optimal point of a generic Lipschitz continuous cost function H(P), P∈D⊂N×K, the optimal point of the optimization problem represented by Eqns. (44) and (45), corresponds to the steady state solution of a multiplicative update-based discrete-time growth transform dynamical system model given by:
The constant λ∈+ is chosen to ensure that
The convergence also holds if the time-constants ai are time-varying, since this would still ensure the invariance of the manifold D.
Taking gik,n−1(Pn−1)=σik2(Pn−1)∀i,k and αi,n−1=sin2 θi,n−1, we get:
p
ik,n←γ[(1−αi,n−1)pik,n−1+αi,n−1pik,n−1(Pn−1)]←γ[cos2θi,n−1pik,n−1+sin2θi,n−1pik,n−1σik,n−12(Pn−1)] Eqn. (47)
Representing pik,n=ψik,nψik,n, ψik,n∈, the update equations become:
ψik,nψik,n*←ψik,n−1[cos(θi,n−1)+sin(θi,n−1)σik,n−1(Pn−1)]×ψi*k,n−1[cos(θi,n−1)+j sin(θi,n−1)σik,n−1(Pn−1)]* Eqn. (48)
Considering H(P)=H(|ψ|2) to be analytic in DC, since by Wirtinger's calculus,
The discrete time update equations for ψij|n is thus given by:
ψik,n←ψik,n−1[cos(θi,n−1)+j sin(θi,n−1)σik,n−1(|Ψn−1|2)] Eqn. (51)
Theorem 2: A continuous-time variant of the coupled oscillator model given by Equation (43) is given by:
Proof: Note that Eqn. (52) can be rewritten as:
ψik,n←ψik,n−1{tilde over (σ)}ik,n−1exp jϕik,n−1) Eqn. (53)
where {tilde over (σ)}ik,n−1=√{square root over (1+sin2(θi,n−1)(σik,n−12−1))}, σik,n−1 has been used to represent σik (|ψn−1|2), and ϕik,n−1=a tan{tan(θi,n−1)σik,n−1}.
Since
ψik,n−ψik,n−1←ψik,n−1{tilde over (σ)}ik,n−1[exp(jϕik,n−1)−1]+ψik,n−1[{tilde over (σ)}ik,n−1−1], Eqn. (54)
In the limiting case, this reduces to the continuous time update equation of the complex variable ψik(t):
where Δσik,n={tilde over (σ)}ik,n−1−1, and ϕik,n−1=ω′ik,n−1Δt, where Δt is the sampling interval.
In the steady state, since H is Lipschitz continuous in
the dynamical system thus becomes:
where θi=ωiΔt. This thus implies that the steady state response of the complex variable ψik(t) corresponds to steady oscillations with a constant angular frequency of ωi.
Theorem 3: Different oscillation frequencies can be assigned to each each element in the dynamical system represented by Equation (43) by adding an instantaneous relative phase term as follows:
Proof: Considering φik,n− to be the instantaneous relative phase of the ikth element with respect to an absolute reference at the nth time instance, we have:
ψik,n←ψik,n−1{tilde over (σ)}ik,n−1exp(jϕik,n−1)exp(jφik,n−1) Eqn. (58)
This leads us to the following system of dynamical system equations:
ψik,n−ψik,n−1←ψik,n−1{tilde over (σ)}ik,n−1[exp(j(ϕij,n−1+φik,n−1))−1]+ψij,n−1[{tilde over (σ)}ik,n−1−1], Eqn. (59)
which in the limiting case leads to Equation (25), considering φik,n−1=ξik,n−1Δt.
Mapping a Generic Optimization Problem to the Equivalent Network Model
Let us consider an optimization problem of the following generic form:
Consider pik=pik+−pik−∀i, where both pik+, pik−≥0. Since by triangle inequality, |oik|≤|pik+|+|pik−|, enforcing pik++pik−=γ ∀i would automatically ensure |pik|≤1 ∀i. Thus we have,
s.t.|pik|≤γ,pik∈s.t. pik++pik−=pik+,pik−≥0
Finally, we replace
to arrive at the following equivalent optimization problem over a probabilistic domain:
Computing Systems and Devices
In other aspects, the computing device 302 is configured to perform a plurality of tasks associated with the disclosed method.
In one aspect, database 410 includes various data 412. Non-limiting examples of suitable algorithm data 412 includes any values of parameters defining the disclosed method, such as any of the parameters from the equations described above.
Computing device 402 also includes a number of components which perform specific tasks. In the example aspect, computing device 402 includes data storage device 430, computing component 440, and communication component 460. Data storage device 430 is configured to store data received or generated by computing device 402, such as any of the data stored in database 410 or any outputs of processes implemented by any component of computing device 402. Computing component 440 is configured to perform the tasks associated with the method described herein in various aspects.
Communication component 460 is configured to enable communications between computing device 402 and other devices (e.g. user computing device 330 shown in
Computing device 502 may also include at least one media output component 515 for presenting information to a user 501. Media output component 515 may be any component capable of conveying information to user 501. In some aspects, media output component 515 may include an output adapter, such as a video adapter and/or an audio adapter. An output adapter may be operatively coupled to processor 505 and operatively coupleable to an output device such as a display device (e.g., a liquid crystal display (LCD), organic light emitting diode (OLED) display, cathode ray tube (CRT), or “electronic ink” display) or an audio output device (e.g., a speaker or headphones). In some aspects, media output component 515 may be configured to present an interactive user interface (e.g., a web browser or client application) to user 501.
In some aspects, computing device 502 may include an input device 520 for receiving input from user 501. Input device 520 may include, for example, a keyboard, a pointing device, a mouse, a stylus, a touch sensitive panel (e.g., a touch pad or a touch screen), a camera, a gyroscope, an accelerometer, a position detector, and/or an audio input device. A single component such as a touch screen may function as both an output device of media output component 515 and input device 520.
Computing device 502 may also include a communication interface 525, which may be communicatively coupleable to a remote device. Communication interface 525 may include, for example, a wired or wireless network adapter or a wireless data transceiver for use with a mobile phone network (e.g., Global System for Mobile communications (GSM), 3G, 4G or Bluetooth) or other mobile data network (e.g., Worldwide Interoperability for Microwave Access (WIMAX)).
Stored in memory area 510 are, for example, computer-readable instructions for providing a user interface to user 501 via media output component 515 and, optionally, receiving and processing input from input device 520. A user interface may include, among other possibilities, a web browser and client application. Web browsers enable users 501 to display and interact with media and other information typically embedded on a web page or a website from a web server. A client application allows users 501 to interact with a server application associated with, for example, a vendor or business.
Processor 605 may be operatively coupled to a communication interface 615 such that server system 602 may be capable of communicating with a remote device such as user computing device 330 (shown in
Processor 605 may also be operatively coupled to a storage device 625. Storage device 625 may be any computer-operated hardware suitable for storing and/or retrieving data. In some aspects, storage device 625 may be integrated in server system 602. For example, server system 602 may include one or more hard disk drives as storage device 625. In other aspects, storage device 625 may be external to server system 602 and may be accessed by a plurality of server systems 602. For example, storage device 625 may include multiple storage units such as hard disks or solid state disks in a redundant array of inexpensive disks (RAID) configuration. Storage device 625 may include a storage area network (SAN) and/or a network attached storage (NAS) system.
In some aspects, processor 605 may be operatively coupled to storage device 625 via a storage interface 620. Storage interface 620 may be any component capable of providing processor 605 with access to storage device 625. Storage interface 620 may include, for example, an Advanced Technology Attachment (ATA) adapter, a Serial ATA (SATA) adapter, a Small Computer System Interface (SCSI) adapter, a RAID controller, a SAN adapter, a network adapter, and/or any component providing processor 605 with access to storage device 625.
Memory areas 510 (shown in
The computer systems and computer-implemented methods discussed herein may include additional, less, or alternate actions and/or functionalities, including those discussed elsewhere herein. The computer systems may include or be implemented via computer-executable instructions stored on non-transitory computer-readable media. The methods may be implemented via one or more local or remote processors, transceivers, servers, and/or sensors (such as processors, transceivers, servers, and/or sensors mounted on vehicle or mobile devices, or associated with smart infrastructure or remote servers), and/or via computer executable instructions stored on non-transitory computer-readable media or medium.
In some aspects, a computing device is configured to implement machine learning, such that the computing device “learns” to analyze, organize, and/or process data without being explicitly programmed. Machine learning may be implemented through machine learning (ML) methods and algorithms. In one aspect, a machine learning (ML) module is configured to implement ML methods and algorithms. In some aspects, ML methods and algorithms are applied to data inputs and generate machine learning (ML) outputs. Data inputs may include but are not limited to: images or frames of a video, object characteristics, and object categorizations. Data inputs may further include: sensor data, image data, video data, telematics data, authentication data, authorization data, security data, mobile device data, geolocation information, transaction data, personal identification data, financial data, usage data, weather pattern data, “big data” sets, and/or user preference data. ML outputs may include but are not limited to: a tracked shape output, categorization of an object, categorization of a type of motion, a diagnosis based on motion of an object, motion analysis of an object, and trained model parameters ML outputs may further include: speech recognition, image or video recognition, medical diagnoses, statistical or financial models, autonomous vehicle decision-making models, robotics behavior modeling, fraud detection analysis, user recommendations and personalization, game AI, skill acquisition, targeted marketing, big data visualization, weather forecasting, and/or information extracted about a computer device, a user, a home, a vehicle, or a party of a transaction. In some aspects, data inputs may include certain ML outputs.
In some aspects, at least one of a plurality of ML methods and algorithms may be applied, which may include but are not limited to: linear or logistic regression, instance-based algorithms, regularization algorithms, decision trees, Bayesian networks, cluster analysis, association rule learning, artificial neural networks, deep learning, dimensionality reduction, and support vector machines. In various aspects, the implemented ML methods and algorithms are directed toward at least one of a plurality of categorizations of machine learning, such as supervised learning, unsupervised learning, and reinforcement learning.
In one aspect, ML methods and algorithms are directed toward supervised learning, which involves identifying patterns in existing data to make predictions about subsequently received data. Specifically, ML methods and algorithms directed toward supervised learning are “trained” through training data, which includes example inputs and associated example outputs. Based on the training data, the ML methods and algorithms may generate a predictive function which maps outputs to inputs and utilize the predictive function to generate ML outputs based on data inputs. The example inputs and example outputs of the training data may include any of the data inputs or ML outputs described above.
In another aspect, ML methods and algorithms are directed toward unsupervised learning, which involves finding meaningful relationships in unorganized data. Unlike supervised learning, unsupervised learning does not involve user-initiated training based on example inputs with associated outputs. Rather, in unsupervised learning, unlabeled data, which may be any combination of data inputs and/or ML outputs as described above, is organized according to an algorithm-determined relationship.
In yet another aspect, ML methods and algorithms are directed toward reinforcement learning, which involves optimizing outputs based on feedback from a reward signal. Specifically ML methods and algorithms directed toward reinforcement learning may receive a user-defined reward signal definition, receive a data input, utilize a decision-making model to generate a ML output based on the data input, receive a reward signal based on the reward signal definition and the ML output, and alter the decision-making model so as to receive a stronger reward signal for subsequently generated ML outputs. The reward signal definition may be based on any of the data inputs or ML outputs described above. In one aspect, a ML module implements reinforcement learning in a user recommendation application. The ML module may utilize a decision-making model to generate a ranked list of options based on user information received from the user and may further receive selection data based on a user selection of one of the ranked options. A reward signal may be generated based on comparing the selection data to the ranking of the selected option. The ML module may update the decision-making model such that subsequently generated rankings more accurately predict a user selection.
As will be appreciated based upon the foregoing specification, the above-described aspects of the disclosure may be implemented using computer programming or engineering techniques including computer software, firmware, hardware or any combination or subset thereof. Any such resulting program, having computer-readable code means, may be embodied or provided within one or more computer-readable media, thereby making a computer program product, i.e., an article of manufacture, according to the discussed aspects of the disclosure. The computer-readable media may be, for example, but is not limited to, a fixed (hard) drive, diskette, optical disk, magnetic tape, semiconductor memory such as read-only memory (ROM), and/or any transmitting/receiving medium, such as the Internet or other communication network or link. The article of manufacture containing the computer code may be made and/or used by executing the code directly from one medium, by copying the code from one medium to another medium, or by transmitting the code over a network.
These computer programs (also known as programs, software, software applications, “apps”, or code) include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms “machine-readable medium” “computer-readable medium” refers to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The “machine-readable medium” and “computer-readable medium,” however, do not include transitory signals. The term “machine-readable signal” refers to any signal used to provide machine instructions and/or data to a programmable processor.
As used herein, a processor may include any programmable system including systems using micro-controllers, reduced instruction set circuits (RISC), application specific integrated circuits (ASICs), logic circuits, and any other circuit or processor capable of executing the functions described herein. The above examples are example only, and are thus not intended to limit in any way the definition and/or meaning of the term “processor.”
As used herein, the terms “software” and “firmware” are interchangeable, and include any computer program stored in memory for execution by a processor, including RAM memory, ROM memory, EPROM memory, EEPROM memory, and non-volatile RAM (NVRAM) memory. The above memory types are example only, and are thus not limiting as to the types of memory usable for storage of a computer program.
In one aspect, a computer program is provided, and the program is embodied on a computer readable medium. In one aspect, the system is executed on a single computer system, without requiring a connection to a sever computer. In a further aspect, the system is being run in a Windows® environment (Windows is a registered trademark of Microsoft Corporation, Redmond, Wash.). In yet another aspect, the system is run on a mainframe environment and a UNIX® server environment (UNIX is a registered trademark of X/Open Company Limited located in Reading, Berkshire, United Kingdom). The application is flexible and designed to run in various different environments without compromising any major functionality.
In some aspects, the system includes multiple components distributed among a plurality of computing devices. One or more components may be in the form of computer-executable instructions embodied in a computer-readable medium. The systems and processes are not limited to the specific aspects described herein. In addition, components of each system and each process can be practiced independent and separate from other components and processes described herein. Each component and process can also be used in combination with other assembly packages and processes. The present aspects may enhance the functionality and functioning of computers and/or computer systems.
Definitions and methods described herein are provided to better define the present disclosure and to guide those of ordinary skill in the art in the practice of the present disclosure. Unless otherwise noted, terms are to be understood according to conventional usage by those of ordinary skill in the relevant art.
In some embodiments, numbers expressing quantities of ingredients, properties such as molecular weight, reaction conditions, and so forth, used to describe and claim certain embodiments of the present disclosure are to be understood as being modified in some instances by the term “about.” In some embodiments, the term “about” is used to indicate that a value includes the standard deviation of the mean for the device or method being employed to determine the value. In some embodiments, the numerical parameters set forth in the written description and attached claims are approximations that can vary depending upon the desired properties sought to be obtained by a particular embodiment. In some embodiments, the numerical parameters should be construed in light of the number of reported significant digits and by applying ordinary rounding techniques. Notwithstanding that the numerical ranges and parameters setting forth the broad scope of some embodiments of the present disclosure are approximations, the numerical values set forth in the specific examples are reported as precisely as practicable. The numerical values presented in some embodiments of the present disclosure may contain certain errors necessarily resulting from the standard deviation found in their respective testing measurements. The recitation of ranges of values herein is merely intended to serve as a shorthand method of referring individually to each separate value falling within the range. Unless otherwise indicated herein, each individual value is incorporated into the specification as if it were individually recited herein. The recitation of discrete values is understood to include ranges between each value.
In some embodiments, the terms “a” and “an” and “the” and similar references used in the context of describing a particular embodiment (especially in the context of certain of the following claims) can be construed to cover both the singular and the plural, unless specifically noted otherwise. In some embodiments, the term “or” as used herein, including the claims, is used to mean “and/or” unless explicitly indicated to refer to alternatives only or the alternatives are mutually exclusive.
The terms “comprise,” “have” and “include” are open-ended linking verbs. Any forms or tenses of one or more of these verbs, such as “comprises,” “comprising,” “has,” “having,” “includes” and “including,” are also open-ended. For example, any method that “comprises,” “has” or “includes” one or more steps is not limited to possessing only those one or more steps and can also cover other unlisted steps. Similarly, any composition or device that “comprises,” “has” or “includes” one or more features is not limited to possessing only those one or more features and can cover other unlisted features.
All methods described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The use of any and all examples, or exemplary language (e.g. “such as”) provided with respect to certain embodiments herein is intended merely to better illuminate the present disclosure and does not pose a limitation on the scope of the present disclosure otherwise claimed. No language in the specification should be construed as indicating any non-claimed element essential to the practice of the present disclosure.
Groupings of alternative elements or embodiments of the present disclosure disclosed herein are not to be construed as limitations. Each group member can be referred to and claimed individually or in any combination with other members of the group or other elements found herein. One or more members of a group can be included in, or deleted from, a group for reasons of convenience or patentability. When any such inclusion or deletion occurs, the specification is herein deemed to contain the group as modified thus fulfilling the written description of all Markush groups used in the appended claims.
Any publications, patents, patent applications, and other references cited in this application are incorporated herein by reference in their entirety for all purposes to the same extent as if each individual publication, patent, patent application or other reference was specifically and individually indicated to be incorporated by reference in its entirety for all purposes. Citation of a reference herein shall not be construed as an admission that such is prior art to the present disclosure.
Having described the present disclosure in detail, it will be apparent that modifications, variations, and equivalent embodiments are possible without departing the scope of the present disclosure defined in the appended claims. Furthermore, it should be appreciated that all examples in the present disclosure are provided as non-limiting examples.
The following examples illustrate various aspects of the disclosure.
Consider the parallel LC tank circuit shown in
Resonant condition of the circuit is achieved when
This result implies that the apparent power, SN=PN+jQN=VSIS*+VLIL*+VCIC* where the active power PN=0. Additionally at resonance, the reactive power
Here QC and QL are the reactive powers associated with the capacitance and inductance respectively.
Consider an optimization problem defined over a probabilistic domain, given by the following generic form:
Eqn. (65) may be mapped to the electrical network-based model described above by replacing xi=|Vi|2+|Ii|2, which leads to the following problem in the {|Vi|2, |Ii|2} domain:
Note that the method also works for optimization problems defined over non-probabilistic domains, of the following form:
This can be done by considering xi=xi+−xi− ∀i, where both xi+, xi−≥0. Since by triangle inequality, |xi|=|xi+|+|xi−|, enforcing xi++xi−=1 ∀i would automatically ensure |xi|≤1 ∀i, resulting in the following expression:
The replacements xi+=|Vi|2, xi−=|Ii|2 may be performed to obtain the equivalent problem in the {|Vi|2, |Ii|2} domain:
For example, the variables {xi} can represent the Lagrangian multipliers in the primal space of a support vector machine network, or the weights and biases of a generic neural network.
Consider the optimization problem in Equation (65) again. We can use the Baum-Eagon inequality to converge to the optimal point of H in steady state, by using updates of the form:
Here, H is assumed Lipschitz continuous on the domain D={xi, . . . , xn:Σi=1Nxi=1, xi≥0 ∀i}⊂+N. The constant λ∈R+ is chosen such that
The optimization problem given by Equation (10) may be solved by using the growth transforms discussed above. The outline of the proof is as follows: (1) starting with a generic magnitude domain optimization problem without any phase regularizer, derive the form for the growth trans-form dynamical system which would converge to the optimal point asymptotically; (2) derive a complex domain counterpart of the above, again without phase constraints; (3) derive the complex domain dynamical system by incorporating a phase regularizer in the objective function.
Since the time evolutions of Vi and Ii are symmetric because of the conservation constraints, for the rest of the section we will consider only the update equations for the voltages and similar results would also apply to the updates for Ii.
1) Condition 1: Considering β=0 in Equation (10) and {|Vi|, |Ii|}) to be Lipschitz continuous over the domain D=+{|Vi|, |Ii|:Σi=1N(|Vi2+|Ii|2)=1}, we can use the growth transforms to arrive at the following discrete-time update equations in terms of the voltage and current magnitudes:
|Vi,n|2←gV
where Δt is the time increment between successive updates. λ∈R+ is chosen to ensure
Writing gV
|Vi,n|2←|Vi,n−1|2σV
Condition 2: Considering β=0 in Equation (10) and
Vi, Ii∈Dc={Vi, Ii∈: Σi=1N(|Vi|2+|Ii|2)=1}, a time evolution of the form given below converges to limit cycles corresponding to the optimal point of a Lipschitz continuous objective function ({|Vi|,|Ii|}):
V
i,n
←V
i,n−1σV
where σV
The expression in Eqn. (75) may be proven as follows. Since
|Vi,n|2=Vi,nVi,n*,|Vi,n2=Ii,nIi,n*,
V
i,n
V
i,n*←[Vi,n−1σV
I
i,n
I
i,n*←[Ii,n−1σI
where σV
Considering ({ViVi*,IiIi*}) to be analytic in DC and applying Wirtinger's calculus, since
the following expression is obtained:
The discrete time update equations for is thus given by:
V
i,n
←V
i,n−1σV
Similar expressions can also be derived for the current phasors.
Lemma: A continuous-time variant of the model given by Equation (75) is given by:
By way of proof, the difference equations for the voltage phasors are given by:
V
i,n
−V
i,n−1
←V
i,n−1σV
Eqn. (51) may be rewritten as:
V
i,n
−V
i,n−1
←V
i,n−1σV
where ΔσV
In the limiting case, when Δt→0, the above expression reduces to the following continuous time differential equation for the complex variable Vi(t):
In the steady state, since H is Lipschitz continuous in D,
the dynamical system above thus reduces to:
which implies that the steady state response of the complex variable Vi(t) corresponds to steady oscillations with a constant angular frequency of ω.
The difference equations in terms of the nodal currents can be similarly written as:
I
i,n
−I
i,n−1
←I
i,n−1σI
where
The equivalent continuous domain differential equation is then given by:
Condition 3: Considering β≠0 in Equation (10), additional phase constraints can be incorporated in the dynamical system represented by using the update rules in Equations (15)-(18). In steady state, for |Vi|2|Ii|2 #0, the system settles to ϕi=±π/2 ∀i. Additionally, for sufficiently small values of β (or if β is slowly increased during the optimization process), the system converges to the optimal point of the modified objective function ({|Vi|, |Ii|}).
By way of proof, since ({|Vi|, |Ii|, ϕi}) is Lipschitz continuous in both |Vi|2 and |Ii|2, the same form of the update equations proved in Lemma 2 as described above can be applied. For arriving at the updates for the phase angles ϕi, a similar approach to that shown above in Equation (37) may be used. ϕi may be split as ϕi=ϕi+−ϕi−, ϕi+−ϕi−ϕi+, ϕi−>0, which implies that ϕi++ϕi−=π. The growth transform dynamical system may then be applied to obtain:
Since
the above can be simplified to
This result implies that the voltage and current phasors corresponding to the ith node in the network may be phase shifted by an additional amount ϕi with respect to the absolute reference.
Since for optimality, ϕi±π/2 for |Vi|2|Ii|2≠0 in the steady state, the net energy dissipation in steady state is zero, i.e., gϕ
which implies that the system reaches the optimal solution with zero active power in the post-learning stage.
This application claims priority from U.S. Provisional Application Ser. No. 62/889,489 filed on Aug. 20, 2019, which is incorporated herein by reference in its entirety.
This invention was made with government support under ECCS: 1550096 awarded by the National Science Foundation. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
62889489 | Aug 2019 | US |