The present invention relates generally to Artificial Intelligence and specifically to reinforcement learning and generative adversarial learning for signal data signature classification. In particular, the present invention is directed to signal data signature event detection, signal data signature classification, utilizing generative adversarial networks and reinforcement learning. In particular, it relates to generalizable reward mechanisms for reinforcement learning such that the reward mechanism is derived from the underlying distribution of signal data signatures used to train a generative adversarial network.
The prior art is limited by software programs that require human input and human decision points, algorithms that fail to capture the underlying distribution of signal data signature, algorithms that are brittle and unable to perform well on datasets that were not present during training.
Signal data signature Event Detection is the task of recognizing signal data signature events and their respective temporal start and end time in a signal data signature recording. Signal data signature event detection aims at processing the continuous acoustic signal and converting it into symbolic descriptions of the corresponding signal data signature events as well as the timing of those events. Signal data signature event detection has many different commercial applications such as context-based indexing, retrieval in multimedia databases, unobtrusive monitoring in health care, surveillance, and medical diagnostics.
The unmet need is to classify and tag signal data signatures. The unmet need would only be accomplished with a signal data signature detection system that consists of hardware devices (e.g. desktop, laptop, servers, tablet, mobile phones, etc.), storage devices (e.g. hard drive disk, floppy disk, compact disk (CD), secure digital card, solid state drive, cloud storage, etc.), delivery devices (paper, electronic display), a computer program or plurality of computer programs, and a processor or plurality of processors. A signal data signature detection system when executed on a processor (e.g. CPU, GPU) would be able to identify a signal data signature such as a Covid-19 cough from other types of cough and/or signal data signature and delivered to clinicians and/or end-users through a delivery device (paper, electronic display).
Various embodiments of the present disclosure can be further explained with reference to the attached drawings, wherein like structures are referred to by like numerals throughout the several views. The drawings shown are not necessarily to scale, with emphasis instead generally being placed upon illustrating the principles of the present disclosure. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ one or more illustrative embodiments.
This specification describes a signal data signature detection system that includes a reinforcement learning system and a discriminator of a generative adversarial network as computer programs one or more computers in one or more locations. The signal data signature detection system components include input data, computer hardware, computer software, and output data that can be viewed by a hardware display media or paper. A hardware display media may include a hardware display screen on a device (computer, tablet, mobile phone), projector, and other types of display media.
Generally, the system performs signal data signature event detection on a signal data signature recording using a reinforcement learning system such that an agent learns a policy to identify the onset and offset timings in a signal data signature recording that result in a signal data signature distribution that the discriminator of a generative adversarial network has been trained to detect. An environment that is the signal data signature recording, an agent, a state (e.g. onset and offset tags), an action (e.g. adjusting the onset or the offset tag), and a reward (positive—net gain in minimizing cross entropy between target distribution and test distribution, negative—net loss in minimizing cross entropy between target distribution and test distribution) are the components of a reinforcement learning system. The reinforcement learning system is coupled to a real-time oracle the discriminator of a generative adversarial network such that each action (e.g. adjustment of onset or offset tags) made by an agent to the signal data signature recording results in a positive reward if the signal data signature recording has a net gain for minimizing the cross entropy between the target and test distributions or a negative reward if the signal data signature recording has a net loss for minimizing the cross entropy between the target and test distributions.
A reinforcement learning agent is learning a policy to optimize total future reward such that actions performed result in strong labeling and signal data signature type matching to the targeted signal data signature type distribution. A signal data signature type distribution has a characteristic temporal profile that the discriminator of a generative adversarial network (GAN) has been optimized to detect. Training GANs from the standpoint of game theory is similar to setting a minimax two-player game whereby both networks try to beat each other and in doing so, they both become better and better. The goal of the generator is to fool the discriminator, so the generative neural network is trained to maximize the final classification error between true and fake data. The goal of the discriminator is to detect fake generated data, so the discriminative neural network is trained to minimize the final classification error.
The reinforcement learning agent may optimize it's policy to tag the onset and offset timings such that it will match the strong-label target distribution that the discriminator was trained on as part of a GAN system. For example, the reinforcement-learning agent may be provided with a weakly labeled signal data signature recording of a bronchitis cough to learn a policy to adjust the onset and offset labels of the recording such that it matched closely to the strongly labeled distribution of bronchitis cough data that the discriminator of the GAN has been trained with. Whereas a reinforcement-learning agent provided with a weakly or strongly labeled signal data signature recording of an asthma cough may not be able to converge on a policy because the oracle or reward mechanism being the discriminator of a generative adversarial network (GAN) has been trained with a bronchitis cough dataset. The reinforcement learning agent is bound by a maximum number of iterations such that if a match is not detected within the given maximum iteration threshold a negative detection flag will be returned.
In some embodiments, the signal data signature tagging (e.g. placement of onset and offset tags) by the reinforcement learning agent may include strongly labeled signal data signature. Herein, the terms “strongly labeled”, “strong labeled”, “strong label” and/or variations thereof refer to labeling and/or labels that include temporal information. Oftentimes signal data signature recordings are weakly labeled with a presence/absence label, which only states what types of events are present in each recording without any temporal information. A reinforcement-learning agent learns a policy to identify the onset and offset timings of a signal data signature such that the signature matches the targeted strong-labeled (known onset and offset timings) distribution of a known signal data signature.
In some embodiments, the signal data signature tagging (e.g. placement of onset and offset tags) by the reinforcement learning agent may include a minimal distance window. A minimal distance window may be used to constrain an agent to maintain a set minimal distance between the onset and offset tags. A minimal distance window for example can be set by the shortest distance between onset and offset tags observed in a distribution of signal data signature events types or the distribution of a single signal data signature event as well as other distance metrics. The minimal distance window is advantageous for capturing a temporal profile of the targeted signal data signature event. Whereas as the reinforcement learning agent not constrained by a minimal distance window could learn a policy to minimize the distance between the onset and offset tags such that the signal data signature profile becomes ambiguous and loses specificity while at the same time producing a maximum reward for the agent.
In some embodiments, the signal data signature tagging (e.g. placement of onset and offset tags) by the reinforcement learning agent may include a maximum distance window. A maximum distance window may be used to constrain the search space to a maximal timing based on the signal data signature event type. A maximum distance window for example can be set by the longest distance between onset and offset tags observed in a distribution of signal data signature events types or the distribution of a single signal data signature event as well as other distance metrics that are captured from the targeted signal data signature event. The maximum distance window is advantageous for reducing computational resources and maximizing performance of the signal data signature signal data signature detection system.
In some embodiments, the signal data signature tagging (e.g. placement of onset and offset tags) by the reinforcement learning agent may include a generalizable reward mechanism, a real-time discriminator of a GAN. A real-time discriminator of a GAN when provided with a signal data signature recording, data sources (e.g. training data), computer hardware including a memory and a processor(s), and a computer program or computer programs when executed by a processor, outputs one of two values that specifies whether a particular signal data signature recording is match or not a match with the targeted signal data signature distribution.
In some embodiments, the signal data signature tagging (e.g. placement of onset and offset tags) by the reinforcement learning agent may include leveraging a transfer learning approach that combines both a generative model and a discriminative model. The discriminator of the GAN, which was trained in an adversarial environment to recognize the underlying target distribution and identify fake data, defines the generative model. The functional approximator, which could represent a feed-forward neural network, convolutional neural network, support vector machine, logistic regression, conditional random fields as well as many others, defines the discriminative model. The reinforcement-learning agent performs actions setting the timings of the onset and offset while trying to match the target distribution that the generative model was trained on. The generative model returns a positive reward for an improvement in the overlap between suggested vs. targeted distribution. The reinforcement-learning agent having accumulated experiences that are composed of signal data signature recordings (states), modifications to onset and offset (actions), net gain or net loss in overlap between distributions (rewards) exploits these experiences with a function approximator. The function approximator which is a discriminative model is then predicting which action the reinforcement learning agent should to maximize the future reward. As a consequence of the reinforcement learning agent trying to optimize a policy the discriminative model is learning from the generative model.
Advantages of a discriminative model learning from the generative model are the following 1) a generative model requires less training data, 2) a generative model can deal with missing data, 3) discriminative models may be more accurate when the conditional independence assumption is not satisfied,
The details of one or more embodiments of the subject matter of this specification are set forth in the accompanying drawings and the description below. Other features, aspects, and advantages of the subject matter will become apparent from the description, the drawings, and the claims.
In some embodiments, a signal data signature detection system may identify whether or not the sample signal data signature distribution matches the target signal data signature distribution as well as the location of the onset and offset timings. These embodiments are advantageous for identifying signal data signature events that are captured in the wild.
In order to achieve a software program that is able, either fully or partially, to detect signal data signature events, that program matches the sample distribution to a target distribution that was used to train a generative model. Another goal of the invention is to provide strong labeling of the onset and offset timings of the sample distribution. Another challenge is that such a program must be able to scale and process large datasets.
Embodiments of the invention are directed to a signal data signature detection system whereby a signal data signature recording is provided by an individual or individuals(s) or system into a computer hardware whereby data sources and the input target distribution are stored on a storage medium and then the data sources and input target distribution are used as input to a computer program or computer programs which when executed by a processor or processor provides the strong labeled signal data signature recording and the signal data signature type which are provided to an individual or individual(s) on a display screen or printed paper.
The data sources 108 that are retrieved by a hardware device 102 in one of other possible embodiments includes for example but not limited to: 1) a corpus of strong labeled signal data signature recordings, 2) a corpus of weakly labeled signal data signature recordings, 3) a continuous stream of signal data signature recordings, 4) a sample signal data signature recording, 5) video recordings, 6) text related signal data signature recordings, 7) features of signal data signature recordings.
The data sources 108 and the signal data signature recording 101 input including signal data signature recording 101 are stored in memory or a memory unit 104 and passed to a software 109 such as computer program or computer programs that executes the instruction set on a processor 105. The software 109 being a computer program executes a signal data signature detector system 110 and a signal data signature classifier system 111. The signal data signature classifier system 111 executes a reinforcement learning system 112 on a processor 105 such that an agent 113 performs actions 114 on an environment 115, which calls a reinforcement learning reward mechanism, a Generative Adversarial Network (GAN) Discriminator 116, which provides a reward 117 to the system. The reinforcement learning system 112 modifies the onset and offset timings to the signal data signature recording while ensuring that the edits result in a match to the target distribution. The output 118 is either strongly labeled signal data signature recording and signal data signature type or a flag with no signal data signature detected that can be viewed by a reader on a display screen 119 or printed on paper 120.
In one or more embodiments of the signal data signature detection system 100 hardware device 102 includes the computer 103 connected to the network 107. The computer 103 is configured with one or more processors 105, a memory or memory unit 104, and one or more network controllers 106. It can be understood that the components of the computer 103 are configured and connected in such a way as to be operational so that an operating system and application programs may reside in a memory or memory unit 104 and may be executed by the processor or processors 105 and data may be transmitted or received via the network controller 106 according to instructions executed by the processor or processor(s) 105. In one embodiment, a data source 108 may be connected directly to the computer 103 and accessible to the processor 105, for example in the case of a signal data signature sensor, imaging sensor, or the like. In one embodiment, a data source 108 may be executed by the processor or processor(s) 105 and data may be transmitted or received via the network controller 106 according to instructions executed by the processor or processors 105. In one embodiment, a data source 108 may be connected to the reinforcement learning system 112 remotely via the network 107, for example in the case of media data obtained from the Internet. The configuration of the computer 103 may be that the one or more processors 105, memory unit 104, or network controllers 106 may physically reside on multiple physical components within the computer 103 or may be integrated into fewer physical components within the computer 103, without departing from the scope of the invention. In one embodiment, a plurality of computers 103 may be configured to execute some or all of the steps listed herein, such that the cumulative steps executed by the plurality of computers are in accordance with the invention.
A physical interface is provided for embodiments described in this specification and includes computer hardware and display hardware (e.g. the display screen of a mobile device). Those skilled in the art will appreciate that components described herein include computer hardware and/or executable software which is stored on a computer-readable medium for execution on appropriate computing hardware. The terms “computer-readable medium” or “machine readable medium” should be taken to include a single medium or multiple media that store one or more sets of instructions. The terms “computer-readable medium” or “machine readable medium” shall also be taken to include, but not be limited to, solid-state memories, and optical and magnetic media. For example, “computer-readable medium” or “machine readable medium” may include Compact Disc Read-Only Memory (CD-ROMs), Read-Only Memory (ROMs), Random
Access Memory (RAM), and/or Erasable Programmable Read-Only Memory (EPROM). The terms “computer-readable medium” or “machine readable medium” shall also be taken to include any non-transitory storage medium that is capable of storing, encoding or carrying a set of instructions for execution by a machine and that cause a machine to perform any one or more of the methodologies described herein. In other embodiments, some of these operations might be performed by specific hardware components that contain hardwired logic. Those operations might alternatively be performed by any combination of programmable computer components and fixed hardware circuit components.
In one or more embodiments of the signal data signature classifier system 111 software 109 includes the reinforcement learning system 112 which will be described in detail in the following section.
In one or more embodiments of the signal data signature detection system 100 the output 118 includes a strongly labeled signal data signature recording and identification of signal data signature type. An example would be cough sample from a patient which may include: 1) onset and offset timings of the signal data signature recording that capture the cough onset and offset, 2) a label of Covid-19 as the identified signal data signature type, 3) or flag that tells the user that a cough was not detected. The output 118 of strong labeled signal data signature recording and signal data signature type or message that a cough was not detected will be delivered to an end user via a display medium such as but not limited to a display screen 119 (e.g. tablet, mobile phone, computer screen) and/or paper 120.
In one or more embodiments of a reinforcement learning system that performs actions within an minimal distance window of a signal data signature recording such that actions are performed on the onset and offset timings whereby, a real-time oracle reward mechanism returns a reward that is dependent on the net concordance or net discordance between the sample distribution and the target distribution. In one or more embodiments of a reinforcement learning system with a real-time oracle reward mechanism enables actions such as but not limited to adding to or reducing the onset timing while keeping the offset timing constant, or keeping the onset timing constant while adding to or reducing the offset timing, or adding to or reducing the onset timing while also adding to or reducing the offset timing.
In one or more embodiments, a reinforcement learning system 112 with real-time oracle, GAN discriminator, reward mechanism is defined by an input including signal data signature recording 101, hardware device 102, software 109, and output 118.
In one or more embodiments, the reinforcement learning system 112 uses a hardware device 102, which consists of a memory or memory unit 104, and processor 105 such that software 109, a computer program or computer programs is executed on a processor 105 and modify the onset and offset tags of the signal data signature recording resulting in an output 118 including a strongly labeled signal data signature recording. The output from reinforcement learning system 112 is reconstructed to produce the output 118 including the strongly labeled signal data signature recording that matches a target distribution. A user is able to view the strongly labeled signal data signature recording and the output 118 including the signal data signature type output on a display screen 119 or printed paper 120.
In one or more embodiments, a pool of states 201 saves the state (e.g. signal data signature recording), action (e.g. adding to onset), reward (e.g. positive). After exploration and generating a large pool of states 201 a function approximator 202 is used to predict an action that will result in the greatest total reward. The reinforcement learning system 112 is thus learning a policy to perform edits to a signal data signature recording resulting in an exact match with the target distribution. One or more embodiments specify termination once a maximum reward is reached and returns the output 118 including a strongly labeled signal data signature recording and signal data signature type. Additional embodiments may have alternative termination criteria such as termination upon executing a certain number of iterations among others. Also for given input signal data signature recordings 200 it may not be possible to produce concordance with the target distribution in such instances a message will be returned that informs the user that the signal data signature was not detected.
The computer program stored in memory unit 104 receives the signal data signature recording 101 and executes on a processor 105 such that the signal data signature recording is evaluated against the target distribution 304. The output of the oracle GAN discriminator 116 is either the net gain or loss of the concordance between the modified signal data signature recording and the target distribution 304 and the match score, which tells if a perfect match was found. A corresponding positive reward 117 is given for a net gain in concordance between the modified signal data signature file and the target distribution and a negative reward 117 is given for a net loss in concordance between the modified signal data signature file and the target distribution.
The reinforcement-learning agent selects an action 401 to maximize the total future reward. The oracle, GAN discriminator is deciding if the actions made by the RL-agent improve the concordance with the target distribution when assigning the reward. Finally, the neural network function approximator 402 maximizes the total future reward given prior experience, the pool of states, actions, and states 201. The neural network function approximator 402 trains on the pool of states, actions, and states 201, produces an estimate by forward propagation 404 and adjust its weights by back propagating the error 403 between the predicted and observed using stochastic gradient descent.
In some embodiments, an oracle, GAN discriminator evaluate the modified signal data signature recording in real-time and perform a set of actions the onset and offset timing. In this embodiment the signal data signature recording and thus its attributes (e.g. match score) represents the environment. An agent can interact with a signal data signature recording and receive a reward such that the environment and agent represent a Markov Decision Process (MDP). The MDP is a discrete time stochastic process such that at each time step the MDP represents some state s, (e.g. signal data signature recording with position of onset or offset) and the agent may choose any action a that is available in state s. The process responds at the next time step by randomly moving to a new state s′2 and passing new state s′2 residing in memory to a real-time oracle that when executed on a processor returns a corresponding reward Ra (s,s2) for s′2.
The benefits of this and other embodiments include the ability to evaluate and modify the onset and offset timings in real-time. This embodiment has application in many areas of signal data signature event detection in which a signal data signature recording needs to be identified and strongly labeled. These applications may include context-based indexing and retrieval in multimedia databases, unobtrusive monitoring in healthcare and surveillance, noise monitoring solutions, and healthcare diagnostics among others. These and other benefits of one or more aspects will become apparent from consideration of the ensuing description.
One of the embodiments provides an agent with onset and offset positions within a signal data signature recording and attributes of which include a model and actions, which can be taken by the agent. The agent is initialized with a minimum distance window for which defines the minimal distance between the onset and offset timings. The agent is also initialized with maximum distance window between the onset and offset timings, which is used as an upper limit to constrain the search space. The agent is initialized with a starting index for the onset and offset tags within the signal data signature recording.
The agent is initialized with a set of hyperparameters, which includes epsilon ε (ε=1), epsilon decay, ε_decay (ε_decay=0.999), gamma, γ (γ=0.99), and a loss rate η (η=0.001). The hyperparmeter epsilon ε is used to encourage the agent to explore random actions. The hyperparmeter epsilon ε, specifies an E-greedy policy whereby both greedy actions with an estimated greatest action value and non-greedy actions with an unknown action value are sampled. When a selected random number, r is less than epsilon ε, a random action a is selected. After each episode epsilon E is decayed by a factor ε_decay. As the time progresses epsilon ε, becomes less and as a result fewer nongreedy actions are sampled.
The hyperparmeter gamma, γ is the discount factor per future reward. The objective of an agent is to find and exploit (control) an optimal action-value function that provides the greatest return of total reward. The assumption is that future rewards should be discounted by a factor γ per time step.
The final parameter the loss rate, η is used to reduce the learning rate over time for the stochastic gradient descent optimizer. The stochastic gradient descent optimizer is used to train the convolutional neural network through back propagation. The benefits of the loss rate are to increase performance and reduce training time. Using a loss rate, large changes are made at the beginning of the training procedure when larger learning rate values are used and decreasing the learning rate such that a smaller rate and smaller training updates are made to weights later in the training procedure.
The model is used as a function approximator to estimate the action-value function, q-value. A convolutional neural network is the best mode of use. However, any other model maybe substituted with the convolutional neural network (CNN), (e.g. recurrent neural network (RNN), logistic regression model, etc.).
Non-linear function approximators, such as neural networks with weight θ make up a Q-network which can be trained by minimizing a sequence of loss functions, i(θi) that change at each iteration i,
L
i(θi)=Es,a˜ρ(·)[(yi−Q(s, a; θ)2)
where
y
i
=E
s,a˜ρ(·);{dot over (s)}˜ξ[(r+γmaxáQ(śá; Θi−1)|s, a)]
is the target for iteration i and ρ s, a is a probability distribution over states s or in this embodiment signal data signature recording with onset and offset indices s. and actions a such that it represents a signal data signature recording-action distribution. The parameters from the previous iteration θ! are held fixed when optimizing the loss function, L!θ!. Unlike the fixed targets used in supervised learning, the targets of a neural network depend on the network weights.
Taking the derivative of the loss function with respect to the weights yields,
∇Θ
It is computationally prohibitive to compute the full expectation in the above gradient; instead it is best to optimize the loss function by stochastic gradient descent. The Q-learning algorithm is implemented with the weights being updated after an episode, and the expectations are replaced by single samples from the signal data signature recording action distribution, (ρ s, a) and the emulator ξ.
The algorithm is model-free which means that is does not construct an estimate of the emulator ξ but rather solves the reinforcement-learning task directly using samples from the emulator ξ. It is also off-policy meaning that it follows ε-greedy policy which ensures adequate exploration of the state space while learning about the greedy policy a=max!Q(s, a;θ). Another embodiment would include on-policy learning.
A CNN may be configured with a convolutional layer equal to the product of the number of features per signal data signature recording and a filter of 2, and a kernel size of 2. The filters specify the dimensionality of the output space. The kernel size specifies the length of the 1D convolutional window. One-dimensional max pooling with a pool size of 2 may be used for the max-pooling layer of the CNN. The model used the piecewise Huber loss function and adaptive learning rate optimizer, RMSprop with the loss rate, 77 hyperparameter.
After the model is initialized as an attribute of the agent, a set of actions are defined that could be taken for the boundaries of the signal data signature recording within a minimum distance window. The model is off-policy such that it randomly selects an action when the random number, r [0,1] is less than hyperparameter epsilon ε. It selects the optimal policy and returns the argmax of the q-value when the random number, r [0,1] is greater than the hyperparameter epsilon ε. After each episode epsilon ε is decayed by a factor ε_decay, a module is defined to decay epsilon ε. Finally, a module is defined to take a signal data signature features and fit a model to the signal data signature feature using a target value.
One of the embodiments provides signal data signature features such as filter bank system. The filter bank system consists of an analysis stage and synthesis stage. The analysis stage is a filter bank decomposition whereby the signal is filtered into sub-bands along with a sampling rate decimation. In the second stage, the decimated sub-band signals are interpolated to reconstruct the original signal.
Approaches to generate signal data signature features are constant-Q filter banks, Fast Fourier Transform (FFT), multiresolution spectrogram, nonuniform filter banks, wavelet filter banks, dyadic filter banks and cosine-modulated filter-bank, among others. The constant-Q filter bank includes of smoothing the output of a Fast Fourier Transform, whereas a multiresolution spectrogram combines FFTs at different lengths and advances the FFTs forward through time. The Goetzel algorithm may be used to construct nonuniform filter banks.
In some embodiments, an environment may include a current state, which is the index of onset and offset timings within the signal data signature recording that may or may not have been modified by the agent. The environment may also be provided with the match score for the current signal data signature recording and a reset state that restores the signal data signature recording to its original version before the agent performed actions. The environment is initialized with a minimum and maximum distance window.
In some embodiments, a reward module may return a negative reward r− if the modified signal data signature recording length has a net loss from the previous state's match score; and/or return a positive reward r+ the match score is a net gain from the previous state's match score. An additional positive reward r+ may be returned to the agent if the modified signal data signature recording is a perfect match with the target distribution.
At operation, a modified signal data signature recording is provided as input to a reinforcement-learning algorithm a match-score is generated in real-time from the signal data signature recording. The modified signal data signature recording and match score represents an environment. An agent is allowed to interact with the signal data signature recording and receive the reward. In the present embodiment, at operation the agent is incentivized to perform actions to the signal data signature recording that will match the strong-labeled target distribution.
First a min size, batch size, number of episodes, and number of operations are initialized in the algorithm. The algorithm then iterates over each episode from the total number of episodes; for each episode e, the signal data signature recording s, is reset from the environment reset module to the original signal data signature recording that was the input to the algorithm. The algorithm then iterates over k total number of operations; for each operation the signal data signature recording s is passed to the agent module act. A number, r is randomly selected between 0 and 1, such that if r is less than epsilon e, the total number of actions, n!″!#$ is defined such that n!″!#$=n!!! where n! is the number of actions and w! is the positions surrounding the onset and offset in the signal data signature recording s. An action a, is randomly selected between a range of 0 and n!″!#$ and the action a, is returned from the agent module act.
A filter bank may be generated for the modified signal data signature recording s2 creating a computer program for which the modified signal data signature recording s2 is evaluated. If the filter bank for the modified signal data signature recording fools the discriminator of the GAN a positive bonus reward nr+ is returned otherwise a match score is calculated for the filter bank of the modified signal data signature recording and compared against the previous match score. If there is a net gain in the match score a positive reward r+ is returned otherwise a negative reward r− is returned. If k, which is iterating through the number of operations is less than the total number of operations a flag terminate is set to False otherwise set flag terminate to True. For each iteration k, append the signal data signature recording s, before action a, the reward r, the modified signal data signature recording s2 after action a, and the flag terminate to the tuple list pool. If k<number of operations repeat previous steps else call the agent module decay epsilon, e by the epsilon decay function e_decay.
Epsilon e is decayed by the epsilon decay function e_decay and epsilon e is returned. If the length of the list of tuplespool is less than the min size repeat steps previous steps again. Otherwise randomize a batch from the pool. Then for each index in the batch set the target=r, equal to the reward r for the batch at that index; generate the filter bank s2_vec for modified signal data signature recording, s2 and filter bank s_vec for the previous signal data signature recording, s. Next make model prediction X using the filter bank vector s_vec. If the terminate flag is set to False make model prediction X2 using the filter bank vector s2_vec. Using the model prediction X2 compute the q-value using the Bellman equation: q− value=r+ ymaxX! and then set the target to the q-value. If the terminate flag is set to True call agent module learn and pass s_vec and target and then fit the model to the target.
The CNN is trained with weights θ to minimize the sequence of loss functions, L! θ! either using the target as the reward or the target as the q-value derived from Bellman equation. A greedy action a, is selected when the random number r is greater than epsilon e. The filter bank s_vec is returned for the signal data signature recording s and the model then predicts X using the filter bank s_vec and sets the q-value to X. An action is then selected as the argmax of the q-value and action a returned.
The benefits of a reinforcement learning system of the software 109 as compared to a supervised learning are that it does not require large paired training datasets (e.g. on the order of 109 to 1010). Reinforcement learning is a type of on-policy machine learning that balances between exploration and exploitation. Exploration is testing new things that have not been tried before to see if this leads to an improvement in the total reward. Exploitation is trying things that have worked best in the past. Supervised learning approaches are purely exploitative and only learn from retrospective paired datasets.
Supervised learning is retrospective machine learning that occurs after a collective set of known outcomes is determined. The collective set of known outcomes is referred to as paired training dataset such that a set of features is mapped to a known label. The cost of acquiring paired training datasets is substantial. For example, IBM's Canadian Hansaard corpus with a size of 109 cost an estimated $100 million dollars.
In addition, supervised learning approaches are often brittle such that the performance degrades with datasets that were not present in the training data. The only solution is often reacquisition of paired datasets which can be as costly as acquiring the original paired datasets.
One or more aspects includes a real-time oracle, the GAN discriminator which is trained as an adversarial network. The GAN network consists of generator and a discriminator. A generator deep neural network converts a random seed into a realistic signal data signature recording. Simultaneously, a discriminator is trained to distinguish between its output and real signal data signature recordings, which is used to produce gradient feedback to improve the generator neural network. The GAN network would be trained on recordings belonging to a particular signal data signature event such as a Covid-19 cough. In a minimax two-player game both the discriminator network and the generator network try to beat each other and in doing so, they both become better and better.
In some embodiments, referring to
In some embodiments, the exemplary network 505 may provide network access, data transport and/or other services to any computing device coupled to it. In some embodiments, the exemplary network 505 may include and implement at least one specialized network architecture that may be based at least in part on one or more standards set by, for example, without limitation, Global System for Mobile communication (GSM) Association, the Internet Engineering Task Force (IETF), and the Worldwide Interoperability for Microwave Access (WiMAX) forum. In some embodiments, the exemplary network 505 may implement one or more of a GSM architecture, a General Packet Radio Service (GPRS) architecture, a Universal Mobile Telecommunications System (UMTS) architecture, and an evolution of UMTS referred to as Long Term Evolution (LTE). In some embodiments, the exemplary network 505 may include and implement, as an alternative or in conjunction with one or more of the above, a WiMAX architecture defined by the WiMAX forum. In some embodiments and, optionally, in combination of any embodiment described above or below, the exemplary network 505 may also include, for instance, at least one of a local area network (LAN), a wide area network (WAN), the Internet, a virtual LAN (VLAN), an enterprise LAN, a layer 3 virtual private network (VPN), an enterprise IP network, or any combination thereof. In some embodiments and, optionally, in combination of any embodiment described above or below, at least one computer network communication over the exemplary network 505 may be transmitted based at least in part on one of more communication modes such as but not limited to: NFC, RFID, Narrow Band Internet of Things (NBIOT), ZigBee, 3G, 4G, 5G, GSM, GPRS, WiFi, WiMax, CDMA, OFDM, OFDMA, LTE, satellite and any combination thereof. In some embodiments, the exemplary network 505 may also include mass storage, such as network attached storage (NAS), a storage area network (SAN), a content delivery network (CDN) or other forms of computer or machine readable media.
In some embodiments, the exemplary server 506 or the exemplary server 507 may be a web server (or a series of servers) running a network operating system, examples of which may include but are not limited to Apache on Linux or Microsoft IIS (Internet Information Services). In some embodiments, the exemplary server 506 or the exemplary server 507 may be used for and/or provide cloud and/or network computing. Although not shown in
In some embodiments, one or more of the exemplary servers 506 and 507 may be specifically programmed to perform, in non-limiting example, as authentication servers, search servers, email servers, social networking services servers, Short Message Service (SMS) servers, Instant Messaging (IM) servers, Multimedia Messaging Service (MMS) servers, exchange servers, photo-sharing services servers, advertisement providing servers, financial/banking-related services servers, travel services servers, or any similarly suitable service-base servers for users of the member computing devices 501-504.
In some embodiments and, optionally, in combination of any embodiment described above or below, for example, one or more exemplary computing member devices 502-504, the exemplary server 506, and/or the exemplary server 507 may include a specifically programmed software module that may be configured to send, process, and receive information using a scripting language, a remote procedure call, an email, a tweet, Short Message Service (SMS), Multimedia Message Service (MMS), instant messaging (IM), an application programming interface, Simple Object Access Protocol (SOAP) methods, Common Object Request Broker Architecture (CORBA), HTTP (Hypertext Transfer Protocol), REST (Representational State Transfer), SOAP (Simple Object Transfer Protocol), MLLP (Minimum Lower Layer Protocol), or any combination thereof
In some embodiments, member computing devices 602a through 602n may also comprise a number of external or internal devices such as a mouse, a CD-ROM, DVD, a physical or virtual keyboard, a display, or other input or output devices. In some embodiments, examples of member computing devices 602a through 602n (e.g., clients) may be any type of processor-based platforms that are connected to a network 606 such as, without limitation, personal computers, digital assistants, personal digital assistants, smart phones, pagers, digital tablets, laptop computers, Internet appliances, and other processor-based devices. In some embodiments, member computing devices 602a through 602n may be specifically programmed with one or more application programs in accordance with one or more principles/methodologies detailed herein. In some embodiments, member computing devices 602a through 602n may operate on any operating system capable of supporting a browser or browser-enabled application, such as Microsoft™, Windows™ and/or Linux. In some embodiments, member computing devices 602a through 602n shown may include, for example, personal computers executing a browser application program such as Microsoft Corporation's Internet Explorer™, Apple Computer, Inc.'s Safari™, Mozilla Firefox, and/or Opera. In some embodiments, through the member computing devices 602a through 602n, user 612a, user 612b through user 612n, may communicate over the exemplary network 606 with each other and/or with other systems and/or devices coupled to the network 606. As shown in
In some embodiments, at least one database of exemplary databases 607 and 615 may be any type of database, including a database managed by a database management system (DBMS). In some embodiments, an exemplary DBMS-managed database may be specifically programmed as an engine that controls organization, storage, management, and/or retrieval of data in the respective database. In some embodiments, the exemplary DBMS-managed database may be specifically programmed to provide the ability to query, backup and replicate, enforce rules, provide security, compute, perform change and access logging, and/or automate optimization. In some embodiments, the exemplary DBMS-managed database may be chosen from Oracle database, IBM DB2, Adaptive Server Enterprise, FileMaker, Microsoft Access, Microsoft SQL Server, MySQL, PostgreSQL, and a NoSQL implementation. In some embodiments, the exemplary DBMS-managed database may be specifically programmed to define each respective schema of each database in the exemplary DBMS, according to a particular database model of the present disclosure which may include a hierarchical model, network model, relational model, object model, or some other suitable organization that may result in one or more applicable data structures that may include fields, records, files, and/or objects. In some embodiments, the exemplary DBMS-managed database may be specifically programmed to include metadata about the data that is stored.
In some embodiments, the exemplary inventive computer-based systems/platforms, the exemplary inventive computer-based devices, and/or the exemplary inventive computer-based components of the present disclosure may be specifically configured to operate in a cloud computing/architecture 625 such as, but not limiting to: infrastructure a service (IaaS) 810, platform as a service (PaaS) 808, and/or software as a service (SaaS) 806 using a web browser, mobile app, thin client, terminal emulator or other endpoint 804.
It is understood that at least one aspect/functionality of various embodiments described herein can be performed in real-time and/or dynamically. As used herein, the term “real-time” is directed to an event/action that can occur instantaneously or almost instantaneously in time when another event/action has occurred. For example, the “real-time processing,” “real-time computation,” and “real-time execution” all pertain to the performance of a computation during the actual time that the related physical process (e.g., a user interacting with an application on a mobile device) occurs, in order that results of the computation can be used in guiding the physical process.
As used herein, the term “dynamically” and term “automatically,” and their logical and/or linguistic relatives and/or derivatives, mean that certain events and/or actions can be triggered and/or occur without any human intervention. In some embodiments, events and/or actions in accordance with the present disclosure can be in real-time and/or based on a predetermined periodicity of at least one of: nanosecond, several nanoseconds, millisecond, several milliseconds, second, several seconds, minute, several minutes, hourly, several hours, daily, several days, weekly, monthly, etc.
As used herein, the term “runtime” corresponds to any behavior that is dynamically determined during an execution of a software application or at least a portion of software application.
In some embodiments, exemplary inventive, specially programmed computing systems and platforms with associated devices are configured to operate in the distributed network environment, communicating with one another over one or more suitable data communication networks (e.g., the Internet, satellite, etc.) and utilizing one or more suitable data communication protocols/modes such as, without limitation, IPX/SPX, X.25, AX.25, AppleTalk™, TCP/IP (e.g., HTTP), near-field wireless communication (NFC), RFID, Narrow Band Internet of Things (NBIOT), 3G, 4G, 5G, GSM, GPRS, WiFi, WiMax, CDMA, satellite, ZigBee, and other suitable communication modes.
In some embodiments, the NFC can represent a short-range wireless communications technology in which NFC-enabled devices are “swiped,” “bumped,” “tap” or otherwise moved in close proximity to communicate. In some embodiments, the NFC could include a set of short-range wireless technologies, typically requiring a distance of 10 cm or less. In some embodiments, the NFC may operate at 13.56 MHz on ISO/IEC 18000-3 air interface and at rates ranging from 106 kbit/s to 424 kbit/s. In some embodiments, the NFC can involve an initiator and a target; the initiator actively generates an RF field that can power a passive target. In some embodiment, this can enable NFC targets to take very simple form factors such as tags, stickers, key fobs, or cards that do not require batteries. In some embodiments, the NFC's peer-to-peer communication can be conducted when a plurality of NFC-enable devices (e.g., smartphones) within close proximity of each other.
The material disclosed herein may be implemented in software or firmware or a combination of them or as instructions stored on a machine-readable medium, which may be read and executed by one or more processors. A machine-readable medium may include any medium and/or mechanism for storing or transmitting information in a form readable by a machine (e.g., a computing device). For example, a machine-readable medium may include read only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; electrical, optical, acoustical or other forms of propagated signals (e.g., carrier waves, infrared signals, digital signals, etc.), and others.
As used herein, the terms “computer engine” and “engine” identify at least one software component and/or a combination of at least one software component and at least one hardware component which are designed/programmed/configured to manage/control other software and/or hardware components (such as the libraries, software development kits (SDKs), objects, etc.).
Examples of hardware elements may include processors, microprocessors, circuits, circuit elements (e.g., transistors, resistors, capacitors, inductors, and so forth), integrated circuits, application specific integrated circuits (ASIC), programmable logic devices (PLD), digital signal processors (DSP), field programmable gate array (FPGA), logic gates, registers, semiconductor device, chips, microchips, chip sets, and so forth. In some embodiments, the one or more processors may be implemented as a Complex Instruction Set Computer (CISC) or Reduced Instruction Set Computer (RISC) processors; x86 instruction set compatible processors, multi-core, or any other microprocessor or central processing unit (CPU). In various implementations, the one or more processors may be dual-core processor(s), dual-core mobile processor(s), and so forth.
Computer-related systems, computer systems, and systems, as used herein, include any combination of hardware and software. Examples of software may include software components, programs, applications, operating system software, middleware, firmware, software modules, routines, subroutines, functions, methods, procedures, software interfaces, application program interfaces (API), instruction sets, computer code, computer code segments, words, values, symbols, or any combination thereof. Determining whether an embodiment is implemented using hardware elements and/or software elements may vary in accordance with any number of factors, such as desired computational rate, power levels, heat tolerances, processing cycle budget, input data rates, output data rates, memory resources, data bus speeds and other design or performance constraints.
One or more aspects of at least one embodiment may be implemented by representative instructions stored on a machine-readable medium which represents various logic within the processor, which when read by a machine causes the machine to fabricate logic to perform the techniques described herein. Such representations, known as “IP cores” may be stored on a tangible, machine readable medium and supplied to various customers or manufacturing facilities to load into the fabrication machines that make the logic or processor. Of note, various embodiments described herein may, of course, be implemented using any appropriate hardware and/or computing software languages (e.g., C++, Objective-C, Swift, Java, JavaScript, Python, Perl, QT, etc.).
In some embodiments, one or more of illustrative computer-based systems or platforms of the present disclosure may include or be incorporated, partially or entirely into at least one personal computer (PC), laptop computer, ultra-laptop computer, tablet, touch pad, portable computer, handheld computer, palmtop computer, personal digital assistant (PDA), cellular telephone, combination cellular telephone/PDA, television, smart device (e.g., smart phone, smart tablet or smart television), mobile internet device (MID), messaging device, data communication device, and so forth.
As used herein, term “server” should be understood to refer to a service point which provides processing, database, and communication facilities. By way of example, and not limitation, the term “server” can refer to a single, physical processor with associated communications and data storage and database facilities, or it can refer to a networked or clustered complex of processors and associated network and storage devices, as well as operating software and one or more database systems and application software that support the services provided by the server. Cloud servers are examples.
In some embodiments, as detailed herein, one or more of the computer-based systems of the present disclosure may obtain, manipulate, transfer, store, transform, generate, and/or output any digital object and/or data unit (e.g., from inside and/or outside of a particular application) that can be in any suitable form such as, without limitation, a file, a contact, a task, an email, a message, a map, an entire application (e.g., a calculator), data points, and other suitable data. In some embodiments, as detailed herein, one or more of the computer-based systems of the present disclosure may be implemented across one or more of various computer platforms such as, but not limited to: (1) FreeBSD, NetBSD, OpenBSD; (2) Linux; (3) Microsoft Windows™; (4) OpenVMS™; (5) OS X (MacOS™); (6) UNIX™; (7) Android; (8) iOS™; (9) Embedded Linux; (10) Tizen™; (11) WebOS™; (12) Adobe AIR™; (13) Binary Runtime Environment for Wireless (BREW™); (14) Cocoa™ (API); (15) Cocoa™ Touch; (16) Java™ Platforms; (17) JavaFX™; (18) QNX™; (19) Mono; (20) Google Blink; (21) Apple WebKit; (22) Mozilla Gecko™; (23) Mozilla XUL; (24) .NET Framework; (25) Silverlight™; (26) Open Web Platform; (27) Oracle Database; (28) Qt™; (29) SAP NetWeaver™; (30) Smartface™; (31) Vexi™; (32) Kubernetes™ and (33) Windows Runtime (WinRT™) or other suitable computer platforms or any combination thereof. In some embodiments, illustrative computer-based systems or platforms of the present disclosure may be configured to utilize hardwired circuitry that may be used in place of or in combination with software instructions to implement features consistent with principles of the disclosure. Thus, implementations consistent with principles of the disclosure are not limited to any specific combination of hardware circuitry and software. For example, various embodiments may be embodied in many different ways as a software component such as, without limitation, a stand-alone software package, a combination of software packages, or it may be a software package incorporated as a “tool” in a larger software product.
For example, exemplary software specifically programmed in accordance with one or more principles of the present disclosure may be downloadable from a network, for example, a website, as a stand-alone product or as an add-in package for installation in an existing software application. For example, exemplary software specifically programmed in accordance with one or more principles of the present disclosure may also be available as a client-server software application, or as a web-enabled software application. For example, exemplary software specifically programmed in accordance with one or more principles of the present disclosure may also be embodied as a software package installed on a hardware device.
In some embodiments, illustrative computer-based systems or platforms of the present disclosure may be configured to handle numerous concurrent users that may be, but is not limited to, at least 100 (e.g., but not limited to, 100-999), at least 1,000 (e.g., but not limited to, 1,000-9,999), at least 10,000 (e.g., but not limited to, 10,000-99,999), at least 100,000 (e.g., but not limited to, 100,000-999,999), at least 1,000,000 (e.g., but not limited to, 1,000,000-9,999,999), at least 10,000,000 (e.g., but not limited to, 10,000,000-99,999,999), at least 100,000,000 (e.g., but not limited to, 100,000,000-999,999,999), at least 1,000,000,000 (e.g., but not limited to, 1,000,000,000-999,999,999,999), and so on.
In some embodiments, illustrative computer-based systems or platforms of the present disclosure may be configured to output to distinct, specifically programmed graphical user interface implementations of the present disclosure (e.g., a desktop, a web app., etc.). In various implementations of the present disclosure, a final output may be displayed on a displaying screen which may be, without limitation, a screen of a computer, a screen of a mobile device, or the like. In various implementations, the display may be a holographic display. In various implementations, the display may be a transparent surface that may receive a visual projection. Such projections may convey various forms of information, images, or objects. For example, such projections may be a visual overlay for a mobile augmented reality (MAR) application.
In some embodiments, illustrative computer-based systems or platforms of the present disclosure may be configured to be utilized in various applications which may include, but not limited to, gaming, mobile-device games, video chats, video conferences, live video streaming, video streaming and/or augmented reality applications, mobile-device messenger applications, and others similarly suitable computer-device applications.
As used herein, the term “mobile electronic device,” or the like, may refer to any portable electronic device that may or may not be enabled with location tracking functionality (e.g., MAC address, Internet Protocol (IP) address, or the like). For example, a mobile electronic device can include, but is not limited to, a mobile phone, Personal Digital Assistant (PDA), Blackberry™, Pager, Smartphone, or any other reasonable mobile electronic device.
As used herein, terms “proximity detection,” “locating,” “location data,” “location information,” and “location tracking” refer to any form of location tracking technology or locating method that can be used to provide a location of, for example, a particular computing device, system or platform of the present disclosure and any associated computing devices, based at least in part on one or more of the following techniques and devices, without limitation: accelerometer(s), gyroscope(s), Global Positioning Systems (GPS); GPS accessed using Bluetooth™; GPS accessed using any reasonable form of wireless and non-wireless communication; WiFi™ server location data; Bluetooth™ based location data; triangulation such as, but not limited to, network based triangulation, WiFi™ server information based triangulation, Bluetooth™ server information based triangulation; Cell Identification based triangulation, Enhanced Cell Identification based triangulation, Uplink-Time difference of arrival (U-TDOA) based triangulation, Time of arrival (TOA) based triangulation, Angle of arrival (AOA) based triangulation; techniques and systems using a geographic coordinate system such as, but not limited to, longitudinal and latitudinal based, geodesic height based, Cartesian coordinates based; Radio Frequency Identification such as, but not limited to, Long range RFID, Short range RFID; using any form of RFID tag such as, but not limited to active RFID tags, passive RFID tags, battery assisted passive RFID tags; or any other reasonable way to determine location. For ease, at times the above variations are not listed or are only partially listed; this is in no way meant to be a limitation.
As used herein, terms “cloud,” “Internet cloud,” “cloud computing,” “cloud architecture,” and similar terms correspond to at least one of the following: (1) a large number of computers connected through a real-time communication network (e.g., Internet); (2) providing the ability to run a program or application on many connected computers (e.g., physical machines, virtual machines (VMs)) at the same time; (3) network-based services, which appear to be provided by real server hardware, and are in fact served up by virtual hardware (e.g., virtual servers), simulated by software running on one or more real machines (e.g., allowing to be moved around and scaled up (or down) on the fly without affecting the end user).
In some embodiments, the illustrative computer-based systems or platforms of the present disclosure may be configured to securely store and/or transmit data by utilizing one or more of encryption techniques (e.g., private/public key pair, Triple Data Encryption Standard (3DES), block cipher algorithms (e.g., IDEA, RC2, RCS, CAST and Skipjack), cryptographic hash algorithms (e.g., MD5, RIPEMD-160, RTR0, SHA-1, SHA-2, Tiger (TTH),WHIRLPOOL, RNGs).
As used herein, the term “user” shall have a meaning of at least one user. In some embodiments, the terms “user”, “subscriber” “consumer” or “customer” should be understood to refer to a user of an application or applications as described herein and/or a consumer of data supplied by a data provider. By way of example, and not limitation, the terms “user” or “subscriber” can refer to a person who receives data provided by the data or service provider over the Internet in a browser session, or can refer to an automated software application which receives the data and stores or processes the data.
The aforementioned examples are, of course, illustrative and not restrictive.
At least some aspects of the present disclosure will now be described with reference to the following numbered clauses.
Publications cited throughout this document are hereby incorporated by reference in their entirety. While one or more embodiments of the present disclosure have been described, it is understood that these embodiments are illustrative only, and not restrictive, and that many modifications may become apparent to those of ordinary skill in the art, including that various embodiments of the inventive methodologies, the illustrative systems and platforms, and the illustrative devices described herein can be utilized in any combination with each other. Further still, the various steps may be carried out in any desired order (and any desired steps may be added and/or any desired steps may be eliminated).
This application claims priority to U.S. Provisional Application 63/073,962, titled “GENERATIVE ADVERSERIAL NETWORKS AS AN ORACLE FOR REINFORCEMENT LEARNING TO CLASSIFY SOUND” and filed Sep. 3, 2020, which is hereby incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
63073962 | Sep 2020 | US |