Not Applicable
The present disclosure relates generally to machine learning and the training of deep learning systems, and more particularly, to an end-to-end adaptive learning and training inference method and tool chain that improves performance as well as shorten the development cycle time.
Machine learning systems may be employed across a wide range of disciplines and applications involving the use of computers to develop predictive, classification, or decision models, including voice recognition, image recognition, recommendation engines, financial market prediction, medical diagnosis, fraud detection, and so on. In its most basic form, a machine learning algorithm is comprised of a decision process that evaluates input data and makes some manner of prediction, classification, or decision based upon it, along with an error function that evaluates the results of the decision process, and a model optimization process. The machine learning system may be trained through supervised learning, unsupervised learning, semi-supervised learning, reinforcement learning,
In general, training a machine learning system involves providing a set of training data to the algorithm. Varying extents of correlations between input data and the desired output of the algorithm based upon such data may be provided, depending on the supervision level. Generally, generic tools are used to provide a high volume of data collected from different conditions. Current optimization techniques are a highly manual process that lack embedded adaptations to improve performance, and substantially extends the iterative training process. There is little to no standardization of data collection, augmentation, or training processes that improves performance with a short training duration. When developing a customized system for recognizing wake words, commands, sound-based events, and context detection, the lack of such standardization is a significant impediment. Similar concerns also apply to autonomous systems relying on other input data other than audio/speech.
Accordingly, there is a need in the art for an end-to-end adaptive deep learning training and inference method and tool chain, to improve performance and shorten development cycle times.
The embodiments of the present disclosure are directed to an end-to-end adaptive deep learning training and inference system which improves performance and shortens the duration of development cycles. The standardized tools are understood to achieve a high degree of reproducibility in the training, and standardizes the data capturing and augmentation process as a final neural network model is developed.
According to one embodiment, there may be a deep learning training and inference system for a primary machine learning system. The system may include an automated data collection tool that is receptive to incoming input data from a sensor data source. The automated data collection tool may also embed one or more sensor data classifications associated with the incoming input data. The system may further include a data augmentation tool that is receptive to the input data from the automated data collection tool. The data augmentation tool may generate an augmented input data set resulting from one or more predefined operations applied to the input data. The system may further include an adaptive training tool that is receptive to the augmented input data set to improve performance. A new set of weight values may be generated for the primary machine learning system. The adaptive training tool may be in communication with one or more training tools for the primary machine learning system to provide the augmented input data set thereto. The system may include an inference tool that is in communication with the adaptive training tool to receive the new set of weight values for an inference model simulator emulating a native hardware environment of the primary machine learning system. The inference tool may selectively invoke one or more of the automated data collection tool, the data augmentation tool, and the adaptive training tool for iteratively improving the primary machine learning system.
Another embodiment of the present disclosure may be a method for training a machine learning system. The method may involve collecting incoming input data from one or more sensor data sources, then assigning one or more sensor data classifications to the input data. An augmented input data set may be generated from the input data based upon an application of an augmentation operation of the input data, and a new set of weight values may be generated for a primary machine learning system based upon the augmented input data set. The method may include transmitting the augmented input data set to one or more training tools for the primary machine learning system. There may also be a step of simulating a native hardware environment of the primary machine learning system with the new set of weight values. This method may also be performed with one or more programs of instructions executable by the computing device, with such programs being tangibly embodied in a non-transitory program storage medium.
The present disclosure will be best understood by reference to the following detailed description when read in conjunction with the accompanying drawings.
These and other features and advantages of the various embodiments disclosed herein will be better understood with respect to the following description and drawings, in which like numbers refer to like parts throughout, and in which:
The detailed description set forth below in connection with the appended drawings is intended as a description of the several presently contemplated embodiments of a deep learning training and inference system and is not intended to represent the only form in which such embodiments may be developed or utilized. The description sets forth the functions and features in connection with the illustrated embodiments. It is to be understood, however, that the same or equivalent functions may be accomplished by different embodiments that are also intended to be encompassed within the scope of the present disclosure. It is further understood that the use of relational terms such as first and second and the like are used solely to distinguish one from another entity without necessarily requiring or implying any actual such relationship or order between such entities.
With reference to the diagram of
With the neural network 12 being implemented with a computer system, according to some embodiments of the present disclosure, the deep learning training and inference system 10 may likewise be implemented with a computer system as well. The neural network 12 and the deep learning training and inference system 10 may be executing on the same computer system, or on different computer systems that are interconnected with a network link. It will be appreciated that the embodiments of the present disclosure are not dependent on the specifics of such computer system and general hardware environment, so additional details thereof will be omitted.
With reference to the block and flow diagram of
As referenced herein, the sensor data source is understood to be any data storage element that includes information from a sensor device, or generated from a simulation of a sensor device. Further, a sensor device may be any device that captures some physical phenomenon and converts the same to an electrical signal that is further processed. For example, the sensor device may be a microphone/acoustic transducer that captures sound waves and converts the same to analog electrical signals. In another example, the sensor device may be an imaging sensor that captures incoming photons of light from the surrounding environment, and converts those photons to electrical signals that are arranged as an image of the environment. Furthermore, the sensor device may be motion sensor such as an accelerometer or a gyroscope that generates acceleration/motion/orientation data based upon physical forces applied to it. The embodiments of the present disclosure and not limited to any particular sensor data source, set of sensor data sources, or any number of sensor data sources, or sensor device type.
In addition to collecting the input data from the sensor data sources, the data collection toolkit 16 is understood to embedded one or more sensor data classifications/features that are associated with the incoming input data. With reference to
From the gender/age subclassifications, the input data 18 may be separately classified into different room sizes from which the audio was captured. For example, there may be a first room size subclass 26a, a second room size subclass 26b, and any additional room size subclasses including an indeterminate room size subclass 26n within a fourth level classification 26. The room size classifications may be further classified as a first distance subclass 28a, a second distance subclass 28b, and any number of additional distance subclasses, including an indeterminate distance subclass 28n. The distance classification, that is, a fifth level classification 28, is understood to specify the distance separating the microphone and the speaker providing the speech for the input data 18.
The foregoing classes and subclasses are presented by way of example only and not of limitation, and the input data 18 may be classified according to any number of additional dimensions. There may be additional classification enumerations at any of the first level classification 20, the second level classification 22, the third level classification 24, the fourth level classification 26, and the fifth level classification 28. There may also be additional levels of classifications not illustrated in the diagram of
Referring back to the diagram of
The example shown expands upon the speech classification 20c, and the augmented input data set 32 is generated from a variety of operations applied to the input data 18. The first operation is the addition of varying levels of reverb 34 applied to speech input data, to result in a first reverb-added data 32-34a, a second reverb-added data 32-34b, and any number of additional reverb-added data, including an indeterminate reverb-added data 32-34n. An exemplary second operation may be the addition of varying levels of noise 36 applied to given ones of the reverb-added data set 32-34, which may yield a first noise and reverb-added data 32-36a, a second noise and reverb-added data 32-36b, and any number of additional data sets of noise and reverb-added data, including an indeterminate noise and reverb-added data 32-36n. The diagram of
The audio operations applied to the input data 18 is presented by way of example only, and not of limitation. Any other operation may be applied to the input data 18 specific to the general category to which it has been classified, and similar expansion operations may be performed on motion data, sound data, and so on. Furthermore, the example illustrates the reverb, noise, and speed adjustment permutations of the resultant augmented input data set 32 being generated in hierarchical sequence of such operations. However, this is also exemplary only. For instance, there may be an augmented input data set 32 that is generated solely on different adjusted speeds, without first being modified by added reverb and/or noise.
As shown in the block diagram of
Continuing again with the example of the audio/speech input data 18 above, and with reference to the diagram of
Returning to the block diagram of
Depending on the measured performance, the process may repeated back from the data collection toolkit 16, the data augmentation toolkit 30, or the adaptive training toolkit 40. To the extent the valuation determines that additional input data is necessary, the execution of the deep learning training and inference system 10 may return to the data collection toolkit 16. Where the evaluation determines that data augmentation is needed to account for further possible variations, the data augmentation toolkit 30 may be invoked. If an update to the hyperparameters governing the overall operation of the deep learning training and inference system 10, or if additional training cycles executed by the local neural network training tools is deemed necessary, then the adaptive training toolkit 40 may be invoked. Once the performance of the neural network 12 has been improved to such a level for deployment in an end device, a final model 56 is generated.
According to various embodiments of the deep learning training and inference system 10, a standardization of the data capture process, we well as software libraries and processes used in the augmentation and training of the final model 56 is contemplated. Optimal performance of the neural network 12, and the training process thereof, is understood to be reliant on the careful selection of these details, and so the standardization is an important objective. The processes utilized by the deep learning training and inference system 10 are contemplated to expedite various iterative processes. The need for the user to analyze the impact of the quality and/or quantity of the final data can be eliminated, as the data augmentation toolkit 30 provides robustness to the input data that is fed to the training tools of the neural network 12. Hyperparameter tuning and reinitialized trainings can be minimized because of the high reproducibility of the aforementioned tools in the deep learning training and inference system 10.
The particulars shown herein are by way of example and for purposes of illustrative discussion of the embodiments of a deep learning training and inference system and method, and are presented in the cause of providing what is believed to be the most useful and readily understood description of the principles and conceptual aspects. In this regard, no attempt is made to show details with more particularity than is necessary, the description taken with the drawings making apparent to those skilled in the art how the several forms of the present disclosure may be embodied in practice.
This application relates to and claims the benefit of U.S. Provisional Application No. 63/165,309 filed Mar. 24, 2021 and entitled “An End-To-End Adaptive Learning Training and Inference Method and Tool Chain to Improve Performance and Shorten The Development Cycle Time,” the entire disclosure of which is wholly incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
63165309 | Mar 2021 | US |