This application generally relates to a gesture recognition system and more particularly to a gesture recognition system having a machine-learning accelerator.
The category of gesture recognition devices include input interfaces that visually detect a gesture articulated by a user's hand. In general, most gesture recognition systems today lack reliability, flexibility, and speed.
A gesture recognition system having machine-learning accelerator comprises a Frequency modulated continuous waveform radar system having a transmitter transmitting a predetermined frequency spectrum signal to an object, a first receiver receiving a first channel of the signal reflected by the object, a first signal preprocessing engine serially coupled between the first receiver and a first feature map generator, a second receiver for receiving a second channel of the signal reflected by the object, a second signal preprocessing engine serially coupled between the second receiver and a second feature map generator, a clear channel assessment block coupled to receive output from the first and second feature map generators, and a machine-learning accelerator configured to receive output from the first and second feature map generators and form frames fed to a deep neural network realized with a hardware processor array for gesture recognition. The machine-learning accelerator comprises a machine learning hardware accelerator scheduler configured to act as an interface between the hardware processor array and a microcontroller unit, and a memory storing a set of compressed weights fed to the deep neural network.
A method of gesture recognition comprises transmitting a predetermined frequency spectrum signal to an object, receiving a first channel of the signal reflected by the object, sending the first channel of the signal to a first feature map generator via a first signal preprocessing engine, receiving a second channel of the signal reflected by the object, sending the second channel of the signal to a second feature map generator via a second signal preprocessing engine, skipping portions of the spectrum occupied by other devices, a machine-learning accelerator receiving output from the first and second feature map generators and forming frames fed to a deep neural network realized with a hardware processor array for gesture recognition, and utilizing recognized gestures to control and application program.
These and other objectives of the present invention will no doubt become obvious to those of ordinary skill in the art after reading the following detailed description of the preferred embodiment that is illustrated in the various figures and drawings.
As shown in
The entire algorithm for recognition is based on Machine Learning and Deep Neural Network (ML and DNN). The ML/DNN may receive outputs from Feature Map Generators FMG1-FMG2 and form frames for gesture recognition. Because of the computational workload and real time, low latency requirement, the recognition algorithm is realized with a special hardware array processor. A dedicated scheduler (e.g. a machine learning hardware accelerator scheduler 154) may act as an interface between this array processor and the MCU (microcontroller unit). Furthermore, a special compression algorithm may be applied to reduce memory requirements for weights. The compression algorithm compresses the weights into low rank matrices and converts them to a fixed-point form. The fixed-point, low rank matrices can be directly treated as a weight during inference. Therefore, weight decompression on the device side is not required.
The above described system 100 is only an example and not to be considered as limiting. Any FHSS FMCW radar system for hand/finger gesture recognition application using a hardware DNN accelerator using store weights and a customizable gesture-training platform is suitable for gesture recognition as described herein.
In the proposed system, a machine-learning accelerator may be used for gesture detection recognition dedicatedly and may be disposed in the proposed system locally according to an embodiment. The proposed system may be a stand-alone system, which is able to operate for gesture recognition independently. Hence, it is more convenient to integrate the proposed system into another device (e.g. a mobile phone, a tablet, a computer, etc.), and engineering effect may also be improved. For example, the time and/or power consumption required for gesture recognition may be reduced. The machine learning accelerator (e.g. 150) may be used to reduce the required gesture processing time at the system 100, and the weights used by the machine learning accelerator (e.g. 150) may be obtained from gesture training. Gesture training may be performed by a remote ML server such as a cloud ML server.
As a typical application scenario, a fixed number of gestures may be collected and used for training. Gesture recognition using a plurality of weights may be improved upon by performing training using a set of collected gestures. For example, 1000 persons to generate 1000 samples may perform a single gesture, and a cloud ML server may then process these 1000 samples. The cloud ML server may perform gesture training using these samples to obtain a corresponding result. The result may be a set of weights used in the gesture inference process. So when a user performs a gesture, this set of weights may be employed in the calculation process to enhance recognition performance.
A basic set of gestures may therefore be realized using this trained set of weights. In addition, the proposed system may allow a user to have customized gestures. A user's personal gesture may be recorded and then sent to Cloud ML server via an external Host processor (e.g. 180) for subsequent gesture training. The external Host processor (e.g. 180) may run a Custom Gesture Collection Application program and may be connected to a Cloud server via Internet through wire or wirelessly. The results of training (e.g. a set of weights) may then be downloaded so the user's own gesture may be used as well.
As mentioned above, signals used for gesture sensing may have frequency in the 60 GHz range. Due to its corresponding millimeter wavelength, the proposed system can detect minute hand/finger movement with millimeter accuracy. Special processing of phase information for radar signal may be required. A special Phase Processing Engine (e.g. a phase extractor/unwrapper 120) in
In
An Anti-Jamming/Collision Avoidance 60 GHz FHSS FMCW Radar System for Hand/Finger Gesture Recognition Application with Hardware DNN accelerator, Customizable Gesture training platform and fine movement sensing capability. The Radar has two channels of Receivers (RX) and one channel of Transmitter (TX).
Anti-jamming/Collision Avoidance may be achieved by turning on the two RX's and swept the entire 57-67 GHz spectrum first. After processing signal through the entire RX chain, the Clear Channel Assessment Block may tell which part of spectrum may be currently occupied by other users/devices. This knowledge may be used by the FHSS PN Code generator, so the system may skip these portions of spectrum to avoid collision. On top of avoidance, FHSS may be also used to reduce further such occurrence on a statistical basis. This Anti-jamming/Collision Avoidance algorithm may be done on a Frame-to-Frame basis.
The entire algorithm for recognition may be based on Machine Learning and Deep Neural Network (ML and DNN). The ML/DNN takes outputs from the Feature Map Generator and forms Frames for gesture recognition. Because of the computational workload and real time, low latency requirement, the algorithm may be realized with special hardware processor array. A dedicated Scheduler acts as an interface between the array and MCU. Furthermore, since special compression algorithm may be applied to reduce memory requirement for weights, the fixed-point, low rank matrices can be directly treated as weights during inference.
As a basic usage scenario, a fixed number of gestures may be collected, trained and results (Weights) applied to all devices, so a basic set of gestures for recognition may be realized. In addition, the system allows users to have customized gestures—his/her own gesture may be recorded and sent to an external host processor running a Custom Gesture Collection Application program and via the Internet, to our Cloud server for training. The results may then be downloaded so his/her own gesture may be used as well.
In general, a Deep Neural Network takes an input frame or frames and using the weights, may generate a vector trace that falls into one of a plurality of vector spaces determined by the training of the Deep Neural Network. How strongly the vector trace falls within each of the vector spaces is converted into probabilities of with which in a stored set of gestures, a given input gesture corresponds. Unfortunately, even the best Deep Neural Networks can sometimes determine the input gesture incorrectly. This often is because respective vector spaces in the Deep Neural Networks generated by the “correct” gesture and the “incorrect” gesture are too close together. When the vector spaces are too close together, tiny variations in the input tip the most probable gesture from being “correct” to being “incorrect”.
With this in mind, the inventors have realized that the best way substantially to avoid this problem of incorrect classification is to separate the vector spaces as much as possible. This can be done during Deep Neural Network training by determining specific gestures whose ensuing vector traces are as far apart as possible, design considerations permitting. The following is a list of specific mini-gestures and a list of specific micro gestures determined to separate the vector spaces in a Deep Neural Network as much as possible. The specific names given each mini-gesture or micro-gesture are arbitrary and may be changed without altering the definition of the gestures.
No. 1: A Sharp Sign—Traces formed by two extended fingers moved horizontally followed by the two fingers moving vertically to forming a sharp sign.
No. 2: A Signal Down—Traces formed by two extended fingers moved horizontally followed by one finger moving down vertically from the lower horizontal trace.
No. 3: A Signal Up—Traces formed by two extended fingers moved horizontally followed by one finger moving up vertically from the lower horizontal trace.
No. 4: Rubbing—Traces formed by rubbing hand over thumb.
No. 5: Double Kick—Traces formed by two fingers are extended to form a “V” shape, then brought together while still extended, separated back into the “V” shape, then brought together again. Alternatively, traces formed by two fingers that are extended together, the extended fingers separated to form a “V” shape, then brought together while still extended, and separated back into the “V” shape.
No. 6: Lightening Down—Traces formed by one extended finger drawing a lightning shape (zigzagged line) in a downward direction.
No. 7: Lightening Up—Traces formed by one extended finger drawing a lightning shape (zigzagged line) in an upward direction.
No. 8: Pat Pat—Traces formed by an open palm being pushed forward twice in succession.
No. 9: Stone to Palm—Traces formed by beginning with a closed fist. Fist opens and fingers extend and spread exposing the palm.
No. 10: Kick Climb—Traces formed by a mini-gesture similar to a double kick except the hand is moving upward while executing the double kick.
No. 1: One & Two-Traces formed by extending one finger forward, withdrawing the extended finger, then extend two fingers forward before withdrawing both fingers.
No. 2: Come & Come—Traces formed by an open palm facing away from body. Fingers are curled in toward the palm, and then re-extended. Repeat.
No. 3: Twist—Traces formed by rotation of a thumb and index finger as if turning a volume knob.
No. 4: Progressive Grab—Traces formed beginning with an open palm with extended fingers and sequentially, from little finger to thumb, curling each finger in to form a fist.
No. 5: Eating—Traces formed by the same motions as a double kick except executed horizontally across the body.
No. 6: Good Good—Traces formed by a closed fist with thumb extended pushed forward twice.
No. 7: Bad Bad—Traces formed by waving an index finger back and forth twice.
Those skilled in the art will readily observe that numerous modifications and alterations of the device and method may be made while retaining the teachings of the invention. Accordingly, the above disclosure should be construed as limited only by the metes and bounds of the appended claims.
This application claims priority of U.S. Provisional Patent Application No. 62/684,202, filed Jun. 13, 2018, and incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62684202 | Jun 2018 | US |