The proposed method and apparatus relates to human activity recognition and detection of anomalies in the behavior of monitored individuals living in restrictive environments. (1330)
This section is intended to introduce the reader to various aspects of art, which may be related to the present embodiments that are described below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it should be understood that these statements are to be read in this light.
For many years now, human activity detection and recognition has remained a popular topic. There are numerous application domains for which this is useful such as health care, elderly care, home security, and other restrictive environments such as prisons, juvenile detention centers, schools or individuals subject to house arrest. One of the ultimate goals of the monitoring system is to learn about people's habits and detect abnormal behaviors in order e.g. to notify medical staff or a close relative regarding the status of the monitored individual. This kind of service could have several advantages such as anticipating at a very early stage behavioral changes that would generate hospital cost savings, preventing residential burglaries or more simply making medical staff and/or relatives more reassured about their patients, relatives or real-estate assets.
Many different technologies are currently being experimented to monitor activity of individuals in restricted environments. The proposed method and apparatus focuses particularly on audio technology that is able to track daily activity of individuals at home using only microphone recordings. Many papers describe different methods of sound detection and classification, but most of them focus on the environmental sound scenes.
The proposed method and apparatus addresses the above identified issues and in advantageous implementations save CPU resources especially for portable device usage, extends battery life, adapts to activity recognition complexity, reduces time for processing responses for coarse activity detection and recognition and/or improves the accuracy of the fine activity detection and recognition.
A problem that is addressed by the proposed method and apparatus is that the audio recognition (classification) needs different processing means depending on the complexity of the activity that needs to be recognized. Typically CPU consumption and battery life could be real limitations to the deployment of such a service in portable devices. Additionally, many audio events are quite similar (such as sounds of opening the refrigerater and opening the entrance door) that significantly decrease the accuracy of the detection system.
In this respect, the present invention relates to a method as defined in claim 1 and to an apparatus as defined in claim 9.
An advantage of the proposed method and apparatus is to mitigate the above problems by a multi-level classification technique: a coarse event detection and classification (recognition) can be done first (and very quickly, with low processing requirement) to determine the location of the human activity (e.g. if he/she is in the kitchen, in the living room, . . . ). Then a more fine-grained classification step is added to detect more specific actions (e.g., if the monitored individual opens the refrigerater in the kitchen or not, . . . ).
The proposed method and apparatus provides a method of sound event detection and recognition (classification) that has a multi-resolution approach depending on which details of the activity are required to be recognized. For example, the method provides the ability to perform a coarse audio (acoustic signal) detection and recognition (like determining in which room of the house the monitored individual is performing the activity to be recognized or which high level activity the monitored individual is performing like cooking) at a first step and upon request to perform a finer audio (acoustic signal) recognition (like which type of appliances the monitored individual is using like the refrigerator, oven, dishwasher, etc. . . . ). An advantage of the proposed method and apparatus over other methods is that it consumes CPU resources according to the detail of the activity that the proposed method and apparatus is requested to recognize. Thus, the proposed method and apparatus saves CPU processing, battery life and advantageously decreases response time of the service. Moreover, the first coarse detection and recognition step will limit the number of activities in the second (fine grain) step so as to improve the final accuracy. As an example, if the algorithm knows that the action is in the kitchen (given from the first recognition (coarse) step) then it will limit the number of sounds to process that more likely originated only from the kitchen as microwave, opening/closing refrigerator door, running the water, . . . and not trying to distinguish between the sounds of the entrance door opening and refrigerator door opening (between which, it is very difficult, in general, to discriminate).
A method and corresponding apparatus for recognizing an activity of a monitored individual in an environment are described, the method including receiving a first acoustic signal, performing audio feature extraction on the first acoustic signal in a first temporal window, classifying the first acoustic signal by determining a location of the monitored individual in the environment based on the extracted features of the first acoustic signal in the first temporal window, receiving a second audio signal, performing audio feature extraction of the second acoustic signal in a second temporal window and classifying the second acoustic signal by determining an activity of the monitored individual in the location in the environment based on the extracted features of the second acoustic signal in the second temporal window.
The proposed method and apparatus is best understood from the following detailed description when read in conjunction with the accompanying drawings. The drawings include the following figures briefly described below:
It should be understood that the drawing(s) are for purposes of illustrating the concepts of the disclosure and is not necessarily the only possible configuration for illustrating the disclosure.
The present description illustrates the principles of the present disclosure. It will thus be appreciated that those skilled in the art will be able to devise various arrangements that, although not explicitly described or shown herein, embody the principles of the disclosure and are included within its scope.
All examples and conditional language recited herein are intended for educational purposes to aid the reader in understanding the principles of the disclosure and the concepts contributed by the inventor to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions.
Moreover, all statements herein reciting principles, aspects, and embodiments of the disclosure, as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents as well as equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
Thus, for example, it will be appreciated by those skilled in the art that the block diagrams presented herein represent conceptual views of illustrative circuitry embodying the principles of the disclosure. Similarly, it will be appreciated that any flow charts, flow diagrams, state transition diagrams, pseudocode, and the like represent various processes which may be substantially represented in computer readable media and so executed by a computer or processor, whether or not such computer or processor is explicitly shown.
The functions of the various elements shown in the figures may be provided through the use of dedicated hardware as well as hardware capable of executing software in association with appropriate software. When provided by a processor, the functions may be provided by a single dedicated processor, by a single shared processor, or by a plurality of individual processors, some of which may be shared. Moreover, explicit use of the term “processor” or “controller” should not be construed to refer exclusively to hardware capable of executing software, and may implicitly include, without limitation, digital signal processor (DSP) hardware, read only memory (ROM) for storing software, random access memory (RAM), and nonvolatile storage.
Other hardware, conventional and/or custom, may also be included. Similarly, any switches shown in the figures are conceptual only. Their function may be carried out through the operation of program logic, through dedicated logic, through the interaction of program control and dedicated logic, or even manually, the particular technique being selectable by the implementer as more specifically understood from the context.
In the claims hereof, any element expressed as a means for performing a specified function is intended to encompass any way of performing that function including, for example, a) a combination of circuit elements that performs that function or b) software in any form, including, therefore, firmware, microcode or the like, combined with appropriate circuitry for executing that software to perform the function. The disclosure as defined by such claims resides in the fact that the functionalities provided by the various recited means are combined and brought together in the manner which the claims call for. It is thus regarded that any appropriate means that can provide those functionalities are covered by the “means for” feature.
For the clarity of the explanation of the proposed method and apparatus, it will rely on a use case example that could be applied to an elderly care service. The proposed method and apparatus is not limited to an elder care environment but more generally is preferably directed to a restrictive environment.
The elderly care service proposed as an example of usage of the proposed method and apparatus is based on collecting data from different sensors to learn about the habits of an elderly individual (monitored individual) and notify dedicated medical staff or a close relative about detected behavior anomalies or a shift in the habits of the monitored individual. To maximize acceptance by the monitored individual, no sensor is required to be worn by the monitored individual. That being said, one of the most relevant ways to monitor the activity is to use acoustical sensors (i.e., microphones). Privacy, which is outside of the scope of the proposed method and apparatus, will be preserved in collecting temporal, spectral or combination of both fragments of audio (acoustic) signals. The signals would be encrypted for even better privacy preservation. An infrastructure can be imagined that would require one microphone per room as presented in
The three microphones could be connected wirelessly to a box connected to the RGW (Residential Gateway) of the home network. Many other ways to connect the microphones to a centralized device can be used like Programmable Logic Controller (PLC) technology. Alternatively, the box functionality could be integrated in the RGW, which allows the microphones to have a direct connection to the RGW. In another embodiment, arrays of microphones can also be used in each room so as to be able to additionally take into account information about the spatial location of the sound events (i.e., spatial feature such as interchannel intensitive differences (IID) and interchannel phase differences (IPD) can be extracted in addition to the spectral feature (such as MFCC—Mel-Frequency Cepstral Coefficients) so as to form a final, more robust audio feature for classification). As an example, the locations of refrigerator, dishwasher, etc. are usually fixed so if recorded sound is detected coming from that direction (by spatial feature), it is more likely generated from these devices. Examples of the combination of the spatial information (only when using microphone array) and spectral information of the sound event can be found in the prior art.(i.e.: an example of combination of spatial and spectral information described in this paper “A Microphone Array System for Automatic Fall Detection”, IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 59, NO. 2, MAY 2012)
Examples of the types of rooms that would be relevant to monitor in the context of the proposed use case are the kitchen, the bedroom, the living room and the bathroom.
Then an exemplary list of activities based on type of sound to be recognized could be the following assuming that the room being monitored is equipped with at least one microphone or an array of microphones:
The above activities can be considered high level activities because no detail within each of them is displayed. To perform such coarse acoustical activity detection and recognition, an efficient approach in terms of processing resources is to use a classifier that runs its algorithm over a long audio (acoustic signal) time (temporal) window such as 5 seconds to several minutes instead of short audio (acoustic signal) frames like 10-40 milliseconds.
If finer activity level classification is required another exemplary list that can be considered as sub-activities of high level activities could be the following:
The generic audio (acoustic signal) processing tool workflow that would perform such audio (acoustic signal) recognition is presented in
In the training phase, two activity classifiers are trained based on the coarse activity with audio features extracted from a long time segment (window) and detail (fine) activity with audio features extracted from short time windows (segments). The feature extraction pipeline (e.g. as shown in
In the detection and classification phase, given the audio (acoustic signal) recording from microphones, coarse activities are first detected and recognized (classified) by the coarse classifier operating on audio (acoustic) signals in a long time window (e.g., during 5 minutes from 12 PM to 12:05 PM, user is doing cooking in the kitchen). Then detection and classification of more detailed activities is performed, which are limited by the given context as a result of the coarse detection and classification. This is possible since the proposed method and apparatus has already detected and classified the activity location (e.g., bathroom, kitchen, bedroom, living room). Thus a finer classification can be made by the fine classifier (e.g., if the fridge door is opened) in a much smaller window (segment) time since those types of activity are usually short. Note that the fine classifier can reduce false detection compared to the case it is used alone since the number of specific activities now is limited by context (activity location determined by the coarse detection and classification portion of the proposed method and apparatus). For example, if in the coarse detection and recognition step it is determined that the user is in in the kitchen, then the sound of opening the fridge door is not confused with the sound of opening the entrance door. Thus, using a two tiered detection and classification scheme as proposed herein, limits the scope of the search (detection and classification) of the fine classifier that will, thus, converge more quickly to the targeted result.
According to Wikipedia, in sound processing, the mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC. MFCCs are derived from a type of cepstral representation of the audio clip (a nonlinear “spectrum-of-a-spectrum”). It is possible to raise log-mel-amplitudes to a suitable power (around 2 or 3) before taking the DCT (Discrete Cosine Transform), which reduces the influence of low-energy components.
As a variant of the infrastructure presented in
It is to be understood that the proposed method and apparatus may be implemented in various forms of hardware, software, firmware, special purpose processors, or a combination thereof. Special purpose processors may include application specific integrated circuits (ASICs), reduced instruction set computers (RISCs) and/or field programmable gate arrays (FPGAs). Preferably, the proposed method and apparatus is implemented as a combination of hardware and software. Moreover, the software is preferably implemented as an application program tangibly embodied on a program storage device. The application program may be uploaded to, and executed by, a machine comprising any suitable architecture. Preferably, the machine is implemented on a computer platform having hardware such as one or more central processing units (CPU), a random access memory (RAM), and input/output (I/O) interface(s). The computer platform also includes an operating system and microinstruction code. The various processes and functions described herein may either be part of the microinstruction code or part of the application program (or a combination thereof), which is executed via the operating system. In addition, various other peripheral devices may be connected to the computer platform such as an additional data storage device and a printing device.
It should be understood that the elements shown in the figures may be implemented in various forms of hardware, software or combinations thereof. Preferably, these elements are implemented in a combination of hardware and software on one or more appropriately programmed general-purpose devices, which may include a processor, memory and input/output interfaces. Herein, the phrase “coupled” is defined to mean directly connected to or indirectly connected with through one or more intermediate components. Such intermediate components may include both hardware and software based components.
It is to be further understood that, because some of the constituent system components and method steps depicted in the accompanying figures are preferably implemented in software, the actual connections between the system components (or the process steps) may differ depending upon the manner in which the proposed method and apparatus is programmed. Given the teachings herein, one of ordinary skill in the related art will be able to contemplate these and similar implementations or configurations of the proposed method and apparatus.
Number | Date | Country | Kind |
---|---|---|---|
16305325.9 | Mar 2016 | EP | regional |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/EP2017/056923 | 3/23/2017 | WO | 00 |