The present invention describes techniques, systems, software and devices, which can be used in conjunction with, or as part of, systems which gather information using sensors that leverage disparate channels of information to improve sensor output by, for example using data generated from one sensor and one channel of information to improve the quality of data output by another sensor for another, disparate channel of information, e.g., using audio information to improve an output of a motion sensor or vice-versa.
Technologies associated with the communication of information have evolved rapidly over the last several decades. Television, cellular telephony, the Internet and optical communication techniques (to name just a few things) combine to inundate consumers with available information and entertainment options. Taking television as an example, the last three decades have seen the introduction of cable television service, satellite television service, pay-per-view movies and video-on-demand. Whereas television viewers of the 1960s could typically receive perhaps four or five over-the-air TV channels on their television sets, today's TV watchers have the opportunity to select from hundreds, thousands, and potentially millions of channels of shows and information. Video-on-demand technology, currently used primarily in hotels and the like, provides the potential for in-home entertainment selection from among thousands of movie titles.
Some attempts have also been made to modernize the screen interface between end users and media systems. However, these attempts typically suffer from, among other drawbacks, an inability to easily scale between large collections of media items and small collections of media items. For example, interfaces which rely on lists of items may work well for small collections of media items, but are tedious to browse for large collections of media items. Interfaces which rely on hierarchical navigation (e.g., tree structures) may be speedier to traverse than list interfaces for large collections of media items, but are not readily adaptable to small collections of media items. Additionally, users tend to lose interest in selection processes wherein the user has to move through three or more layers in a tree structure. For all of these cases, current remote units make this selection processor even more tedious by forcing the user to repeatedly depress the up and down buttons to navigate the list or hierarchies. When selection skipping controls are available such as page up and page down, the user usually has to look at the remote to find these special buttons or be trained to know that they even exist. Accordingly, organizing frameworks, techniques and systems which simplify the control and screen interface between users and media systems as well as accelerate the selection process, while at the same time permitting service providers to take advantage of the increases in available bandwidth to end user equipment by facilitating the supply of a large number of media items and new services to the user have been proposed in U.S. patent application Ser. No. 10/768,432, filed on Jan. 30, 2004, entitled “A Control Framework with a Zoomable Graphical User Interface for Organizing, Selecting and Launching Media Items”, the disclosure of which is incorporated here by reference.
To navigate rich user interfaces like that described in the '432 patent application, new types of remote devices have been developed with are usable to interact with such frameworks, as well as other applications and systems. Various different types of remote devices can be used with such frameworks including, for example, trackballs, “mouse”-type pointing devices, light pens, etc. However, another category of remote devices which can be used with such frameworks (and other applications) is 3D pointing devices. The phrase “3D pointing” is used in this specification to refer to the ability of an input device to move in three (or more) dimensions in the air in front of, e.g., a display screen, and the corresponding ability of the user interface to translate those motions directly into user interface commands, e.g., movement of a cursor on the display screen. The transfer of data between the 3D pointing device and another device or system which consumes that data may be performed wirelessly or via a wire connecting the 3D pointing device to the other device or system. Thus “3D pointing” differs from, e.g., conventional computer mouse pointing techniques which use a surface, e.g., a desk surface or mousepad, as a proxy surface from which relative movement of the mouse is translated into cursor movement on the computer display screen. An example of a 3D pointing device can be found in U.S. Pat. No. 7,158,118 to Matthew G. Liberty (hereafter referred to as the '118 patent), the disclosure of which is incorporated here by reference. Note that although 3D pointing devices are used herein as one example of device which senses motion, the present application is not limited thereto and is intended to encompass all such motion sensing devices, e.g., activity tracking devices which are typically worn on a user's wrist, and indeed devices which sense parameters other than motion as will be described below.
The '118 patent describes 3D pointing devices which include, for example, one or two rotational sensors and an accelerometer. The rotational sensor(s) are used, as described in more detail below, to detect an angular rate at which the 3D pointing device is being rotated by a user. However, the output of the rotational sensor(s) does not perfectly represent the angular rate at which the 3D pointing device is being rotated due to, for example, bias (also sometimes referred to as “offset”) in the sensor(s)' outputs. For example, when the 3D pointing device is motionless, the rotational sensor(s) will typically have a non-zero output due to their bias. If, for example, the 3D pointing device is used as an input to a user interface, e.g., to move a cursor, this will have the undesirable effect of cursor drifting across the screen when the user intends for the cursor to remain stationary. Thus, in order to provide a 3D pointing device which accurately reflects the user's intended movement, estimating and removing bias from sensor output is highly desirable. Moreover other devices, in addition to 3D pointing devices, may benefit from being able to estimate and compensate for the bias of inertial sensors. Making this process more challenging is the fact that the bias is different from sensor to sensor and, even for individual sensors, is time-varying, e.g., due to changes in temperature.
Bias error associated with rotational sensors is merely one example of the more general paradigm that all sensors are imperfect and, therefore, output data which imperfectly reflects the portion of an environment which they are intended to measure. Thus the aforedescribed 3D pointing device imperfectly measures motion or movement of a user's hand which is holding the 3D pointing device. In the present specification “motion” can be considered to be one data channel that can be measured by a sensor or a number of different sensors.
Each individual sensor in a system typically individually senses one aspect of reality. Since such systems are generally concerned with sensing a local reality, i.e., one within some area surrounding the sensing system's disposed position. In this specification, this area will be referred to herein as a “scene”. The scene itself is typically quite complex and multi-dimensional but each sensor in the system only sees one dimension. For example, televisions being manufactured today may come with a number of different sensors including, for example, microphones and cameras which sense local sound and images in the area near the television.
Sensor errors are sometimes direct and sometimes indirect. A direct error is, for example, one where sensor bias or scale or resolution or noise corrupt the reading. An indirect error is, for example, one where the measurement is affected by other aspects of the scene. An example of an indirect error would be where the temperature affects the reading or one of the direct error drivers. For example, in the context of motion sensors described above, the temperature of a rotational sensor might affect the bias (or offset) of the sensor and thereby affect the output value of the sensor.
Compensation techniques can, for example, be derived to directly address the effects of both direct and indirect sensor errors. For example, in the context of direct sensor bias errors, as well as indirect sensor bias errors caused, e.g., by temperature, attempts have been made to directly compensate for such errors by adjusting the sensor's output as a function of temperature and/or other factors. See, e.g., U.S. Pat. No. 8,683,850, entitled “Real-time dynamic tracking of bias”, the disclosure of which is incorporated here by reference and hereafter referred to as the '850 patent.
While such techniques can be effective, there is still room for improvement in the area of sensor output compensation, generally, and not just in the area of bias compensation for motion sensors which is used purely as an illustrative example above.
Motion and audio data associated with an area or a user are sensed and processed jointly to achieve improved results as compared to utilizing only the motion or the audio data by themselves. Synergies between motion and audio are identified and exploited in devices ranging from cell phones to wearables such as smart watches, activity trackers to home entertainment and alarm systems as well as other Internet of Things (IoT) devices and systems.
According to one exemplary embodiment, a device includes at least one sensor for sensing motion of the device and generating at least one motion output, at least one sensor for sensing sounds in a vicinity of the device and generating at least one audio output, and a processor adapted to determine whether a particular condition associated with the device or a user of the device is met using both the at least one motion output and the at least one audio output.
According to another embodiment, a method includes sensing motion of a user and generating at least one motion output, sensing sounds in a vicinity of the user and generating at least one audio output; and determining whether a particular condition associated with the user is met using both the at least one motion output and the at least one audio output.
According to yet another embodiment, a communication device includes at least one microphone, at least one motion sensor, at least one wireless transceiver; and at least one processor; wherein the at least one processor and the at least one wireless transceiver are configured to transmit voice signals received from the at least one microphone over an air interface; wherein the at least one processor is further configured to receive audio data from the at least one microphone and motion data from the at least one motion sensor and uses both the audio data and the motion data to adapt processing of the voice signals for transmission.
The accompanying drawings illustrate embodiments, wherein:
The following detailed description of the invention refers to the accompanying drawings. The same reference numbers in different drawings identify the same or similar elements. Also, the following detailed description does not limit the invention. Instead, the scope of the invention is defined by the appended claims.
As mentioned above, embodiments presented herein include techniques, systems, software and devices, which can be used in conjunction with, or as part of, systems which gather information using sensors that leverage disparate channels of information to improve sensor output by, for example using data generated from one sensor and one channel of information to improve the quality of data output by another sensor for another, disparate channel of information, e.g., using audio information to improve an output of a motion sensor or vice versa. The following portions of this specification are organized as follows. First an overview of scene analysis, including both audio and video scene analysis, is described. Then specific examples of a motion sensing system and an audio sensing system are provided. Next an example of how to jointly use sensed audio data and motion data to improve a decision is shown in the context of step detection. Finally, various alternatives and generalizations are shown since the embodiments are not limited to audio/motion scene analysis.
Scene analysis is a term used primarily in audio and video processing to refer to identifying the elements of a scene. As mentioned above, a scene can be considered to be an area or volume which surrounds a sensing system's position, and which can be delimited by, for example, the sensing range of one or more of the sensors used by the sensing system, e.g., the sensor having the smallest sensing range. In audio, for example, Albert Bregman is credited with coming up with the idea for “auditory scene analysis”. Auditory scene analysis involves separating out individual sound sources or components within a scene and then integrating those components that belong with each other, e.g., which originate from a same source, while segregating those components that originate from different sources. See, e.g., Bregman in Auditory Scene Analysis, MIT, 1990 and Elyse Sussman, “Integration and Segregation in Auditory Scene Analysis”, Journal of Acoustical Society of America 117 (3), Pt. 1, March 2005, pp. 1285-1298, the disclosures of which are incorporated here by reference and referred to jointly below as “the auditory scene analysis articles”.
To better understand auditory scene analysis, consider the following example. Imagine sitting outside at a café in a plaza. There are birds making different sounds in the plaza, cars going by, people chatting at tables nearby, the sound of the moving river across the bank and so on. Those are all sounds that originate from different sources and are perceived together, e.g., by a person's ears or a microphone disposed in the plaza. The process of auditory scene analysis separates those sources and their corresponding sounds out from the composite scene for individual analysis and enables them to be operated upon separately. For example, a sensing system which performs auditory scene analysis on this scene might choose to mute all of the sounds other than the bird noises, or to try to focus on the people's conversations by amplifying the auditory data associated with only those sources. Filtering, amplification, noise reduction, etc., are just a few of the examples of operations which can be performed on audio data after auditory scene analysis has been performed.
Visual scene analysis is similar to auditory scene analysis but applies to perceived images rather than audio. Imagine the same café scene as described above but this time consider how the video image of that scene is perceived by the eyes of an observer or a camera disposed in the plaza. All (or at least most) of those objects that made sounds plus many that didn't (e.g., a chair, a table, a building) are merged into one composite image that is recorded by the image sensor(s). The process of visual scene analysis separates those individual objects out for separate analysis and operation.
Scene analysis, whether it be auditory or visual, can be quite useful. In the case of auditory scene analysis, this general technique can, in principle, allow for synthesizing an audio signal with only the component of interest, say the voice of the person across the table from you, while removing all the interfering noises of all the other people around, the construction noise down the street and so on. The result is a cleaner audio signal that could then better be transferred over a phone connection or perhaps used for speech recognition and the like. Performing video scene analysis offers its own analogous benefits in the visual domain.
While a conventional auditory scene analysis only concerns itself with audio input from one type of data channel (i.e., an audio channel type, albeit possibly from several different physical channels, i.e., different microphones) and a conventional visual scene analysis only concerns itself with image input from one type of data channel (i.e., an image channel type, albeit possibly from several different physical cameras), embodiments described herein instead perform scene analysis (and other functions) by operating on multiple dimensions/different types of data channels to provide what is referred to herein as Composite Scene Analysis and Fusion (CSAF). CSAF synthesizes a “Composite Scene” containing all the information about the scene from all the available sensors (including data and metadata from the cloud as appropriate). For the café example discussed earlier, this composite scene would ideally include information on the location and audio characteristics from all of the audio sources detectable by the microphones along with information on the acoustic properties like absorption and reflectivity from all the various objects and surfaces in the 3D space around the applicable sensors. The composite scene would also ideally include all the position, orientation and motion information on all the relevant physical objects in the 3D space around the applicable sensors such as the device itself, the person holding the device (if that's the case) and so on. If visual sensors are available, the composite scene would ideally identify all visual elements in the scene such as tables, chairs, people, cars, the river and so on. If there are other information inputs available, the composite scene would include them too. The “Analysis and Fusion” part of CSAF refers to the inferences and linkages formed between the layers of the Composite Scene as well as the work to segregate the original input information so that portions could be attributed to each element separately.
For example, a CSAF system according to an embodiment could possess both auditory and visual sensors to operate on both audio and image data channels in a manner which can improve the overall functionality of each. More specifically, such embodiments can provide (but are not limited to) two different sorts of benefits by virtue of operating on multiple types of data channels. The first benefit is that embodiments can improve the readings from an individual sensor or even enhance a particular type of scene analysis, e.g., for a combined audio/image sensing system improving the output of the audio sensor, or the auditory scene analysis, using the output of the visual sensor. The second benefit is that embodiments can better infer higher level knowledge of what is occurring in the scene, e.g., by combining and merging the information gleaned from the individual observations. As an example of the first benefit, let's consider a scenario where a female user with high heels is walking across a hard wood floor while talking on the phone. Normally, the reverberant heel strikes would interfere with the speech and be tough for an audio-only system to deal with since they are intermittent and from multiple locations (once echo is included). However, once motion scene analysis identifies the heel strikes in time, the audio system can better align and separate out the heel strike echoes from the acoustics and isolate on the desired speech. This is one example of motion processing improving the acoustic scene analysis.
As an example of the second benefit, consider a CSAF system which is connected to a higher level system that would like to understand whether the user is working at the office, just walking or playing tennis. Acoustics alone might help since the background sounds of the office differ from those of a tennis court but they aren't definitive. Motion analysis helps since tennis movements involve both arms and legs in a different way than just walking But combining determinations from the Acoustic Scene Analysis and the Motion Scene Analysis together yields a better quality determination overall than either would alone.
The foregoing provides a high level description of CSAF embodiments using audio, video and motion as general non-specific examples of synergistically performing scene analysis using disparate data channels, primarily since audio and video domains have themselves been investigated in depth for many decades, and are frequently used together (albeit not in scene analysis) as input and output data. However the embodiments described herein contemplate CSAF systems which operate on even more disparate types of data channels, some of which have only much more recently come into use in commercial products. For example, and as mentioned earlier, one such data channel is motion and one such CSAF embodiment operates on one or more motion data channels as well as one or more audio data channels.
As a precursor to discussing motion scene analysis and then discussing CSAF embodiments involving both motion and audio channels, a brief example of a motion sensing device/system is provided for context. Remote devices which operate as 3D pointers are examples of motion sensing devices which enable the translation of movement, e.g., gestures, into commands to a user interface. An exemplary 3D pointing device 100 is depicted in
Numerous different types of sensors can be employed within device 100 to sense its motion, e.g., gyroscopes, angular rotation sensors, accelerometers, magnetometers, etc. It will be appreciated by those skilled in the art that one or more of each or some of these sensors can be employed within device 100. According to one purely illustrative example, two rotational sensors 220 and 222 and one accelerometer 224 can be employed as sensors in 3D pointing device 100 as shown in
A handheld motion sensing device is not limited to the industrial design illustrated in
Such motion sensing devices 100, 300 have numerous applications including, for example, usage in the so-called “10 foot” interface between a sofa and a television in the typical living room as shown in
It shall be appreciated that the examples provided above with respect to
The data captured by the sensors which are provided in such systems can then be processed to, for example, audio scene analysis and/or motion scene analysis. A high level example of an audio processing function/system 500 is provided as
The other processing block that the clean audio signal is forwarded to is the scene analysis block 510. Here, the clean audio signal is resolved into constituent parts as described previously or in the manner described in the above-incorporated by reference auditory scene analysis articles. The processing performed by the scene analysis block 510 can generate data which then drives the various functions in the box 512, which are merely illustrative of various functions which can be used to operate on the audio components depending upon the particular application of interest. In this example, such functions 512 include a block 514 to determine the mood of the speaker (e.g., happy, sad), a block 516 to identify the speaker based on the characteristics of the speech provided by the scene analysis block 510, and a third block 518 is a speech recognition engine to detect the words, and possibly phonemes, in the detected speech.
A similar processing function system 600 can be provided to process sensed motion data, and to provide motion scene analysis according to an embodiment, an example of which is provided in
The second block to which the calibrated motion signals are sent is the scene analysis block 610. In block 610 the individual elements of the motion are determined Those motion elements then support a number of different potential applications as shown in block 612. Note that block 612 merely provides an exemplary rather than exhaustive list of applications which can be driven based on motion elements received from the motion scene analysis block 610. One is an application processing block which determines the mood of the device wearer (e.g., happy, sad) based on one or more motion elements. Another block determines the activity the device wearer is engaged in (e.g., walking, swimming, weight lifting). A third block identifies either the device wearer or the environment the device is in. A fourth block recognizes and/or measures the motion elements (e.g., steps, strokes, swings). The final example block measures some of the biomarkers and/or biometrics of a person (e.g., heartrate, weight).
The motion scene analysis block 610, for example, decomposes the calibrated motion signal which it receives into motion measurements for all the rigid body elements in the scene. Each of those individual rigid body element motion measurements fit, for example, a simplified ideal motion model which gives the total acceleration of a ridged body relative to a selected origin and center of rotation is described below and illustrated in
A
n
=A
L
+
×H
n+ω×(ω×Hn)+En
where:
An=Acceleration at the point
AL=Linear acceleration
=Angular acceleration
ω=Angular Velocity
Hn=Vector from center
En=Error
Note that the center of rotation and origin may be selected arbitrarily.
When breaking down a set of motion readings from, say, a mobile phone held in someone's hand into a Motion Scene one ideally includes elements such as in this following list of Motion Scene Analysis Basics. A particular embodiment may choose to only do some of these or might choose to do more of them depending on the constraints and goals of the embodiment in question. For the purpose of the list below and this example, each body part (such as forearm or thigh) is considered a separate rigid body as is the phone itself. Again, the particular way a Motion Scene is broken down depends on the goals and constraints of the embodiment in question (analogous to the Acoustic Scene and Visual Scene).
Tremor
Inverse Kinematics
Contextually Irrelevant Motion Separation
Motion from Other Source
Other Sources Including Motion Mimicry
The motion scene analysis system described above with respect to
The value of motion scene analysis (MSA) stretches across many categories since a more complete and correct MSA yields value across the application set, for example:
Pointing—only relevant motion is considered
Natural motion—only body motion is considered
Pedestrian navigation—if MSA was perfect, there would be no need for continual GPS reads
Contextual Sensor Fusion—better MSA is a major input to contextual decisions
Fingerprint mapping—better MSA allows for better mapping as well.
While some embodiments and applications may benefit from using motion scene analysis by itself, other embodiments may benefit by further augmenting MSA using one or more other, disparate data channels. A specific example will now be provided where audio scene analysis and motion scene analysis are both performed to augment a determination regarding whether a person in the scene performed a “step”. For example, activity trackers today are commonly designed to count the number of steps that a person takes, e.g., on a daily basis, as part of a regimen for recording a user's physical activity level. Thus such devices will benefit from more accurate decision making regarding whether or not a particular motion by a user's legs should be counted as a step, or not.
First, consider the audio processing of a step or heel strike. Via the acoustic scene analysis described above, the sound component of a heel striking the floor can be isolated. By analyzing and comparing the sound patterns across, for example, spatially separated microphones (if available), the approximate direction of the heel strike relative to the sensing device(s) can optionally be determined. Furthermore, the precise timing and even sound intensity of the step can be determined. Sound analysis can thus determine when the floor type changes from, say, tile to carpet and can also signal when the striking foot pivots as well as hits the floor (indicating a turn). All of this information can be used separately or jointly to determine within some probability whether, at a given time t, a person in the scene performed a “step” based solely on audio data collected by the microphone(s).
Next consider the motion processing of a step or heel strike. Via the motion scene analysis described above, some parameters related to a step can be determined which are analogous to those determined using audio scene analysis. First, by using gravity, the orientation of the device (e.g., handheld or wearable) can be determined (apart from a yaw rotation). Yaw rotation can, for example, be determined via tracking magnetic fields (e.g., magnetic North direction) and/or by determining the traveling direction of the person and mapping the traveling direction back to the phone orientation. The traveling direction of a person can be analyzed, for example, by isolating and tracking the horizontal acceleration and deceleration perpendicular to gravity that the user induces with each step push off and heel strike.
Each scene analysis can help the other. First, the precise vibration detection of a heel strike available from the motion sensor(s) can help with full isolation of the acoustic signature of that heel strike. This can help improve accuracy of a heel strike detection by the audio scene analysis when the acoustic signal gets weaker, for example, when the user walks on carpet instead of tile. Then, too, the acoustic signal can help improve the motion scene analysis as well. For example, when the user steps more lightly on the floor, the sound may be detected more easily than the vibration and acceleration forces generated by the heel strike, and can be used to improve the determination made by the motion scene analysis.
The foregoing provides some high level examples of how disparate channels of information can be used jointly to provide better scene analysis and/or better decision making associated with information derived from a scene according to various embodiments. A more detailed, and yet purely illustrative, example will now be provided. For example, one way to express the joint usage of disparate channels of information in a CSAF embodiment mathematically for the step/heel strike example given above is with the following equations: Let heel-striketa represent the probability that a heel strike occurred at time t based on audio analysis. Let σm2 represent the variance of that acoustical determination of heel strikes. Let heel-striketm represent the probability that a heel strike occurred at time t based on motion analysis. Let σm2 represent the variance of that motion determination of heel strikes. An example of a CSAF embodiment (i.e., sensor fusion result) can then be expressed as
Those skilled in the art will appreciate that blending the audio information and the motion information to generate a step or heel strike conclusion can be performed in numerous other ways (e.g., Kalman filtering, Maximum Likelihood, etc.) and that the foregoing is merely one example.
The example of
Similarly, embodiments can be used to adapt for a phone's location in car or vehicle (e.g., cup, seat). Different locations of the phone in a car can influence the optimal acoustical processing parameters for that phone (e.g., frequency shaping, reverberation control, volume). Since the vibration patterns and orientation angles of a phone in the cupholder or ashtray differ from being on a seat, the MSA can provide this information to the audio processing system to enable the audio processing system to adapt its audio processing based on the phone's location within the car.
Context decisions can also be improved with MSA and ASA together according to various other embodiments, e.g. using vehicle sounds and movement together rather than just one or the other to determine if an authorized user/operator is present in a vehicle. In-vehicle identification can be difficult to infer with just motion sensors. Typically some variation of motion pattern recognition as people get in and out of cars, vibration detection, magnetic field patterns and horizontal plane acceleration are used to detect whether the phone is in a vehicle or not. However audio processing can aid in this detection quite a bit—at least for the typical car. The sound of the car starting up, the car system's message sounds, road and traffic noise and the like can all be detected and augment the overall classification decision.
CSAF provides for other improvements to current audio and motion features including, but not limited to, improved audio processing through better detection of phone position, improved motion detection through inclusion of sound data, and background context used to adjust vibrate/notification levels. Additionally, CSAF provides for new health and fitness functionality, e.g., improved sleep monitoring and apnea “diagnosis” through combination of actigraphy, research work and breathing pattern analysis. In this context, it will be appreciated that actigraphy refers to inferring sleep state and/or level by analyzing motion signals from an accelerometer. Actigraphy can achieve moderate accuracy but is very far from matching the gold standard in the industry—i.e., PSG or Polysomnography. The problem with PSG is that it is cumbersome with lots of wire probes attached to the body and requires analysis by a trained clinician to interpret the output. An easier to use home monitoring system which gets closer to the PSG answer than actigraphy is desired.
Audio processing is one way to improve accuracy of actigraphy. Via a microphone, the unit can hear the breathing patterns of the patient. By detecting the volume and regularity of those patterns—including snoring, if present—additional information re the patient's sleep state is obtained. One illustrative embodiment follows and is based on the fact that conventional snoring is unlikely in REM sleep while sleep apnea snoring is most likely in REM sleep. Sleep apnea snoring involves intermittent “noisy recovery breaths”. REM sleep is also the state where the skeletal body muscles are the most relaxed. Therefore, we can have two detectors, REMtb and REMtm which represent the probability that the user is in REM sleep based on breathing and motion determinations respectively. The breath detector could, for example, be implementing by comparing the deviations in timing and volume between the current breath cycle and the immediate average. The motion detector would be one based on actigraphy and be looking for a very small amount of motion. Let σb2 represent the variance of that acoustic breathing-based determination of whether or not the user is in REM sleep. Let σm2 represent the variance of that motion-based determination of whether or not the user is in REM sleep. Then one CSAF embodiment of a sensor fusion result could then be expressed as follows:
Those skilled in the art will appreciate that this blending could also performed in numerous other ways (e.g., Kalman filtering, Maximum Likelihood, etc.) and that the foregoing is merely one example. That in turn could be further augmented with information like heart rate and heart rate variability and other biometrics. Similar combinations of audio information with existing biometric monitoring could include an exercise monitor enhanced with breath pattern analysis, health monitoring via breathing, motion, temperature and even internal body sounds (e.g., a contact microphone disposed in a wearable for heartbeat), stride and breathing analysis, sound-based detection of under/above water state (useful for swimming for example) and fall detection for aging in place applications.
Regarding the latter topic of fall detection, motion processing can be used to detect falls by detecting the motion pattern anomalies when the user falls as compared to walking or sitting down normally. However, there are a number of user behaviors which make it difficult to properly detect falls. For example, a slow fall is difficult to distinguish from a normal sitting down movement. Also, lying down purposefully on a bed can appear similar to falling prone on the floor at a moderate speed.
Audio processing can aid in distinguishing those cases. The sound of someone hitting the floor is different than the sound of someone hitting the bed. The “oomph” or involuntary groan of a person who falls can be recognized and lined up with the motion that preceded it to help distinguish a fall from something normal. The sound of a person after a potential fall event is different as well. In the case of lying down, one might hear the sound of a TV or breathing. In the case of falling down, one might here an occasional moan or a strained breathing sound.
Other new functionality is also possible using other CSAF embodiments, including, for example, Body Area GPS for multi-device case with audio ToA (Time of Arrival), Deep Belief Networks, with sound and motion, to improve Latency Adjustment feature and Biometric Fingerprint assessment for user authentication and Context State decisions and Extended context detection through band-based analysis.
CSAF thus involves systems, devices, software and methods, among other things. One example of a method embodiment is illustrated by the flowchart of
Systems and methods for processing data according to exemplary embodiments of the present invention can be performed by one or more processors executing sequences of instructions contained in a memory device. Such instructions may be read into the memory device from other computer-readable mediums such as secondary data storage device(s). Execution of the sequences of instructions contained in the memory device causes the processor to operate, for example, as described above. In alternative embodiments, hard-wire circuitry may be used in place of or in combination with software instructions to implement the present invention. Such software may run on a processor which is housed within the device, e.g., a 3D pointing device, cell phone or other device, which contains the sensors or the software may run on a processor or computer housed within another device, e.g., a system controller, a game console, a personal computer, etc., which is in communication with the device containing the sensors. In such a case, data may be transferred via wireline or wirelessly between the device containing the sensors and the device containing the processor which runs the software which performs the CSAF methodology as described above. According to other exemplary embodiments, some of the processing described above with respect to bias estimation may be performed in the device containing the sensors, while the remainder of the processing is performed in a second device after receipt of the partially processed data from the device containing the sensors.
Although some of the foregoing exemplary embodiments relate to sensing packages including one or more rotational sensors and an accelerometer, CSAF techniques according to these exemplary embodiments are not limited to only these types of sensors. Instead CSAF techniques as described herein can be applied to devices which include, for example, only accelerometer(s), optical and inertial sensors (e.g., a rotational sensor, a gyroscope or an accelerometer), a magnetometer and an inertial sensor (e.g., a rotational sensor, a gyroscope or an accelerometer), a magnetometer and an optical sensor, or other sensor combinations. Additionally, although exemplary embodiments described herein relate to CSAF techniques in the context of 3D pointing devices, cell phones, activity trackers and related applications, such techniques are not so limited and may be employed in methods and devices associated with other applications, e.g., medical applications, gaming, cameras, military applications, etc.
The above-described exemplary embodiments are intended to be illustrative in all respects, rather than restrictive, of the present invention. Thus the present invention is capable of many variations in detailed implementation that can be derived from the description contained herein by a person skilled in the art. For example, although the foregoing exemplary embodiments describe, among other things, the use of inertial sensors to detect movement of a device, other types of sensors (e.g., ultrasound, magnetic or optical) can be used instead of, or in addition to, inertial sensors in conjunction with the afore-described signal processing. All such variations and modifications are considered to be within the scope and spirit of the present invention as defined by the following claims. No element, act, or instruction used in the description of the present application should be construed as critical or essential to the invention unless explicitly described as such. Also, as used herein, the article “a” is intended to include one or more items.
This application is related to, and claims priority from, U.S. Provisional Patent Application Ser. No. 62/040,579, filed on Aug. 22, 2014, entitled “Audio and Motion Synergies”, the disclosure of which is incorporated here by reference.
Number | Date | Country | |
---|---|---|---|
62040579 | Aug 2014 | US |