This invention relates generally to apparatus such as electronic or mobile devices and, more specifically, relates to user interaction with the apparatus.
This section is intended to provide a background or context to the invention that is recited in the claims. The description herein may include concepts that could be pursued, but are not necessarily ones that have been previously conceived, implemented or described. Therefore, unless otherwise indicated herein, what is described in this section is not prior art to the description and claims in this application and is not admitted to be prior art by inclusion in this section.
Mobile devices are becoming more prevalent, smaller, and varied. For instance, smart phones and tablets have become ubiquitous. Recently, smart glasses, smart watches, and like have become popular and are becoming even more popular. Each of these varied mobile devices has to have some interface with which a user can communicate. That is, a user needs the mobile device (or an application on the device) to perform some function, and the interface for the device is the way the user commands the device to perform the function.
Many user interfaces are dominated by touch, such as through gestures on a touch screen or via user interaction with physical buttons or other elements. However, these touch-based interface elements are not always easy to access. As an example, if a user is walking or jogging and using a smart watch or a smart phone, the user has to access the watch or phone, find the touch-based interface element, and interact with the element in a certain way. This sequence of events could interrupt the walk or jog, or could at the least cause some amount of aggravation on the part of the user.
It would be beneficial to improve user interaction with mobile devices such as smart phones, tablets, smart glasses, or smart watches.
The foregoing and other aspects of embodiments of this invention are made more evident in the following Detailed Description of Exemplary Embodiments, when read in conjunction with the attached Drawing Figures, wherein:
In an exemplary embodiment, an apparatus includes one or more location sensors configured to output one or more signals and one or more microphones configured to form corresponding microphone signals. The apparatus also includes one or more processors configured to cause the apparatus to perform at least the following: determination, using the one or more signals from the one or more location sensors, of a direction of at least one object relative to the apparatus; recognition, by the apparatus using a signal from a microphone in the apparatus, of one or more attributes of an acoustic signal made by the at least one object; and causation of an operation to be performed by the apparatus in response to the direction and the recognized one or more attributes being determined to correspond to the operation.
In another exemplary embodiment, an apparatus includes means for sensing a location and for outputting corresponding one or more signals and means for sensing audio configured to form corresponding audio signals. The apparatus also includes the following: means for determining, using the one or more signals from the means for sensing a location, a direction of at least one object relative to the apparatus; means for recognizing, by the apparatus using a signal from the means for sensing audio in the apparatus, one or more attributes of an acoustic signal made by the at least one object; and means for causing an operation to be performed by the apparatus in response to the direction and the recognized one or more attributes being determined to correspond to the operation.
An exemplary computer program product includes a computer-readable storage medium bearing computer program code embodied therein for use with a computer. The computer program code includes: code for determining, using one or more signals from one or more location sensors of an apparatus, a direction of at least one object relative to the apparatus; code for recognizing, by the apparatus using a signal from a microphone in the apparatus, one or more attributes of an acoustic signal made by the at least one object; and code for causing an operation to be performed by the apparatus in response to the direction and the recognized one or more attributes being determined to correspond to the operation.
In a further exemplary embodiment, a method comprises the following: determining, using one or more signals from one or more location sensors of an apparatus, a direction of at least one object relative to the apparatus; recognizing, by the apparatus using a signal from a microphone in the apparatus, one or more attributes of an acoustic signal made by the at least one object; and causing an operation to be performed by the apparatus in response to the direction and the recognized one or more attributes being determined to correspond to the operation.
As stated above, it would be beneficial to improve user interaction with mobile devices such as smart phones, tablets, smart glasses, or smart watches. The number of different ways people interact with their devices grows daily. Some of the reasons are, for example, the following: the growing size of the devices makes normal one-handed operation difficult; the devices are taken along for an increasing amount of activities like jogging or snowboarding; and the emergence of wearable computing devices such as smart glasses and smart watches that are worn and not carried requires mobile device makers to invent new ways to interact with mobile devices.
Such devices include different sensors so that a user can interact with the mobile device. For instance, touch sensors are readily available in most mobile devices (where “mobile” indicates a user can carry and easily move the device). A touch sensor relies on touch of a fingertip of a user to determine the location of a fingertip. Hover sensors are relatively new sensors that allow locations of fingers of a user (or other objects) to be determined, without the user touching a hover sensor or the device. Both touch and hover sensors may be integrated with or placed near a display such as a touch screen. Touch and hover sensors may be considered location sensors, as they can determine locations of, e.g., a hand or fingers of a user and other objects.
Hover sensors are good at detecting directions where a user hand is located. However, hover sensors are not good at detecting exact moments when a user makes a sound event. A sound signal from the sound event detected by microphones can easily be used for detecting the exact moment of a user interaction.
There are currently several ways to interact with a device with directional sound commands. Typically, however, extra devices or accessories are required for detecting audio interactions with directions. The microphones built into a mobile device are seldom used.
Conventional methods use an accelerometer to determine if a tap has occurred, then uses the microphone to determine a direction of the tap. The drawback is that these methods use an accelerometer. Accelerometers can get confused with noisy signals that are typical, for instance, while running.
The inventors have realized that hover sensors are good at detecting directions where the user hand is located. However, hover sensors are not good at detecting exact moments when a user makes a sound event. Microphones can easily be used for detecting an exact moment of a user interaction, which is helpful in certain situations. In particular, if the intent is to control a game, it is beneficial to be able to time the control precisely (e.g., not to drive the game car off the road for example). Another example could be if the user is fast forwarding a song or a video, the moment when to stop fast forwarding and to return to normal speed is time critical for a good user experience. A hover sensor can be beneficial in case of user interactions where the device is not touched.
Consequently, a first exemplary solution herein, which uses a single microphone and is responsive to acoustic signals created by touching of the device by the user, avoids the above problems by detecting the direction of a sound using a hover sensor which detects, e.g., on which side of the device the user hand is. Also, in exemplary embodiments, the sound events are classified and recognized, which provides the user interaction extra flexibility and robustness against environmental sounds. A solution alternative to or in addition to using hover sensors is a solution that uses one or several touch sensors in place of the hover sensors. Touch sensors—like hover sensors—work better than accelerometers in situations like running.
Another potential drawback in conventional systems is that certain systems require touching the device (e.g., in order to effect the accelerometers) and in addition the accelerometers can get confused with noisy signals that are typical for example while running. A second exemplary solution herein, which uses a single microphone and is responsive to acoustic signals created without touching of the device by the user, avoids both these problems by detecting the moment of a sound from a microphone and the direction of the sound using a hover touch sensor, which detects, e.g., which side of the device the user hand is.
Additional details of exemplary solutions are presented after a description of an apparatus suitable for performing the exemplary embodiments is presented. Turning to
In brief, a combination of a single microphone 145 and location sensor(s) 171 such as hover sensor 175 and/or touch sensor 170 and an acoustic signal classification system (e.g., via the acoustic signal recognition unit 120) is used to detect directional sound/touch commands in a way that the commands should not get confused with environmental sounds or random acceleration of the device. More specifically, the user interaction control unit 135 can access data from location sensors 171 such as the touch sensor(s) 170 and/or the hover sensor(s) 175. The direction analysis unit 115 performs an analysis to determine a direction (relative to the mobile device 100) of an object based on the data from the touch sensor(s) 170 and/or the hover sensor(s) 175 to determine an analyzed direction 198. The user interaction control unit 135 accesses the acoustic signal 140, which in this example is digital data from the microphone 145, and operates the acoustic signal recognition unit 120 to recognize an acoustic signal such as a sound or a vibration or both.
Acoustic signals 127 could be time or frequency domain signals from a particular sound, such as a tap on a side of the mobile device 100. However, the acoustic signal 127 could correspond to, e.g., a snap, tap, clap, hit, and these can contact the mobile device or not contact the mobile device, depending on the user actions. That is, the recognized acoustic signal 127 could be a gesture of a “tap” for instance and this might be used as a simple, shorthand way to characterize the signal. Additionally, the recognized acoustic signal 127 could be some type of representation used to distinguish between different user interactions. For instance, there could be several different recognized acoustic signals 127 such as “signal 1”, “signal 2”, and “signal 3”, and as long as the signals correspond to different user actions and are distinguishable, then the mobile device 100 should be able to apply these recognized acoustic signals for user interaction. That is, there is no need for a recognized acoustic signal 127 (or 197) to be characterized as a gesture of a “tap”, even if the recognized signal is a tap.
As stated above, acoustic signals 127 could be time or frequency domain signals from a particular sound, such as a tap on a side of the mobile device 100. Alternatively or additionally, the acoustic signals 127 could be processed into one or more attributes that are representative of the sound. For example, some parameters like LPC (linear predictive coding) coefficients could be used for a set of attributes 146, and these coefficients represent a frequency distribution of the acoustic signals 127. Processing may be performed by the acoustic signal recognition unit 120 to determine a set of attributes 199 that is then compared against the set of attributes 146 in the acoustic signal and direction database 125. As additional examples, the set of attributes 146 could be a sampling of a frequency distribution of the acoustic signal. Other possible attributes are described below. The sets of attributes 199, 146 can be used to determine what the acoustic signal 197, 127 is, for instance the “shorthand” of “tap”. Alternatively, the sets of attributes 199, 146 can be compared directly, such that no “shorthand” is necessary. Thus, instead of recognizing that an acoustic signal 197 is a “tap” and looking for a “tap” in the recognized acoustic signals 127, the mobile device 100 could compare sets of attributes 199 and 146 to determine which operation 129 should be performed. The sets of attributes 146 themselves may never have an applied shorthand for gestures such as taps, hits, and the like, and instead the sets of attributes 146 would be used directly.
The recognized acoustic signal (AS) 197 and the analyzed direction (D) 198 form a pair 196. The acoustic signal and direction database 125 has a number of pairs 126 of acoustic signals (AS) 127 and directions (D) 128, where each of the pairs 126 correspond to an operation (O) 128. In this example, there are N pairs 126 and a corresponding number of operations 129, but this is not a requirement. For instance, there could be multiple pairs 126 assigned to a single operation 129, or multiple operations 129 assigned to a single pair 126, or other options. If there is a match between the acoustic signal 197 recognized by analysis of the acoustic signal 140 and the analyzed direction 198 with acoustic signals 127 and directions 128, respectively, such that a pair 126 is determined, the corresponding operation(s) 129 is performed.
The operations 129 may be performed by one or more of the applications 185. For instance, the operations 129 could be instructions to a music or video player (as an application 185), e.g., to pause, stop, or play the current media. Alternatively or additionally, the operations 129 may be performed by the OS 130. As another example, the operations 129 could be to open (by the OS 130) a certain application, such as texting or email. Many other options are possible.
The video processor 150 processes information to be displayed on the display 160. The display 160 may be integrated with the touch sensor(s) 170 and/or the hover sensor(s) 175, or the touch sensor(s) 170 and/or the hover sensor(s) 175 may be separate from the display 160.
The one or more network interfaces 165 may be wired or wireless or both. Typically, a mobile device 100 operates at least with wireless network interfaces such as Bluetooth (a wireless technology standard for exchanging data over short distances), cellular interfaces, and the like.
A first embodiment is a single microphone solution responsive to touch by a user of the mobile device 100. Turning to
In this first exemplary embodiment, a hover sensor 175/240 is used to detect the direction of location of movement of finger 251 of a user hand 250 with respect to the device 100/200. That is, whether the location of movement is, for instance, to the left side 272, right side 270, above 271 or below 273 the device. These locations are relative to an x-y coordinate system 274. Note that the locations can be used to locate a contact by an object on any part of the mobile device 100/200 or to locate a sound by the object where the object does not touch any part of the mobile device 100/200. In the example of
Touching the device 100/200 in different parts may cause significantly different acoustic signals reaching the microphone 145 due to the device 100/200 being made of different materials in different parts. Therefore, it can be beneficial to have a different acoustic signal and direction pair 126 for each touch location used on the device 100/200. When the user touches the device, the acoustic signal 140 picked up by the microphone 145 may be partly touch induced vibrations 280 and partly sound 290 that has traversed through air. However, from a recognition point of view, this is irrelevant, as sound and vibration signals and their pairs can be recognized similarly.
The user can teach the device 100/200 the acoustic signal and direction pairs and their associated operations so that when the device recognizes a certain sound/vibration with a direction, the device may perform an operation. For example if the user slaps the device on its right side, a music player (as an application 185) may skip to the next song in a playlist.
Instead of using a hover sensor 175, it is possible to use a touch sensor 170. Instead of detecting a direction as in the case with a hover sensor, the place where the touch sensor is touched is used. For instance, see
In a second embodiment, a single microphone solution is disclosed that is responsive to a user (or other object) not touching the device.
For instance, the user can teach the device acoustic signal and direction pairs 126 and their associated operations 129 so that when the device 100/200 recognizes a certain sound with a certain direction, the device 100/200 may perform an operation. For example if the user snaps his fingers to the right of a device, a music player may skip to the next song in a playlist.
Turning to
In block 510, the mobile device 100/200 performs acoustic signal recognition using an acoustic signal 140 from the microphone 145 to determine an acoustic signal AS 197 or a set of attributes 199. Note that an analog to digital (A/D) converter 505 may be used to convert the analog microphone signal 145 to a digital signal that represents the acoustic signal 140. The acoustic signal 140 may be a sound, vibrations, or both. Acoustic signal recognition could be performed, e.g., using techniques described in, for instance, U.S. Pat. No. 8,195,455 and Heittola et al., “Sound Event Detection in Multisource Environments Using Source Separation”, Workshop on Machine Listening in Multisource Environments (CHiME 2011), Florence, Italy, 2011, pp. 36-40.
Block 510 may include an operation to determine (block 570) an attribute of an acoustic signal 140, such as a sound. For instance, in the Heittola et al. article, they use feature extraction for sounds in audio signals, then use the extracted features for classification. Such features may be attributes of sound that can subsequently be used to compare with attributes from a previously determined set of sounds (or vibrations or both) in order to perform recognition (block 510) of the sound (or vibrations or both). Another possible example of an attribute of an acoustic signal is timbre of the signal. Timbre is an attribute of sensation in terms of which a listener can judge that two sounds having the same loudness and pitch are dissimilar. Timbre may be a single number or multiple numbers for an acoustic signal 140. For instance, in Lopes et al., “Augmenting Touch Interaction through Acoustic Sensing”, ACM (2011), they determine a timbre cue, which is a vector with 11 elements from magnitudes of discrete Fourier transforms (DFTs) applied to corresponding 11 narrow band filters for a signal. They also determine an intensity cue that is a peak amplitude of the same signal analyzed through an envelope follower. Such timbre and intensity cues are also possible attributes of sound that might be used herein.
In block 520, the mobile device 100/200 performs direction analysis for an object using a signal or signals 518 from location sensor(s) 171 such as the touch sensor 170, the hover sensor 175, or both. The direction analysis determines a direction D 198. Typically, the location sensor 171 directly tells the location. That is, a hover sensor 175 on the back of the device could indicate a signal while a hover sensor 175 on the front of the device does not. Similarly, a grid sensor would have a specific output (corresponding to the grid) and thus a location might be immediately determined. However, it could be that the hover sensor 175 on the back of the device has a stronger signal than a hover sensor 175 on the front of the device, and a decision might have to be made that the location is more likely on the back of the device. Similarly, with a grid sensor, there may be a range of strengths for outputs, and a decision could be made through known techniques as to a location for an object.
It is noted there may be some coordination between blocks 510 and 520. This coordination is illustrated by block 517 as “trigger other block”, as one block 510/520 could trigger the other block 520/510. In an exemplary embodiment, for instance, the acoustic signal 140 from the microphone 145 is used to detect the touch or sound command. Once the touch or sound command has been detected, the device checks where the user hands were using, e.g., the hover sensor 175. In this example, at least the microphone should be on all the time. Typically, so would be the hover sensor 175 (or touch sensor 170), but in many cases it is enough to quickly check after the touch has been detected where the user hands are right after the touch. This example is merely illustrative and other coordination may be used, such as having block 520 trigger the block 510 (e.g., to fix a time when a sound command should start).
As a more specific example of block 517, in block 580, the mobile device 100/200 determines a (e.g., an exact) moment the acoustic signal 140 from the microphone exceeds a threshold. Responsive to this determination, the mobile device 100/200 in block 585 performs analyses around the moment the acoustic signal exceeded the threshold. That is, the mobile device causes (in block 590) the acoustic signal recognition (block 510) and determination of one or more attributes of the acoustic signal (block 570) to be performed. Additionally, the mobile device causes (in block 595) direction analysis for an object (block 520) to be performed. Blocks 590 and 595 may occur one after the other or simultaneously, although simultaneously can be beneficial (depending on the system) to reduce latency. The location is detected using touch or hover sensor(s) and the attribute(s), e.g., timbre, of the sound/vibration signal are determined “around” the moment when the threshold was exceeded is compared to the signals in the database. That is, block 530 is also performed, as will block 540. Typically, “around” the moment means starting 100 ms (milliseconds) before the moment and ending 500 ms after the moment. However, it should be noted that the time limits are related to how quickly a user may repeat a gesture. For example, snapping fingers may probably occur at most four times a second. Therefore, for an acoustical signal that is a result of snapping, there is a quarter of a second time window from which the audio signal belongs rather exclusively to a single gesture. For different gestures, the time window may be of a different length. Also, an optimal centering of the time window may depend on the gesture. Thus, the 100 ms before and 500 ms after the moment are merely exemplary and may change depending on the gesture. For instance, an initial “large” window (e.g., 500 ms before and 500 ms after the moment) may be chosen until a gesture (such as “tap” or “snap”) is determined, then the window size may be adjusted to a smaller window corresponding to the gesture. Thus, a window for a snap could be 250 ms wide and centered 75 ms from the moment, while a window for a tap could be 500 ms wide and centered 150 ms from the moment.
The examples provided above use an object of a hand 250 or fingers 251. However, the object could also be a pencil, an eraser, a glove, a stylus, a pen, and the like, e.g., depending on the ability to sense the object by the location sensor(s) 171. For instance, a hover sensor 175 may have a limited number of objects that can be sensed, while the touch sensor could have many additional objects (e.g., any object able to press down a contact).
The acoustic signal and direction database provides input to blocks 510 and 530. In block 530, the mobile device 100/200 compares the recognized acoustic signal 197 and the analyzed direction 198 (e.g., or the set of attributes 199 and the sets of attributes 146) to entries in the acoustic signal and direction database 120. The entries are, in an exemplary embodiment as illustrated in
The examples above used a single microphone 145. However, multiple microphone examples are also possible.
In terms of the embodiments using touch sensors, the following examples are additionally provided.
A) Instead of detecting events and their time only based on the audio, one could detect events based on both audio and touch input.
B) In large displays, for instance, a possible distance is large between the touch location and the microphone, which may be problematic. This is because a) the audio may change during the time the audio travels and b) because the time windows in which the audio attributes occur and touch attributes occur are different. Touch input occurs first then sound waves travel to the microphone, where the audio signal arrives to the system with a delay and then audio attribute(s) is/are determined.
A possible implementation that would effectively comprise both touch detection and audio signal determination is illustrated by
In block 705, the mobile device detects the audio event and outputs the detection time, tA, 706. In block 710, the mobile device detects the audio attributes like peak energy and spectrum as non-limiting examples. The audio attributes 711 are communicated to block 760. In block 760, the mobile device, if an audio event is detected, checks if the database 755 has a matching pair of audio and touch attributes. That is, it is checked which Δi, i=1 . . . N, produces the best match between attribute values in the database and the input attribute values. In equations, the values used could be as follows: tA−Δ1, tA−Δ2, . . . , tA−ΔN. That is, block 760 starts at a detection time (e.g., tA) for the audio event and looks for a touch event in the past (times prior to tA). As such, delays are subtracted from the audio detection time, tA. If Δ1≠0, the value at tA may also be stored. Note that the audio attributes are fixed at tA, but the touch attributes are sampled at the times tA−Δ1, tA−Δ2, . . . , tA−ΔN. The database 755 provides at least an input to blocks 760 and 735 and includes N entries 756-1 through 756-N. Each entry 756 includes a set of audio attributes 146 and a set of touch attributes 721. Each “set” is at least one attribute.
In block 715, the mobile device detects a touch event 715 and outputs a detection time, tT, 716. In block 720, the mobile device detects touch attributes like shape and location as non-limiting examples. The touch attributes 721 are output to blocks 735 and 740. In block 730, the mobile device calculates the time difference ΔD, which is the time difference it takes for audio to arrive from the touch location to the microphone. Typically, ΔD is a distance between touch location and microphone divided by the speed of sound. In block 725, the audio attributes are delayed by ΔD. The audio attributes are calculated in time windows. The time windows start at different times and the windows may overlap (for example 20 ms long windows with 50 percent overlap). The time difference ΔD describes the different start times of the windows. That is, the window(s) used may start at least at the time tT+ΔD. Each window typically has its own set of attributes. In particular, the recognizing the attribute(s) may be performed using a microphone signal captured based on the time difference ΔD.
In block 735, the mobile device, if a touch event is detected, checks if the database 755 has a matching entry (e.g., pair) 756 of audio and touch attributes. Block 735 can entail comparing the one or more attributes of the touch at the touch detection time and audio attributes determined using a window of microphone signal information delayed from the touch detection time by a time difference (e.g., ΔD). It is also possible to recognize the one or more attributes of the acoustic signal using a microphone signal captured using time windows starting at a point based at least in part on the time difference and progressing from this point. For instance, this could be used if the time difference (e.g., ΔD) is not exactly known, such that multiple windows of microphone information can be used. Each of the combination of the attribute(s) of the touch event and the attribute(s) of the audio event at the different windows can be compared with pairs of touch and audio attributes in the database 755.
In blocks 740, the mobile device stores past values with delays Δ1 (block 740-1) through ΔN (block 740-N). That is, the touch attributes 721 are stored at various delays Δ1 (block 740-1) through ΔN (block 740-N). Note that the touch attributes 721 may change over time. For instance, the touch attributes stored at delay Δ1 may be different from the touch attributes stored at delay ΔN. Similar to audio that is analyzed based on time windows, time windows may also be used to determine touch attributes 721, such that the touch attributes 721 are determined per time window and each time window corresponds to a delay. That is, determining the one or more attributes for a touch occurs using the information from the touch sensor at the touch detection time, and for information from the touch sensor at a plurality of delays from the touch detection time to the touch detection time plus the time difference. In equations the values stored would be as follows: tT+Δ1, tT+Δ2, . . . , tT+ΔN. If Δ1≠0, the value at tT may also be stored. Overlap of windows may or may not be used for the windows used to determine touch attributes.
Blocks 735 and 760 output audio and touch attributes and times 706, 716, respectively, to block 765. Block 765 includes combining the events. If the detected event was an audio event, the event attributes 770 and time 771 are output. The event time 771 is tA−Δi. If the event was a touch event, the event time is tT. Note that only one of the touch or audio events might be detected or both could be detected.
A simple explanation for the embodiment in
If a touch event was detected, touch attributes from the moment of the touch event (tT) are compared to audio attributes starting from after the touch event occurred, tA−Δ.
Different attributes can be stored in the database 755 depending on the touch location because audio changes when the audio travels far, i.e. the audio signal level attenuates when audio goes far. This is probably only important when the display size is 32 inches or so, but with table top displays this kind of exemplary embodiment would be significant.
Additionally, sometimes detecting the touch or audio event does not work because the detection typically has a threshold the event must exceed to be detected and sometimes the event does not exceed the threshold. In such case, the mobile device may only detect the audio part or the touch part of the event. However, if for example, the audio part did not exceed the threshold, when the mobile device detects the touch event, the mobile device determines that there must have been an audio event and the mobile device may lower the threshold for the audio event and check the audio again.
Also, there may be events for which both audio and touch exceeded the threshold and events where only audio or touch exceeded the threshold. Combining the events in block 765 means that all three types of events are output similarly because for the rest of the system it does not matter how the event was detected. Instead, the event parameters are output regardless of how the parameters were accumulated. If both the audio and the touch are detected, typically, the audio event time is used and the touch event time is discarded. Even in cases when the audio event has not exceeded the threshold, one may calculate from the touch event time when the audio event should have approximately occurred (e.g., using the time it takes sound to travel from the touch event location to the microphone) and then search for the maximum value in the audio signal within a search window (e.g., typically 200 ms before and after the estimated time). Then, one may take the time of the maximum value within the search window to be the audio event time. The achieved audio event time can then be used in place of the touch event time because the audio event time is more accurate.
The output event attributes 770 can be used to select an operation 129 based on the event attributes 770, e.g., and based on the time 771. See block 785. For instance, if a video was being played, and the operation 129 is to pause the video, the video could be “rewound” and paused at the time 771. In block 790, the mobile device causes the operation to be performed, e.g., by an application, operating system, and the like.
The exemplary embodiment in
Turning to
The examples in entries 756-1 through 756-4 are simple and do not show any time dependency. However, time dependency could be added as is illustrated by entries 756-21, 756-22, 756-26, and 756-27, which correspond to entry 756-2. The time dependency is shown as delays 820: 820-1 of 0 ms (milliseconds); 820-2 of 20 ms; 820-6 of 100 ms; and 820-7 of 120 ms. In this example, the touch attributes 721 and the audio attributes 146 are sampled based on time windows of 20 ms. The peak energy as the audio attribute 146 starts at 41 dB (146-21), increases to 43 dB (146-22) at 20 ms, reaches a highest level of 60 dB (146-27) at 100 ms, and then begins to decrease, becoming 57 dB (136-7) at 120 ms. In this example, the touch attributes 721 do not vary, but this is merely exemplary. For instance, shape of the touch could vary over time. There could also be additional or fewer entries 756 for such a table.
Without in any way limiting the scope, interpretation, or application of the claims appearing below, a technical effect of one or more of the example embodiments disclosed herein is to allow a user to command a mobile device or applications thereon using sounds or vibrations. Another technical effect is to allow a user ability to interact with a mobile device using sounds. Advantages and other technical effects include one or more of the following: exemplary embodiments provide more robust detection in the presence of noisy acceleration; exemplary embodiments provide more robust detection in the presence of noisy background sounds; and the exemplary embodiments provide more flexibility for user interaction.
Additional exemplary embodiments are as follows.
An apparatus, comprising: means for sensing a location and for outputting corresponding one or more signals and means for sensing audio configured to form corresponding audio signals. The apparatus also includes the following: means for determining, using the one or more signals from the means for sensing a location, a direction of at least one object relative to the apparatus; means for recognizing, by the apparatus using a signal from the means for sensing audio in the apparatus, one or more attributes of an acoustic signal made by the at least one object; and means for causing an operation to be performed by the apparatus in response to the direction and the recognized one or more attributes being determined to correspond to the operation.
The apparatus of example 1, wherein the acoustic signal comprises a sound and the means for recognizing the one or more attributes of the acoustic signal further comprises means for recognizing one or more attributes of the sound.
The apparatus of example 1, wherein the acoustic signal comprises vibrations and the means for recognizing the one or more attributes of the acoustic signal comprises means for recognizing one or more attributes of the vibrations.
The apparatus of example 1, wherein the at least one object comprises a finger or hand of a user. Example 5. The apparatus of example 1, wherein the at least one object comprises at least one of a pencil, an eraser, a glove, a stylus, or a pen.
The apparatus of example 1, wherein the means for sensing a location comprises one or more location sensors.
The apparatus of example 1, wherein the means for sensing a location comprises one or more touch sensors.
The apparatus of example 1, wherein the acoustic signal comprises at least one of vibrations and sound generated by contact of the at least one object on the mobile device.
The apparatus of example 1, wherein the acoustic signal comprises sound made by the at least one object, where the at least one object does not contact the mobile device when making the sound.
The apparatus of example 1, wherein the apparatus further comprises: means for comparison of the one or more attributes and the determined direction with pairs of entries in a database, each pair comprising one or more attributes and a direction; means for matching the one or more attributes and the determined direction with a pair in the entries; and means for determination of an operation corresponding to the pair.
The apparatus of example 1, further comprising means for determination of a moment a signal from a microphone exceeds a threshold; means for causing, responsive to the determination of the moment the signal from the microphone exceeds the threshold, performance of the determination of the direction and performance of the recognition of the one or more attributes of the acoustic signal made by the at least one object.
The apparatus of example 11, wherein the means for sensing a location comprises at least one hover sensor and the means for determination of the direction uses information from the at least one hover sensor.
The apparatus of example 11, further comprising means for performing, by starting a first number of milliseconds before the moment and ending a second number of milliseconds after the moment, the determination of the direction and the recognition of the one or more attributes of the acoustic signal made by the at least one object.
The apparatus of example 13, wherein the means for sensing a location comprises at least one hover sensor and the means for determination of the direction uses information from the at least one hover sensor.
The apparatus of example 1, wherein:
the means for determination of the direction of the at least one object relative to the apparatus further comprises means for determination, using information from a touch sensor, one or more attributes of a touch by the at least one object on the apparatus;
the apparatus further comprises: means for comparison of the one or more attributes of the touch and the one or more attributes of the acoustic signal with attributes of touch and attributes of acoustic signals in a database in order to determine a match;
the means for causing an operation to be performed further comprises means for causing the operation to be performed based on the determined match.
The apparatus of example 15, wherein:
the means for determination, using information from a touch sensor, of one or more attributes of a touch by the at least one object on the apparatus is performed responsive to a determination a touch event has been detected, and the means for determination of the one or more attributes for a touch is performed using information from the touch sensor at a touch detection time; and
the means for comparison of the one or more attributes of the touch and the one or more attributes of the acoustic signal further comprises means for comparison of the one or more attributes of the touch at the touch detection time and audio attributes determined by processing microphone signal information delayed from the touch detection time by a time difference with attributes of touch and attributes of acoustic signals in the database in order to determine the match.
The apparatus of example 16, further comprising:
means for calculation of the time difference based on a difference between a location on the apparatus of the touch and a microphone used to recognize the one or more attributes of the acoustic signal;
means for performance of the recognition of the one or more attributes of the acoustic signal using a microphone signal captured using time windows starting at a point based at least in part on the time difference and progression from this point; and
means for performance of the comparison of the one or more attributes of the touch at the touch detection time and the one or more attributed determined with each window with pairs of touch and audio attributes in the database.
The apparatus of example 16, wherein:
the means for determination of the one or more attributes for a touch occurs using the information from the touch sensor at the touch detection time and for information from the touch sensor at a plurality of delays from the touch detection time to at least delays from the touch detection time to the touch detection time plus the time difference.
The apparatus of example 15, wherein:
the means for determination, using information from a touch sensor, of one or more attributes of a touch by the at least one object on the apparatus is performed responsive to a determination an audio event has been detected;
the apparatus further comprises:
means for determination, in response to detecting the audio event, of an audio detection time;
means for comparison of the one or more attributes of the touch and the one or more attributes of the acoustic signal further comprises comparison of attributes of touch determined at a plurality of delays delayed from the audio detection time into past times and attributes of the acoustic signal with attributes of touch and attributes of acoustic signals in the database in order to determine the match.
The apparatus of example 15, wherein:
the apparatus further comprises:
means for determination, in response to a touch event being detected, of a touch detection time;
means for determination, in response to an audio event being detected, of an audio detection time,
wherein only one of a touch event or an audio event is detected and either a touch detection time or an audio detection time is determined; and
the means for causation of the operation to be performed by the apparatus is based on the determined match and based on either the determined touch detection time or the determined audio detection time.
Embodiments of the present invention may be implemented in software, hardware, application logic or a pair of software, hardware and application logic. In an exemplary embodiment, the application logic, software or an instruction set is maintained on any one of various conventional computer-readable media. In the context of this document, a “computer-readable medium” may be any media or means that can contain, store, communicate, propagate or transport the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer, with examples of computers described and depicted. A computer-readable medium may comprise a computer-readable storage medium that may be any media or means that can contain or store the instructions for use by or in connection with an instruction execution system, apparatus, or device, such as a computer. The computer-readable storage medium does not, however, encompass propagating signals.
If desired, the different functions discussed herein may be performed in a different order and/or concurrently with each other. Furthermore, if desired, one or more of the above-described functions may be optional or may be combined.
Although various aspects of the invention are set out in the independent claims, other aspects of the invention comprise other pairs of features from the described embodiments and/or the dependent claims with the features of the independent claims, and not solely the pairs explicitly set out in the claims.
It is also noted herein that while the above describes example embodiments of the invention, these descriptions should not be viewed in a limiting sense. Rather, there are several variations and modifications which may be made without departing from the scope of the present invention as defined in the appended claims.