For a better understanding of the invention with regard to the embodiments thereof, reference is made to the accompanying drawing, in which like numerals designate corresponding sections or elements throughout, and in which:
The present invention is a recording device implemented to store, in a memory, voice messages received from a user upon detecting a similarity between audio data received from the user and corresponding audio data previously stored in the memory.
The audio data herein refers to speech that is transformed into signals recognizable by a machine.
Note that in accordance with the present invention, the detection of a similarity between audio data received from the user and corresponding audio data previously stored in the memory requires utilizing pattern recognition methods only. Speech recognition is at all not required in the present invention, since there is no need to recognize what has been recorded by the user.
The recording device of the present invention is programmed to create a practically unlimited number of folders, each folder storing a number of corresponding pending voice messages that are received by the user.
A folder in the present invention represents a situation (e.g. where, when, etc.) a user is likely to want to be reminded of for doing things. Each folder is represented by a respective voice tag, i.e. an audio segment that is associated with this folder. The voice tags, stored in the memory in a table of voice tags for example, are preferably significantly different from one another and are identified according to their respective audio content using pattern recognition methods known in the art.
The audio data spoken by the user is defined herein as a “statement”.
The term “prefix of a statement” is used herein to mean the first syllable or syllables of a recorded audio statement (with length shorter than the full statement). The term “suffix of a statement” is used herein to mean the last syllable or syllables of a recorded audio statement (with length shorter than the full statement).
In accordance with a preferred embodiment (see
Preferably but not limited to, the first portion of the statement is a bounding portion, such as the prefix or the suffix of the statement, and the second portion of the statement is a remainder portion, such as the suffix or the prefix of the statement, respectively. As an example, the statement—“Home Center buy 3 new shelves” includes the voice tag “Home Center” at its prefix and the new pending voice message “buy 3 new shelves” at its suffix. In this example, the first portion including the voice tag of a pre-defined folder previously stored in the memory is the prefix of the statement and the second portion including the new pending voice message is the suffix of the statement.
Alternatively, the first portion and/or the second portion of the statement include any portions of the statement, whether this portion is the prefix of the statement, the suffix of the statement or the middle of the statement. As an example, the statement—“when I go to Home Center buy 3 new shelves” includes the voice tag “Home Center” at its middle portion and the new pending voice message may include the whole statement “when I go to Home Center buy 3 new shelves”. In this example, the first portion including the voice tag of a pre-defined folder previously stored in the memory is the middle portion of the statement and the second portion including the new pending voice message is the entire statement itself.
In accordance with another embodiment (see
Note that as new folders are created separately and independently of any pre-defined folder, new pending voice messages are created in the memory in association with a respective folder.
In accordance with one embodiment, a statement including a voice tag and a new pending voice message to-be stored in the memory must be spoken by the user only after at least one “new-folder instruction” is initiated by the user.
In accordance with another embodiment, the recording device of the present invention is implemented with a group of built-in folders, so that a statement including a voice tag and a new pending voice message to-be stored in the memory can be spoken at any time, providing the voice tag represents a folder that is among this group of built-in folders.
Referring to
Preferably, the pending voice messages are stored, in association with the respective voice tags, in chronological order. The term “chronological order” is defined herein to mean that the management technique of the pending voice messages is either one of First In First Out (FIFO) where the order in which the audio data (e.g. pending voice messages) are stored in the memory is the same order in which this data is played by the recorder, Last In Fast Out (LIFO) where the order in which the audio data (e.g. pending voice messages) are stored in the memory is in the opposite order in which this data is played by the recorder, or a combination thereof of these techniques.
The instruction commands are stored in the memory, in a table of valid instruction commands 13 for example. Typical instruction commands include a “list instruction” instructing to play all the pending voice messages stored in association with a respective folder, a “new-folder instruction” instructing to create a new folder in the memory, a “delete instruction” instructing to delete all or some pending voice messages from a respective folder, etc.
For example, a verbal request to play all the pending voice messages of a respective folder can be made by a user via a statement, such as “Supermarket List” or “Grandma List”. In such case, the clause “supermarket” and the clause “grandma” are voice tags of two different folders and the clause “list” is a recognizable voice tag indicating to play all of the pending voice messages previously stored in the respective folders.
A detector 14 applying pattern recognition methods known in the art, as utilized in “Nokia Shorty™” (sold as a prepaid phone by Virgin Mobile Ltd.) for example, is provided for parsing audio data of a received statement into syllables and detecting an approximate similarity between a string of consecutive syllables (e.g. a prefix, a suffix) and a voice tag associated to a folder pre-recorded in memory 12. A well known pattern recognition method, for example, is the K-Nearest-Neighbor (KNN) algorithm, which is a method for classifying objects based on closest training examples in a feature space. The KNN algorithm utilizes new and updated examples of various known patterns in order to refine the decision thresholds between different patterns and improve the detection of future voice tags.
A microphone 16 is provided for receiving statements from a user and a built-in speaker 18 for playing the pending voice messages upon request. An earphone/headphone jack 19 and a USB interface 21 providing a PC link, for example, are also included.
In a preferred embodiment, a Speech Recognition unit 20 is provided for converting the pending voice messages into text and displaying the text upon a display 22. The conversion is applied using speech recognition methods known in the art, such as Dragon Dictate™, available from ScanSoft Inc., London, UK. Optionally, display 22 can be configured as a dual display further displaying the status of folders or remaining memory, for example.
Preferably, the Digital Audio Recorder device 10 of the present invention includes a Press-To-Talk (PTT switch 24 that must be pressed by the user upon recording, thereby preventing accidental recording of audio content.
Referring to
At the initial step 30, a user records a statement that is stored within a buffer of the DAR device. At the next step 32, a subsequent syllable is retrieved from the statement and concatenated with the previously retrieved syllables. The first time this step is applied only the first syllable of the statement is retrieved.
At step 34 it is determined whether the retrieved syllables (e.g. prefix of the statement) match a voice tag of a folder previously programmed to the device. In the affirmative case, the method proceeds to step 40. In the negative case, step 36, it is determined whether all the syllables of the statement are retrieved (i.e. such that the retrieved syllables include the whole statement).
In case not all the syllables are retrieved, the method returns to step 32, thereby retrieving the next syllable of the statement (such that the retrieved syllables include the syllables previously retrieved in earlier stages and the new syllable). However, in case all the syllables are retrieved, an error message is sent to the user (step 38) and the method comes to an end at step 50.
At step 40 it is determined whether the remaining syllables (e.g. suffix of the statement) match a valid instruction command.
In the affirmative case, the instruction command is applied at step 42 (typically with respect to the voice tag received by the user at the prefix of the statement), an acknowledgement message is sent to the user (step 44) and the method comes to an end at step 50. Note that new folders received by a “new-folder instruction” are created separately and independently from any pre-defined folders.
However in case the remaining syllables (e.g. suffix of the statement) do not match a valid instruction command pre-programmed in the device, then the remaining syllables of the statement are stored as a new pending voice message in association with the voice tag (e.g. the prefix of the statement) (step 48), a confirmation signal is sent to the user (step 48) and the method comes to an end at step 50.
Note that a valid statement is defined herein to include a voice tag (at the prefix) followed by a pending voice message or an instruction command (at the suffix).
However, the method of the present invention in accordance with
Referring to
Referring to
Referring to
According to some embodiments described herein above, a valid statement received from a user includes a voice tag followed by a pending voice message (see
It should be noted that the present invention relates to an audio recording device. Preferably, the method of the present invention is implemented within a mobile phone. Furthermore, it can be understood that other implementations are possible within the scope of the invention. Thus the scope of the present invention includes any recording device capable of selectively storing audio data received from a user in response to detecting a similarity with voice tags previously stored in the recording device.
Having described the invention with regard to certain specific embodiments thereof, it is to be understood that the description is not meant as a limitation, since further modifications will now suggest themselves to those skilled in the art, and it is intended to cover such modifications as fall within the scope of the appended claims.
This patent application claims the benefit of U.S. Provisional Patent Application No. 60/803,372 filed Apr. 29, 2006.
Number | Date | Country | |
---|---|---|---|
60803372 | May 2006 | US |