The present invention relates generally to storage of spoken information for subsequent retrieval.
International Business Machines Corp. (IBM) of Armonk, N.Y. has been at the forefront of new paradigms in business computing. One particular area of development has been in the development of personal assistance devices which serve to aid or supplement a user's memory—for example, cell phones, PDAs (personal digital assistant) and other memory devices. One particular area of development has been the audio recording of speech in such devices. Such improvements have used digital audio recording technology improvements including compression of digital audio recording to improve the storage capacity of a digital recording device by recognizing silence. Recognition of silence enables ignoring this information thus compressing the amount of information to record or otherwise treating it in a manner that decreases the overall size of the audio file. Improvements have been made in recognizing silence distinguishing between background noise and audio that the user desires to have captured. Recognizing silence has also been used to initiate or terminate a recording session.
One major limitation of these prior art devices lies in the inefficiency of retrieving information stored in this manner. Improved storage of audio-recorded information for easier retrieval is desired.
A better understanding of the present invention can be obtained when the following detailed description of the disclosed embodiments is considered in conjunction with the following drawings, in which:
Although described with particular reference to a memory assistance device, the claimed subject matter can be implemented in any electronic system in which it is desired to record speech into more easily accessible formats. Those with skill in the computing arts will recognize that the disclosed embodiments have relevance to a wide variety of computing environments in addition to those described below. In addition, the methods of the disclosed invention can be implemented in software, hardware, or a combination of software and hardware. The hardware portion can be implemented using specialized logic; the software portion can be stored in a memory and executed by a suitable instruction execution system such as a microprocessor, personal computer (PC) or mainframe.
In the context of this document, a “memory” or “recording medium” can be any means that contains, stores, communicates, propagates, or transports the program and/or data for use by or in conjunction with an instruction execution system, apparatus or device. Memory and recording medium can be, but are not limited to, an electronic, magnetic, optical, electromagnetic, infrared or semiconductor system, apparatus or device. Memory and recording medium also includes, but is not limited to, for example the following: a portable computer diskette, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), and a portable compact disk read-only memory or another suitable medium upon which a program and/or data may be stored.
Turning now to the figures,
In most of the embodiments described herein the speaker system 16 is also employed to cue the user as will be described in greater detail below. The speaker system 16 may also be used to alert the user about system status—such as an alert that the memory is full or near full.
The system illustrated in
Typically an extended audio segment 46 is directly associated with a segment name 44. In practice these segment names 44 serve like a table of contents or index for the extended segments 46. By scanning the segment names 44 the user can more readily identify an extended audio segment that contains information that the user desires to retrieve. Systems and methods for populating the extended segments and segment names are described in greater detail in reference to
The unit 12 illustrated in
Meanwhile the trigger detection system 64 continues to assess the information coming into the buffer 62 and the user control interface 68 continues to monitor for input from the user. After the section is done recording either by instruction from the user or firing of a new trigger, then the user is prompted by the user control interface 68 via the control I/O to record a segment name 44. While the segment name is recorded trigger detection 64 is ignored. In some embodiments the segment name is mapped to the extended segment memory 46 that has just been place in a memory location. In other embodiments both the segment name and the extended audio signal are recorded in their respective memory locations after the segment name has been recorded and placed in the temporary memory. However, in any case, it is preferable that the segment name is mapped directly to its corresponding extended audio segment. In some devices the extended memory segments and segment names are stored in the same memory device as illustrated in
If the trigger is identified 96 and the system is already recording 110, then the recording continues to be stored in the temporary memory 80.
Whether or not the trigger is identified the buffer continues to be read 92 and processed 94 by the audio trigger detection routine(s).
While the audio signal is being stored 104 in the temporary memory 80, the system is waiting for the user to reply to the user prompt and confirm whether to continue storing the audio recording. If the user confirms 120 then the recording and storage continues 122 until a stop-input command is entered by the user 124. If a stop-input is entered by the user 124, then the user is prompted to record a segment name 126 and the user name is recorded and stored 128 linked/mapped to the extended audio segment in the system memory. Although not shown in this figure, the preferred embodiment includes a timeout that signals the user to prompt the device if the user wants the system to continue recording information in the temporary buffer after a predetermined time limit. If so, the system begins to store the temp file in memory to make more room in the temp file. In other embodiments the user is prompted to record a segment name and forced to start a new segment if he/she wants to continue recording.
If the user does not prompt the device to proceed with recording 130, and a predetermined period of time passes 132 then the system stops recording and the temporary memory is cleared 134
If there is no begin record command the audio trigger detection program applies a routine for detecting a silence transition in speech 152. Routines for detecting silence transitions are well known in the art. It is preferable to use a routine that accounts for back ground noise in determining such transitions such routines are also well known in the art. See for example U.S. Pat. Nos. 4,130,739; and 6,029,127. If a silence transition is detected a detection significant flag is set 154 to “low.”
Then a detection routine is used to detect if there is a change in speakers 156. Routines for distinguishing between different speakers audio signature(s) are well known in the art. Alternative embodiments do not distinguish between speakers.
If there is a change in speakers 156 and the speaker mentions a number 158 a significance flag is set to high 160. Likewise if there is a change in speakers 154 and the speaker mentions a proper name 162, then a significance flag is set to high 164. Routines for recognizing numbers spoken in a digital audio signal are well known in the art. In alternative embodiments detection trigger significance flag settings may be raised even if there is no change in speaker preceding the mention of a number or proper name. In yet other alternative embodiments more complex triggers can be constructed using Grammar/Syntax parsers such as those described in U.S. Pat. No. 6,665,642.
In the embodiment shown in
Although not shown in
In the embodiment shown in
While the invention has been shown and described with reference to particular embodiments thereof, it will be understood by those skilled in the art that the foregoing and other changes in form and detail may be made therein without departing from the spirit and scope of the invention, including but not limited to additional, less or modified elements and/or additional, less or modified blocks performed in the same or a different order.