1. Technical Field
The present disclosure relates to audio recording devices and methods thereof and, particularly, to a voice recording device and a voice recording method.
2. Description of Related Art
Usually, speech in a meeting is received through a microphone, and recorded to an electronic audio file without any indexing to accommodate searching for a specific speaker's recording from many speakers of the recorded speech, which can be inconvenient.
The components of the drawings are not necessarily drawn to scale, the emphasis instead being placed upon clearly illustrating the principles of a voice recording device and a method thereof. Moreover, in the drawings, like reference numerals designate corresponding parts throughout several views.
Referring to
The voice receiving unit 10 receives voice signals. In the embodiment, the voice receiving unit 10 is a microphone.
The storage unit 20 stores a number of voice models and personal information associated with each of the voice models. In the embodiment, the personal information associated with one voice model includes a name, an image, and so on.
The processor 30 includes a voice recording module 310, an extracting module 320, an identifying module 330, a document generating module 340, and a registration module 350.
The voice recording module 310 is configured to record voice signals received by the voice receiving unit 10, and store the received voice signals to the storage unit 20.
The extracting module 320 is configured to extract speaker's voice features from the stored voice signals. In the embodiment, the method to extract speaker's features is Mel-Frequency Cepstral Coefficient (MFCC).
The identifying module 330 is configured to compare the extracted features with the voice models to find a match. The document generating module 340 is configured to obtain the personal information from the storage unit 20 associated with the determined voice model, obtain a storage path of the voice signals, and generate an index document according to the personal information and the storage path of the voice signals, and store the index document to the storage unit 20. The document generating module 340 may be further configured to record duration of receiving a speaker's voice signals, and generate an index document according to the personal information, the duration, and the storage path of the voice signals. The duration may include the beginning time and the end time of receiving a speaker's voice signals. For example, an index document may include “Ann, 9:00-9:10, D:\\Voice Signal.”
If there is no match, the registration module 350 is configured to generate a speaker voice model according to the extracted features, associate input personal information with the generated voice model, and store the generated voice model and the associated personal information to the storage unit. The document generating module 340 then generates an index document as described above. In the embodiment, the method used to generate the voice model is Gaussian Mixture Model (GMM).
Referring to
In step S201, the voice recording module 310 records the voice signals received by the voice receiving unit 10, and stores the recorded voice signals to the storage unit 20.
In step S202, the extracting module 320 extracts speaker's voice features from the voice signals.
In step S203, the identifying module 330 compares the extracted features with the voice models to find a match. If no, the procedure goes to S204. Otherwise, the procedure goes to S205.
In step S204, the registration module 350 generates a speaker voice model according to the extracted features, associates the generated voice model with input personal information, and stores the generated voice model and the associated personal information in the storage unit 20.
In step S205, the document generating module 340 obtains the personal information from the storage unit 20 associated with the determined voice model, obtains the storage path of the voice signals, generates an index document according to the obtained personal information and the obtained storage path of the voice signals, and store the index document to the storage unit 20. The document generating module 340 further records the time of receiving a speaker's voice signals, and generates an index document to store to the storage unit 20 according to the obtained personal information, the obtained storage path of the voice signals, and the recorded duration.
In that way, when searching for specific speaker's recorded voice in recording of many speakers, one only need to look at the index document without and cue playback accordingly rather than play and fast forward through a recording, which saves time.
Although the present disclosure has been specifically described on the basis of the exemplary embodiment thereof, the disclosure is not to be construed as being limited thereto. Various changes or modifications may be made to the embodiment without departing from the scope and spirit of the disclosure.
Number | Date | Country | Kind |
---|---|---|---|
99125821 | Aug 2010 | TW | national |