The present invention generally relates to systems and methods for signal processing and analysis and more particularly to systems and methods for detecting fundamental frequencies of pseudo-periodic signals in noisy environments.
Voice recognition systems require optimal noise estimation and reduction for distinguishing speech related signal characteristics from noise related signals. Noise can result from environmental sources (such as other speakers, background noises etc.) and/or from the detection system itself (e.g. microphone quality, processing methods and equipment, etc.). Speech detection systems use various methods for distinguishing speech related signals from noise based on audio recording/receiving of speech related acoustic signals (e.g. using an acoustic microphone system for detection of sound).
Two such known methods are Log-Spectral Amplitude (LSA) or optimally modified LSA (OMLSA). LSA estimators minimize the mean square error of the log spectra, based on Gaussian statistical models (see “Speech Enhancement for Non-Stationary Noise Environments”, Israel Cohen and Baruch Berdugo, Signal Processing, vol. 81, pp. 2403-2418, November 2001, referred to hereinafter as Cohen 1, which is incorporated by reference in its entirety to this application). OMLSA is based on the time-frequency distribution of signal-to-noise ratio (SNR) of the detected audio signal.
The minimal Controlled Recursive Averaging (MCRA) noise estimation approach is a method for noise estimation used for speech enhancement or detection, which combines minimum tracking with recursive averaging, such as described in Cohen 1, page 2405. This algorithm uses probability functions for estimating the speech and for controlling adaptation of the noise spectrum by determining the ratio between the local energy of the noisy signal and its minimum within a specified time window. An improved MCRA (IMCRA) is also described in another paper by Israel Cohen (see “Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging”, Israel Cohen,: IEEE Trans. Speech Audio Processing, vol. 11, no. 5, pp. 466-475, September 2003 referred to hereinafter as Cohen 2, which is incorporated by reference in its entirety to this application). “The IMCRA involves averaging past spectral power values, using a time-varying frequency-dependent smoothing parameter that is adjusted by the signal presence probability.” (see Cohen 2, abstract).
The present invention, according to some embodiments thereof, provides method and system for tracking fundamental frequencies of pseudo-periodic signals in the presence of noise.
According to some embodiments of the present invention, there is provided a method of tracking fundamental frequencies of pseudo-periodic signals in the presence of noise. The method includes receiving a time-frequency representation of signals measured in a predefined environment; estimating and tracking a fundamental frequency of a respective pseudo-periodic signal at each time frame of the time-frequency representation by tracking detections of harmonious frequencies in the time-frequency representation over time; and outputting each respective estimated fundamental frequency associated with the pseudo-periodic signal of each respective time frame.
According to some aspects of the present invention, the tracking of detections of fundamental frequencies is a recursive process done in real time or in near real time on a frame-by-frame basis wherein a respective fundamental frequency is tracked and identified in each time frame of the time-frequency representation.
Optionally, the estimation and tracking of the fundamental frequency of each respective time frame includes: identifying harmonious frequencies in each time frame of the time-frequency representation; checking correlations between each identified harmonious frequency and harmonious frequencies identified in preceding time frames; allocating a new tracker to each respective identified uncorrelated harmonious frequency; updating information relating to each tracker including number of identified correlations associated with each tracker; and determining the fundamental frequency of the respective time frame by selecting one of these trackers, according to predefined rules associated with accumulated information of the trackers, including the number of correlations associated with each tracker.
Optionally, updating of the information comprises updating predefined fields of the trackers, said fields include at least one of: signal power field, indicative of the average signal intensity of each tracker; detections field, indicative of the number of times the associated tracker has been detected, which is indicative of the correlations number of the respective tracker; frequency value field, indicative of the average value of the frequency associated with each respective tracker; frames field, each is an array field associated with each respective said tracker that has been identified as a fundamental frequency, wherein each component in the array is indicative of the time frame number in which the fundamental frequency tracker has been tracked; and/or last update field, indicative of the last time frame number of the respective tracker, in which the respective tracker has been tracked.
According to some embodiments, each detected fundamental frequency of the respective time frame is determined by selecting a tracker that has an optimal combination of signal power, using the signal power field, and number of detections, using the detections field, in respect to a duration level of the respective tracker calculated according to said frames field of each respective tracker, where the duration level is indicative of the number of successive detections of said respective tracker.
The method may optionally further include identifying a durable fundamental frequency (DFF) out of the trackers, using the duration level, and operating a reduced estimation and tracking procedure upon identification of the DFF, for tracking only the identified DFF.
The identification of a respective DFF may optionally be carried out by checking whether the number of detections of each tracker, using its respective detections field, exceeds a predefined threshold number, indicating the continuous fundamental frequency tracker and rejecting all other trackers, where the reduced tracking procedure comprises identifying new harmonious frequencies in the respective current time-frame and checking their correlation with the continuous fundamental frequency, wherein correlated detections are used for updating the fields associated with the respective DFF. The reduced tracking procedure may be terminated upon identifying discontinuity of the continuous fundamental frequency, using the associated fields, where the termination allows reverting to previous procedure.
According to some embodiments, the method further includes: receiving a detected signal input in real time or near real time; and operating a signal transformation, such as a short-time Fourier transform (STFT) transformation, over the received signal input, in real time, where the transformation enables transforming the respective signal representation into the respective time-frequency representation.
Noise Spectrum Evaluation and/or peak detection may further be implemented, in real time or in near real time over the time-frequency representation. The Noise Spectrum Evaluation may include evaluation techniques based on minima controlled recursive averaging (MCRA) or improved MCRA.
According to some embodiments, the trackers may be updated before determining a respective fundamental frequency of the respective time frame, wherein the updating of the trackers includes at least one of: checking for trackers that are harmonious to one another, according to predefined rules, using the frequency value field, and merging such identified harmonious trackers; checking for trackers that have secondary correlations with one another, according to predefined rules, using the frequency value field, and merging such identified correlated trackers; and/or identifying outdated trackers, using last update field, and discarding all trackers that are identified as outdated.
Optionally, the pseudo-periodic signal is an acoustic signal indicative of human speech measured in the noisy environment, wherein the acoustic signal is acquired by using at least one signal measurement system. The fundamental frequency identification and associated information thereof with each time frame may be used for enhancing speech detection of the acoustic signal, by indicating the pitch of the detected speech in each respective time frame, wherein the respective pitch is proportional to the fundamental frequency of the respective time frame.
The signal measurement system may include at least one optical or acoustic device enabling to optically or acoustically measure and represent said acoustic signals in said noisy environment. For example, the signal measurement system may include at least one optical microphone, which is based on optical vibrometry detection of sound.
According to some embodiments of the present invention there is provided a system for tracking fundamental frequencies of pseudo-periodic signals in the presence of noise. The system includes: a signal measurement system for measuring pseudo-periodic signals in a predefined environment; at least one processing unit, which receives measured pseudo-periodic signals in real time or near real time from the signal measurement system, processes the signal for obtaining a time-frequency representation thereof in real time or near real time and recursively estimates and tracks a respective fundamental frequency of each respective pseudo-periodic signal at each time frame of said time-frequency representation by tracking detections of harmonious frequencies in said time-frequency representation over time. The processing unit can output the respective estimated fundamental frequency associated with the pseudo-periodic signal of the respective time frame.
Optionally, the signal measurement system comprises an optical measurement system for optically detecting the pseudo-periodic signals in the environment. The optical measurement system may include an optical microphone enabling vibrometry-based detection of acoustic signals including speech related signals, where the optical microphone is located in proximity to vibrating surfaces of a respective speaker.
According to some embodiments of the present invention, the system is operatively associated with at least one audio system enabling to additionally acoustically measure the acoustic signals in the environment, wherein fundamental frequencies estimated by using respective optically measured signals are used to improve corresponding detection of acoustic signals carried and outputted by the acoustic system, for voice activity detection (VAD) or any other purpose.
According to some embodiments, the estimation and tracking of the fundamental frequency of each respective time frame is carried out by: identifying harmonious frequencies in each time frame of the time-frequency representation; checking correlations between each identified harmonious frequency and harmonious frequencies identified in preceding time frames; allocating a new tracker to each respective identified uncorrelated harmonious frequency; updating information relating to each tracker including number of identified correlations associated with each tracker; and determining said fundamental frequency of the respective time frame by selecting a tracker according to accumulated information including the number of correlations associated therewith.
The system may include designated one or more modules such as a fundamental frequency detection module for detecting and tracking the fundamental frequencies and outputting thereof, where the fundamental frequency detection module is a software application operated by the processing unit.
In the following detailed description of various embodiments, reference is made to the accompanying drawings that form a part thereof, and in which are shown by way of illustration specific embodiments in which the invention may be practiced. It is understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the present invention.
The present invention, in some embodiments thereof, provides methods and systems for robust estimation and tracking of fundamental frequencies of pseudo-periodic signals in non-stationary noisy environments. The methods and systems enable receiving signals measured in a noisy environment and/or time-frequency representation of those measured signals and processing these signals to identify at each given time frame the respective fundamental frequency of the pseudo-periodic signal within the measured (noisy) corresponding signal, thereby reduce and “clean out” noises that are unrelated to the pseudo-periodic signal and identifying the fundamental frequency thereof. The pseudo-periodic signal (e.g. a speech related acoustic signal) is measured by one or more signal measurement systems such as one or more acoustic and/or optical microphones along with noises of various types and behavior depending on the type of the pseudo periodic signal, the measurement system and the environmental noises and effects. The noise can originate from external environmental sources such as other sound sources and/or may be created by the detection devices.
According to some embodiments of the present invention, the measured signals are analyzed and/or processed by the estimation and tracking system for recursively estimating and tracking a fundamental frequency of the respective pseudo-periodic signal at each time frame. Each respective fundamental signal is identified by tracking detections of harmonious frequencies in a time-frequency representation of the measured signal, over time, outputting an estimated fundamental frequency associated with the pseudo-periodic signal of the respective time frame. Each of the tracked fundamental frequency and/or any other associated information may be automatically stored in one or more memory units (e.g. computer data storage) for allowing later utilization of this information for example, for speech enhancement in a case of acquiring of acoustic signal associated with speech, or for any other usage or purpose.
This process is recursive and carried out on a frame-by-frame basis, allowing accumulated information regarding the tracked fundamental frequency and other detected harmonious frequencies of preceding time-frames, to be used for deciding the fundamental frequency of each current given time frame allowing refining and correcting the frequency value of the fundamental frequency over time.
These methods and systems are particularly yet not exclusively efficient for speech detection/enhancement and/or voice activity detection (VAD) that can be used for various purposes such as for speech recognition, speech parts recognition (e.g. identification of beginning and ending of each word or phoneme of speech), speaker identification (e.g. by identifying typical speech pitch frequency of each speaker) as well as for noise reduction.
The term “pseudo-periodic signal” refers to any signal that shows cyclic patterns that can be represented by pseudo-periodic functions, such as, for example, speech and/or music related acoustic signals.
The term “fundamental frequency” is defined as the lowest frequency of a periodic and/or pseudo-periodic waveform.
The term “harmonious frequencies”, “harmonies” or “harmonics” each refers to all frequencies that are multiplications of the same fundamental frequency.
According to some embodiments of the present invention, the estimation of the fundamental frequency of each time frame includes identifying harmonious frequencies in each time frame of a time-frequency representation of the measured signal; checking correlations between each identified harmonious frequency and harmonious frequencies identified in preceding time frames (using past detected and tracked frequencies); allocating a new tracker to each respective identified uncorrelated harmonious frequency; updating information relating to each tracker including number of identified correlations associated with each tracker; and determining the fundamental frequency of the respective time frame, according to predefined conditions and rules such as, for instance by selecting a tracker of a frequency that exceeds a predefined threshold intensity value that has the maximal substantially number of consecutive correlations up to the respective time frame.
In this way, a previously detected fundamental frequency and other candidate such fundamental frequencies are tracked over time in real time or in near real time. This tracking can be used to various purposes, depending, inter alia, on the type of pseudo-periodic signal (speech related acoustic signal, optical signal, digital signal etc.) and system requirements.
For example, for processing of acoustic signals acquired in a noisy environment for detection/enhancement of human speech of a single speaker, the methods and systems described in this document can assist in noise reduction as well as for speech recognition, VAD and/or speaker identification. In this example, the fundamental frequency of speech is defined as a pitch. The pitch detection can enhance speaker identification by identification of current typical pitch of the relevant speaker as well as speech recognition by identification of speech related pitches (e.g. speech related typical frequencies) and also recognition of speech segments (e.g. beginnings and endings of words, syllables, phonemes and the like) since tracking speech related frequencies can indicate where there are no such frequencies detected over time signifying no-speech and therefore the end of a speech segment.
According to some embodiments of the present invention, there is provided a software application, which carries out most or all of the steps of the method for detection and tracking of the fundamental frequencies. This application can receive signals measured in the non-stationary noisy environment from a signal measurement system, create a time-frequency representation of those signals, e.g. by using one or more mathematical transformation operators (such as one or more Fourier Transform operators) and use this time-frequency representation for detecting and tracking the fundamental frequency of the pseudo-periodic signal associated with the measured signal at each time-frame. The application is designed, in some embodiments of the present invention, to work frame-by-frame, where for each time frame the fundamental frequency is detected while keeping recordation of information relating to past and present tracked candidate and/or identified fundamental frequencies in a recursive manner, allowing continuous tracking of those identified frequencies by using accumulated information relating thereto.
According to some embodiments of the present invention, the signal detection system includes an optical and/or an acoustic detector such as an optical and/or acoustic microphone enabling detecting acoustic signals including a speaker's voice signals. According to some embodiments, the optical microphone enables vibrometry-based detection of speech related vibrations of the speaker, where an optical sensor is placed in proximity to vibrating surfaces of the speaker. The optical/acoustic signal (the optical output representation of the detected acoustic signal is illustrated in
The application is optionally operated by a processor (e.g. a computerized system such as a server computer, a PC, a laptop or any other processor system or device known in the art). The processor may be separated from the signal measurement system and connect thereto for receiving the detected signal in real time through one or more communication links and/or devices (e.g. through a digital wiring or wireless connection). Data is transmitted from the signal measurement system to the processor in real time or near real time, allowing the application or another transformation module (e.g. by using an on-chip transformation Fourier transform operators) to convert this signal data into a corresponding time-frequency representation thereof (correspondently in real time or near real time). The application may output the resulting estimated fundamental frequency and information associated thereto also in real time/near real time. The output data may then be stored and/or further processed depending on system definitions and requirements.
Reference is now made to
A time-frequency representation of signals detected 101 in the environment in real time or in near real time is received or created by the application on a frame-by frame basis. The received time-frequency representation is used for recursively estimating and tracking a fundamental frequency of a respective pseudo-periodic signal at each time frame of the time-frequency representation 102 by tracking detections of harmonious frequencies in said time-frequency representation over time. The estimated respective fundamental frequency of each respective time frame is outputted by the application 103, optionally along with information relating thereto such as its estimated value, error/probability rate or grade, and the like. The outputted fundamental frequency and optionally its related information can be stored and/or used for other algorithms/processes.
For example, in case of using this process for noise reduction of acoustic signals, the fundamental frequencies may be used in real time for noise reduction and outputting of a clearer noise-reduced acoustic signal of the speaker, using output audio devices and systems such as audio speakers. Alternatively or additionally, the output fundamental frequencies may be used for VAD purposes, speech and/or speech segments recognition as will be further explained in this document.
Reference is now made to
The process includes receiving data indicative of an acoustic signal including a speaker voice related signal (which is the pseudo-periodic signal that is to be identified) of a speaker from a signal measurement system 11. The acoustic signal may be optically acquired, using, for example, an optical vibrometer laser system, which includes an optical laser-based sensor located in the speaker's area. Additionally or alternatively, the acoustic signal is acoustically measured using an audio receiver such as a microphone for measuring sounds from the environment including voice of the speaker and transmitting measured sound into electric/digital signals.
The signal data may include the signal intensity or intensity related value for the respective time frame as acquired in real time by the signal measurement system (which may be for instance an optical microphone such as illustrated in
Optionally, the time-frequency signal representation associated with each time frame “t1”, where “1” is the frames index, is filtered for initial noise reduction 13 by using one or more “filter operators”, which may be software-based operators.
According to some embodiments, noise spectrum evaluation may be operated for evaluating the noise level of each frequency value of each time frame and thereby excluding frequency measures that are identified as “noise” in the time-frequency representation. For example, if using optically acquired signals, the SNR value of the optical signal may be compared to an evaluated corresponding SNR value thereof e.g. using subtraction of these values, and excluding the frequency measure if the difference between these values does not exceed a predefined threshold. Known noise spectrum evaluation processes and algorithms may be used such as MRCA or IMRCA, for instance, to calculate each evaluated SNR value.
Additionally or alternatively the time-frequency representation for each time frame is further noise-reduced by using noise detection. The noise detection includes detecting frequency peaks of each time frame, thereby excluding non-peak values from the time-frequency representation of each time frame.
According to some embodiments of the present invention, in each time frame, the process enables identifying harmonious frequencies 14 by, for example, searching for frequencies that are multiplications of one another—where one is a multiplication of the other by an integer number: f1i=I×f1j, where “i” and “j” represent a different frequency measure of the same time frame “1” and where I is an integer number. For example, if in a time frame “1” one frequency measure is 151 Hz and another is 300 Hz the algorithm divides the higher one by another and checks how close the ratio is to an integer number (in this example: 300:151=1.99) according to a predefined threshold to decide whether these two frequencies are harmonious to one another. If the time frame is the first time frame as illustrated in decision box 15, each harmonious frequency of the lowest frequency-value is allocated with a tracker 16 and considered as a “candidate fundamental frequency”. Non-harmonious frequencies are untracked.
According to some embodiments, each tracker is associated with one or more fields such as: (i) an intensity value related therewith (e.g. the SNR values of all harmonious frequencies of the tracker may be taken from the measured or filtered time-frequency representation of the respective time frame and averaged); (ii) a frequency value (e.g. the frequency values of all harmonious frequencies of the tracker may be taken from the measured or filtered time-frequency representation of the respective time frame and averaged); (iii) detection number (“N-detect”) indicative of the number of times the respective tracker has been detected (the number of frames including the respective harmonious frequency); (iv) last update frame, indicative of the last time frame “1” where the respective tracker has been identified and updated. These fields may be updated with every iteration as indicated in box 19.
If 1>1, correlations between previously tracked harmonious frequencies and currently identified harmonious frequencies are checked 17. For example, the difference between the frequency value of each currently identified harmonious frequency of time frame “1” and past identified and tracked harmonious frequencies (referred to hereinafter also as “trackers”) may be calculated and once the difference is below a predefined threshold the two are considered “correlated”. The currently identified harmonious frequencies for which no correlated tracker was identified will be allocated with new trackers 18, while the ones who are correlated will be used to update fields of their respective correlated trackers 19. The SNR and frequency values will be averaged in respect to its previous value and the average value of the harmonies associated with the corresponding newly identified harmonious frequency, the N-detect will be increased by one and the update frames will be changed to the current value of “1”.
According to some embodiments of the present invention as illustrated in box 21, in each iteration a single fundamental frequency “f01” is estimated and determined, according to predefined one or more conditions. For example, the fundamental frequency will be the tracker with an SNR value that exceeds a predefined minimum threshold and that has the highest number of detections—mainly the tracker with the highest N-detect value, where its detections are determined as consecutive according to predefined rules. For example, another field “f0 frames” indicative of the consecutiveness of the respective tracker detection is added and should be updated at each frame after a fundamental frequency f0 is determined (also included in operations of box 19). For example, the f0 frames field may be an array, where the number of array-components is equivalent to the number of times the respective corresponding tracker was identified (estimated) as a fundamental frequency. For each such identified fundamental frequency the number in each component of the array is indicative of the respective time frame “1” in which the respective tracker was identified as a fundamental frequency. This can be used to track the consecutiveness level of the fundamental frequency for determining whether a tracker exceeding the SNR threshold that has the maximal N-detect number can be a valid fundamental frequency. The f0 frames array will be empty for trackers that were not yet identified as a fundamental frequency.
To illustrate the process of selecting a fundamental frequency of each time frame indicated in box 21, let us use table 60 in
According to some embodiments, the consecutively level may be determined by checking the gap between the current iteration “1” and the last updated iteration of the f0 frames array—mainly subtracting the last iteration indicated in the last component of the f0 frames array from “1”.
According to some embodiments, with each iteration, the f0 frames field is updated once the fundamental frequency of the respective time frame “1” is determined 22.
According to some embodiments of the present invention another process of updating the trackers may be carried out by the algorithm 20 after updating the trackers' fields. This process may include any one or more of the following exemplary steps: (1) checking for trackers which are harmonious to one another (e.g. by checking if the frequency value of each tracker is a multiplication of another tracker), in which case the two harmonious trackers may be merged into a single tracker, updating all its respective fields correspondently; (2) checking for “second degree correlations” between trackers, where the difference between the frequency values of each pair of trackers is checked to see if they can be considered correlated—in this operation the predefined threshold difference may be calculated according to the frequency values of all trackers; and/or (3) checking for outdated trackers according to the update tracker field indicative of the last time the respective tracker was updated (meaning detected).
The process of checking for secondary correlations, as mentioned above, may include calculating a threshold, in each iteration, in respect to the frequency values of all trackers. This means that if the trackers are all within a narrow frequency band (meaning that the difference between the highest frequency and the lowest one is small) the threshold will consequentially be low and vice versa—if the frequency band is wide—the threshold will be higher. For example, the threshold frequency value for identifying secondary correlations may be set to a predefined percentage rate of the frequency band (e.g. 30% of the band-width).
According to some embodiments of the present invention, outdated trackers are eliminated and untracked in future iterations. In this way only relevant frequencies are tracked saving time and complexity level of the process. To identify outdated trackers a predefined iterations threshold value Δ1 (e.g. 4 iterations) may be set where if the difference between the current iteration number or time frame “1” and the last update frame number exceeds the predefined threshold Δ1, the tracker is defined as “outdated”.
According to some embodiments of the present invention, as illustrated in
The frequency value and optional SNR value can be used for further analysis of the detected signal, e.g. for VAD purposes and/or for detection of speech segments in real time or near real time. The process illustrated in
According to some embodiments of the present invention, as indicated in boxes 25-27 the algorithm checks a durability factor of the fundamental frequency of the respective time frame, for example, by having an N-detect value that exceeds a predefined threshold Δ2 (e.g. D2=30), the respective fundamental frequency is considered a “durable fundamental frequency” (DFF). Once identifying such DFF 25, all other trackers (that are not associated with the DFF) a rejected 26 and a different predefined reduced detection process is initiated 27. This reduced process is used to reduce time and complexity of the algorithm by assuming (especially when referring to voice detection utilization of the method) that if a fundamental frequency is continuous it is probably related to the pseudo-periodic signal that we wish to detect (e.g. pitch frequency characterizing a speaker and the respective word/syllable/phoneme) and therefore that the other trackers are associated with irrelevant sources (noise). If no DFF is identified, the process recursively repeats steps 13-25.
One embodiment of the reduced tracking process is schematically illustrated in
The last calculated average value of the fundamental frequency DFF is outputted 32, optionally along with information associated therewith, taken from its corresponding one or more fields. In the next step, a continuity level of the DFF is checked 33, mainly to see if the current DFF is still durable or another fundamental frequency should be estimated and tracked. The continuity level checking may include, for example, subtracting the current “1” value from the last updated value in the update frame of the DFF and determining that the DFF tracker is no longer “valid” if this difference exceeds a predefined threshold number (e.g. above 3 iterations during which the fields were not updated). If the DFF is valid (see decision box 34), and if “1” is not final (see decision box 35) the reduced process is recursively repeated. If the DFF is found to be invalid (see decision box 34) and “1” is not final, the algorithm reverts back to the unreduced process described in
Reference is now made to
The optical signal 91 outputted by the optical microphone 100, schematically illustrated in
According to some embodiments of the present invention, the environment 70 includes the speaker 55 as the sound source that is to be measured and at least one noisy source such as another speaker 56, background noises and other noises that are all picked by the optical microphone 100. Optical vibrometry-based microphones are substantially immune to background and other speakers' noises inter alia due to the fact that they are located near the vibrating surfaces of the relevant speaker and since they optically detect these vibrations. Optical microphones typically have low-pass filter, which means that it can be “blind” to the lower frequencies and therefore it may be recommended to use a combination of audio and optical microphones systems in the case of detection of speech related fundamental frequencies.
Audio microphones even when positioned close to the speaker's mouth are more likely to output acoustic signals that are much noisier than the optically acquired signals. In this example, using optical devices for sound detection, the optical signal alone can be used for the detection of pitches in real/near real time for further processing of the speech related pseudo-periodic signal and/or of the outputted pitches for reducing noise and improving analysis of acoustically acquired corresponding signals for many one or more purposes, as discussed above, such as VAD, speech detection or enhancement, speech segments' detection or simply for reducing noise of parallel acoustically acquired signals.
For example, another acoustic receiver such as an acoustic microphone 300 may be used where both the optical and acoustic microphones 100 and 300, respectively, measure the same acoustic signals in the same environment 70 simultaneously, where the optical signal is used for pitch detection in real time for real time improving analysis of the acoustic signal outputted by the acoustic microphone 300. A second signal processing unit 600 or the same first signal processing unit 200 may receive the output pitch frequency in real time from the fundamental frequency detection module 210 and the acoustic signal data from the acoustic microphone 300 and combine them to perform any one or more analysis techniques for any one or more purposes, using for example a designated speech detection module 610 for speech detection (e.g. VAD) taking the identified fundamental frequencies from the optically based pitch detection system 200 and the acquired respective acoustic signal.
For example, the pitch frequency outputted in real/near real time by the fundamental frequency detection module 210 may be used to identify the pitches of the measured optical signals and optionally allow storing them in predefined data storage 201. The identified pitches may be used to perform VAD over the acoustically acquired signal, where the characterizing pitches of the speaker's speech help identifying which parts of the signal over time is associated with the speaker's voice and which can be defined as “noise” indicating when the speaker speaks.
Another additional or alternative utilization of the pitch detection is to identify speech segments (e.g. identifying beginnings and endings of speech parts such as words, syllable, or phonemes) to enhance processes for identification of the actual content of detected speech related sound. This can be done, for example, by using the pitch detection for identifying endings and beginnings of speech parts whenever a dominant durable fundamental frequency (DFF) begins and ends as illustrated in
Reference is now made to
After processing these signals using the above described tracking of fundamental frequencies method, as illustrated in
According to some embodiments of the invention, the application enabling to detect and track fundamental frequencies of pseudo-periodic signals as described above can be operated by any number of processing units through one or more computerized systems.
The application can be adapted to receive a frame-by-frame input detected signals and/or to receive an entire stored detection of signals over time and recursively process the detection data on a frame-by-frame basis.
According to some embodiments of the present invention, the identification of fundamental frequencies method and/or system can be used for enhancing LSA or OMLSA speech detection applications/operators by providing the fundamental frequency of the respective frames. The respective fundamental frequency of each time-frame, estimated by the application (e.g. by the fundamental frequency detection module 210), may be fed as an input parameter of the LSA/OMLSA operator, where the operator may require a few modifications for allowing improving its speech detection abilities by using the input from the fundamental frequency detection module 210.
Many alterations and modifications may be made by those having ordinary skill in the art without departing from the spirit and scope of the invention. Therefore, it must be understood that the illustrated embodiment has been set forth only for the purposes of example and that it should not be taken as limiting the invention as defined by the following invention and its various embodiments and/or by the following claims. For example, notwithstanding the fact that the elements of a claim are set forth below in a certain combination, it must be expressly understood that the invention includes other combinations of fewer, more or different elements, which are disclosed in above even when not initially claimed in such combinations. A teaching that two elements are combined in a claimed combination is further to be understood as also allowing for a claimed combination in which the two elements are not combined with each other, but may be used alone or combined in other combinations. The excision of any disclosed element of the invention is explicitly contemplated as within the scope of the invention.
The words used in this specification to describe the invention and its various embodiments are to be understood not only in the sense of their commonly defined meanings, but to include by special definition in this specification structure, material or acts beyond the scope of the commonly defined meanings. Thus if an element can be understood in the context of this specification as including more than one meaning, then its use in a claim must be understood as being generic to all possible meanings supported by the specification and by the word itself.
The definitions of the words or elements of the following claims are, therefore, defined in this specification to include not only the combination of elements which are literally set forth, but all equivalent structure, material or acts for performing substantially the same function in substantially the same way to obtain substantially the same result. In this sense it is therefore contemplated that an equivalent substitution of two or more elements may be made for any one of the elements in the claims below or that a single element may be substituted for two or more elements in a claim. Although elements may be described above as acting in certain combinations and even initially claimed as such, it is to be expressly understood that one or more elements from a claimed combination can in some cases be excised from the combination and that the claimed combination may be directed to a sub-combination or variation of a sub-combination.
Insubstantial changes from the claimed subject matter as viewed by a person with ordinary skill in the art, now known or later devised, are expressly contemplated as being equivalently within the scope of the claims. Therefore, obvious substitutions now or later known to one with ordinary skill in the art are defined to be within the scope of the defined elements.
The claims are thus to be understood to include what is specifically illustrated and described above, what is conceptually equivalent, what can be obviously substituted and also what essentially incorporates the essential idea of the invention.
Although the invention has been described in detail, nevertheless changes and modifications, which do not depart from the teachings of the present invention, will be evident to those skilled in the art. Such changes and modifications are deemed to come within the purview of the present invention and the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
6587816 | Chazan et al. | Jul 2003 | B1 |
7124075 | Terez | Oct 2006 | B2 |
7266493 | Su et al. | Sep 2007 | B2 |
7353167 | Puterbaugh et al. | Apr 2008 | B2 |
7593847 | Oh | Sep 2009 | B2 |
20040225493 | Jung et al. | Nov 2004 | A1 |
20100070270 | Gao | Mar 2010 | A1 |
20100280826 | Bakish | Nov 2010 | A1 |
20110191101 | Uhle et al. | Aug 2011 | A1 |
Entry |
---|
Israel Cohen et al; “Speech enhancement for non-stationarynoise environments”, Signal Processing, vol. 81, pp. 2403-2418, (2001). |
Israel Cohen; “Noise Spectrum Estimation in Adverse Environments: Improved Minima Controlled Recursive Averaging”, IEEE Transactions on Speech and Audio Processing, vol. 11, No. 5, pp. 466-475, (2003). |
Number | Date | Country | |
---|---|---|---|
20130246062 A1 | Sep 2013 | US |