The present application is a US national phase application of International Application No. PCT/CN2021/079004, filed Mar. 4, 2021, which, in turn, claims the right of priority to Chinese Application No. 202010278634.9, filed Apr. 10, 2020, the disclosures of both of which are hereby incorporated by reference herein in their entirety for all purposes.
The present invention relates to a band-pass filter and, more specifically, to a time division interleaving band-pass filter for use in voice activity detection.
In battery-powered internet of things (IoT) systems, less power consumption is a key factor that determines whether a terminal can be used for a longer time. A voice-enabled IoT system first discerns human voice from background noise by means of voice activity detection (VAD) and then activates other high power dissipating modules in the system, thus entailing a continuous standby voice system with low power consumption.
A task of VAD is to determine whether an input sound signal is human voice or background noise. Embedded voice recognition systems all employ pattern matching-based input signal preprocessing that involves feature extraction, a process for extracting, from the waveform of an input signal, one or more sets of parameters that describe the signal's features. Feature extraction is crucial to the success of VAD and often requires using a bank of band-pass filters whose central frequencies form a geometric sequence.
Therefore, traditional VAD is based on digital signal processing, in which an analog signal from a microphone sensor is first converted into its digital representation via a analog-to-digital converter (ADC), and digital band-pass filters are implemented using an algorithm. However, the digital implementation requires the use of the high power-consuming analog-to-digital converter (ADC), and the digital band-pass filters themselves consume much power. On the other hand, as an always-on detector, the VAD is required to provide sufficient classification accuracy with moderate power consumption.
Compared with the traditional VAD, the recent implementation based on analog techniques, as described in “Design of an Always-On Deep Neural Network-Based 1-μW Voice Activity Detector Aided With a Customized Software Model for Analog Feature Extraction”, DOI:10.1109/JSSC.2019.2894360 and “A 1 μW voice activity detector using analog feature extraction and digital deep neural network”, DOI: 10.1109/ISSCC.2018.8310326, provides a high recognition rate with only 1 microwatt (μW) of consumed power and dispenses with the use of a “power hungry” ADC. This technique achieves low power consumption mainly by using 16 parallel analog band-pass filters to directly process the microphone sensor's output analog signal, and employs a super source follower (SSF) architecture-based band-pass filters bank whose central frequencies form a geometric sequence in the range of 100 Hz to 5000 Hz to obtain information about features of the input signal.
Although feature extraction through SSF architecture-based band-pass filtering can realize low power consuming VAD, it suffers from two deficiencies. One of the deficiencies is that each central frequency requires a separate band-pass filter circuit, leading to an increase in chip area and cost. The other deficiency is that, for the band-pass filters having multiple central frequencies within a specified frequency range, it is necessary for the individual central frequencies to be accurate to avoid aliasing. However, in the conventional work, the central frequency of each band-pass filter depends on the transconductance of a respective transistor and the capacitance of a respective metal capacitor therein. This limits the number of band-pass filters that can be implemented and the accuracy of their central frequencies because transconductance matching of the transistors requires accurate current mirror circuits and identical parameters of the transistors, in contrast to the fact that matching of transistor parameters and current mirror circuits varies considerably with manufacturing and operating environment variations.
Since voice-enabled IoT systems like Bluetooth headsets and smart watches are imposing strict requirements on chip area, it is important for VAD implementations to have a reduced chip area. Moreover, feature extraction based on analog circuitry is more susceptible to manufacturing and operating environment variations, when compared to that based on digital circuitry. This becomes worse when forming band-pass filtering channels with multiple different transistor circuits, leading to an insufficiently accurate central frequency of each band-pass filter and unsatisfactory feature extraction accuracy. Consequently, each chip will have a different recognition rate.
In view of the above, in order to achieve a reduced circuit area for multiple band-pass filters and improved central frequency accuracy of the band-pass filters against manufacturing and operating environment variations, the present invention proposes a time-division band-pass filter incorporating analog band-pass filtering channels sharing a common transistor circuit, and the sharing is enabled by activating the channels having different central frequencies in respective different intervals of a given period of time.
In a first aspect, there is proposed a band-pass filter comprising a coupling capacitor, a first transistor, a first filtering channel array, a first current source and a second current source, the coupling capacitor connected to a gate of the first transistor, the first transistor comprising a source connected to an output of the first current source, the first transistor comprising a drain output connected to both the first filtering channel array and the second current source and grounded via the second current source. The band-pass filter further comprises a second transistor and a second filtering channel array, the second transistor comprising a gate input connected to the drain output of the first transistor. A source of the second transistor is grounded, and a drain thereof is connected to both the output of the first current source and an input of the second filtering channel array. Each of the first and second filtering channel arrays comprises multiple filtering channels each comprising a switch and a capacitor. The switch is connected to the capacitor, and the capacitor is grounded. The switch is controlled by a pulse signal of a phase φi, where i is an integer in the range of 0-N.
This design is based on a super source follower (SSF) architecture, which actually utilizes the shunt feedback between the first and second transistors to reduce the transistors' output impedances, thus improving their output voltage following performance. The first and second transistors are equivalent to a converter capable of controlling a current using a voltage, and the control capability is reflected by the transistors' transconductances. The converter converts voltage information into current information, which is then accumulated and stored on capacitors in filtering channels respectively in the first and second filtering channel arrays. It is particularly noted that each filtering channel in the first filtering channel array is paired with a respective filtering channel of the same phase φi in the second filtering channel array to form a band-pass filtering channel. That is, the band-pass filter indeed includes only one common transistor circuit and a series of or multiple band-pass filtering channels having different central frequencies. This results in a reduced chip area. For example, in the phase φ0 of the filter, the first transistor converts an input voltage signal into a corresponding current, which is then accumulated in the form of charge on the capacitor C1,0 in an amount depending on the first transistor's transconductance gm1 and a pulse duration to for the phase φ0. The second transistor operates in the same manner as the first transistor. Within a given period of time T, the pulse durations for the phases φ0, φ1, φ2, . . . , φN are t0, t1, t2, . . . , tN, respectively, and variation of the central frequencies of the N+1 filtering channels contained in the first and second filtering channel arrays depends on the transconductances gm1 and gm2 of the first and second transistors and on the pulse durations ti of the phases φi and the capacitances of the capacitors C1,i and C2,i in the channels, but not on the matching of transistor transconductances and current mirror circuits. Thus, the influence of manufacturing and operating environment variations on the accuracy of the central frequencies is mitigated.
The central frequency fi of the i-th band-pass filtering channel of the N+1 band-pass filtering channels that are consists of the first and second filtering channel arrays is given by:
Additionally, in a predetermined period of time, the multiple band-pass filtering channels may operate in a time division interleaving manner where the band-pass filtering channels having different central frequencies occupy respective different intervals of the predetermined period of time, and for each band-pass filtering channel, the pulse duration ti of the phase φi corresponds to a respective one of the intervals. In this way, the sharing of the common transistor circuit is allowed. In some embodiments, a ring oscillator consisting of three inverters may produce a clock signal with a period of Tvco and provide it to a phase generator, which may then generate the pulse signals φ0-φN whose pulse durations are integral multiples of the ring oscillator's period.
Additionally, the band-pass filter may further comprise a bias circuit including a bias voltage terminal and at least one resistor. The bias circuit may be connected between the coupling capacitor and the first transistor, and an output of the bias circuit may be connected together with the coupling capacitor to the gate of the first transistor.
In particular, a bias voltage may be transmitted through the resistor in the bias circuit to the gate of the first transistor, thus biasing the first transistor to operate in saturation region so that a current from the drain output of the first transistor varies following variation of the input voltage.
Additionally, the predetermined period of time T may also be an integral multiple of the ring oscillator's period Tvco and thus extremely matched with the pulse durations ti of the phases φi.
The present invention offers the following benefits.
1. The multiple band-pass filtering channels require only one transistor circuit of SSF architecture. Thus, it provides the multiple band-pass filtering channels with a reduced area, and the area does not increase proportionally with the number of band-pass filtering channel. As a result, for a particular frequency range, multiple band-pass filtering channels having different central frequencies may be provided in a given chip area. That is, a denser sequence of central frequencies can be achieved, which enables extraction of more feature information from a voice signal.
2. The pulse durations ti of the phases φi are obtained by merging multiples of the period Tvco of the ring oscillator, resulting in extremely high matching of ti and eliminating the influence of manufacturing and operating environment variations. As a result, matching of current mirror circuits and transistor parameters is dispensed with, and improved accuracy of the central frequencies is obtained.
3. The analog band-pass filter can directly process an analog signal without using a power hungry ADC, resulting in reduced power consumption.
The objects, principles, features and advantages of the present invention will become more apparent from the following detailed description of embodiments thereof, which is to be read in connection with the accompanying drawings. It will be appreciated that the particular embodiments disclosed herein are illustrative and not intended to limit the present invention, as also explained somewhere else herein.
It is particularly noted that, for the brevity of illustration, some connections or positional relationships that can be inferred from the text of this specification or the teachings disclosed herein are omitted in the figures, or not all positional changes are depicted. Such positional changes that are not clearly described or illustrated should not be considered as having not taken place. As collectively clarified here, this will not be explained separately in the following detailed description, for the sake of conciseness.
Voice activity detection (VAD) aims to identify and classify an input signal, and feature extraction is a key enabler for this, which is a process for extracting, from the waveform of an input signal, one or more sets of parameters that describe the signal's features. However, feature extraction needs to make trade-offs among low power consumption, a small footprint and a high recognition rate. When using a bank of band-pass filters whose central frequencies form a geometric sequence to extract features from the waveform of an input signal, the recognition rate depends on the accuracy of the band-pass filters' central frequencies.
Referring to
It is to be understood that the first transistor M1 is equivalent to a converter capable of controlling a current using a voltage, and the control capability is reflected by the transistor's transconductance, i.e., a ratio of a resulting change in its output current to a given change in the input voltage. The first transistor M1 converts voltage information into current information, which is accumulated and stored in the form of charge in the capacitor of a corresponding filtering channel in the first filtering channel array. For example, in the phase φ0 of the filter, the current information converted by the first transistor M1 from the input voltage signal is accumulated on the capacitor C1,0, and the amount of accumulated charge depends on the transconductance of the first transistor M1 and a pulse duration to of the phase φ0. Typically, the longer the time the switch is closed, the greater the amount of accumulated charge. Transconductances of the first and second transistors M1, M2 are denoted as gm1 and gm2, respectively. It is to be noted that in a given period of time T, the pulse durations of the phases φ0-φN are denoted as t0-tN, e.g., the pulse duration of the phase φN as tN, and that of phase φ0 as t0. The second transistor M2 operates in the same way as the first transistor M1. Specifically, the voltage signal is output from the first transistor M1 to the gate of the second transistor M2 and is then converted by the second transistor M2 into a current signal, which then passes through the switch in a corresponding filtering channel in the second filtering channel array and is accumulated and stored in the form of charge on the capacitor in the filtering channel.
It is to be understood that the transistor circuit in the super source follower (SSF) architecture of this application utilizes the shunt feedback between the first and second transistors M1, M2 to reduce the transistors' output impedances, thus improving their output voltage following performance and ensuring consistency of the output voltage. It is to be noted that the filtering channel of the phase φi in the first filtering channel array corresponds to the filtering channel of the phase φi in the second filtering channel array. That is, when the first filtering channel array includes N+1 filtering channels and the second filtering channel array includes N+1 filtering channels, N+1 band-pass filtering channels can be formed. For example, the filtering channel of the phase φi in the first filtering channel array and the corresponding filtering channel of the same phase φi in the second filtering channel array may constitute a band-pass filtering channel of the phase φi. In a given period of time T, as the filters in different band-pass filtering channels share the same transistor circuit, the variation of their central frequencies depends on the transconductances gm1 and gm2 of the first and second transistors and on the pulse durations ti of the phases φi and the capacitances of the capacitors C1,i and C2,i in the channels. Thus, the central frequency fi of the i-th band-pass filter in the N+1 band-pass filtering channels is given by:
It is to be understood that the N+1 band-pass filtering channels share the common transistor circuit, more specifically, the common coupling capacitor, bias circuit, first current source, second current source, first transistor and second transistor. Compared to the parallel band-pass filtering channels, each band-pass filtering channel reduces the use of a transistor circuit, a bias circuit and a current mirror circuit in SSF architecture, reducing the number of necessary transistor circuit components and thus leading to a reduced VAD chip area. Additionally, it is to be noted that, for the i-th band-pass filter, instead of depending on only the transconductances of the first and second transistors and the capacitances of the capacitors in the corresponding filtering channels of the first and second filtering channel arrays, its central frequency further depends on the pulse duration ti of the phase φi. The above central frequency expression may be considered as a corresponding modification, which can mitigate the influence of the transistor manufacturing and operating environment variations on the central frequency accuracy of the band-pass filters. Further, as the multiple band-pass filtering channels share a single transistor circuit, more filtering channels can be provided per unit chip area.
Further, in some embodiments, the N+1 band-pass filtering channels may operate in a time division interleaving manner where in a given period of time, each filtering channel in the first filtering channel array is paired with a corresponding filtering channel of the same phase φi in the second filtering channel array to form a band-pass filtering channel. It would be appreciated that the first and second filtering channel arrays totally form N+1 band-pass filtering channels. These band-pass filtering channels occupy respective intervals in a predetermined period of time T, which correspond to the pulse durations ti of the respective phases φ in the embodiment of
Further, referring to
It is worth noting that the boundaries of the various blocks and modules included in the foregoing embodiments have been defined only based on their functional logic, and the present invention is not so limited, as alternate boundaries can be defined as long as the specified functions are appropriately performed. Also, specific names of the various functional components are intended to distinguish between these components rather than limit the scope of the present invention in any way.
The foregoing description presents merely preferred embodiments of the present invention and is not intended to limit the scope of the present invention in any way. Any and all changes, equivalent substitutions, modifications and the like made within the spirit and principles of the present invention are intended to be embraced in the scope thereof.
Number | Date | Country | Kind |
---|---|---|---|
202010278634.9 | Apr 2020 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/CN2021/079004 | 3/4/2021 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2021/203877 | 10/14/2021 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
3526858 | Heinlein et al. | Sep 1970 | A |
5187445 | Jackson | Feb 1993 | A |
9374063 | Mak et al. | Jun 2016 | B1 |
9584164 | Sheikh et al. | Feb 2017 | B1 |
20130322215 | Du et al. | Dec 2013 | A1 |
20180175827 | Chen et al. | Jun 2018 | A1 |
Number | Date | Country |
---|---|---|
1624763 | Jun 2005 | CN |
107147408 | Sep 2017 | CN |
108134592 | Jun 2018 | CN |
111490750 | Aug 2020 | CN |
201824747 | Jul 2018 | TW |
Entry |
---|
PCT International Search Report (with English translation) for corresponding Application No. PCT/CN2021/079004, dated Jun. 7, 2021, 6 pages. |
Croce et al., “A 760 nW, 180 nm CMOS Analog Voice Activity Detection System”, 2020 IEEE Custom Integrated Circuits Conference (CICC), Mar. 22-25, 2020, Virtual, 4 pages. |
Yang et al., “A IμW Voice Activity Detector Using Analog Feature Extraction and Digital Deep Neural Network”, 2018 IEEE International Solid-State Circuits Conference, Feb. 11-15, 2018, San Francisco, CA, 3 pages. |
Yang et al., “Design of an Always-On Deep Neural Network-Based 1-μW Voice Activity Detector Aided With a Customized Software Model for Analog Feature Extraction”, IEEE Journal of Solid-State Circuits, Jun. 2019, Vol, 54, No. 6, pp. 1764-1777. |
Almutairi et al, “Fully-Differential Second-Order Tunable Bandstop Filter Based on Source Follower”, Electronics Letters, vol. 55, No. 3, Feb. 7, 2019, pp. 122-124. |
Supplementary European Search Report for corresponding Application No. EP 21784927, dated Sep. 20, 2022, 11 pages. |
Yang et al, “Design of an Always-On Deep Neural Network-Based 1-μW Voice Activity Detector Aided with a Customized Software Model for Analog Feature Extraction”, Journal of Solid-State Circuits, vol. 54, No. 6, Jun. 2019, 14 pages. |
Chinese Search Report for Corresponding Chinese Application No. 202010278634.9, dated Feb. 11, 2023, 10 pages. |
Number | Date | Country | |
---|---|---|---|
20220294424 A1 | Sep 2022 | US |