The present invention relates to a speech separating apparatus, a speech synthesizing apparatus, and a voice quality conversion apparatus that separate an input speech signal into voicing source information and vocal tract information.
In recent years, the development of speech synthesis techniques has enabled generation of very high-quality synthesized speech.
However, the conventional use of such synthesized speech is still centered on uniform purposes, such as reading off news texts in announcer style.
Meanwhile, speech having distinctive features (synthesized speech highly representing personal speech or synthesized speech having a distinct prosody and voice quality, such as the speech style of a high-school girl or speech with a distinct intonation of the Kansai region in Japan) has started to be distributed as a kind of content. Thus, in pursuit of further amusement in interpersonal communication, a demand for creating distinct speech to be heard by the other party is expected to grow.
Meanwhile, the method for speech synthesis is classified into two major methods. The first method is a waveform concatenation speech synthesis method in which appropriate speech elements are selected, so as to be concatenated, from a speech element database (DB) that is previously provided. The second method is an analysis-synthesis speech synthesis method in which speech is analyzed so as to generate synthesized speech based on analyzed parameters.
In terms of converting the voice quality of the above-mentioned synthesized speech in many different ways, in the waveform concatenation speech synthesis method, it is necessary to prepare the same number of the speech element DBs as necessary voice quality types, and to switch between the speech element DBs. Thus, it requires enormous costs to generate synthesized speech having various voice qualities.
On the other hand, in the speech analysis-synthesis method, the analyzed speech parameters are transformed. This allows conversion of the voice quality of the synthesized speech. Generally, a model known as a vocal tract model is used for the analysis. It is difficult, however, to completely separate speech information into voicing source information and vocal tract information. This causes a problem of sound quality degradation as a result of the transformation of incompletely-separated voicing source information (voicing source information including vocal tract information) or incompletely-separated vocal tract information (vocal tract information including voicing source information).
The conventional speech analysis-synthesis method is mainly used for compression coding of speech. In such application, such incomplete separation as described above is not a serious problem. More specifically, it is possible to obtain synthesized speech close to the original speech by re-synthesizing the speech without transforming the parameters. In a typical linear predictive coding (LPC), white noise or an impulse train, either having a uniform spectrum, is assumed for the voicing source. In addition, an all-pole transfer function in which numerators are all constant terms is assumed for the vocal tract. The voicing source spectrum is not uniform in practice. In addition, the transfer function for the vocal tract does not have an all-pole shape due to the influence of the vocal tract having a sophisticated concavo-convex shape and its divergence into the nasal cavity. Therefore, in the LPC analysis-synthesis method, a certain level of sound quality degradation is caused due to model inconsistency. It is typically known that the synthesized speech sounds stuffy-nosed or sounds like a buzzer tone.
To reduce such model inconsistency, the following measures are separately taken for the voicing source and the vocal tract.
Specifically, for the voicing source, preemphasis processing is performed on a speech waveform to be analyzed. A typical vocal tract spectrum has a tilt of −12 dB/oct. and a tilt of +6 dB/oct. is added when the speech is emitted into the air from the lips. Therefore, the spectrum tilt for the vocal-tract voicing source as a result of synthesizing the preemphasized speech waveform is generally considered as −6 dB/oct. Thus, it is possible to compensate the voicing-source spectral tilt by adding a tilt of +6 dB/oct. to the vocal-tract voicing source through differentiation of the speech waveform.
In addition, a method used for the vocal tract is to extract a component inconsistent with the all-pole model as a prediction residual and convolve the extracted prediction residual into the voicing source information, that is, to apply a residual waveform to a driving voicing source for the synthesis. This causes the waveform of the synthesized speech to completely match the original speech. A code excited linear prediction (CELP) is a technique in which the residual waveform is vector-quantized and transmitted as a code number.
According to the technique, the re-synthesized speech has a satisfactory voice quality even when the voicing source information and the vocal tract information are not completely separated due to inaccuracy of analysis attributed to low consistency of the linear prediction model.
However, in an application where voice quality is converted with varying parameters, it is important to separate the voicing source information and the vocal tract information as accurately as possible. That is, even when it is intended to change parameters attributable to the vocal tract (for example, formant center frequency), the characteristics of the voicing source are changed at the same time. Therefore, in order to allow control of the vocal tract and the voicing source separately, it is necessary to accurately separate the information regarding these two.
In the speech synthesis-analysis method, a technique for performing more accurate separation of the voicing source information and the vocal tract information is, for example, to obtain the vocal tract information, which is not sufficiently obtained in one LPC analysis, through plural LPC analyses, so as to flatten the spectral information of the voicing source (for example, see Patent Reference 1).
Hereinafter, an operation of the conventional speech analyzing apparatus shown in
By thus configuring the speech analyzing apparatus, spectrum envelop characteristics, which cannot conventionally be removed only by the first spectrum analysis unit 2a, are extracted by the second spectrum analysis unit 5a. This allows flattening of the frequency characteristics of the voicing source information outputted from the voicing source coding unit 7a.
In addition, another related technique is embodied as a speech enhancement apparatus which separates the input speech into voicing source information and vocal tract information, enhances the separated voicing source and vocal tract information individually, and generates synthesized speech using the enhanced voicing source information and vocal tract information (for example, see Patent Reference 2).
The speech enhancement apparatus calculates, when separating the input speech, an autocorrelation-function value of the input speech of a current frame. The speech enhancement apparatus also calculates an average autocorrelation-function value through weight-averaging of the autocorrelation-function value of the input speech of the current frame and the autocorrelation-function value of the input speech of a previous frame. This offsets rapid change in the shape of the vocal tract between the frames. Thus, it is possible to prevent rapid gain change at the time of enhancement. Accordingly, this makes it less likely to cause unusual phone.
[Patent Reference 1] Japanese Unexamined Patent Application Publication No. 5-257498 (pages 3 to 4, FIG. 1)
[Patent Reference 2] International Application Published under the Patent Cooperation Treaty No. 2004/040555)
However, in the conventional LPC analysis, a phenomenon is observed in which the LPC coefficient (linear predictive coefficient) that is the result of the analysis temporally fluctuates under the influence of the pitch period of the speech. This phenomenon is also observed in a PARCOR coefficient that is mathematically equivalent to the LPC coefficient shown in
In the conventional LPC analysis, the fluctuations inherent to speech or temporal fluctuations of speech attributed to the position of the analysis window are inevitably extracted as part of vocal tract information. This, as a result, causes a problem of catching a quick movement that is not inherent to the vocal tract as part of the vocal tract information, while removing a quick movement that is inherent to the voicing source from the voicing source information. As a result, when converting the voice quality by transforming a vocal tract parameter, the vocal tract parameter is transformed with such fine fluctuations still being retained. This causes a problem of difficulty of obtaining smooth speech. That is, there is a problem that the voicing source and the vocal tract cannot be separated properly.
Thus, when transforming the vocal tract information or the voicing source information, each of them includes information other than its inherent information. This results in transforming the vocal tract information or voicing source information that is deformed under the influence of such non-inherent information. Eventually, a problem remains that the sound quality of the synthesized speech is caused to degrade when voice quality is transformed.
For example, original fluctuation components derived from the original pitch and included in the vocal tract information still remain even when the pitch is changed. This causes sound quality to degrade.
Furthermore, for the speech enhancement apparatus described in Patent Reference 2, obtainable information is voicing source information. Conversion to an arbitrary voice quality requires a transformable parameter representation while holding, concurrently, the vocal tract information and the voicing source information of the source speech. However, there is a problem that the waveform information as described in Patent Reference 2 does not allow such conversion with high degrees of freedom.
In addition, Patent Reference 1 discloses that the voicing source is approximated to an impulse voicing source assumed in the LPC by flattening the frequency characteristics of the voicing source. However, real voicing source information is not consistent with impulses. Thus, when simply performing an analysis and synthesis, it is possible to obtain high-quality synthesized speech using a conventional technique without transforming the vocal tract information and the voicing source information. However, this presents a problem, in converting the voice quality, that the vocal tract information and the voicing source information cannot be controlled independently of each other, for example, controlling only the vocal tract information or only the voicing source information is not possible.
Furthermore, for the speech enhancement apparatus described in Patent Reference 2, obtainable voicing source information is waveform information. Thus, the problem is that it is not possible to arbitrarily convert the voice quality without further processing.
The present invention is conceived in view of the above-described problems, and it is an object of the present invention to provide a speech separating apparatus, a speech synthesizing apparatus, and a voice quality conversion apparatus that separate voicing source information and vocal tract information in a manner more appropriate for voice quality conversion, to thereby make it possible to prevent the degradation of voice quality resulting from the transforming each of the voicing source information and vocal tract information.
In addition, the present invention also aims to provide a speech separating apparatus, a speech synthesizing apparatus, and a voice quality conversion apparatus that allow efficient conversion of voicing source information.
In order to achieve the above object, the speech separating apparatus according to the present invention is a speech separating apparatus that analyses an input speech signal so as to extract vocal tract information and voicing source information, and includes: a vocal tract information extracting unit that extracts vocal tract information from the input speech signal; a filter smoothing unit that smoothes, in a first time constant, the vocal tract information extracted by the vocal tract information extracting unit; an inverse filtering unit that calculates a filter having an inverse characteristic to a frequency response of the vocal tract information smoothed by the filter smoothing unit and filters the input speech signal by using the calculated filter; and a voicing source modeling unit that takes, from the input speech signal filtered by the inverse filtering unit, a waveform included in a second time constant shorter than the first time constant and calculates, for each waveform that is taken, voicing source information from the each waveform.
According to this configuration, the vocal tract information including voicing source information is smoothed in a time axis direction. This allows extraction of vocal tract information that does not include fluctuations derived from the pitch period of the voicing source.
In addition, a filter coefficient is calculated for a filter having a frequency amplitude response characteristic inverse to the vocal tract information that has been smoothed, so as to filter the input speech signal by using the filter. Furthermore, voicing source information is obtained from the input speech that has been filtered. This allows obtainment of voicing source information including information that is conventionally mixed in the vocal tract information.
Furthermore, the voicing source modeling unit converts the input speech signal into a parameter, with a shorter time constant than a time constant used for the smoothing by the filter smoothing unit. This allows modeling of the voicing source information including fluctuation information that is conventionally lost in the smoothing by the filter smoothing unit.
Accordingly, this allows modeling of vocal tract information that is more stable than before and the voicing source information including temporal fluctuations that are conventionally removed.
In addition, the voicing source information is parameterized. This allows efficient conversion of the voicing source information.
Preferably, the speech separating apparatus described above further includes a synthesis unit that generates synthesized speech by generating a voicing source waveform by using a voicing source information parameter outputted from the voicing source modeling unit, and filtering the generated voicing source waveform by using the vocal tract information smoothed by the filter smoothing unit.
It is possible to generate synthesized speech using the above-described voicing source information and vocal tract information. This makes it possible to generate synthesized speech having fluctuations. With this, it becomes possible to generate highly natural synthesized speech.
Further preferably, the speech separating apparatus described above includes: a target speech information holding unit that holds vocal tract information and the parameterized voicing source information on a target voice quality; a conversion ratio input unit that inputs a conversion ratio for converting the input speech signal into the target voice quality; a filter transformation unit that converts, at the conversion ratio inputted by the conversion ratio input unit, the vocal tract information smoothed by the filter smoothing unit into the vocal tract information on the target voice quality, which is held by the target speech information holding unit; and a voicing source transformation unit that converts, at the conversion ratio inputted by the conversion ratio input unit, the voicing source information parameterized by the voicing source modeling unit into the voicing source information on the target voice quality, which is held by the target speech information holding unit, and the synthesis unit generates synthesized speech by generating a voicing source waveform by using the voicing source information transformed by the voicing source transformation unit, and filtering the generated voicing source waveform by using the vocal tract information transformed by the filter transformation unit.
It is possible to transform the vocal tract information while retaining fluctuation information. This prevents the degradation of sound quality.
Even when voice quality conversion processing is performed on the voicing source information and the vocal tract information independently of each other, it is possible to convert only the information that should originally be converted. This prevents the degradation of sound quality as a result of the voice quality conversion.
Note that the present invention can be realized not only as a speech separating apparatus including these characteristics but also as a speech separation method including, as steps, characteristic units included in the speech separating apparatus, and also as a program causing a computer to execute such characteristic steps included in the speech separation method. Additionally, it goes without saying that such a program can be distributed through a recording medium such as a Compact Disc-Read Only Memory (CD-ROM) and a communication network such as the Internet.
Vocal tract information including voicing source information is smoothed in a time axis direction. This allows extraction of vocal tract information that does not include fluctuations derived from the pitch period of a voicing source.
In addition, a filter coefficient is calculated for a filter having a frequency amplitude response characteristic inverse to the vocal tract information that has been smoothed, so as to filter the input speech signal by using the filter. Furthermore, parameterized voicing source information is obtained from the input signal that has been filtered. This allows obtainment of voicing source information including information that is conventionally mixed in the vocal tract information.
Furthermore, the input speech signal is converted into a parameter, with a shorter time constant than a time constant used for the smoothing. This allows modeling of the voicing source information by including fluctuation information that is conventionally lost in the smoothing.
Accordingly, this allows modeling of the vocal tract information that is more stable than before and the voicing source information including temporal fluctuations that are conventionally removed.
In addition, it is also possible to generate synthesized speech having fluctuations. With this, it becomes possible to generate highly natural synthesized speech.
Even when transforming the vocal tract information, it is possible to transform the vocal tract information while retaining fluctuation information. This prevents the degradation of sound quality.
Even when voice quality conversion processing is performed on the voicing source information and the vocal tract information independently of each other, it is possible to convert only the information that should originally be converted. This prevents the degradation of sound quality as a result of the voice quality conversion.
In addition, the voicing source information is parameterized. This allows efficient conversion of the voicing source information.
Hereinafter, embodiments of the present invention shall be described with reference to the drawings.
The voice quality conversion apparatus is an apparatus that generates synthesized speech by converting the voice quality of inputted speech into a target voice quality and outputs the synthesized speech, and includes a speech separating apparatus 111, a filter transformation unit 106, a target speech information holding unit 107, voicing source transformation unit 108, a synthesis unit 109, and a conversion ratio input unit 110.
The speech separating apparatus 111 is an apparatus that separates voicing source information and vocal tract information from the input speech, and includes a linear predictive coding (LPC) analysis unit 101, a partial auto correlation (PARCOR) calculating unit 102, a filter smoothing unit 103, an inverse filtering unit 104, and a voicing source modeling unit 105.
The LPC analysis unit 101 is a processing unit that extracts vocal tract information by performing a linear predictive coding analysis on the inputted speech.
The PARCOR calculating unit 102 is a processing unit that calculates a PARCOR coefficient based on a linear predictive coefficient analyzed by the LPC analysis unit 101. The LPC coefficient and the PARCOR coefficient are mathematically equivalent, and the PARCOR coefficient also represents vocal tract information.
The filter smoothing unit 103 is a processing unit that smoothes the PARCOR coefficient, which is calculated by the PARCOR calculating unit 102, in a time direction with respect to each dimension.
The inverse filtering unit 104 is a processing unit that calculates a coefficient, from the PARCOR coefficient smoothed by the filter smoothing unit 103, for a filter having an inverse frequency amplitude response characteristic and performs inverse filtering on the speech using the calculated inverse filter, to thereby calculate voicing source information.
The voicing source modeling unit 105 is a processing unit that performs modeling on the voicing source information calculated by the inverse filtering unit 104.
The filter transformation unit 106 is a processing unit that converts the PARCOR coefficient smoothed by the filter smoothing unit 103, based on the target filter information held by the target speech information holding unit 107 to be hereinafter described and the conversion ratio inputted by the conversion ratio input unit 110, to thereby convert the vocal tract information.
The target speech information holding unit 107 is a storage apparatus that holds filter information on the target voice quality, and is configured with, for example, a hard disk and so on.
The voicing source transformation unit 108 is a processing unit that transforms the voicing source information parameterized into a model by the voicing source modeling unit 105, based on the voicing source information held by the target speech information holding unit 107 and the conversion ratio inputted by the conversion ratio input unit 110, to thereby convert the voicing source information.
The synthesis unit 109 is a processing unit that generates synthesized speech using the vocal tract information converted by the filter transformation unit 106 and the voicing source information converted by the voicing source transformation unit 108.
The conversion ratio input unit 110 is a processing unit that inputs a ratio indicating a degree to which the input speech can be approximated to the target speech information held by the target speech information holding unit 107.
The voice quality conversion apparatus is thus configured with the constitutional elements described above. The respective processing units included in the voice quality conversion apparatus are realized through execution of a program for realizing these processing units on a computer processor as shown in
Next, an operation of each of the constituent elements shall be described in detail.
<LPC Analysis Unit 101>
The LPC analysis unit 101 performs a linear predictive analysis on inputted speech. The linear predictive analysis is to predict a sample value yn having a speech waveform from p sample values (yn−1, yn−2, yn−3, . . . , yn−p) that temporally precede the sample value yn, and can be represented by Equation 1.
[Expression 1]
yn≅α1yn−1+α2yn−2+α3yn−3+Λ+αpyn−p (Equation 1)
A coefficient ai (i=1 to p) for the p sample values can be calculated using a correlation method, a covariance method, or the like. Where the calculated coefficient ai is used, an inputted speech signal S(z) can be represented by Equation 2.
Here, U(z) represents a signal obtained through inverse filtering of the input speech S(z) using 1/A(z).
<PARCOR Calculating Unit 102>
Generally, in order to transform the vocal tract information calculated based on the LPC analysis and so on, the vocal tract information is transformed by extracting correspondence of feature points (for example, formant) in spectral envelope, and then interpolating the vocal tract information between such feature points found corresponding to each other.
However, when obtaining a spectral envelope by LPC analysis or the like, each spectral feature point does not always correspond to the formant, and there is a case where a relatively weak peak value is selected as a feature point (y2). Such a feature point is hereinafter referred to as a pseudo formant.
In extracting the correspondence, there is a case where the formant and the pseudo formant are incorrectly extracted to correspond to each other. The figure shows an example where the correspondence, which should normally be: x1 to y1, x2 to y3, and x3 to y4 (shown in full line in the figure), results in such incorrect correspondence as: x1 to y1, x2 to y2, and x3 to y3 indicated in dashed line).
As a result, when interpolating the vocal tract information between such feature points incorrectly extracted to correspond to each other, an inappropriate value is calculated for the vocal tract information as a result of the correspondence of x3 to y3, which should not normally correspond to each other.
The PARCOR calculating unit 102 calculates a PARCOR coefficient (partial autocorrelation coefficient) ki, using the linear predictive coefficient ai analyzed by the LPC analysis unit 101. For the calculation method, it is possible to apply the Levinson-Durbin-Itakura algorithm to perform the calculation. Note that the PARCOR coefficient has the features below.
(1) The fluctuations of a lower-order coefficient have a larger influence on the spectrum, and the higher the order of the coefficient is, the less influence such fluctuations have over the spectrum.
(2) The fluctuations of a high-order coefficient flatly influence the entire region.
Due to such features of the PARCOR coefficient, information, which appears as the pseudo formant (the peak value having a weak spectral envelope), is represented as a high-order parameter in the PARCOR coefficient. Therefore, such interpolation in terms of the PARCOR coefficient, performed in an identical dimension, allows extraction of correspondence very close to the feature points on the spectrum. A specific example of this shall be given below with the description of the filter smoothing unit 103.
<Filter Smoothing Unit 103>
The PARCOR coefficients shown in
The filter smoothing unit 103 performs smoothing in the time direction with respect to each dimension of the PARCOR coefficient calculated by the PARCOR calculating unit 102.
The smoothing method is not particularly limited. For example, it is possible to smooth the PARCOR coefficient by approximating the PARCOR coefficient with respect to each dimension using a polynomial as represented by Equation 3.
Here,
[Expression 4]
ŷa [Expression 4]
represents the PARCOR coefficient approximated using the polynomial, with ai representing the coefficient of polynomial and x representing time.
At this time, as a time constant to which the polynomial approximation is applied (corresponding to a first time constant), it is possible to set, for example, a phoneme section as a unit of the approximation. In addition, instead of the phoneme section, it is also applicable to set, as the time constant, a length from the center of a phoneme to the center of the subsequent phoneme. Note that the phoneme section shall hereinafter be described as a unit of smoothing.
In the present embodiment, a fifth order is given as an example for describing the order of polynomial, but the polynomial need not be quintic. Note that a regression line of each phoneme, other than the polynomial approximation, is also applicable in approximating the PARCOR coefficient.
The figures show that the PARCOR coefficients are smoothed for each phoneme after the smoothing.
Note that the smoothing method is not limited to this, and smoothing through moving average or the like is also applicable.
On a phoneme boundary, the PARCOR coefficient is discontinuous, but it is possible to prevent such discontinuity by interpolating the PARCOR coefficient by providing an appropriate transitional section. The interpolation method is not particularly limited, but may be linear interpolation, for example.
<Inverse Filtering Unit 104>
The inverse filtering unit 104 forms a filter having an inverse characteristic to the filter parameter by using the PARCOR coefficient smoothed by the filter smoothing unit 103. The inverse filtering unit 104 filters input speech using the formed filter, so as to output a voicing source waveform of the input speech.
<Voicing Source Modeling Unit 105>
In the present invention, the vocal cord voicing source waveform thus estimated (hereinafter referred to as “voicing source waveform”) is modeled in the following method: (1) A glottal closure time for the voicing source waveform is estimated per pitch period. This estimation method includes a method disclosed in Patent Reference: Japanese Patent No. 3576800.
(2) The voicing source waveform is taken per pitch period, centering on the glottal closure time. For the taking, the Hanning window function having nearly twice the length of the pitch period is used.
(3) The waveform, which is taken, is converted into a frequency domain representation. The conversion method is not particularly limited. For example, the waveform is converted into the frequency domain representation by using a discrete Fourier transform (hereinafter, DFT) or a discrete cosine transform.
(4) A phase component is removed from each frequency component in DFT, to thereby generate amplitude spectrum information. For removal of the phase component, the frequency component represented by a complex number is replaced by an absolute value in accordance with the following Equation 4.
[Expression 5]
z=√{square root over (x2+y2)} (Equation 4)
Here, z represents an absolute value, x represents a real part of the frequency component, and y represents an imaginary part of the frequency component.
(5) The amplitude spectrum information is approximated by one or more functions. Parameters (coefficients) of the above approximate functions are extracted as voicing source information.
The voicing source information is modeled after thus extracted with a time constant equivalent to a pitch period (corresponding to a second time constant). The voicing source waveform includes a number of pitch periods that are continuously present in a time direction. Therefore, the modeling as described above is performed on all of these pitch periods. Since the modeling is performed with respect to each pitch period, the voicing source information is analyzed with a time constant far shorter than the time constant for the vocal tract information.
Next, the method of approximating voicing-source amplitude spectrum information by functions shall be described in detail.
<Method of Approximating Voicing-Source Amplitude Spectrum Information by Functions>
The method of modeling an output waveform outputted from the inverse filtering unit 104 (
In the description below, the output waveform from the inverse filtering unit 104 is referred to as a voicing source, and the amplitude spectrum is simply referred to as a spectrum.
The output waveform shown in
In the present embodiment, it is assumed that the modeling is performed one by one on each voicing source waveform that is taken with the Hanning window having twice the length of the pitch period (hereinafter, referred to as a “voicing source pitch waveform”).
Considering auditory characteristics and focusing on the tendency that the higher the frequency is, the lower the frequency resolution is and the less decibel difference affects the perception, the inventors have come to consider, as
Table 1 shows a five-level scale and evaluation words in the DMOS test.
Furthermore, the inventors have attempted, as
However, incrementing the order means increasing sensitivity to the quantization of the coefficient, and thereby increasing difficulty in implementation to the hardware. Therefore, as
This test has proved that sufficient sound quality can be obtained by assigning a quadratic function to both parts.
Thus, it has been clarified that it is effective to apply a straight-line approximation to a domain having a frequency above the boundary frequency, and to apply a quadratic function to each of the two parts of the domain having a frequency below the boundary frequency and divided in half.
Meanwhile, it is clarified that the lower limit of boundary frequency described above is different from speaker to speaker. Thus far, an example of using speech of a female speaker has been described, but when the same frequency was applied to speech of a male speaker, a phenomenon was observed in which energy in the low frequency area decreased. A possible cause of this is a low fundamental frequency component of the male voice, which results in a low glottal formant position (glottal formant frequency). In fact, an optimal point is found down below the boundary frequency.
Based on these results and with an understanding that the glottal formant position fluctuates in continuous speech even when the speech is uttered by the same single speaker, the inventors have conceived a method of dynamically setting the boundary frequency according to the voicing source spectrum. The method is to previously store, in a table, plural boundary frequencies (276 Hz, 551 Hz, 827 Hz, 1103 Hz, 1378 Hz, and 1654 Hz) as boundary frequency candidates. The spectrum is approximated by sequentially selecting these boundary frequency candidates, so as to select a boundary frequency having a minimum square error.
Thus, the voicing source modeling unit 105 analyzes an inverse filter waveform on a per-pitch period basis, and stores: linear-function coefficients (a1, b1) for high frequency area; quadratic-function coefficients for area A in the low frequency area (a2, b2, c2); quadratic-function coefficients for area B (a3, b3, c3); information on the boundary frequency Fc; and, additionally, temporal and positional information on the pitch period.
Note that here the magnitude of the DFT frequency component is used as a voicing source spectrum, but normally the magnitude of each DFT frequency component is logarithmically converted when displaying the amplitude spectrum. Therefore, it is naturally possible to perform the approximation using functions after such processing.
<Conversion Ratio Input Unit 110>
The conversion ratio input unit 110 inputs, as a conversion ratio, the degree to which the inputted speech should be converted into the target speech information held by the target speech information holding unit 107.
<Filter Transformation Unit 106>
The filter transformation unit 106 performs transformation (conversion) of the PARCOR coefficients smoothed by the filter smoothing unit 103.
Although the unit of conversion is not particularly limited, a case of the conversion in units of phoneme shall be described, for example. First, the filter transformation unit 106 obtains, from the target speech information holding unit 107, a target PARCOR coefficient corresponding to a phoneme to be converted. For example, such a target PARCOR coefficient is prepared for each phoneme category.
The filter transformation unit 106 transforms an inputted PARCOR coefficient, based on the information on the target PARCOR coefficient and the conversion ratio inputted by the conversion ratio input unit 110. The inputted PARCOR coefficient is specifically a polynomial used for the smoothing by the filter smoothing unit 103.
First, the conversion source parameter (inputted PARCOR coefficient) is represented by Equation 5, and thus the filter transformation unit 106 calculates a coefficient ai of the polynomial. This coefficient ai, when used for generating a PARCOR coefficient, allows generation of a smooth PARCOR coefficient.
Next, the filter transformation unit 106 obtains a target PARCOR coefficient from the target speech information holding unit 107. The filter transformation unit 106 calculates a coefficient bi of polynomial by approximating the obtained PARCOR coefficient by using the polynomial represented by Equation 6. Note that the coefficient bi after the approximation using the polynomial may be previously stored in the target speech information holding unit 107.
Next, the filter transformation unit 106 calculates a coefficient ci of polynomial for the converted PARCOR coefficient in accordance with Equation 7, by using a parameter to be converted ai, a target parameter bi, and a conversion ratio r.
[Expression 8]
ci=ai+(bi−ai)×r (Equation 7)
Normally, the conversion ratio r is designated within a range of 0≦r≦1. However, even in the case of the conversion ratio r exceeding the range, it is possible to convert the parameter in accordance with Equation 7. In the case of the conversion ratio r exceeding 1, the difference between the parameter to be converted (ai) and the target vowel vocal tract information (bi) is further emphasized in the conversion. On the other hand, in the case of the conversion ratio r assuming a negative value, the difference between the parameter to be converted (ai) and the target vowel vocal tract information (bi) is further emphasized in a reverse direction in the conversion.
The filter transformation unit 106 calculates the filter coefficient after the conversion in accordance with Equation 8, by using the calculated coefficient ci of polynomial after the conversion.
The above conversion processing, when performed in each dimension of the PARCOR coefficient, allows the conversion into the target PARCOR coefficient at a designated conversion ratio.
On the phoneme boundary, as in the case of the filter smoothing unit 103, an appropriate transitional section is provided for the interpolation so as to prevent discontinuity of the PARCOR coefficient values.
In order to recognize the appropriateness of such interpolation in PARCOR coefficients,
Here, the left side represents a comparison of vocal-tract cross-sectional areas in section n and section n+1. Kn represents an nth and an n+1th PARCOR coefficients on the vocal tract boundary.
As clearly shown by
In addition, since the vocal tract information is smoothed in a time direction through polynomial approximation, it is possible to convert the vocal tract information through extremely simplified processing.
<Target Speech Information Holding Unit 107>
The target speech information holding unit 107 holds the vocal tract information regarding the target voice quality. For the vocal tract information, a time sequence of a target PARCOR coefficient is included in at least each phonological category. In the case of holding the time sequence of a PARCOR coefficient in each category, the filter transformation unit 106 obtains a time sequence of the PARCOR coefficient corresponding to the category. This allows the filter transformation unit 106 to obtain a function used for the approximation of the target PARCOR coefficient.
In addition, in the case where the target speech information holding unit 107 holds plural PARCOR coefficient time sequences for each category, the filter transformation unit 106 may select a PARCOR coefficient time sequence most adaptable for the source PARCOR parameter. The selection method is not particularly limited, but the selection may be performed using, for example, the function selection method described in Patent Reference: Japanese Patent No. 4025355.
In addition, the target speech information holding unit 107 further holds voicing source information as target speech information. The voicing source information includes, for example, an average fundamental frequency, an average aperiodic component boundary frequency, and an average voiced voicing source amplification of the target speech.
<Voicing Source Transformation Unit 108>
The voicing source transformation unit 108 transforms the voicing source parameter modeled by the voicing source modeling unit 105, using information related to the voicing source from among the target speech information held by the target speech information holding unit 107.
The transformation method is not particularly limited. For example, the method may be realized by conversion processing for converting an average value of the fundamental frequency of the modeled voicing source parameter, the aperiodic component boundary frequency, or the voiced voicing source amplification into the information held by the target speech information holding unit 107 in accordance with the conversion ratio inputted by the conversion ratio input unit 110.
<Synthesis Unit 109>
The synthesis unit 109 drives a filter based on the PARCOR coefficient transformed by the filter transformation unit 106, using the voicing source based on the voicing source parameter transformed by the voicing source transformation unit 108, so as to generate synthesized speech. This, however, does not limit a specific generation unit. An example of the method of generating a voicing source waveform shall be described with reference to
a) shows that the voicing source parameter, which is modeled by the method described above, is obtained through approximation of the amplitude spectrum. That is, the frequency band below the boundary frequency is divided into two parts, the voicing source spectrum in each half of the divided frequency band is approximated using a quadratic function, and the voicing source spectrum in the frequency band above the boundary frequency is approximated using a linear function. The synthesis unit 109 restores the amplitude spectrum based on the information (the coefficients of the respective functions). As a result, a simplified amplitude spectrum as shown in
The synthesis unit 109 converts the amplitude spectrum thus restored in the frequency domain into a temporal waveform by applying the inverse discrete Fourier transform (IDFT). The waveform thus restored is a bilaterally symmetrical waveform having a length of one pitch period as shown in
In
Next, the operation of the voice quality conversion apparatus shall be described with reference to the flowchart shown in
The LPC analysis unit 101 performs an LPC analysis on inputted speech so as to calculate a linear predictive coefficient ai (step S001).
The PARCOR calculating unit 102 calculates a PARCOR coefficient ki from the linear predictive coefficient ai calculated in step S001 (step S002).
The filter smoothing unit 103 smoothes, in a time direction, parameter values in respective dimensions of the PARCOR coefficient ki calculated in step S002 (step S003). This smoothing allows removal of temporal fluctuation components of the voicing source information that remain in the vocal tract information. The description shall be continued below based on the assumption that the smoothing is performed through polynomial approximation at this point in time.
The inverse filtering unit 104 generates an inverse filter representing inverse characteristics of the vocal tract information, using vocal tract information from which the temporal fluctuations of the voicing source information are removed after the smoothing in a time direction performed in step S003. The inverse filtering unit 104 performs inverse filtering on the inputted speech, using the generated inverse filter (step S004). This makes it possible to obtain voicing source information including the temporal fluctuations of the voicing source, which is conventionally included in the vocal tract information.
The voicing source modeling unit 105 performs modeling on the voicing source information obtained in step S004 (step S005).
The filter transformation unit 106 transforms the vocal tract information approximated using the polynomial function calculated in step S003, in accordance with the conversion ratio separately inputted from the outside, so that the voicing source information is approximated to the target voicing source information (step S006).
The voicing source transformation unit 108 transforms a voicing model parameter parameterized into a model in step S005 (step S007).
The synthesis unit 109 generates synthesized speech based on the vocal tract information calculated in step S006 and the voicing source information calculated in step S007 (step S008). Note that the processing of step S006 may be performed immediately after the performance of the processing of step S003.
The processing described above makes it possible to accurately separate, with respect to the inputted speech, the voicing source information and the vocal tract information. Furthermore, when converting voice quality by transforming such accurately-separated vocal tract information and voicing source information, it is possible to perform voice quality conversion resulting in less degradation of the sound quality.
(Effects)
Conventionally, as
Furthermore, it is possible to obtain voicing source information which includes information that is conventionally removed, by performing inverse filtering on the inputted speech by using filter coefficients calculated by the filter smoothing unit 103.
Accordingly, this allows extraction and modeling of the vocal tract information that is more stable than before. At the same time, this allows extraction and modeling of more accurate voicing source information which includes temporal fluctuations that are conventionally removed.
The thus-calculated vocal tract information and voicing source information include, with respect to each other, less unnecessary components than before. This produces an effect that degradation of sound quality is very small even when the vocal tract information and the voicing source information are separately transformed. Accordingly, this allows designing that achieves a higher degree of freedom in voice quality conversion, thus allowing the conversion into various voice qualities.
For example, the vocal tract information separated by a conventional speech separating apparatus is appended with a component essentially derived from the voicing source. Thus, when performing speaker conversion (that is, converting voice quality from a speaker A to a speaker B) or the like, the transformation is performed including the voicing source component of the speaker A although it is only intended to convert the vocal tract information of the speaker A. Thus, there is a problem of, for example, phonemic ambiguity because the same transformation process that is performed on the vocal tract information of the speaker A is to be performed on the voicing source components of the speaker A.
On the other hand, the vocal tract information and the voicing source information calculated according to the present invention contain less unnecessary components than before with respect to each other. This produces an effect that the degradation of sound quality is very small even when the vocal tract information and the voicing source information are independently transformed. Thus, this allows designing that achieves a higher degree of freedom in voice quality conversion, thus allowing the conversion into various voice qualities.
In addition, the filter smoothing unit 103 smoothes a PARCOR coefficient by using a polynomial with respect to each phoneme. This produces another effect of making it only necessary to hold, for each phoneme, the vocal tract parameter, which conventionally has to be held for each analysis period.
Note that in the present embodiment, a combination of all the analysis, synthesis, and voice quality conversion of speech has been described, but the configuration may be such that each of them functions independently. For example, a speech synthesizing apparatus may be configured as shown in
In addition, although the voicing source information has been modeled in each pitch period, the modeling need not necessarily be performed with that short time constant. It is still possible to maintain the effect of preserving some level of naturalness because the pitch period is also shorter than the time constant of the vocal tract in the modeling by selecting one pitch period from every few pitch period. The vocal tract information is approximated using a polynomial for the duration of a phoneme. Thus, assuming that the utterance speed in Japanese conversation is approximately 6 morae/second, one mora has a duration of approximately 0.17 second, a large part of which consists of vowels. Accordingly, the time constant for modeling the vocal tract is around 0.17 second. On the other hand, as for the voicing source information, assuming that the pitch frequency of a male speaker's utterance having a relatively low pitch is 80 Hz, one pitch period is: 1/80 second=0.013 second. Accordingly, the time constant is 0.013 second in the case of modeling the voicing source information in each pitch period, and the time constant is 0.026 second in the case of modeling in every two pitch periods. Thus, in the modeling in every few pitch periods, the time constant for modeling the voicing source information is sufficiently shorter than the time constant for modeling the vocal tract information.
The external view of the voice quality conversion apparatus according to a second embodiment of the present invention is the same as shown in
The second embodiment of the present invention is different from the first embodiment in that the speech separating apparatus 111 is replaced with a speech separating apparatus 211. The speech separating apparatus 211 is different from the speech separating apparatus in the first embodiment in that the LPC analysis unit 101 is replaced with an ARX analysis unit 201.
Hereinafter, the difference between the ARX analysis unit 201 and the LPC analysis unit 101 shall be described focusing on the effects produced by the ARX analysis unit 201, and the description of the same portions as those described in the first embodiment shall be omitted. The respective processing units included in the voice quality conversion apparatus are realized through execution of a program for realizing these processing units on a computer processor as shown in
<ARX Analysis Unit 201>
The ARX analysis unit 201 separates vocal tract information and voicing source information by using an autoregressive with exogenous input (ARX) analysis. The ARX analysis widely differs from the LPC analysis in that in the ARX analysis a mathematical voicing source model is applied as a voicing source model. In addition, in the ARX analysis, when the analysis section includes plural fundamental frequencies, it is possible to separate with higher accuracy, unlike the LPC analysis, vocal tract information and voicing source information (Non-Patent Reference: Otsuka et al., Robust ARX-based Speech Analysis Method Taking Voicing Source Pulse Train into Account”, The Journal of The Acoustical Society of Japan, vol. 58, No. 7, (2002) (Vol. 58 No. 7 (2002), pp. 386-397).
Assuming that a speech signal is S(z), vocal tract information is A(z), voicing source information is U(z), and unvoiced noise source E(z), the speech signal S(z) can be represented by Equation 10. Here, a characteristic point is that voicing source information generated by the Rosenberg-Klatt (RK) model shown in Equation 11 is used as the voicing source information U(z) in the ARX analysis.
However, S(z), U(z), and E(z) represent the z-transform of s(n), u(n), and e(n). In addition, AV represents voiced voicing source amplitude, Ts represents sampling period, T0 represents pitch period, and OQ represents glottal open quotient. The first term is used for voiced speech, and the second term is used for unvoiced speech.
Here, A(z) has the same format as the system function in the LPC analysis, thus allowing the PARCOR calculating unit 102 to calculate a PARCOR coefficient by the same method as in performing the LPC analysis.
The ARX analysis has the following advantages, compared with the LPC analysis.
(1) A voicing source pulse train corresponding to plural pitch frequencies is provided in the analysis window for performing the analysis. This allows stable extraction of vocal tract information from high-pitched speech of women, children or the like.
(2) Particularly, separation performance of the vocal tract and the voicing source is high for narrow vowels such as /i/ and /u/, in which F0 (fundamental frequency) and F1 (first formant frequency) are close to each other.
However, the ARX analysis has a disadvantage that a greater amount of processing is required than in the LPC analysis.
By comparing
It is clarified that the ARX analysis, compared to the case of using the LPC analysis, is less likely to be influenced by temporally short fluctuations and also allows maintaining the level of separation performance of the vocal tract and the voicing source in the smoothing, which is a characteristic of the ARX analysis.
The other processing is the same as the first embodiment.
(Effects)
Conventionally, as shown in
In the ARX analysis, compared to the LPC analysis, vocal tract information that is more accurate and includes less fluctuation having a short time constant is successfully obtained. This allows further removal of fluctuations having a short time constant while retaining rough movements, thus improving accuracy of vocal tract information.
Furthermore, it is possible to obtain voicing source information which includes information that is conventionally removed, by performing inverse filtering on the inputted speech by using filter coefficients calculated by the filter smoothing unit 103.
Accordingly, this allows extraction and modeling of vocal tract information that is more stable than before. At the same time, this allows extraction and modeling of more accurate voicing source information which includes temporal fluctuations that are conventionally removed.
In addition, the filter smoothing unit 103 smoothes a PARCOR coefficient by using a polynomial with respect to each phoneme. This produces an effect of making it only necessary to hold, for each phoneme, the vocal tract parameter, which conventionally has to be held for each analysis period.
Note that in the present embodiment, a combination of all the analysis, synthesis, and voice quality conversion of speech has been described, but the configuration may be such that each of them functions independently. For example, a speech synthesizing apparatus may be configured as shown in
Note that the description in the present specification assumes, for convenience sake, Japanese language and five vowels /a/, /i/, /u/, /e/, and /o/, but the differentiation between vowels and consonants is a concept independent of language. Thus, the scope of application of the present invention is not limited to the Japanese language, and the present invention is applicable to every language.
Note that the embodiments described thus far include inventions having the following structure.
The speech separating apparatus in an aspect of the present invention is a speech separating apparatus that separates an input speech signal into vocal tract information and voicing source information, and includes: a vocal tract information extracting unit that extracts vocal tract information from the input speech signal; a filter smoothing unit that smoothes, in a first time constant, the vocal tract information extracted by the vocal tract information extracting unit; an inverse filtering unit that calculates a filter having an inverse characteristic to a frequency response of the vocal tract information smoothed by the filter smoothing unit and filters the input speech signal by using the calculated filter; and a voicing source modeling unit that takes, from the input speech signal filtered by the inverse filtering unit, a waveform included in a second time constant shorter than the first time constant and calculates, for each waveform that is taken, voicing source information from the each waveform.
Here, the voicing source modeling unit may convert each waveform that is taken, into a representation of the frequency domain, may approximate, for each waveform, an amplitude spectrum included in a frequency band above a predetermined boundary frequency by using a first function, and may approximate an amplitude spectrum included in a frequency band not higher than a predetermined boundary frequency by using a second function of higher order than the first function, so as to output, as parameterized voicing source information, coefficients of the first and the second functions.
In addition, the first function may be a linear function.
Note that the voicing source modeling unit may approximate the amplitude spectra included in two frequency areas of the frequency band by using functions of second or higher order, respectively, so as to output, as parameterized voicing source information, coefficients of the functions of second or higher order.
In addition, the voicing source modeling unit may take a waveform from the input speech signal filtered by the inverse filtering unit, by gradually shifting a window function in a time axis direction in a pitch period of the input speech signal, and may convert into a parameter each waveform that is taken, the window function having approximately twice a length of the pitch period.
Here, intervals between adjacent window functions in the taking of the waveform may be synchronous with the pitch period.
The voice quality conversion apparatus in another aspect of the present invention is a voice quality conversion apparatus that converts a voice quality of an input speech signal, and includes: a vocal tract information extracting unit that extracts vocal tract information from the input speech signal; a filter smoothing unit that smoothes, in a first time constant, the vocal tract information extracted by the vocal tract information extracting unit; an inverse filtering unit that calculates a filter having an inverse characteristic to a frequency response of the vocal tract information smoothed by the filter smoothing unit and filters the input speech signal by using the calculated filter; a voicing source modeling unit that takes, from the input speech signal filtered by the inverse filtering unit, a waveform included in a second time constant shorter than the first time constant and calculates, for each waveform that is taken, parameterized voicing source information from the each waveform; a target speech information holding unit that holds vocal tract information and the parameterized voicing source information on a target voice quality; a conversion ratio input unit that inputs a conversion ratio for converting the input speech signal into the target voice quality; a filter transformation unit that converts, at the conversion ratio inputted by the conversion ratio input unit, the vocal tract information smoothed by the filter smoothing unit into the vocal tract information on the target voice quality, which is held by the target speech information holding unit; a voicing source transformation unit that converts, at the conversion ratio inputted by the conversion ratio input unit, the voicing source information parameterized by the voicing source modeling unit into the voicing source information on the target voice quality, which is held by the target speech information holding unit; and a synthesis unit that generates synthesized speech by generating a voicing source waveform by using the parameterized voicing source information transformed by the voicing source transformation unit and filtering the generated voicing source waveform by using the vocal tract information transformed by the filter transformation unit.
The filter smoothing unit may smooth the vocal tract information, through approximation using a polynomial or a regression line, in the time axis direction in a predetermined unit, the vocal tract information being extracted by the vocal tract information extracting unit, and the filter transformation unit may convert, at the conversion ratio inputted by the conversion ratio input unit, a coefficient of the polynomial or the regression line into the vocal tract information on the target voice quality held by the target speech information holding unit, the polynomial or the regression line being used when the vocal tract information is approximated by the filter smoothing unit.
Note that the filter transformation unit may further interpolate, by providing a transitional section having a predetermined time constant around the phoneme boundary, the vocal tract information included in the transitional section, by using the vocal tract information at starting and finishing points.
The voice quality conversion system in another aspect of the present invention is a voice quality conversion system that converts a voice quality of an input speech signal, and includes: a vocal tract information extracting unit that extracts vocal tract information from the input speech signal; a filter smoothing that smoothes, in a first time constant, the vocal tract information extracted by the vocal tract information extracting unit, by shifting the first time constant in the time axis direction; an inverse filtering unit that calculates a filter having an inverse characteristic to a frequency response of the vocal tract information smoothed by the filter smoothing unit and filters the input speech signal by using the calculated filter; a voicing source modeling unit that takes, from the input speech signal filtered by the inverse filtering unit, a waveform included in a second time constant shorter than the first time constant and calculates, for each waveform that is taken, parameterized voicing source information from each waveform, by shifting the second time constant in the time axis direction; a target speech information holding unit that holds vocal tract information and the parameterized voicing source information on a target voice quality; a conversion ratio input unit that inputs a conversion ratio for converting the input speech signal into the target voice quality; a filter transformation unit that converts, at the conversion ratio inputted by the conversion ratio input unit, the vocal tract information smoothed by the filter smoothing unit into the vocal tract information on the target voice quality, which is held by the target speech information holding unit; a voicing source transformation unit that converts, at the conversion ratio inputted by the conversion ratio input unit, the voicing source information parameterized by the voicing source modeling unit into the voicing source information on the target voice quality, which is held by the target speech information holding unit; and a synthesis unit that generates synthesized speech by generating a voicing source waveform by using the parameterized voicing source information transformed by the voicing source transformation unit, and filtering the generated voicing source waveform by using the vocal tract information transformed by the filter transformation unit, and the filter smoothing unit smoothes the vocal tract information, through approximation using a polynomial or a regression line, in the time axis direction in a predetermined unit, the vocal tract information being extracted by the vocal tract information extracting unit, and the filter transformation unit converts, at the conversion ratio inputted by the conversion ratio input unit, a coefficient of the polynomial or the regression line into the vocal tract information on the target voice quality held by the target speech information holding unit, the polynomial or the regression line being used when the vocal tract information is approximated by the filter smoothing unit, and also interpolates, by providing a transitional section having a predetermined time constant around the phoneme boundary, the vocal tract information included in the transitional section, by using the vocal tract information at starting and finishing points.
The speech separating method in another aspect of the present invention is a speech separating method for separating an input speech signal into vocal tract information and voicing source information, and includes: extracting vocal tract information from the input speech signal; smoothing, in a first time constant, the vocal tract information extracted in the extracting; calculating a filter having an inverse characteristic to a frequency response of the vocal tract information smoothed in the smoothing, and filtering the input speech signal by using the calculated filter; and taking, from the input speech signal filtered in the calculating, a waveform included in a second time constant shorter than the first time constant, and calculating, for each waveform that is taken, parameterized voicing source information from the each waveform.
Note that the speech separating method described above may also include generating synthesized speech by: generating a waveform by using a voicing source information parameter outputted in the taking, and filtering the generated voicing source waveform by using the vocal tract information smoothed in the smoothing.
In addition, the speech separating method described above further includes: inputting a conversion ratio for converting the input speech signal into the target voice quality; converting, at the conversion ratio inputted in the inputting, the vocal tract information smoothed in the smoothing into the vocal tract information on the target voice quality; and converting, at the conversion ratio inputted in the inputting, the voicing source information parameterized in the taking, into the voicing source information on the target voice quality, and in the generating, synthesized speech may be generated by generating a voicing source waveform by using the parameterized voicing source information transformed in the converting of the voicing source information, and filtering the generated voicing source waveform by using the vocal tract information transformed in the converting of the vocal tract information.
The embodiments disclosed here should not be considered limitative but should be considered illustrative in every aspect. The scope of the present invention is shown not by the above description but by the claims, and is intended to include all modifications within the scope of a sense and a scope equal to those of the claims.
The speech separating apparatus according to the present invention has a function to perform high-quality voice quality conversion by transforming vocal tract information and voicing source information, and is useful for user interface, entertainment, and so on requiring various voice qualities. The speech separating apparatus according to the present invention is also applicable to voice changers or the like in speech communication using cellular phones and so on.
Number | Date | Country | Kind |
---|---|---|---|
2007-209824 | Aug 2007 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2008/002122 | 8/6/2008 | WO | 00 | 4/28/2009 |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2009/022454 | 2/19/2009 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5400434 | Pearson | Mar 1995 | A |
5749073 | Slaney | May 1998 | A |
5822732 | Tasaki | Oct 1998 | A |
5864812 | Kamai et al. | Jan 1999 | A |
5890108 | Yeldener | Mar 1999 | A |
5983173 | Inoue et al. | Nov 1999 | A |
6081781 | Tanaka et al. | Jun 2000 | A |
6115684 | Kawahara et al. | Sep 2000 | A |
6349277 | Kamai et al. | Feb 2002 | B1 |
6490562 | Kamai et al. | Dec 2002 | B1 |
6615174 | Arslan et al. | Sep 2003 | B1 |
6804649 | Miranda | Oct 2004 | B2 |
7152032 | Suzuki et al. | Dec 2006 | B2 |
7349847 | Hirose et al. | Mar 2008 | B2 |
7464034 | Kawashima et al. | Dec 2008 | B2 |
20020032563 | Kamai et al. | Mar 2002 | A1 |
20030088417 | Kamai et al. | May 2003 | A1 |
20050165608 | Suzuki et al. | Jul 2005 | A1 |
20060136213 | Hirose et al. | Jun 2006 | A1 |
Number | Date | Country |
---|---|---|
4-323699 | Nov 1992 | JP |
5-257498 | Oct 1993 | JP |
9-244694 | Sep 1997 | JP |
10-143196 | May 1998 | JP |
2000-259164 | Sep 2000 | JP |
3576800 | Oct 2004 | JP |
2007-114355 | May 2007 | JP |
4025355 | Dec 2007 | JP |
2004040555 | May 2004 | WO |
2006040908 | Apr 2006 | WO |
Number | Date | Country | |
---|---|---|---|
20100004934 A1 | Jan 2010 | US |