System and method for communicating a perceptually encoded speech spectrum signal

Description

FIELD OF THE INVENTION

This invention relates in general to a system for communicating encoded speech, and more specifically, to a system for communicating perceptually encoded speech.

BACKGROUND OF THE INVENTION

Systems for communicating encoded speech at low bit rates commonly include quantizing a vector which represents the shape of the vocal tract for a speaker. Vectors consisting of ten Line Spectral Frequencies (LSFs) are commonly used to represent the vocal tract for each speech frame for the speaker. Commonly, each speech frame is from 10 to 40 ms of sampled speech. A problem with systems using techniques which substitute a codebook vector for a vector representing a speech sample is the excessive time required to search a vector quantizer (VQ) codebook. Typically, a vector including ten LSFs can be adequately characterized by a twenty-four bit VQ without sacrificing perceptual quality. However, another problem is determining which vector from the set of vectors in the VQ codebook represents the best perceptual model for a speech sample. For example, when a twenty-four bit VQ codebook is “searched”, the search includes comparing a ten dimensional input vector which represents the speech sample with 2

24

VQ codebook vectors.

Techniques such as Multi-stage and split VQ can reduce the time to search a VQ codebook. However, a problem with such techniques is that, while typically reducing the time to search a VQ codebook, the vector selected to represent the speech sample fails to be perceptually optimal. So, another problem with existing techniques is that they do not efficiently determine a vector from a VQ codebook which represents the best perceptual model for a speech sample.

Thus, what is needed is a system and method for communicating a perceptually encoded speech spectrum signal in a time efficient manner. What is also needed is a system and method which search a VQ codebook for a vector which perceptually models a speech signal. Also needed is a system and method which improve the speed for searching a VQ codebook. What is also needed is a system and method which efficiently determine a vector to perceptually model a speech signal.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention is pointed out with particularity in the appended claims. However, a more complete understanding of the present invention may be derived by referring to the detailed description and claims when considered in connection with the figures, wherein like reference numbers refer to similar items throughout the figures, and:

FIG. 1

is a simplified block diagram of a system for communicating a perceptually encoded speech spectrum signal in accordance with a preferred embodiment of the present invention;

FIG. 2

is a simplified flow chart for a method for partitioning a plurality of vectors for a codebook in accordance with a preferred embodiment of the present invention; and

FIG. 3

is a simplified flow chart for a method for vector quantizing in accordance with a preferred embodiment of the present invention.

The exemplification set out herein illustrates a preferred embodiment of the invention in one form thereof, and such exemplification is not intended to be construed as limiting in any manner.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

The present invention provides a system and methods for efficiently communicating a perceptually encoded speech spectrum signal from a transmitter to a receiver. The transmitter includes a speech analyzer which accepts a speech signal input and generates a parameterized speech signal. The transmitter also includes a vector quantizer for generating the perceptually encoded speech spectrum signal from the parameterized speech signal. A “perceptually encoded speech spectrum signal” is generally defined to mean an encoded speech spectrum signal which has been quantized from a codebook having vectors grouped perceptually. The receiver decodes the perceptually encoded speech spectrum signal to produce decoded spectral parameters that further produce a synthetic speech output. The vector quantizer performs a method for partitioning a vector quantizer (VQ) codebook to produce perceptually organized sub-codebooks. The vector quantizer performs a second method for quantizing a vector (e.g., the parameterized speech signal) based on the perceptually organized sub-codebooks. The second method identifies a vector, from one of the perceptually organized subcodebooks, to perceptually model the speech signal.

The present invention also provides a system and method for communicating a perceptually encoded speech spectrum signal in a time efficient manner. The present invention also provides a system and method which search a VQ codebook for a vector which perceptually models a speech signal. The present invention also provides a system and method which improve the speed for searching a VQ codebook. The present invention also provides a system and method which efficiently determine a vector to perceptually model a speech spectrum signal.

FIG. 1

is a simplified block diagram for a system for communicating a perceptually encoded speech spectrum signal in accordance with a preferred embodiment of the present invention. System

100

, in

FIG. 1

, primarily shows a system for communicating a speech spectrum signal. In a preferred embodiment, the speech spectrum signal is encoded, in part, using a novel method for vector quantization. Speech coding at higher bit rates (e.g., above or equal to 4.8 kilobits per second (kb/s)) can be accomplished by directly modeling a speech signal such as speech signal input

101

. Speech coding at lower bit rates (e.g., below 4.8 kb/s) are preferably modeled by frames of speech which are decomposed into perceptually meaningful parameters. These parameters are preferably quantized for communication through a channel or for compact storage of speech information. The number of bits available for quantizing a parameter is generally limited by channel capacity or storage constraints, wherein fewer bits produce a lower quality result. Typically, synthetic speech such as synthetic speech output

104

is reconstructed from these quantized parameters. A typical parameter set for each frame includes: a vector to represent the shape of a vocal tract (e.g., LSFs or spectrum), frame pitch, frame energy and possibly some characterization of an excitation waveform.

An N dimensional (e.g., N=10, 12, 14) Linear Predictive Analysis is generally used to produce a vector of N coefficients to represent the spectrum or shape of the vocal tract. The N dimensional vector may be transformed into one of many domains, such as prediction coefficients, reflection coefficients, autocorrelation coefficients, cepstral coefficients, and line spectral frequencies (LSFs) to determine a domain to quantize the parameters efficiently. A ten dimensional vector of LSFs is most commonly used to show that twenty-four bits can adequately quantize a ten dimensional LSF vector when a vector quantizer (VQ) is used. These ten LSFs are preferably transformed to a range from 0-4000 Hertz (Hz). LSFs have a property where closely spaced LSFs indicate the presence of a formant frequency, or resonant frequency for the vocal tract. The first, or lowest frequency, formant is often the “highest energy and peakiest” so the difference between the first LSF (e.g., LSF

1

) and the second LSF (e.g., LSF

2

) or between LSF

2

and the third LSF (e.g., LSF

3

) is the smallest. The fine quantization for formant frequencies, and hence the closely spaced LSFs is especially important for good perceptual quality.

A VQ is a list, or codebook of vectors which has been trained to represent a set of vectors to be quantized. Quantization involves comparing an input vector, for example, an input speech spectrum signal, to each of the vectors in the codebook to find the one vector in the codebook which best matches perceptual criteria for the input vector. An index for the vector determined from the codebook is preferably communicated in lieu of the vector.

Methods for vector quantization which reduce the storage size and search time for a twenty-four bit VQ can be practically implemented. Two methods which reduce storage size and search time for vector quantization are a multi-stage VQ and a split VQ. An N-dimensional twenty-four bit multi-stage VQ may first employ an N-dimensional twelve bit VQ and determine the quantizing error between an input vector and the determined vector from the codebook. The “error vector” could then be quantized with an N-dimensional, twelve bit VQ for error vectors. The storage size and search time for the twelve bit VQs is substantially less than for a “full” twenty-four bit VQ. The storage size and search time would be further reduced for an eight, eight, and eight bit or a ten, eight, and six bit multi-stage VQ.

A split VQ for quantizing ten LSFs preferably employs a four dimensional, twelve bit VQ for quantizing a vector for the first four LSFs and a six dimensional, twelve bit VQ for quantizing a vector for the last six LSFs. Multi-stage and split VQs reduce storage size and search time, but have a lower perceptual quality than a full search VQ. Perceptual quality for a multi-stage VQ may be increased by retaining a set of the best vectors at each stage to apply to the next stage, however search time is also increased.

In a preferred embodiment of the present invention, VQ search time is reduced without further reducing perceptual quality. The present invention may be applied to, among other things, a full VQ, a multi-stage VQ, and a split VQ. The present invention primarily reduces search time for a VQ by partitioning the codebook in a perceptually meaningful way. An N-dimensional codebook can be searched more quickly when partitioned into a number of smaller N-dimensional sub-codebooks. The present invention partitions a codebook into sub-codebooks by grouping vectors for the codebook which are perceptually most similar. So, when a sub-codebook is determined to be searched, a best perceptual match for the input vector is within the sub-codebook.

In another embodiment of the present invention, VQ search time may be reduced by determining a structure for a codebook so that N-dimensional adjacency relationships between neighboring vectors are determined. In this embodiment, additional memory would be required to store tables in vector quantizer

120

to describe adjacency relationships. This embodiment of the present invention reduces search time by describing a path through a codebook to search such that successive comparisons would determine only a small set of vectors to search which preferably produce less quantization error.

In a preferred embodiment of the present invention, system

100

generally includes transmitter

110

coupled to receiver

150

via channel

130

. Preferably, transmitter

110

further includes: speech coder

112

, channel coder

114

, and modulator

116

. Preferably, speech coder

112

further includes: speech analyzer

118

and vector quantizer

120

. Channel

130

represents a wireless channel, however channel

130

may represent, among other things, a “wired” channel such as a fiber optic channel or a twisted pair channel.

Receiver

150

preferably includes: demodulator

156

, channel decoder

154

, and speech decoder

152

.

In a preferred embodiment of the present invention, speech analyzer

118

accepts speech signal input

101

and generates parameterized speech signal

102

. Vector quantizer

120

accepts parameterized speech signal

102

and generates perceptually encoded speech spectrum signal

103

.

Perceptually encoded speech spectrum signal

103

is received by channel coder

114

. Preferably, channel coder

114

adds forward error correction (FEC) bits to perceptually encoded speech spectrum signal

103

to provide channel error protection to signal

103

. Modulator

116

preferably accepts the protected signal from channel coder

114

and provides a modulated signal to channel

130

. Receiver

150

preferably receives the modulated signal from channel

130

via demodulator

156

. Demodulator

156

demodulates the modulated signal and forwards the demodulated signal to channel decoder

154

. Channel decoder

154

preferably provides error detection and correction to the demodulated signal and subsequently provides an error corrected signal to speech decoder

152

.

Speech decoder

152

decodes the error corrected signal to synthesize a speech output, namely synthetic speech output

104

.

In the preferred embodiment of the present invention, vector quantizer

120

generally includes a means for receiving a parameterized signal, and a means for generating a perceptually encoded speech spectrum signal.

A method for generating a perceptually encoded speech spectrum signal is discussed below.

FIG. 2

is a simplified flow chart for a method for partitioning a plurality of vectors for a codebook in accordance with a preferred embodiment of the present invention. In a preferred embodiment, method

200

is a method for partitioning a plurality of vectors for a codebook into a set of sub-codebooks. Preferably, each of the plurality of vectors is assigned to a sub-codebook based on perceptual information determined from the coefficients for the vector associated therewith.

In step

205

, subtraction operations for adjacent terms for each of the plurality of vectors is performed. In the preferred embodiment, each vector is represented by a vector having ten coefficients. Preferably, each coefficient represents one line spectral frequency (LSF). For example, assume that the ten coefficients for a vector representing a set of LSFs is as follows: 478, 578, 1040, 1487, 1604, 2043, 2359, 2622, 3316, 3540, wherein the coefficients represent LSFs between 0 and 4000 Hz. Further assume each of the coefficients is identified by a label, for example, LSF

1

, LSF

2

, LSF

3

, LSF

4

, LSF

5

, LSF

6

, LSF

7

, LSF

8

, LSF

9

, and LSF

10

, respectively. Step

205

includes performing the following subtraction operations: LSF

2

−LSF

1

, LSF

3

−LSF

2

, LSF

4

−LSF

3

, LSF

5

−LSF

4

, LSF

6

−LSF

5

, LSF

7

−LSF

6

, LSF

8

−LSF

7

, LSF

9

−LSF

8

, and LSF

10

−LSF

9

, each subtraction operation representing at least one sub-codebook (e.g., sub-codebook

1

is represented by LSF

2

−LSF

1

). In another embodiment, step

205

includes subtraction operations such as: LSF

1

−0(Hz), LSF

2

−LSF

1

, LSF

10

−LSF

9

, and 4000(Hz)−LSF

10

.

In step

210

, results from the subtraction operations for each of the plurality of vectors are compared. In the preferred embodiment, the results from step

205

are compared and ordered from smallest difference to largest difference. For the example in step

205

, the smallest difference between coefficients is determined by LSF

2

−LSF

1

(e.g., 578−478=100).

In step

215

, each of the plurality of vectors is assigned to at least one of a set of sub-codebooks based on the differences between adjacent terms for each of the plurality of vectors. In the preferred embodiment, the vector shown in the example in steps

205

-

210

is assigned to sub-codebook

1

because the difference between LSF

1

and LSF

2

is the smallest.

In step

230

, a check is performed to determine when any one of the set of sub-codebooks needs additional partitioning. In the preferred embodiment, when any one of the sub-codebooks is assigned more vectors than a predetermined percentage of vectors, for example, more than 25 percent of the entire codebook, the sub-codebook is further partitioned. In a preferred embodiment, an example step for further partitioning the sub-codebook is based on the LSF pair having the second smallest difference. Sub-dividing the sub-codebooks is preferably performed until no sub-codebook contains more than the predetermined percentage of vectors. In other embodiments, other partitioning schemes are possible such as a tree process. Method

200

then ends

235

.

FIG. 3

is a simplified flow chart for a method for vector quantizing in accordance with a preferred embodiment of the present invention. In a preferred embodiment, method

300

is a method for quantizing an input vector. Preferably, the input vector is identified as “belonging to” at least one of a predetermined set of sub-codebooks. Then, a search is performed within the “identified” sub-codebook to determine a vector which is to be substituted for the input vector.

In step

305

, subtraction operations for adjacent terms for the vector are performed. In a preferred embodiment, step

305

is performed similar to step

205

(FIG.

2

). The vector is preferably represented by ten coefficients. Preferably, each coefficient represents one LSF. For example, assume that the ten coefficients for the vector represent the following LSFs: 479, 578, 1040, 1487, 1604, 2043, 2359, 2622, 3316, and 3540. Further assume each of the coefficients is identified by a label, for example, LSF

1

, LSF

2

, LSF

3

, LSF

4

, LSF

5

, LSF

6

, LSF

7

, LSF

8

, LSF

9

, and LSF

10

, respectively. Step

305

includes performing the following subtraction operations: LSF

2

−LSF

1

, LSF

3

−LSF

2

, LSF

4

−LSF

3

, LSF

5

−LSF

4

, LSF

6

−LSF

5

, LSF

7

−LSF

6

, LSF

8

−LSF

7

, LSF

9

−LSF

8

, and LSF

10

−LSF

9

for the vector.

In step

310

, results for each subtraction operation are compared. In the preferred embodiment, the results from step

305

are compared and ordered from smallest difference to largest difference. For the example in step

305

, the smallest difference between coefficients is determined by LSF

2

−LSF

1

. So, step

310

determines which sub-codebook to search to quantize an LSF vector.

In step

315

, the vector is assigned to at least one of a set of sub-codebooks based on step

310

. In the preferred embodiment, the vector shown in the example in steps

305

−

310

is assigned to a sub-codebook where “LSF

2

−LSF

1

” is the smallest difference between LSFs.

In step

320

, the vector is compared with a plurality of vectors representing the at least one sub-codebook. In the preferred embodiment, the vector is compared to each one of the plurality of vectors in the sub-codebook to determine which one is perceptually closest to the vector. In a preferred embodiment, the comparison between vectors is determined by performing a perceptual distance measure, for example, a Euclidean distance, Itakura's likelihood ratio, or a weighted Euclidean distance where the distance between lower order LSFs is given more weight than an error between higher order LSFs.

In step

325

, the one vector from the sub-codebook is substituted for the vector. In the preferred embodiment, the vector from the sub-codebook having the smallest perceptual distance (i.e., closest match) from the vector is substituted for the vector. Preferably, when a vector is substituted for another vector, an index into the sub-codebook identifies the vector from the sub-codebook. The index is preferably communicated in a system in lieu of communicating the vector from the sub-codebook. Method

300

then ends

330

.

In a preferred embodiment of the present invention, methods

200

and

300

are applied to a full search VQ, a multi-stage VQ, and a split VQ. Applying methods

200

and

300

to each of these VQs improves the perceptual quality for the vector substituted by the quantizer and reduces the search time for the VQ.

Thus, what has been shown are a system and method for communicating a perceptually encoded speech spectrum signal in a time efficient manner. What has also been shown are a system and method which search a VQ codebook for a vector which perceptually models a speech spectrum signal. What has also been shown are a system and method which improve the speed for searching VQ codebook. Also shown are a system and method which efficiently determine a vector to perceptually model a speech signal.

Claims

1. A system for communicating an encoded speech signal comprising:a transmitter for generating a perceptually encoded speech spectrum signal; and a receiver for decoding the perceptually encoded speech spectrum signal; wherein the transmitter further includes: a speech analyzer for generating a parameterized speech signal comprised of a plurality of vectors for a codebook; and a vector quantizer for generating the perceptually encoded speech spectrum signal from the parameterized speech signal, wherein said vector quantizer performs a subtraction operation for first adjacent terms for each of the plurality of vectors for the codebook, compares results for the subtraction operation for the first adjacent terms to determine differences between the first adjacent terms for each of the plurality of vectors, assigns each of the plurality of vectors to at least one of a set of sub-codebooks based on the differences between the first adjacent terms for each of the plurality of vectors, assigns a vector to a sub-codebook based on differences between second adjacent terms for the vector, compares the vector with each of a second plurality of vectors representing the sub-codebook to determine which one of the second plurality of vectors is perceptually closest to the vector, and substitutes the one for the vector.
2. A system as claimed in claim 1, wherein the vector quantizer includes:means for receiving the parameterized speech signal; and means for generating the perceptually encoded speech spectrum signal from the parameterized speech signal.
3. A system as claimed in claim 2, wherein the means for generating the perceptually encoded speech spectrum signal is part of a full vector quantizer.
4. A system as in claim 2, wherein the means for generating the perceptually encoded speech spectrum signal is part of at least one stage of a multi-stage vector quantizer.
5. A system as claimed in claim 2, wherein the means for generating the perceptually encoded speech spectrum signal is part of a first stage of a split vector quantizer.
6. A system as claimed in claim 2, wherein the means for generating the perceptually encoded speech spectrum signal is part of a second stage of a split vector quantizer.
7. A method for communicating an encoded speech signal, the method comprising the steps of:performing a subtraction operation for first adjacent terms for each of a plurality of vectors for a codebook; comparing results for the subtraction operation for the first adjacent terms to determine differences between the first adjacent terms for each of the plurality of vectors; assigning each of the plurality of vectors to at least one of a set of sub-codebooks based on the differences between the first adjacent terms for each of the plurality of vectors, assigning a vector to a sub-codebook based on differences between second adjacent terms for the vector; comparing the vector with each of a second plurality of vectors representing the sub-codebook to determine which one of the second plurality of vectors is perceptually closest to the vector; and substituting the one for the vector.
8. A method as claimed in claim 7, further comprising the steps of:performing another subtraction operation for the second adjacent terms for the vector; and comparing results from the subtraction operation to determine differences between the second adjacent terms for the vector.

US Referenced Citations (6)

Number	Name	Date
4896361	Gerson	Jan 1990
5396576	Miki et al.	Mar 1995
5848387	Nishigushi et al.	Dec 1998
5987406	Honkanen et al.	Nov 1999
6018707	Nishigushi et al.	Jan 2000
6073092	Kwon	Jun 2000

Non-Patent Literature Citations (5)

Entry
An article entitled “Efficient Search and Design Procedures for Robust Multi-Stage VQ of LPC Parameters for 4 kb/s Speech Coding” by W.P. LeBlanc, B. Bhattacharya, S.A. Mahmoud and V. Cuperman, from the IEEE Transactions On Speech And Audio Processing, vol. 1, No. 4, Oct. 1993.
An article entitled “Spectral Coding By Fast Vector Quantization” by Erik Agrell, Department of Information Theory, Chalmers University of Technology, S-412 96 Goteborg, Sweden.
An article entitled “Vector/Matrix Quantization For Narrow-Bandwidth Digital Speech Compression” by David Y. Wong, from Signal Technology, Inc. Goleta, CA.
An article entitled “High-Quality 800-B/S Voice Processing Algorithm”, by G.S. Kang and L.J. Fransen from Naval Research Lab., Washington, D.C.
An article entitled “Single Stage Spectral Quantization at 20 bits” by Per Hedelin from the Department of Information Theory, Chalmers University of Technology, S-41296 Goteborg, Sweden.

System and method for communicating a perceptually encoded speech spectrum signal

Information

Patent Number

Date Filed

Date Issued

Inventors

Original Assignees

Examiners

Agents

CPC

US Classifications

Field of Search

US

International Classifications

Abstract

Description

Claims

US Referenced Citations (6)

Non-Patent Literature Citations (5)