1. Field of the Invention
The present invention relates generally to speech and audio signal processing. More particularly, the present invention relates to multiple channel speech and audio signal processing.
2. Related Art
In a conventional voice-over-packet (“VoP”) system or voice over IP (“VoIP”) system, telephone conversations or analog voice may be transported over the local loop or the public switched telephone network (“PSTN”) to the central office (“CO”), where speech is digitized according to an existing protocol, such as G.711. From the CO, the digitized speech is transported to a gateway device at the edge of the packet-based network. The gateway device receives the digital speech and packetizes it. The gateway device can combine G.711 samples into a packet, or use any other compressing scheme. Next, the packetized data is transmitted over the packet network, such as the Internet, for reception by a remote gateway device and conversion back to analog voice in the reverse manner as described above.
For purposes of this application, the terms “speech coder” or “speech processor” will generally be used to describe the operation of a device that is capable of encoding speech for transmission over a packet-based network and/or decoding encoded speech received over the packet-based network. As noted above, the speech coder or speech processor may be implemented in a gateway device for conversion of speech samples into a packetized form that can be transmitted over a packet network and/or conversion of the packetized speech into speech samples.
A speech processor can be configured to handle the speech coding of multiple channels. Thus, input speech signal frames from multiple channels can be processed by the speech processor. With variable-rate codecs (coder-decoder), input speech signal frames are typically processed by adapting the bit-rate to the amount of information carried by the input speech signal frame, and may include a single-rate codec that uses discontinuous transmission (“DTX”). This variable bit-rate is associated with a variable processing complexity or coding algorithm complexity. In general, different bit-rates vary in complexity. Increased complexity corresponds to increased processing requirements. Conventional speech processors, however, inefficiently allocate its processing power. For example, in order to safeguard against exceeding their available computation power, conventional speech processors support a maximum channel density according to a worst-case definition, e.g., by assuming that the input speech signal frame for each channel will be processed with the highest complexity. As a consequence of this inefficient allocation of processing power, the price per port of such speech processors are significantly increased, which is undesirable.
Accordingly, there is a strong need in the art for a signal processing apparatus and method which provides efficient allocation of speech processing power.
In accordance with the purposes of the present invention as broadly described herein, there is provided a multi-channel speech processor and method with increased channel density. The present invention resolves the need in the art for a signal processing apparatus and method which provides efficient allocation of speech processing power.
In one exemplary embodiment of the present invention, a multi-channel speech processor comprises a controller capable of interfacing with a plurality of channels, a memory coupled to the controller configured to store speech signal process time values, and at least one signal processing unit coupled to the controller. Typically, the multi-channel speech processor supports a plurality of bit-rates and has a maximum execution time for processing all frames, one channel at a time, by processing a single frame from each of the plurality of channels.
In accordance with the invention, the signal processing unit is configured to encode each of the single frames from each of the plurality of channels, one channel at a time, to generate encoded frames until the maximum execution time elapses or is about to elapse. The encoded frames are then transmitted by the controller. The controller is further configured to transmit a pre-determined frame for each of the plurality of channels not processed during the encoding step, due to the maximum execution time elapsing or being about to elapse, such that the predetermined frame causes a decoder which receives the predetermined frame to generate a frame erase frame.
The predetermined frame may, for example, be a frame erase packet, an illegal packet or a blank frame, such that the predetermined frame is processed as a frame erasure by the decoder upon receipt.
These and other aspects of the present invention will become apparent with further reference to the drawings and specification, which follow. It is intended that all such additional systems, methods, features and advantages be included within this description, be within the scope of the present invention, and be protected by the accompanying claims.
The features and advantages of the present invention will become more readily apparent to those ordinarily skilled in the art after reviewing the following detailed description and accompanying drawings, wherein:
The present invention may be described herein in terms of functional block components and various processing steps. It should be appreciated that such functional blocks may be realized by any number of hardware components and/or software components configured to perform the specified functions. For example, the present invention may employ various integrated circuit components, e.g., memory elements, digital signal processing elements, logic elements, and the like, which may carry out a variety of functions under the control of one or more microprocessors or other control devices. Further, it should be noted that the present invention may employ any number of conventional techniques for data transmission, signaling, signal processing and conditioning, speech coding and decoding and the like. Such general techniques that may be known to those skilled in the art are not described in detail herein.
It should be appreciated that the particular implementations shown and described herein are merely exemplary and are not intended to limit the scope of the present invention in any way. For example, the present invention may be implemented in a number of communication systems arrangements, including wired and/or wireless system arrangements. For the sake of brevity, conventional data transmission, speech encoding, speech decoding, signaling and signal processing and other functional aspects of the data communication system (and components of the individual operating components of the system) may not be described in detail herein. Furthermore, the connecting lines shown in the various figures contained herein are intended to represent exemplary functional relationships and/or physical couplings between the various elements. It should be noted that many alternative or additional functional relationships or physical connections may be present in a practical communication system.
Speech processor 108 of gateway 106 converts voice information of participants 104 of network 102 into a packetized form that can be transmitted to the other packet networks 110. A gateway is a system which may be placed at the edge of the network in a central office or local switch (e.g., one associated with a public branch exchange), or the like. It is noted that in addition to speech encoding and decoding, the gateway performs various functions of receiving and transmitting information (speech samples) from the network 102, and receiving and transmitting information (speech packets) from the packet network (e.g.,, padding and stripping header information). The gateway also performs data (modem, fax) transmission and receiving functionalities. It will be appreciated that the present invention can be implemented in conjunction with a variety of gateway designs. A corresponding gateway and a speech processor (not shown) might also be associated with each of the other networks 110, and their operation is substantially the same manner as described herein for gateway 106 and speech processor 108 for encoding speech information into packet data for transmission to other packet networks. It is also possible that participants 114 generate packetized speech, where no gateway or additional speech processing is needed for the communication of participants 114 to the networks 110.
Speech processor 108 of the present invention is capable of interfacing with a plurality of communication channels (e.g., 1 through n channels) via communication lines 112 for receiving speech signals as well as control signals in network 102. For example, speech signals from participants 104 are communicated via an appropriate channel for processing by speech processor 108 as described in further detail below. The output of speech processor 108 is then communicated by gateway 106 to the appropriate destination packet network.
Referring now to
Controller 220 comprises a processor, such as an ARM® microprocessor, for example. In certain embodiments, a plurality of controllers 220 may be used to enhance multi-channel speech processor's 208 performance. Similarly, a plurality of SPUs 222 may be used to provide increased performance and/or channel density of multi-channel speech processor 208.
Memory 225 stores information accessed by controller 220. In particular, memory 225 stores speech processing time values which are used to calculate whether a maximum execution time has been reached as described more fully below. An illustration for carrying out this calculation is described more fully below in conjunction with
It is noted that the arrangement of multi-channel speech processor 208, as depicted in
SPU 222 carries out the operation of converting data from input speech signal frames 230a, 230b, 230c and 230n of channels 224 into a packetized format using one of the coding rates of a speech codec. For example, SPU 222 may use one of a variable rate codec to convert input speech signal frames 230a, 230b, 230c and 230n received from controller 220 via line 238 into encoded speech packets 234a, 234b, 234c and 234n, which are transmitted to controller 220 via line 240. Any suitable algorithm may be used for determining which coding rate SPU 222 uses for this encoding process. For example, according to one exemplary implementation, the bit-rate used to code input speech signal frames 230a, 230b, 230c and 230n is related to the amount of information carried by input speech signal frames 230a, 230b, 230c and 230n.
Referring now to
Certain details and features have been left out of flow chart 400 of
Beginning at step 402, a determination is made as to a maximum number of channels a multi-channel speech processor is capable of supporting based on a worst-case definition. As discussed above, the maximum number of channels supported according to a worst-case definition is calculated by dividing the maximum MIPS (million instructions per second) of the speech processor by the maximum algorithm complexity path. By way of illustration, the maximum number of channels according to a worst-case definition for multi-channel speech processor 208 of
At decision step 406, a determination is made as to whether a probability of error based on the potential number of channels supported is greater than a predetermined threshold. This probability of error corresponds to the likelihood that the total complexity of the channels will be higher than the maximum MIPS of the speech processor taking into account that in a multi-channel configuration, the probability that all the channels at a given time require the maximum processing complexity is very low. The predetermined threshold can be set such that the QoS requirements are satisfied for a given application. By way of illustration, a mobile telephone application typically experiences 1-5% frame error rate between a source device and a destination device. In a case where the predetermined threshold is set to less than or equal to the 1-5% frame error rate for a mobile telephone application, customers rarely, if ever, will realize any degradation in QoS. According to another embodiment, the predetermined threshold can be set to a fixed value such as (10−3/(N−M)), where N is maximum number of channels that can be processed and M is the number of channels that cannot be processed.
If, at step 406, it is determined that the probability of error based on the potential number of channels supported is greater than the predetermined threshold, step 408 is carried out. Otherwise, the potential number of channels supported is increased at step 410, and decision step 406 is repeated.
At step 408, the potential number of channels supported is decreased by one channel, and at step 412, the actual number of channels supported is set to the adjusted potential number of channels supported. Referring to multi-channel speech processor 208 of
Thus, a speech processor configured in accordance with flow chart 400 results in significantly improved efficiency, by increasing the channel density supported by the multi-channel speech processor. More particularly, the method for increasing channel density in a multi-channel speech processor as outlined by flow chart 400 takes into account the fact that the probability that all the channels at a given time require the maximum processing complexity is very low. As a result, SPU 222 is “overdriven” by controller 220 such that SPU 222 is able to process additional channels beyond the maximum number of channels supported according to a worst-case definition, thereby allowing SPU 222 to process additional input speech signal frames where otherwise SPU 222 would remain idle. Because the calculation as set forth in flow chart 400 results in a probability of error that is within predetermined thresholds, QoS requirements can be satisfied while supporting a greater number of channels. As a further benefit, the price per port of the multi-channel speech processor configured in this manner is significantly decreased.
Referring next to
Beginning at step 502, the total execution time is reset by CDM 228. Typically the total execution time is reset during startup or reset, and after processing each set of input speech signal frames 230a, 230b, 230c and 230n of channels 224. The total execution time is used to record the amount of time consumed for processing input speech signal frames 230a, 230b, 230c and 230n in the current set of frames.
At step 504, CDM 228 receives the first/next input speech signal frame via input line 232a, 232b, 232c or 232n. At step 506, the input speech signal frame received during step 504 is transmitted to SPU 222 for processing via line 238. CDM 228 receives the encoded speech packet from SPU 222 via line 240. At step 508, CDM 228 measures the time consumed by SPU 222 to process the input speech signal frame, and transmits the encoded speech packet via respective output line 236a, 236b, 236c or 236n.
At step 510, the time to process the input speech signal frame measured during step 508 is added to the total execution time for the current set of frames. At decision step 512, a determination is made as to whether the total execution time for the current set of frames has reached or exceeded the maximum execution time for the multi-channel speech processor. If the total execution time for the current set of frames has reached or exceeded the maximum execution time for the multi-channel speech processor, step 516 is then carried out. Otherwise, decision step 514 is then carried out.
At decision step 514, a determination is made as to whether all input speech signal frames 230a, 230b, 230c and 230d of channels 224 have been processed. If not, steps 504 through 512 are repeated for processing the next input speech signal frame. Otherwise, the next set of frames is processed, and step 502 is repeated.
At step 516, the total execution time for the current set of frames has exceeded the maximum execution time for the multi-channel speech processor. This situation may arise, for example, when a large number of high complexity frames were processed in the current set of frames. As discussed above, because the likelihood of this situation occurring is low and within QoS requirements, a certain number of frame errors is determined to be acceptable. As a result, the remaining input speech signal frames in the current set of frames which have not been processed by SPU 222 are not processed by SPU 222. Instead, CDM 228 processes the remaining input speech frames by transmitting a frame erase packet for each of the remaining input speech frames which have not been processed by SPU 222. This frame erase packet is transmitted via corresponding output lines 236a, 236b, 236c and 236n, and is formatted so that upon receipt by a destination device, the destination device processes the frame erase packet using conventional frame erase processes, e.g., such as when a frame error occurs during conventional operation. The frame erase packet can be formatted in any manner to achieve this result, including formatting the frame erase packet in way which violates encoding rules, such as an illegal packet or a blank frame, for example. Step 502 is then repeated to process the next set of frames.
In processing each set of frames as described above according to flow chart 500, CDM 228 may further employ an algorithm for determining the order in which frames 230a, 230b, 230c and 230n of channels 224 are processed. For examples, CDM 228 may employ a round-robin ordering scheme, e.g., in groups of frames, so that likelihood that the same channel(s) as the previous frame will be processed as a frame erase packet during step 516 is further reduced. In this way, frame erase processing (step 516) can be evenly distributed among channels 224.
The methods and systems presented above may reside in software, hardware, or firmware on the device, which can be implemented on a microprocessor, digital speech processor, application specific IC, or field programmable gate array (“FPGA”), or any combination thereof, without departing from the spirit of the invention. Furthermore, the present invention may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive.
Number | Date | Country | |
---|---|---|---|
Parent | 10464307 | Jun 2003 | US |
Child | 11077809 | Mar 2005 | US |