1. Technical Field
The present invention relates generally to audio content and more specifically to transmission of audio content. Still more particularly, the present invention relates to selection of an optimized codec for audio content.
2. Description of the Related Art
Technology for transmission of audio content in conventional media (e.g., radio and television) is known in the art. In addition to these conventional media, transmission of audio content via the Internet is quickly growing in popularity. With each conventional media, audio transmission is constrained by two key variables/parameters: (1) available bandwidth; and (2) quality of encoding/decoding (codec) processing. Selected codec is often influenced by the desire to minimize the bandwidth required for transmission while maximizing the quality of the transmitted content.
Available bandwidth is typically a static quantity, and codecs are designed to transmit content within pre-set maximum bandwidths. Thus, for the most part, codec quality differentiates the transmission and is the primary consideration influencing the types of devices (encoders, decoders, transmitters, etc.) and processes utilized at the broadcasting and receiving ends of the transmission. Codec is itself constrained by the type/makeup of content being transmitted. In audio content, for example, there are two distinct types of data signals, traditional voice signal (human voices) and non-voice signals, such as music/musical instruments, etc. While it is likely that audio content may consists primarily of one type of signal, it is quite common for audio content to include both types of signals within a single audio stream. Notably, from a codec analysis, each signal type has distinct characteristics/qualities that respond better to specific codec processing. Also, codec processing utilized for voice signals may not be appropriate (i.e., less ideal) for non-voice signals.
Thus, several different types of audio encoders and decoders have been created, some of which are designed to support a preferred type of audio content. One commonly utilized group of codec devices are the “Ogg” family of encoders and decoders (within the Ogg container). The Ogg container is able to encapsulate arbitrary audio codecs. The “Ogg Vorbis” audio codec performs well and is commonly utilized for general purpose audio content, including both voice and music. Features of Ogg Vorbis codec are described at world-wide web site “xiph.org/ogg/vorbis/”. “Ogg Speex”, in contrast, is optimized for the human voice alone and does not perform well for general purpose audio content. When encoding only voice content, Ogg Speex codec provides the best codec processing from a quality and bandwidth standpoint. Ogg Speex codec is described at world-wide web site “speex.org”.
Conventional audio broadcast solutions constrains the broadcaster to only one codec per broadcast stream, even though different audio codecs perform better for different types of audio content. When selecting an audio codec for the content to be streamed over the Internet, radio stations that broadcast both talk programs and music currently select the most general purpose codec (e.g., Ogg Vorbis codec). As a result, both music and voice content is brought to a lowest common denominator of quality.
Disclosed is a method and system for separating audio content into constituent parts and processing each constituent part via a most optimal codec (among several available codecs) for that constituent part during transmission of the audio stream from a broadcast system to a receiver system. Streaming audio content is divided into its constituent packets at the broadcast system. The broadcast system is configured with multiple codec pairs that each process a copy of the packets within an audio stream and forwards the processed (encoded and decoded) packets to a comparator. The comparator first determines the quality of each processed packet and then selects the most optimal codec on a packet-by-packet basis by comparing codec quality against required bandwidth.
A copy of the encoded packet from the encoder of the codec devices determined to provide the most optimal codec is queued for transmission to the receiver system. The receiver system includes a set of similar decoders to the decoders of the codec devices at the broadcast system. Packets are routed to their respective decoder corresponding to the encoder of the most optimal codec devices. The packets are then reassembled into the audio content at the receiver system.
The above as well as additional objectives, features, and advantages of the present invention will become apparent in the following detailed written description.
The invention itself, as well as a preferred mode of use, further objects, and advantages thereof, will best be understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings, wherein:
The present invention provides a method and system for separating audio content into constituent parts and processing each constituent part via a most optimal codec (among several available codecs) for that constituent part during transmission of the audio stream from a broadcast system to a receiver system. Streaming audio content is divided into its constituent packets at the broadcast system. The broadcast system is configured with multiple codec pairs that each process a copy of the packets within an audio stream and forwards the processed (encoded and decoded) packets to a comparator. The comparator first determines the quality of each processed packet and then selects the most optimal codec on a packet-by-packet basis by comparing codec quality against required bandwidth.
A copy of the encoded packet from the encoder of the codec devices determined to provide the most optimal codec is queued for transmission to the receiver system. The receiver system includes a set of similar decoders to the decoders of the codec devices at the broadcast system. Packets are routed to their respective decoder corresponding to the encoder of the most optimal codec devices. The packets are then reassembled into the audio content at the receiver system.
With reference now to the figures, and in particular to
One skilled in the art will recognize that the exemplary processing system 260 may also include additional devices, such as network connections, additional memory, additional processors, LANs, input/output lines for transferring information across a hardware channel, the Internet or an intranet, etc. Also, one skilled in the art will recognize that the specific configuration of components is not meant to imply any limitations on the actual system utilized to perform the processes of the invention. Processing system 260 may be provided as a system on a chip (SoC) or as a simple set of logic components that together provide the functionality required for processing the audio content. Notably, in addition to the above described hardware components, processing system 260 also comprises software components that enable the various functions provided by the invention. Thus, in one embodiment, a codec-and-compare module/utility may be provided as program application/code within processing system 260.
Packets from the encoders are then sent through associated decoders 210, where the encoded packets are immediately decoded, as indicated at block 306. These decoded packets from each decoder are then forwarded to packet comparator 220 at block 308. The packets' headers are still tagged with the respective EIDs. Packet comparator 220 receives as input a copy of the original uncompressed audio content (packets) from the audio source 215 and each of the compressed then decompressed (codec) audio streams (packets) from each of the decoders 210. At block 310, packet comparator 220 compares each corresponding packet from the audio decoders 210 against the original packet from the uncompressed audio stream. During this comparison, packet comparator 220 utilizes known comparative analysis to determine which of the streams is closest to the original audio content, and packet comparator calculates a quality rating from each packet/codec at block 312.
Various techniques exist that can be employed to quantify the degree to which an encoded stream resembles the original stream. In one implementation, software coded models of human perception are to perform objective measurements on audio quality. The techniques involved, as well as the development of these software models, are described at world-wide web (www) site www.psytech nics.com/papers, relevant content of which is incorporated herein by reference.
In another embodiment, the encoded packets are compared not only to the original packets, but also to a predefined, arbitrary model of human audio perception. For example, the model may define audio that is “low/unacceptable quality” as audio having certain characteristics (patterns, data conformancies, spikes, etc.) that are illustrative of audio that may sound metallic, garbled, etc. In one implementation, the present analysis comparing the audio content against the arbitrary model is completed entirely independent of the source audio packets. However, in a second implementation, the present analysis is completed in conjunction with the analysis utilizing the original packets as a way to “calibrate” the two analysis/algorithms.
Returning to
Output signal 222 operates as a select signal for packet MUX 230, which receives as input an encoded packet from each of the audio encoders 205 encapsulated with a particular EID. As shown at block 318, the packet produced by the encoder with the selected EID is chosen from among the other corresponding packets. At block 320, the selected packet is added to an output queue 240 for broadcast over the Internet as a packet within an Internet broadcast stream 250.
Each client system has a plurality of decoders 410, which, according to the invention, are similar to decoders 210 of the broadcast module. Each decoder is associated (in codec processing terms) to one of the encoders 205 that encodes the audio content. Coupled to the output of each decoder 410 is an output device 415 that outputs the transmitted codec version of the original audio content.
According to the illustrative embodiment, the comparison is performed on a packet-by-packet basis, and the broadcast source selects the codec that performs the best for the audio in a given packet. However, while the illustrative embodiment is described on a packet-by-packet basis, other types of analysis are contemplated, particularly for audio codec formats that are not structured as individual packets. For example, when dealing specifically with the Ogg container format, consideration is given to the structure of the packet streams. Logical Ogg streams consist of distinct pages of data.
Since audio streams switch between voice and music relatively infrequently, the broadcaster's comparator operates in a stateful manner most of the times. Notably, in this embodiment, if the optimum codec switches too rapidly, then bandwidth is lost due to the overhead for these codec ID pages.
Beginning at block 602, a first codec is selected for the first page (or group of pages) based on the analysis at the comparator. As shown at block 604, the comparator evaluates entire sequences of pages and compares the overall quality between sequences encoded by different codecs. A determination made at block 606 whether the currently selected broadcast codec performs worse (within a given time frame) than another identified codec. If no other codec performs better than the currently selected codec, then the codec continues to be utilized as shown at block 608. However, if the currently selected broadcast codec performs worse than another codec for over a pre-determined, extended period of time (pre-selected during design/configuration of the system), the broadcast system switches the codec used for the pages to the other, better-performing codec, as indicated at block 610.
As a final matter, it is important that while an illustrative embodiment of the present invention has been, and will continue to be, described in the context of a fully functional computer system with installed management software, those skilled in the art will appreciate that the software aspects of an illustrative embodiment of the present invention are capable of being distributed as a program product in a variety of forms, and that an illustrative embodiment of the present invention applies equally regardless of the particular type of signal bearing media used to actually carry out the distribution. Examples of signal bearing media include recordable type media such as floppy disks, hard disk drives, CD ROMs, and transmission type media such as digital and analogue communication links.
While the invention has been particularly shown and described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention. For example, while the described embodiments of the invention refer specifically to audio content and audio frames, the present invention finds applicability to any media that is transmitted from a source to a destination and which may utilize different codecs, which each yield different transmission quality and bandwidth.