This invention relates to a system and methods for encoding and decoding video signals or files from a video transport stream or raw video data file, respectively, into a constant bit rate (CBR) high level MPEG-2 ISO/IEC compliant transport stream wherein the CBR is maintained for each processed frame in a video sequence.
The challenges created by the ever evolving video encoding and transport standards force new generations of video equipment that customers have to manage, control and continue to invest in. Expensive equipment purchased by video product manufacturers such as a professional HD camera manufacturer has to be removed and replaced by equipment built for new standards. To manage in this environment advanced but economical video compression techniques are required to store or transmit video. Furthermore, a dynamic platform is required to accommodate the ever evolving standards in order to reduce equipment churn.
Conventional approaches require complex ASICS or arrays of DSPs to manage the intensive signal processing which reduces flexibility, comprises quality and adds non-recurring engineering costs inherent in ASIC production. What is needed is high performance, high speed, low cost high speed hardware platform in combination with software programmability so that future video signal processing standards may be incorporated into the platform as those standards evolve.
A state of art constant bit rate (CBR) encoder provides average constant bits over time. The MPEG-2 encoder of the present invention compresses every input frame with MPEG-2 intra coding with group of picture (GOP) size equal to 1 (one). The present invention provides not only average constant bits over time, but an exactly same number of output bits for each frame. The output bitstream compressed by the MPEG-2 encoder can easily accept inserted, deleted or replaced content of a given frame at any position in the bitstream without uncompressing the whole bitstream, the modified bitstream remaining MPEG-2 compliant. A strict frame CBR compliant MPEG-2 encoder has wide applicability in professional video program editing or digital Television standard, for instance, in manipulating D-10 bit streams as described in SMPTE 356M-2001, “SMPTE Standard for Television—Type D-10 Stream Specifications—MPEG-2 4:2:2P@ML for 525/60 and 625/50”, Aug. 23, 2001.
U.S. Pat. No. 7,317,839 entitled “Chroma Motion Vector Derivation for Interlaced Forward-Predicted Fields” to Holcomb discloses a digital video bitstream producing method for computer by outputting encoded video data & controls to control post-processing filtering video data after decoding.
U.S. Patent Publication No. 2002/0041632 entitled “Picture Decoding Method and Apparatus” to Sato, et al. discloses an MPEG decoder for digital television broadcasting that has activity compensation for reverse orthogonal transformation image based on reference image.
U.S. Pat. No. 6,434,196 entitled “Method and Apparatus for Encoding Video Information” to Sethuraman, et al. discloses a video information encoding system for communication and compression system that employs Micro-block sectioning.
U.S. Pat. No. 5,973,740 entitled “Multi-Format Reduced Memory Video Decoder with Adjustable Polyphase Expansion Filter” to Hrusecky discloses expanding decimated macroblock data in digital video decoding system with coding for both field and frame structured coding.
The present invention addresses the need for a programmable video signal processor through a combination of a hardware and dynamic software platform for video compression and image processing suitable for broadcast level high definition (HD) video encoding, decoding and imaging. The dynamic software platform is implemented on a low cost multicore DSP.
The present invention is a programmable energy efficient codec system with sufficient flexibility to provide encoding and decoding functions in a plurality of application environments.
In one application of the present invention, a camera control system for an HD-Camera is envisioned wherein a first embodiment hosted codec subsystem encodes raw uncompressed HD-SDI video signals from the camera's optical subsystem into an MPEG-2 transport stream. A host system in the HD-camera stores the MPEG-2 transport stream on storage media onboard the HD-camera. The host system also exchanges status and control with the first embodiment codec subsystem. Raw uncompressed audio and video files may be passed through the codec susbsystem and stored by host system for subsequent processing. The codec susbsystem may be programmed to encode or decode a plurality of video and audio format as required by multiple HD-camera manufacturers.
In a second application of the present invention, a stand alone encoder system and stand alone decoder system is assembled into a network configuration suitable for studio production system allowing for remote display and editing of HD-SDI video. The stand alone encoder and decoder utilize a second embodiment codec subsystem. At least one of a plurality of HD-SDI transport streams generated from a plurality of HD-cameras is encoded into an MPEG-2 transport stream which is output by the stand alone encoder into a DVB-ASI signal and a TS over IP packet stream, the latter being suitable for MPEG-2 transport over a routed IP network. The stand alone decoder accepts MPEG-2 TS over IP packet streams from a routed IP network and decodes them into uncompressed HD-SDI transport stream useful for display. The MPEG-2 transport stream arriving at the stand alone decoder may be generated by a stand alone encoder on site to the studio production. A local workstation may accept DVB-ASI signals from the encoder for local video editing and storage. A remote workstation may accept TS/IP MPEG-2 files for remote video editing and storage. The codec susbsystem may be programmed to encode or decode a plurality of video and audio format as required by multiple studio production houses.
The embodiments described have hardware systems based on a field programmable set of hardware including a DSP, a HD-SDI and SD-SDI multiplexer/demultiplexer, an MPEG-2 compatible transport stream multiplexer/demultiplexer, a boot controller, and a set of external interface controllers. In one embodiment of the codec system, the set of external interface controllers includes a PCI controller for a PCI bus interface. In a second embodiment codec system, the set of external interface controllers includes a panel interface controller for accepting input from a keypad, displaying output on a LCD display screen and communicating alarm information through a digital interface.
The software framework of the many embodiments of the present invention has the capability to intelligently manage system power consumption through a systems energy efficiency manager (SEEM) kernel which is programmed to interact with various software modules, including modules that can adaptively control system voltage. The SEEM kernel monitors required speed and required system voltage while in different operational modes to ensure that required speed and voltage are maintained at minimum necessary levels to accomplish required operations. The SEEM kernel enables dramatic power reduction over and above efficient power designs chosen in the hardware systems architecture level, algorithmic level, chip architecture level, transistor level and silicon level optimizations.
To accommodate the SEEM kernel and to allow for ease of system update and upgrade, and ease of development of a variety of different systems or encoder/decoder algorithms, the DSP based software framework utilizes a dual operating system environment to run system level operations on a system OS and to run computational encoder/decoder level operations on a DSP OS. A system scheduler manages the operations between the two OS environments. A set of system library interfaces are utilized for external interface functions and communications to peripherals allowing for a set of standard APIs to be available to host systems when the codec is in a hosted environment. A set of DSP library interfaces allow for novel DSP intensive encoder functions relating to operations such as discrete cosine transformations, motion estimation, quantization matrix manipulations, variable length encoding functions and other compression functions.
In another aspect of the invention, algorithms used for encoding in the codec system include at least the function of performing discrete cosine transforms (DCT), the function of applying a quantization matrix (Q) to the DCT signal, the function of applying a variable length encoding (VLC) to the quantized signal, and formatting the output signal into an elementary transport stream (TS).
A constant bit stream is accomplished through a rate control function to adjust quantization matrix scale factors on-the-fly and per image slice. The DCT function includes the ability to perform a prediction of quantization parameters which are fed forward to rate control function. Improvements in the encoding function and rate control function in general may be made over time and incorporated through program updates via the flash memory.
These and other inventive aspects will be described in the detailed description below.
The disclosed inventions will be described with reference to the accompanying drawings, which describe important sample embodiments of the invention and which are incorporated in the specification hereof by reference, wherein:
a and 1b are schematic diagrams of a HD-Camera codec system application in the first embodiment.
The flexible video processor of the present invention may be implemented in a variety of embodiments in different application environments incorporating hardware platforms suitable to the environment. Two relevant application environments including high definition camera hardware, high definition video production are described along with corresponding embodiments of the flexible video processor. Many other applications and embodiments of the present invention may be conceived so that the inventive ideas disclosed in relation to the given applications are not to be construed as limiting the invention.
A first application of the present invention is in a high definition video production camera as shown in
Codec susbsystem 5 interfaces to host subsystem 26 through PCI bus 27 to allow for control signals 44 and status signals 45 to flow between the two subsystems. In the encoder mode of operation, the input video/audio stream for codec subsystem 5 is demultiplexed and encoded from uncompressed HD-SDI signal 8. A MPEG-2 transport stream (TS) encoded by codec subsystem 5 is sent to DVB-ASI interface 12 and also forms compressed audio/video files 18 with record headers documenting format information and content metadata if required.
Uncompressed HD-SDI signal 8 may also be demultiplexed and stored as raw YUV video data files 17 and raw audio data files 19 in storage memory 28. Uncompressed raw data files allow for future editing and processing without loss of information that occurs during the encoding and compression processes. Codec subsystem 5 may playback raw video and audio data files to the HD-SDI interface 11 and AES/EBU port 13, respectively.
Codec subsystem 5 is implemented on a digital signal processor and may be programmed to support a variety of encoding, decoding and compression algorithms.
The HD camera application illustrates an example of a first embodiment of the present invention that utilizes a codec system, host system and PCI bus between the two systems to perform video encoding and decoding operations. The host system need not be embedded as in the HD-camera 1, but may be a computer system wherein the codec subsystem may be a physical PCI card connected to the computer. Novel encoding and decoding operations of the present invention will be described in greater detail. A commercial example of the first embodiment codec system is the HCE1601 from Ambrado, Inc.
Moving to the block diagram of
DSP microprocessor 301 is a physical integrated circuit with CPU, I/O and digital signal processing hardware onboard. A suitable component for DSP microprocessor 301 having sufficient processing power to successfully implement the embodiments of the present invention is the SP16 Storm-1 SoC processor from Stream Processors Inc.
MMU 308 provides access to dynamic random access memory (DRAM 318) for implementing video data storage buffers utilized by SDI mux/demux 306 and TS mux/demux 310 for storing input and output video and audio data. SDI mux/demux 306 has external I/O ports HD-SDI port 321a, HD-SDI port 321b, HD-SDI loopback port 321c, and has internal I/O connections to DRAM 318 through MMU 308 including embedded video I/O 321e and embedded metadata I/O 321f. SDI-mux/demux may stream digital audio to and from audio crossconnect via digital audio I/O 321d. A set of external AES/EBU audio ports 323a-d is also connected to audio crossconnect 312 which functions to select from the signal on audio ports 323a-d or the signal on digital audio I/O port 321d for streaming to DRAM 318 through MMU 308 on embedded audio connection 323b.
Transport stream mux/demux 310 has DVB-ASI interfaces 322a, 322b and DVB-ASI loopback interface 322c. TS mux/demux 310 may also generate or accept TS over IP data packets via 10/100/1000 Base-Tx Ethernet port 322d. TX mux/demux 310 conveys MPEG-2 transport streams in network or transmission applications. MPEG-2 video data streams may be stored and retrieved by accessing DRAM 318 through MMU 308.
MMU 308, SDI mux/demux 306, TS mux/demux 310 and audio crossconnect 312 functions are preferably implemented in programmable hardware such as a field programmable gate array (FPGA). Encoder and decoders are implemented in reprogrammable software running on DSP microprocessor 301. Boot controller 320 and PCI controller 325 are implemented as system control programs running on DSP microprocessor 301.
To implement an encoder, DSP microprocessor 301 operates programmed instructions for encoding and compression of an SMPTE 292M standard HD-SDI transport stream into a MPEG-2 transport stream. SDI mu/demux 306 is programmed to operate as a SDI demultiplexer on input transport streams from HD-SDI I/O ports 321a and 321b with output embedded video and audio streams placed in video and audio data buffers implemented in DRAM 318, TS mux/demux 310 is programmed to operate as a TS multiplexer, taking its input audio and video data stream from DRAM 318 and streaming its multiplexed transport stream selectably to DVB-ASI port 322a, DVB-ASI port 322b or TS over IP port 322d. Video and audio encoder running on DSP microprocessor 301 accesses stored video and audio data streams in DRAM 318 to perform the encoding and compression functions.
To implement a decoder, DSP microprocessor 301 operates programmed instructions for decompression and decoding of a MPEG-2 transport stream into an SMPTE 292M HD-SDI transport stream. SDI mux/demux 306 is programmed to operate as a SDI multiplexer with output transport streams sent to HD-SDI I/O ports 321a and 321b with input embedded video and audio streams captured from video and audio data buffers implemented in DRAM 318, TS mux/demux 310 is programmed to operate as a TS demultiplexer, sending its output audio and video data stream to DRAM 318 and streaming its input transport stream selectably from DVB-ASI port 322a, DVB-ASI port 322b or TS over IP port 322d. Video and audio decoder running on DSP microprocessor 301 accesses stored video and audio data streams in DRAM 318 to perform the decompression and decoding functions.
In the preferred embodiment DRAM 318 is shared between a host system connected through PCI bus 326 and codec system 300.
The hardware platform being centered around a DSP processing engine are flexible and extendable to input interfaces and bit rates, video framing formats, compression methods, file storage standards, output interfaces and bit rates and to given user requirements per a given deployed environment so that many further embodiments are envisioned by adjusting the firmware or software programs residing on either given hardware platform.
The software framework and programmable aspects of the present invention are explained with the help of
Software framework 100 has the capability to intelligently manage system power consumption through systems energy efficiency manager (SEEM) kernel 115 which is programmed to interact with various software modules, including modules that can adaptively control system voltage. SEEM kernel 115 monitors required speed and required system voltage while in different operational modes to ensure that required speed and voltage are maintained at minimum necessary levels to accomplish required operations. SEEM kernel 115 enables dramatic power reduction over and above efficient power designs chosen in the hardware systems architecture level, algorithmic level, chip architecture level, transistor level and silicon level optimizations.
System OS 106 further interfaces to a set of hardware drivers 103 and a set of hardware control APIs 105 and forms a platform that utilizes systems library module 107 along with the communications and peripheral functions module 109 to handle the system work load. Systems library module 107 contains library interfaces for functions such as video device drivers and audio device drivers while communications and peripheral functions module 109 contains functions such as device drivers for RS232 interfaces and panel control functions if they are required. System OS 106 also handles the system function of servicing the host interface in a hosted environment, the host interface physically being PCI controller 325 controlling PCI bus interface 326 in first embodiment codec system 300.
DSP OS 116 handles the execution of DSP centric tasks and comprises DSP library interfaces 117, DSP intensive computation and data flow 118, and a system scheduler 119. Examples of DSP centric tasks include codec algorithmic functions and video data streaming functions. The system scheduler 119 manages thread and process execution between the two operating systems.
Software framework 100 is realized in the embodiments described herein and is named in corresponding products from Ambrado, Inc as the Energy Efficient Multimedia Processing Platform (EMP).
Codec software system of software framework 100 is organized into a set of modular components which are shown in
Examining
A host system 153 interacts with codec software system 150 via PCI bus interface 159, host system 153 comprising at least a PCI driver 175 for driving data to and from PCI bus interface 159, a user driven control application 190 for controlling codec functions, a record application 196 for recording video and audio in conjunction with codec system 150 and a playback application 197 for playing video and audio files in conjunction with codec system 150. Host system 153 is typically a computer system with attached storage media that operates programs under Microsoft Windows OS. Alternatively, the host operating system may be a Linux OS.
Systems control processor 152 operates principal system components including codec manager 161, PCI manager 171, video device driver VDD 191 and audio device driver ADD 192. Codec manager 161 is packaged as a set of methods programmed in codec control module 160. PCI manager 171 is packaged as a set of methods programmed in codec host interface module 170.
DSP control processor 154 operates a codec algorithmic subsystem CAS 165 which is a principal system component.
Shared memory 157 comprises memory containers including at least a decode FIFO stack 163 and an encode FIFO stack 164 for holding command and status data, a video input buffer 180 for holding ingress video stream data, a video output buffer 181 for holding egress video stream data, an audio input buffer 182 for holding ingress audio stream data and an audio output buffer 183 for holding egress audio stream data.
VDD 191 and ADD 192 principal components are standard in embedded processing systems, being realized by the Linux V4L2 video driver and the Linux 12S audio driver in the preferred embodiment. VDD 191 manages the video data input and output requirements of codec system 150 as required in the course of its operation, operating on the video output buffer to create egress video streams for direct video interfaces and operating on ingress video streams from direct video interfaces to store video streams in video input buffer 180. Similarly, ADD 192 handles the codec system's audio input and output requirements operating on the audio input and output buffers to store and retrieve audio streams, respectively.
PCI manager 171 communicates all codec management and control tasks between host system 153 and codec manager 161 via PCI bus interface 159 using PCI driver 172. More specifically, PCI manager 171 communicates configuration commands 173a and status responses 173b in addition to record/playback commands 174 to and from host system 153.
PCI manager 171 transfers ingress video and audio streaming data generated from host system 153 into video input buffer 180 and audio input buffer 182, respectively. It also transfers egress video and audio streaming data to host system 153 from the video output buffer 181 and audio output buffer 183, respectively.
For configuration programming, PCI manager 171 allows host system 153 to exercise broad or finely tuned control of the codec functions. With a broad control approach, host system 153 configures the codec system 150 with stored configuration groupings known as configuration sets 177 of which there are three primary types in the preferred embodiment: (a) factory default configuration, (b) default configuration and (c) current configuration and an array of user definable configuration sets. In the preferred embodiment there are sixty-four user definable configuration sets in the array. With the finely tuned control approach, host system 153 may change any of the configuration settings in the current configuration allowing for a flexible model for codec configuration management for a plurality of encoding and decoding requirements.
Codec algorithmic subsystem CAS 165 performs encoding and decoding of video and audio data. CAS 165 is made up of kernels implementing MPEG-2 encoding and decoding algorithms for both audio and video which are executed by DSP control processor 154 in conjunction with DSP engine 155 by manipulating and performing computations on the streams in the lane register files 168. CAS 165 receives its commands and responds with status data to decode FIFO stack 163 and encode FIFO stack 164.
Codec manager 161 manages user interfaces and communicates configuration and status data between the user interfaces and the other principal components of the codec system 150. System interfaces are serviced by the codec manager 161 including a command line interface (not shown) and PCI bus interface requests via PCI manager 171. Codec manager 161 is also responsible for configuration data validation such as range checking and dependency checking.
Codec manager 161 also performs hierarchical scheduling of encoding and decoding processes, ensuring that encoding and decoding processes operating on incoming video and audio streams get appropriate CPU cycles. Codec manager 161 also schedules the video and audio streams during the encoding and decoding processes. To perform these scheduling operations, Codec manager 161 communicates directly with Codec algorithmic subsystem 165. For encoding (and decoding) operations, codec manager 161 accepts configuration data from the host control application 190 (via PCI manager 171) and relays video encoding (decoding) parameters to CAS 165 using encode FIFO 164 (decode FIFO 163). Codec manager 161 also collects status updates on the operational status of CAS 165 during encoding (decoding) process phases, communicating status information to host system 153 as required. Another function of Codec manager 161 is to interact with the video input buffer 180 to keep CAS 165 input stream full and interacts with the video output buffer 181 to ensure enough output buffer storage for CAS 165 to dump processed video data without overrun.
In operation, codec system 150 follows a sequence of operational states according to the state diagram 350 of
Codec system 150 starts from the initialization state 355 while booting without any host system interaction. The system may be put into this state by sending an “INIT” command from PCI manager 171 to codec manager 161. During the initialization state the codec system boots, loading program instructions and operational data from flash memory. Once initialization is complete, codec system 150 transitions automatically to idle state 360, wherein the codec system is operational and ready for host communication. Codec manager 161 keeps the codec system in idle state 360 until a “start encode” or “start decode” command is received from the PCI manager 171. From idle state 360, the codec system may transition to either encode standby state 365 or decode standby state 380 depending upon the operational mode of the codec system being configured to encode or decode, respectively, according to the current configuration set.
Upon entering encode standby state 365, the codec system loads an encoder algorithm and is ready to begin encoding immediately upon receiving a “start encode” command from the host system via the PCI manager. When the “start encode” command is received by the codec manager, the codec system transitions from encode standby state 365 to encode running state 370. Encode standby state 365 may also transition to configuration update state 375 or to shutdown state 390 upon a configuration change request or a shutdown request from the host system, respectively. One other possible transition from encode standby state 365 is to maintenance state 395.
Encode running state 370 is a state in which the codec system, specifically the CAS 165, is actively encoding video and audio data. The only allowed transition from encode running state 370 is to encode standby state 365.
When entering decode standby state 380, the codec system loads a decoder algorithm and is ready to begin decoding immediately upon receiving a “start decode” command from the host system via the PCI manager. When the “start decode” command is received by the codec manager, the codec system transitions from decode standby state 380 to decode running state 385. Decode standby state 380 may also transition to configuration update state 375 or to shutdown state 390 upon a configuration change request or a shutdown request, respectively, from the host system. One other possible transition from decode standby state 380 is to maintenance state 395.
Decode running state 385 is a state in which the codec system, specifically the CAS 165, is actively decoding video and audio data. The only allowed transition from decode running state 385 is to decode standby state 380.
In configuration update state 375 a new configuration set is selected to be the current configuration set or the current configuration set is altered by the PCI manager. The only allowed transitions from the configuration update is to encode standby state 365 or decode standby state 380, depending upon the configuration setting.
Transitions to maintenance state 395 only arrive from encode standby state 365 or decode standby 330 when a major codec system issue fix or a software update is required. The software update process is managed by the PCI manager. The only possible transition from maintenance state 395 is to initialization state 355.
Transitions to shutdown arrive from encode state 365 or decode state 380 upon a power down request from PCI manager, wherein the codec system promptly powers down.
Energy efficiency of the codec system is managed in relation to the operational states of
SEEM kernel 115 is examined in greater detail with the help of
SEEM_init 840 is a SEEM component that runs when the system is in initialization state 355 to parse all operational parameters passed to the system and based on impending operational requirements executes the following tasks:
i. initializes the voltage for the system to commence operation
ii. initializes the requisite clock speed
iii. idles all processor resources not required
iv. powers down and turns off the clocking signals to all peripherals not required
SEEM_encstby 845 is a SEEM component executing tasks similar to SEEM_init, except that it handles these tasks as operational/parametric requirements change during the transition from encode running state 370 to encode standby state 365 and back to encode running state 365. An example of a parametric change that changes operational requirements affecting power is when the encoder mode is changed from I-frame only encoding to LGOP frame encoding. Another relevant example is in when the constant bit rate requirement is changed from one output CBR rate to a different output CBR rate.
SEEM_destby 850 is a SEEM component executing tasks similar to SEEM_init, except that it handles these tasks as operational/parametric requirements change during the transition from decode running state 385 to decode standby state 380 and back to decode running state 385.
SEEM_encrun 855 is a SEEM component executing tasks similar to SEEM_init, except that it handles these tasks dynamically as needed while the codec system is in encode running state 365. For example, while a discrete cosine transform (DCT) is being computed the processor clock speed is increased by SEEM_encrun 855. Upon completion of the DCT, the encoder algorithm moves to a data transfer intensive mode that does not require processor cycles. SEEM_encrun 855 then idles the processor by reducing its clock rate and/or voltage level.
SEEM_decrun 860 is a SEEM component executing tasks similar to SEEM_init and SEEM_encrun, handling the tasks dynamically as needed while the codec system is in decode running state 385. SEEM_shut 865 performs an energy conserving system shutdown by appropriately powering off voltages and shutting down clock domains in sequences that do not compromise the systems ability to either switch back on at a later time or respond to a sudden request to reverse the shut-down process.
Once the codec system has appropriately been initialized and configured via the PCI manager, there are two essential user modes of operation shared between the host and codec system—the record mode and the playback mode.
A second embodiment of the present invention is a production quality stand-alone codec system suitable for rack mount applications in a studio or video production environment.
SA encoder 60 functions to encodes and compress at least one of the HD-SDI signals 71 and 72 into an MPEG-2 transport stream which may be further packetized into a DVB-ASI output signal 75 or a an MPEG-2 TS over IP packet stream which is sent to IP routed network 65 for transport to other devices such as SA decoder 62 and video workstation 56. SA decoder 62 may be used to monitor the quality of the MPEG-2 encoding process by decoding the MPEG-2 TS over IP packet stream to uncompressed HD-SDI signal 73 which is available for viewing on a second HD video display monitor 53. Video workstation 56 receives routed MPEG-2 TS over IP packet streams and may by used to display, edit, store and perform other video processing functions as is known in the art of video production.
One goal of the present invention is to provide SA encoder and SA decoder devices which are customized for the needs of the specific production environment. As production environment needs vary considerably from company to company and requirements evolve rapidly with standards, a need exists for software programmable SA encoder and decoder devices allowing for rapid development and deployment cycles.
DSP microprocessor 401 is a physical integrated circuit with CPU, I/O and digital signal processing hardware onboard. A suitable component for DSP microprocessor 401 having sufficient processing power to successfully implement the embodiments of the present invention is the SP16 Storm-1 SoC processor from Stream Processors Inc.
MMU 408 provides access to dynamic random access memory (DRAM 418) for implementing video data storage buffers utilized by SDI mux/demux 406 and TS mux/demux 410 for storing input and output video and audio data. SDI mux/demux 406 has external I/O ports HD-SDI port 421a, HD-SDI port 421b, HD-SDI loopback port 421c, and has internal I/O connections to DRAM 418 through MMU 408 including embedded video I/O 421e and embedded metadata I/O 421f. SDI-mux/demux may stream digital audio to and from audio crossconnect via digital audio I/O 421d. A set of external AES/EBU audio ports 423a-d also connected to audio crossconnect 412 functions to select from the signal on audio ports 423a-d or the signal on digital audio I/O port 421d for streaming to DRAM 418 through MMU 408 on embedded audio connection 423b.
Transport stream mux/demux 410 has DVB-ASI interfaces 422a, 422b and DVB-ASI loopback interface 422c. TS mux/demux 410 may also generate or accept TS over IP data packets via 10/100/1000 Base-Tx Ethernet port 422d. TX mux/demux 410 conveys MPEG-2 transport streams in network or transmission applications. MPEG-2 video data streams may be stored and retrieved by accessing DRAM 418 through MMU 408.
MMU 408, SDI mux/demux 406, TS mux/demux 410 and audio crossconnect 412 functions are preferably implemented in programmable hardware such as a field programmable gate array (FPGA). Encoder and decoders are implemented in reprogrammable software running on DSP microprocessor 401. Boot controller 420 and panel controller 430 are implemented as system control programs running on DSP microprocessor 401.
The encoder and decoder implementations as well as the software framework for second embodiment codec system 400 are similar to the implementations and framework for first embodiment codec system 300. Software framework for the second embodiment replaces PCI manager with a panel control manager and extended codec manager for controlling alarming functions and the human interface functions: LCD panel display functions and panel control functions. Buttons on the front display panel are used to change the operational mode of second embodiment codec system, a codec manager software component being the primary system component responsible to communicate with the front panel display. Software state diagram as described for first embodiment codec system also applies to second embodiment codec system.
A picture of the encoder box 460 front and back panels are shown in
Encoder box 460 supports a real time clock to keep track of its event logs, alarms and warnings; to maintain synchronization, the encoder box has a clock reference input 453. Event log data is saved in onboard flash memory and is available for user access. Ethernet 10/100/1000 Base-tx IP management port 428 is available on rear panel 450 for remote management of encoder functions. Encoder box 460 also has debug port 429 to connect to a local interface such as an EJTAG interface for hardware debugging and has a parallel alarm port 455 for remote monitoring of alarm signals 432. For local monitoring of alarm signals 432, front panel 440 contains alarm light 446 and status light 447. Encoder box 460 and decoder box 560 are half-rack in size so two boxes can be mounted in a single slot in any desired combination, for example one encoder box 460 and one decoder box 560.
Encoder box 460 has two HD/SD SDI I/O ports 421a and 421b for uncompressed video with embedded audio. One of the two HD/SD SDI signals on HD-SDI I/O ports 421a or 421b is selected for video/audio encoding and the selected HD/SD SDI signal is then driven to HD/SD SDI loop back I/O port 421c. Additionally, 4-pairs (8-channel) of external AES/EBU input audio signals 423a are connected via rear panel BNC connectors 452a-452d. Encoder box 460 is programmed to support the generation of color bars and a 1 KHz sine test signals for video and audio processing, respectively.
For output, encoder box 460 has two DVB-ASI I/O ports 422a and 422b providing two identical outputs for transmission of the DVB-ASI compliant MPEG Transport Stream (TS). Encoder box 460 allows for transmission of MPEG-2 TS over IP through dedicated 10/100/1000 Mbps (Gigabit) Base-TX Ethernet port 428. SDI and DVB video and AES/EBU audio ports typically utilize 75-ohm BNC type connectors. The Ethernet ports typically use RJ-45 connectors.
Similar to encoder box 460, decoder box 560 has front panel and rear panel connectors and controls.
The frame based encoder of the preferred embodiment operates separately on the 8×8 luma sub-blocks 602, 603, 604 and 605, applying DCT, quantization matrix and VLC methods thereto.
Preferred encoder modes as supported in the current embodiments are shown in the table 608 of
Upon encoding each complete frame into an MPEG-2 elementary transport stream, each transport stream packet record is augmented with a record header according to the record header format shown in
Timecode 640 comprises 9 fields indicating hours, tens of hours, minutes, tens of minutes, seconds, tens of seconds, frames, tens of frames and a frame drop flag. PTS 650 has two fields containing the presentation time stamp in standard timestamp format. DTS 655 has two fields containing the decoding time stamp in standard timestamp format. Data length 660 indicates the length in bytes of the size of the packet. Video data 670 is the MPEG-2 video transport stream data.
A host software API in the context of the first embodiment codec system is specified for communications between the host and the encoder. Communications occurs by reading and writing commands and other information to specified memory locations (fields) which are shared between host and codec across the PCI bus interface. Table 700 of
The host software API may access or set encoder information. The function of reporting the current hardware and firmware revision is reported by two fields HW_rev and FW_rev as per table 710.
The host software API may read or write the operational configuration which is accomplished through a set of fields shown in table 712 as operating “Mode” field and operating “Init” field as per table 712. The operating “Mode” of the MPEG2 video encoder is set to one of four possible operating modes: mode 0 being an “idle” mode in which the encoder hardware is operating and ready for communication from the host; mode 1 being a “record from video capturing” mode wherein the encoder receives signal from an HD-SDI video stream and is capturing and encoding the video stream into the elementary transport stream; mode 2 being a “record from video YUV data file” mode wherein the encoder receives video signal from reading a YUV data file which is buffered in shared memory and encodes the file into an elementary transport stream. Operating “Init” field causes an initialization of the encoder firmware if the field value is set to ‘1’.
According to
According to
Turning now to the methods used for encoding in the codec systems of the present invention, the methods are described in the context of four processes as shown in
A constant rate bit stream is accomplished through rate control function 901 and to adjust quantization parameters on-the-fly and per image slice. MB function 902 and DCT function 903 includes the ability to perform a prediction of quantization parameters which are fed forward to rate control function 901. Improvements in the encoding function and rate control function in general may be made over time and incorporated through program updates via the flash memory and by downloading via integrated ethernet interfaces.
To achieve a constant number of output bits for every frame while maintaining high quality encoding and compression, rate control function 901 is operated by the DSP control processor in conjunction with the encoder processes consistent with SIMD structure parallel processing. The optimized bit allocation works to minimize stuffing bits. Bit allocation within a frame is controlled by the RC process which takes as its inputs: a computed complexity predictor prior to quantization and the actual bit stream bit rate after variable length encoding. The total output bits per frame are tuned by adjusting the quantization parameter (QP) for each master block within the frame according to the inputs using methodology and algorithms which are described in the methods of
A first embodiment rate control method 1000 of the present invention is shown in
The frame is then split into MBs and complexity measures are calculated in step 1010 for each MB in the frame. The MBs are further categorized into M sets in step 1012 according to the complexity measure of each MB and in step 1013, the target bits range {RT} is subdivided into a set of M target ranges {RS}. M distinct QPs are computed in step 1014 for each of the M sets in the frame, the distinct QPs forming the initial set of QPs 1021 for the MBs of the frame to be applied during quantization. Method 1000 then continues at step 1016. Complexity measures determine similarity between the current frame and previous encoded frame, for example scene change and motion complexity changes and will be described in more detail below.
If there is no scene change from the previous frame, step 1008 is performed on the current frame wherein a target range of bits, {RT} is computed for the current frame based on the actual bits generated in the previous frame. The set of QPs 1021 for the previous frame become the initial set of QPs 1021 for the current frame to be applied during quantization. Rate control method 1000 then continues at step 1016.
In step 1016, a DCT process is run on each MB to transform the MBs of the frame into the spatial frequency domain.
An algorithm 1020 combining the quantization and VLC processes is run in step 1018 on the previously transformed MBs iterating through all of the MBs in the frame. The quantization utilizes quantization parameters from the set of QPs 1021, each MB mapped to one QP in the set.
After the quantization/VLC process for the current frame is completed, step 1022 stores the set of QPs 1021 for use as an initial set of QPs for the next frame.
A check is performed in step 1024 to determine if the actual number of output bits Ro is within the required target range {RT}. If Ro is not in range, set of QPs 1021 is updated and adjusted in step 1026 wherein the set of M target ranges {RS} is further checked for the output bits in each macroblock MB in the encoded frame. Also in step 1026, the set of frame complexity measures may be computed again, as in step 1010, to determine how the set of QPs 1021 need to be adjusted to ensure the required frame rate. The set of QPs 1021 are then adjusted accordingly and as needed.
Then the method continues to perform quantization/VLC step 1018 along with steps 1022, 1024 and 1026 repeatedly until the actual output bits are within the required range {RT}.
Once Ro falls in the target range {RT}or the process times out, stuff bits are added to the encoded frame in step 1028 to bring the number of frame bits to RT.
After step 1028 the current frame is completely encoded, and the bit stream is pushed to the video output buffer in step 1030, after which the rate control method repeats at step 1004 with the next frame and continues until the video sequence of frames is completed or stopped.
In relation to first rate control method 1000, rate control function 901 of
A second embodiment rate control method of the present invention is shown in
The frame is then split into slices of MBs, each frame being constructed of a plurality of slices and each slice constructed of a set of MBs. Complexity measures are calculated in step 1050 for each MB in the frame. The MBs are further categorized into M sets in step 1052 according to the complexity measure of each MB. M distinct QPs are computed in step 1054 for each of the M sets in the frame, the distinct QPs forming the initial set of QPs 1059 for the MBs of the frame to be applied during quantization. Rate control method 1040 then continues at step 1056.
If there is no scene change from the previous frame, step 1048 is performed on the current frame wherein a target range of bits, {RT} is computed for the current frame based on the actual bits generated in the previous frame. The set of QPs 1059 for the previous frame become the initial set of QPs 1059 for the current frame to be applied during quantization. Rate control method 1040 then continues at step 1056.
In step 1056, a DCT process is run on each MB to transform the MBs of the frame into the spatial frequency domain. After the DCT process completes, complexity measures are summed in step 1057 for each slice in the frame. The slices are then prioritized into N groups in step 1058 according to the complexity sum of each group of slices, highest priority groups of slices having the largest complexity sum and lowest priority groups of slices having the smallest complexity sum. Each group of slices is allocated a target range of bits {RG}.
An algorithm 1060 combining the quantization and VLC processes is run in step 1062 on the previously transformed MBs iterating through all of the MBs in the highest priority group of slices, the quantization utilizing quantization parameters from the set of QPs 1059, each MB mapped to one QP in the set.
After the quantization/VLC process for the current group of slices is completed, step 1064 stores the set of QPs 1059 for use as an initial set of QPs for the corresponding slice of the next frame.
After encoding the current group of slices, a check is performed in step 1066 to determine if the actual number of output bits Ro is consistent with the required target range of bits {RG}. If Ro is not in the range, the set of QPs 1059 is adjusted and updated in step 1068. Also in step 1068, the set of frame complexity measures may be computed again, as in step 1050, to determine how the set of QPs 1059 need to be adjusted to ensure the required frame rate. The set of QPs 1059 are then adjusted accordingly and as needed.
The rate control method 1040 continues to perform quantization/VLC step 1062 along with steps 1064 and 1066 repeatedly for the current group of slices until the actual output bits are within the required range {RG}.
Step 1070 checks if the last group of slices has been processed and the frame is completely encoded. If the last group of slices in the frame has been processed then stuff bits are added to the encoded frame in step 1074 to bring the number of frame bits to RT.
If the frame is not completely processed in step 1070, then the next lower priority group of slices is selected in step 1072 for processing and steps 1062, 1064, 1066 and 1068 are repeated as required until all of the N groups of slices are processed.
After step 1074 the current frame is completely encoded, and the bit stream is pushed to the video output buffer in step 1080, after which the rate control method repeats at step 1044 with the next frame and continues until the video sequence of frames is completed or stopped.
In relation to second rate control method 1040, rate control function 901 of
The deviation of each MB, devMB, is used as the complexity measure in step 1010 of method 1000 and step 1050 of method 1040. MBs are divided into M groups based on the histogram of deviation of MBs in the frame. The group complexity measure in step 1057 for prioritizing the group of slices in method 1040 may use the sum of devMB for all the MBs in each slice or it may be computed as the a sum of the DCT coefficients from step 1056.
Assuming I(x, y) is the value of the luma component of pixel at (x,y), for one P×P macroblock, which includes four (P/2)×(P/2) blocks, the deviation of this macroblock is calculated according to the following equations:
In the embodiments of the present invention, FIFO frame buffers in memory are used to accept incoming frames from a video source. The encoder unloads the FIFO as the frames are encoded leaving empty frames available to accept incoming frames. A repeated encoding loop for quantization and VLC is prescribed within the rate control methods 1000 and 1040. See the steps 1021, 1018, 1022, 1024 and 1026 of method 1000 and the steps 1059, 1062, 1064, 1066 and 1068 of method 1040. The rate control methods with repeated encodings will optimize output bitstreams to have minimal stuffing bits for better quality and guarantees fixed output bits. However, the number of encoder loops should be limited, otherwise the input frame buffer queue fills and frames may be dropped, especially in the case where the incoming video source is real-time video capture.
Encoder process 1100 begins by unloading the next frame into encoder memory from a frame buffer queue in step 1105. Once loaded, a target range of bits {RT} is computed for the frame in step 1103 and the frame buffer queue is checked in step 1107 to get the number of empty input frames available for incoming video. Given the number of empty input frames, and the current frame rate, the maximum number of loops allowed for repeated encoded is estimated, MAX_LOOP. In step 1110, MAX_LOOP is compared to a pre-defined first threshold 1101. If MAX_LOOP is greater than or equal to first threshold 1101 then a low stuffing bit flag is enabled in step 1112, otherwise if MAX_LOOP is less than first threshold 1101, then low stuffing bit flag is disabled in step 1113. Encoder process 1100 continues with the rate control step 1115 and DCT in step 1116 followed by quantization and VLC in step 1117.
At step 1125 the low stuffing bit flag is checked and the number of loops L compared to MAX_LOOP. The number of loops is the number of times the quantization/vlc process in step 1117 has been repeated. L is equal to 1 (one) after the initial execution of quantization/VLC process in step 1117. If the low stuffing bit flag is enabled and (MAX_LOOPS-L) is less than a predefined second threshold 1102, then step 1127 is executed, otherwise step 1129 is executed.
Step 1127 checks the number of stuffing bits: if the number of stuffing bits is less than a pre-defined third threshold 1103 then the low stuffing bit flag is disabled in step 1128, otherwise step 1120 is performed. The number of stuffing bits is the difference between the actual bits generated for the encoded frame and a target number of bits.
Step 1129 checks if the output bits are within a frame target bit range. If the output bits are not in the frame target range then the rate control step 1119 is performed. Rate control step 1119 is essentially the same as rate control step 1115 and executes with the assumption that low stuffing bit optimization is not required. When low stuffing bit optimization is not required, rate control steps 1115 and 1119 allow for more rapid and coarse adjustment of quantization parameters. If, in step 1129, the output bits are within the frame target bit range, then the frame is considered to be encoded and the encoder process moves to the next frame in step 1130.
Rate control step 1120 is essentially the same as rate control step 1115 and executes with the assumption that low stuffing bit optimization is required. When low stuffing bit optimization is required, rate control steps 1115 and 1120 allow for fine adjustment of quantization parameters.
After rate control steps 1119 and 1120 finish, the quantization/VLC process in step 1117 and the steps that follow are repeated and the number of loops L incremented.
The specifications and description described herein are not intended to limit the invention, but to simply show a set of embodiments in which the invention may be realized. Other embodiments may be conceived for example, for current and future studio quality video formats which may include 3-D image and video content of current and future consumer formats for in-home theater such as the MPEG-4, H.264 format.
This application claims priority to U.S. Provisional Application No. 61/070,213 filed Mar. 20, 2008.
Number | Date | Country | |
---|---|---|---|
61070213 | Mar 2008 | US |