The present invention relates to the field of digital video. In particular, but not by way of limitation, the present invention discloses techniques for efficiently scaling down full-motion video.
Video generation systems within computer systems generally use a large amount of memory and a large amount of memory bandwidth. At the very minimum, a video display adapter within a computer system requires a frame buffer that stores a digital representation of the desktop image currently being rendered on the video display screen. The central processing unit (CPU) or graphics processing unit (GPU) of the computer system must access the frame buffer to change the desktop image in response to user inputs and the execution of application programs. Simultaneously, the entire frame buffer is read by the video display adapter at rates of 60 times per second or higher to render the desktop image in the frame buffer on a video display screen. The combined accesses of the CPU (or GPU) updating the image to display and the video display adapter reading out the image in order to render a video output signal use a significant amount of memory bandwidth.
In addition to those minimum requirements, there are other video functions of a computer system that may consume processing cycles, memory capacity, and memory bandwidth. For example, three-dimensional (3D) graphics, full-motion video, and graphical overlays may all need to be handled by the CPU (or GPU), the video memory system, and the video display adapter.
Many computer systems now include special three-dimensional (3D) graphics rendering systems that read information from 3D models and render a two-dimensional (2D) representation in the frame buffer that will be read by the video display adapter for display on the video display system. The reading of the 3D models and rendering of a two-dimensional representation may consume a very large amount of memory bandwidth. Thus, computer systems that will do a significant amount of 3D rendering generally have separate specialized 3D rendering systems that use a separate 3D memory area. Some computer systems use ‘double-buffering’ wherein two frame buffers are used. In double-buffering systems, the CPU generates one image in a frame buffer that is not being displayed while another frame buffer is being displayed on the video display screen. When the CPU completes the new image, the system switches from a frame buffer currently being displayed to the frame buffer that was just completed. This technique eliminates the effect of ‘screen tearing’ wherein an image is changed while being displayed.
Furthermore, the video output systems of modern computer systems generally need to display full-motion video. Full-motion video systems decode and display full-motion video clips such as clips of television programming or film on the computer display screen for the user. (This document will use the term ‘full-motion video’ when referring to such television or film clips to distinguish such full-motion video from the reading of normal desktop graphics for the generation of a video signal to display on a video display monitor.) Full-motion video is generally represented in digital form as computer files containing encoded video or an encoded digital video stream received from an external source.
To display digitally encoded full-motion video, the computer system must first decode the full-motion video to obtain a series of video image frames. Then the computer system needs to merge the full-motion video with desktop image data stored within the computer systems main frame buffer. Due to all of the processing steps required to decode, processing and resize full-motion video for display on a computer desktop, the output of full-motion video generally consumes a significant amount of memory capacity and memory bandwidth. However, since the ability to display of full-motion video is a now standard feature that is expected in all modern computer systems, computer system designers must design their computer systems to handle the display of full-motion video
In a full personal computer system, there is ample CPU processing power, memory capacity, and memory bandwidth in order to perform all of the needed processing steps for rendering a complex composite desktop image that includes a window displaying a full-motion video. For example, the CPU may decode full-motion video stream to create video frames in a memory system, the CPU may render the normal desktop display screen in a frame buffer, and a video display adapter may then read the decoded full-motion video frames and main frame buffer to create a composite image. Specifically, the video display adapter, combines the decoded full-motion video frames with the desktop display image from the main frame buffer to generate a composite video output signal.
In small computer systems wherein the computing resources are much more limited the task of generating a video output display with advanced feature such as handling full-motion video can be much more difficult. For example, mobile telephones, handheld computer systems, netbooks, tablet computer systems, and terminal systems will generally have much less CPU processing power, memory capacity, and video display adapter resources than a typical personal computer system. Thus, in a small computer the task of combining a full-motion video stream with a desktop display to render a composite video display can be very difficult. It would therefore be very desirable to develop very efficient methods of handling complex display tasks such that complex displays can be output by the display systems in small computer systems.
In the drawings, which are not necessarily drawn to scale, like numerals describe substantially similar components throughout the several views. Like numerals having different letter suffixes represent different instances of substantially similar components. The drawings illustrate generally, by way of example, but not by way of limitation, various embodiments discussed in the present document.
The following detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings show illustrations in accordance with example embodiments. These embodiments, which are also referred to herein as “examples,” are described in enough detail to enable those skilled in the art to practice the invention. It will be apparent to one skilled in the art that specific details in the example embodiments are not required in order to practice the present invention. For example, although the example embodiments are mainly disclosed with reference to the True-Color and High-Color video modes, the teachings of the present disclosure can be used with other video modes. Furthermore, the present disclosure describes certain embodiments for use within thin-client terminal systems but the disclosed technology can be used in many other applications. The example embodiments may be combined, other embodiments may be utilized, or structural, logical and electrical changes may be made without departing from the scope what is claimed. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope is defined by the appended claims and their equivalents.
In this document, the terms “a” or “an” are used, as is common in patent documents, to include one or more than one. In this document, the term “or” is used to refer to a nonexclusive or, such that “A or B” includes “A but not B,” “B but not A,” and “A and B,” unless otherwise indicated. Furthermore, all publications, patents, and patent documents referred to in this document are incorporated by reference herein in their entirety, as though individually incorporated by reference. In the event of inconsistent usages between this document and those documents so incorporated by reference, the usage in the incorporated reference(s) should be considered supplementary to that of this document; for irreconcilable inconsistencies, the usage in this document controls.
Computer Systems
The present disclosure concerns computer systems.
The example computer system 100 includes a processor 102 (e.g., a central processing unit (CPU), a graphics processing unit (GPU) or both), and a main memory 104 that communicate with each other via a bus 108. The computer system 100 may further include a video display adapter 110 that drives a video display system 115 such as a Liquid Crystal Display (LCD) or a Cathode Ray Tube (CRT). The computer system 100 also includes one or more input devices 112. The input devices may include an alpha-numeric input device (e.g., a keyboard), a cursor control device (e.g., a mouse or trackball), a touch screen, or any other user input device. Similarly, the computer system may include one or more output devices 118 (e.g., a speaker), LEDs, a vibration device. A storage unit 116 functions as a non-volatile memory system. The storage unit 116 may be a disk drive unit, flash memory, read-only memory, battery backed-RAM, or any other system of providing non-volatile data storage.
The computer system 100 may also have a network interface device 120. The network interface device may couple to a digital network in a wired or wireless manner. Wireless networks may include WiFi, WiMax, cellular phone, networks, BlueTooth, etc. Wired networks may be implemented with Ethernet, a serial bus, a token ring network, or any other suitable wired digital network.
In many computer systems, a section of the main memory 104 is used to store display data 111 that will be accessed by the video display adapter 110 to generate a video signal. A section of memory that contains a digital representation of what the video display adapter 110 is currently outputting on the video display system 115 is generally referred to as a frame buffer. Some video display adapters store display data in a dedicated frame buffer located separate from the main memory. (For example, a frame buffer may reside within the video display adapter 110.) However, the present disclosure will primarily focus on computer systems that store a frame buffer within a shared memory system.
The storage unit 116 generally includes some type of machine-readable medium 122 on which is stored one or more sets of computer instructions and data structures (e.g., instructions 124 also known as ‘software’) embodying or utilized by any one or more of the methodologies or functions described herein. The instructions 124 may also reside, completely or at least partially, within the main memory 104 and/or within the processor 102 during execution thereof by the computer system 100. Thus, the main memory 104 and the processor 102 may also be considered machine-readable media.
The instructions 124 may further be transmitted or received over a computer network 126 via the network interface device 120. Such transmissions may occur utilizing any one of a number of data transfer protocols such as the well known File Transport Protocol (FTP), the HyperText Transport Protocol (HTTP), or any other data transfer protocol.
Some computer systems may operate in a terminal mode wherein the system receives a full representation of display data 111 to be stored in the frame buffer over the network interface device 120. Such computer systems will decode the received display data and fill the frame buffer with the decoded display data 111. The video display adapter 110 will then render the received data on the video display system 115.
In addition, a computer system may receive a stream of encoded full-motion video for display or open a file with encoded full-motion video data. The computer system must decode the full-motion video data such that the full-motion video can be displayed. The video display adapter 110 must then merge that full-motion video data with display data 111 in the frame buffer to generate a final display signal for the video display system 115.
In
For the purposes of this specification, the term “module” includes an identifiable portion of code, computational or executable instructions, data, or computational object to achieve a particular function, operation, processing, or procedure. A module need not be implemented in software; a module may be implemented in software, hardware/circuitry, or a combination of software and hardware.
Computer Display Systems
The video display data for a computer system is generally made up of a matrix of individual pixels (picture elements). Each pixel is an individual “dot” on the video display system. The resolution of a video display system is generally defined as a two-dimensional rectangular array defined by a number of columns and a number of rows. The rectangular array of pixels is displayed on a video display device. For example, a video display monitor with a resolution of 800 by 600 will display a total of 480,000 pixels. Most modern computer systems have video display adapters that can render video in several different display resolutions such that the computer system can take advantage of the specific resolution capabilities of the particular video display monitor coupled to the computer system.
Most modern computer systems have color display systems. In a computer system with a color display system, each individual pixel can be any different color that can be defined by the pixel data and generated by the display system. Each individual pixel is represented in the frame buffer of the memory system with a digital value that specifies the pixel's color. The number of different colors that may be represented in a frame buffer is limited by the number of bits assigned to each pixel. The number of bits per pixel is often referred to as the color-depth.
A single bit per pixel frame buffer would only be capable of representing two different colors (generally black and white). A monochrome display would require a small number of bits to represent various shades of gray.
With colored display systems, each pixel is generally defined using a number of bits for defining red, green, and blue (RGB) colors that are combined to generated a final output color. In a “High Color” display system, each pixel is defined with 16 bits of color data. The 16 bits of color data generally represent 5 bits of red data, 6 bits of green data, and 5 bits of blue data. With a “True Color” display system, each pixel is defined with 24 bits of data. Specifically, the 24 bits of data represent 8 bits of Red data, 8 bits of Green data, and 8 bits of Blue data. Thus, True Color mode is synonymous with “24-bit” mode, and High Color “16-bit” mode. Due to reduced memory prices and the ability of 24-bit (True Color) to convincingly display any image without much noticeable degradation, most computer systems now use 24 bit “True Color” display systems. Some video systems may also use more than 24 bits per pixel wherein the extra bits are used to denote levels of transparency such that multiple depths of pixels may be combined.
To display an image on a video display system, the video display adapter of a computer system fetches pixel data from the frame buffer, interprets the color data, and then generates an appropriate video output signal that is sent to a display device such as a liquid crystal display (LCD) panel. Only a single frame buffer is required to render a video display. However, more than one frame buffer may be present in a computer system memory depending on the application.
In a personal computer system, the video adapter system may have a separate video frame buffer that is in a dedicated video memory system. The video memory system may be designed specifically for the task of handling video display data. Thus, in most personal computers the rendering of a video display can be handled easily. However, in small computer systems such as mobile telephones, handheld computer systems, netbooks, thin-client terminal systems, and other small computer systems the computing resources tend to be much more limited. The computing resources may be limited due to cost, limited battery power, heat dissipation, and other reasons. Thus, the task of generating a high-quality video display in a computer system with limited computing resources can be much more difficult. For example, a small computer system will generally have less CPU power, less memory capacity, less memory bandwidth, no dedicated GPU, and less video display adapter resources than are present in a typical personal computer system.
In a small computer system, there is often no separate memory system for the video display system. Thus, the video generation system must share the same memory resources as the rest of the small computer system. Since a video generation system must continually read the entire frame buffer from the shared memory system at high rate (generally more than 60 times per second) and all the other memory users share the same memory system, the memory bandwidth (the amount of data that can be read out of the memory system per unit time) can become a very scarce resource that limits the functionality of the small computer system. It is therefore very important to devise methods of reducing the memory bandwidth requirements of the various memory users within the small computer system. Since the video display system may consume the largest amount of memory bandwidth (by constantly reading out data to refresh the video display monitor), it is obvious to focus on the video display system when attempting to optimize memory usage.
Thin-Client Terminal System Overview
As set forth in the preceding sections, many different types of small computer systems can benefit from methods disclosed in this document that reduce the memory bandwidth requirements in the small computer system. For example, any other small computer system that renders full-motion video such as mobile telephones, netbooks, slate computers, or other small systems may use the teachings of this document. However, this disclosure will be disclosed with reference to an implementation within a small computer terminal system known as a thin-client terminal system.
A thin-client terminal system is an inexpensive dedicated computer system that is designed to receive user input then transmit that input to a remote computer system and receive output information from that remote computer system to present to the user. For example, a thin-client terminal system may transmit mouse movements and alpha-numeric keystrokes received from a user to a remote server system. Similarly, the thin-client system may receive encoded video display output data from the remote server system and display that video display output data on a local video display system. In general, a thin-client terminal system does not execute user application programs on the processor of a dedicated thin-client terminal system. Instead, the thin-client terminal system executes user applications on the remote server system and displays the output data locally.
Modern thin-client terminal systems strive to provide all of the standard user interface features that personal computers users have come to expect from a computer system. For example, modern thin-client terminal systems includes high-resolution graphics capabilities, audio output, and cursor control (mouse, trackpad, trackball, etc.) input that personal computer users have become accustomed to using. To implement all of these user interface features, a modern thin-client terminal system generally includes a small dedicated computer system that implements all of the tasks associated with displaying video output and accepting user input. For example, the thin-client terminal system receives encoded display information, decodes the encoded display information, places the decoded display information in a frame buffer, and then renders a video display based on the information in the frame buffer. Similarly, the thin-client terminal system receives input from the local user, encodes the user input, and then transmits the encoded user input to the remote server system.
In the embodiment of
The goal of thin-client terminal system 240 is to provide most or all of the standard input and output features of a personal computer system to the user of the thin-client terminal system 240. However, this goal should be achieved at the lowest possible cost since if a thin-client terminal system 240 is too expensive, a personal computer system could be purchased instead of the inexpensive thin-client terminal system 240. Keeping the costs low can be achieved since the thin-client terminal system 240 will not need the full computing resources or software of a personal computer system. Those features will be provided by the thin-client server system 220 that will interact with the thin-client terminal system 240.
Referring back to
The thin-client terminal system 240 receives the screen buffer changes and applies the changes to a local frame buffer. Specifically, the graphics update decoder 261 decodes graphical changes made to the associated thin-client screen buffer 215 in the server 220 and applies those same changes to the local screen buffer 260 thus making screen buffer 260 an identical copy of the bit-mapped display information in thin-client screen buffer 215. Video adapter 265 reads the video display information out of screen buffer 260 and generates a video display signal to drive display system 267.
From an input perspective, thin-client terminal system 240 allows a terminal system user to enter both alpha-numeric (keyboard) input and cursor control device (mouse) input that will be transmitted to the thin-client computer system 220. The alpha-numeric input is provided by a keyboard 283 coupled to a keyboard connector 282 that supplies signals to a keyboard control system 281. The thin-client control system 250 encodes keyboard input from the keyboard control system 281 and sends that keyboard input as input 225 to the thin-client server system 220. Similarly, the thin-client control system 250 encodes cursor control device input from cursor control system 284 and sends that cursor control input as input 225 to the thin-client server system 220. The cursor control input is received through a mouse connector 285 from a computer mouse 285 or any other suitable cursor control device such as a trackball, trackpad, etc. The keyboard connector 282 and mouse connector 285 may be implemented with a PS/2 type of interface, a USB interface, or any other suitable interface.
The thin-client terminal system 240 may include other input, output, or combined input/output systems in order to provide additional functionality to the user of the thin-client terminal system 240. For example, the thin-client terminal system 240 illustrated in
Thin-client server system 220 is equipped with multi-tasking software for interacting with multiple thin-client terminal systems 240 wherein each thin-client terminal system executes within its own terminal “session”. As illustrated in
The bandwidth required to transmit an entire high-resolution video frame buffer from a server to a terminal at full video display refresh speeds is prohibitively large. Thus video compression systems are used to greatly reduce the amount of information needed to recreate a video display on a terminal system at a remote location. In an environment that uses a shared communication channel to transport the video display information (such as the computer network 230 in the thin-client environment of
When the application programs running on the thin-client server system 220 for the thin-client terminal systems 240 are typical office software applications (such as word processors, databases, spreadsheets, etc.) then there are many simple techniques that can be used to significantly decrease the amount of display information that must be delivered over the computer network 230 to the thin-client terminal systems 240 while maintaining a high quality user experience for each terminal system user. For example, the thin-client server system 220 may only send display information across the computer network 230 to a thin-client terminal system 240 when the display information in the thin-client screen buffer 215 for that specific thin-client terminal system 240 actually changes. In this manner, when the display for a thin-client terminal system is static (no changes are being made to the thin-client screen buffer 215 in the thin-client server system 220), then no display information needs to be transmitted from the thin-client server system 220 to that thin-client terminal system 240. Small changes (such as a few words being added to a document in a word processor or the pointer being moved around the screen) will require only small updates to be transmitted.
As long as the software applications run by the users of thin-client terminal systems 240 do not change the display screen information very frequently, then the thin-client system illustrated in
Referring to
To create a more efficient system for handling full-motion video in a thin-client environment, a related patent application titled “System And Method For Low Bandwidth Display Information Transport” disclosed a system wherein areas of full-motion video information to be displayed on a thin-client transmitted to the thin-client system in an encoding format specifically designed for encoding full-motion video. (That related U.S. patent application Ser. No. 12/395,152 filed Feb. 27, 2009 is hereby incorporated by reference in its entirety.) A high-level block diagram of this more efficient system is illustrated in
Referring to
The video transmission system in the thin-client server computer system 220 of
The virtual graphics card 331 acts as a control system for creating video displays for each of the thin-client terminal systems 240. In one embodiment, an instance of a virtual graphics card 331 is created for each thin-client terminal system 240 that is supported by the thin-client server system 220. The responsibility of the virtual graphics card 331 is to output either bit-mapped graphics to be placed into the appropriate thin-client screen buffer 215 for a thin-client terminal system 240 or to output an encoded full-motion video stream that is supported by the full-motion video decoder 262 within the thin-client terminal system 240.
The full-motion video decoders 332 and full-motion video transcoders 333 within the thin-client server system 220 may be used to support the virtual graphics card 331 in handling full-motion video streams. Specifically, the full-motion video decoders 332 and full-motion video transcoders 333 help the virtual graphics card 331 handle encoded full-motion video streams that are not natively supported by the digital video decoder 262 in thin-client terminal system. The full-motion video decoders 332 are used to decode full-motion video streams and place the video data thin-client screen buffer 215 (in the same manner as the system of
The full-motion video transcoders 333 may be implemented as the combination of a digital full-motion video decoder for decoding a first digital video stream into individual decoded video frames, a frame buffer memory space for storing decoded video frames, and a digital full-motion video encoder for re-encoding the decoded video frames into a second digital full-motion video format supported by the video decoder 262 in the target thin-client terminal system 240. This enables the transcoders 333 to use existing full-motion video decoders on the personal computer system. Furthermore, the transcoders 333 could share the same full-motion video decoding software used to implement video decoders 332. Sharing code would reduce licensing fees.
The final output of the video system in the thin-client server system 220 of
In the thin-client terminal system 240, the thin-client control system 250 will distribute the received output information (such as audio information, frame buffer graphics, and full-motion video streams) to the appropriate subsystem in the thin-client terminal system 240. Thus, graphical frame buffer update messages will be passed to the graphics frame buffer update decoder 261 and the streaming full-motion video information will be passed to the full-motion video (FMV) decoder 262. The graphics frame buffer update decoder 261 will decode the graphics update and then apply the graphics update to the thin-client terminal's screen frame buffer 260 appropriately. The full-motion video decoder 262 will decode incoming digital full-motion video stream and write the decoded video frames into a full-motion video buffer 263.
In the embodiment of
Combining Full-Motion Video with Frame Buffer Graphics
The task of combining a typical display frame buffer (such as screen frame buffer 260) with full-motion video information (such as full-motion video buffer 263) may be performed in many different ways. One common method is to place a ‘key color’ in sections of the desktop display frame buffer where the full-motion video is to be displayed on the desktop display. The video output system then reads the desktop display frame buffer and replaces the key color areas of the desktop display frame buffer with full-motion video.
Referring to
In addition to the frame buffer display information, the system also has a full-motion video decoder 562 that decodes full-motion video into a full-motion video buffer 563. In this particular embodiment, the decoded video consists of YUV encoded video frames. A video output system 565 reads the both the data in the frame buffer 560 and the YUV video frame data 569 in the FMV buffer 563. The video output system 565 then replaces the key color of the full-motion video window area 579 of the frame buffer with pixel data generated from the YUV video frame data in the FMV buffer 563 to generate a final video output signal.
The raw full-motion video information output by a full-motion video decoder 562 generally cannot be used to directly generate a video output signal. The raw decoded full-motion video information is not within a format that can easily be merged with the desktop display information in the frame buffer 560.
A first reason that the decoded full-motion video information cannot be used directly is that the native resolution (horizontal pixel size by vertical pixel size) of the raw decoded full-motion video information 563 will probably not match the size of the full-motion video window 579 that the user has created to display the full-motion video. Thus, the full-motion video information may need to be rescaled from an original native resolution to a target resolution that will fit properly within the full-motion video window 579.
A second reason that the raw decoded full-motion video information 563 cannot be used directly is that full-motion video information is generally represented in a compressed YUV color space format. For example, the 4:2:0 YUV color space format is commonly used in many digital full-motion video encoding systems. As set forth earlier, the frame buffer in a typical computer system uses a red, green, and blue (RGB) pixel format. Thus, the raw decoded full-motion video information must be processed with a color conversion function to change the YUV encoded color pixel data into RGB encoded color pixel data.
All of this processing of full-motion video information can significantly tax the resources of a small computer system.
Initially, incoming full-motion video (FMV) data 601 is received by a full-motion video decoder 610. The full-motion video decoder 610 decodes the encoded video stream and stores (along line 611) the raw decoded full-motion video information 615 into a memory system 695. This raw decoded full-motion video information 615 generally consists of video image frames stored in some native resolution using a YUV color space encoding format. As set forth above, this raw decoded full-motion video information 615 cannot be displayed using a typical RGB computer display system and thus needs to be processed.
In a computer environment that allows multiple application windows to be displayed simultaneously, the full-motion video information will need to be scaled to fit within the application that the user has created for the full-motion video application. Thus, a video scaling system 620 will read (along line 621) the decoded YUV full-motion video information 615 from the shared memory system 695 at the video source frame rate. If a 4:2:0 YUV encoding system is used, the bandwidth required for this step is Hv*Vv*f*1.5 bytes/sec where Hv is the native horizontal resolution, Vv is the native vertical resolution, and f is the frame rate of the source full-motion video data. (The value of 1.5 byes represents the amount of bytes per pixel in a 4:2:0 YUV encoding.)
The video scaling system 620 then adjusts the resolution of the full-motion video to fit within boundaries of the full-motion video window created by the user. An inefficient scaling system might perform the scaling in two stages that each require reading and writing from the memory system 695. A first stage would read the full-motion video data and then write back horizontally scaled full-motion video data. A second stage would read the horizontally scaled full-motion video data and then write back full-motion video data 625 that is both horizontally and vertically scaled. This document will assume a video scaling system 620 that scales the video in both dimensions with a single read 621 and a single write 622 of the full-motion video frame
After completing the scaling, the video scaling system 620 will write (along line 622) the scaled YUV full-motion video information 625 back into the shared memory system 695 at the same (video source) frame rate. The memory bandwidth required for this write-back step is Hw*Vw*f*1.5 bytes/sec where Hw is horizontal resolution, Vw is the vertical resolution of the full-motion video window, and f is the full-motion video frame rate. (Again, the value of 1.5 byes represents the amount of bytes per pixel in a 4:2:0 YUV encoding.)
To merge the full-motion video with the desktop display graphics and display it with the computer systems RGB based system, a color conversion system must convert the full-motion video from its non RGB format (4:2:0 YUV in this example) into an RGB format. Thus, the color conversion system 630 will read (along line 631) the scaled YUV full-motion video information 625 from the shared memory system 695, convert the pixel colors to RGB format, and write (along line 632) the color converted full-motion video data 635 back into the shared memory system 695. In a True Color video system that uses 3 bytes per pixel, the memory bandwidth requirements for this color conversion are:
Read=Hw*Vw*f*1.5 bytes/sec
Write=Hw*Vw*f*3 bytes/sec
Total=Hw*Vw*f*4.5 bytes/sec
Finally, the scaled and RGB formatted full-motion video 635 must be written read by the video output system 650 and merged with the desktop display image from the main frame buffer 660. To perform this merging, the video output system 650 reads both the RGB formatted full-motion video 635 (along line 651) and desktop display image from the main frame buffer 660 (along line 652) from the memory system 695 at a refresh rate R required by the display monitor. (The refresh rate R will typically be larger than the source video frame rate f.) The video output system 650 may then use a key color system to multiplex together the two data streams and generate a final video output signal 670. For a computer display system with a horizontal resolution of Hd and a vertical resolution of Vd, the bandwidth requirements for this final processing stage are:
Main frame buffer data read=Hd*Vd*R*3 bytes/sec
FMV data read=Hw*Vw*R*3 bytes/sec
Total=(Hw*Vw*R*3 bytes/sec)+(Hd*Vd*R*3 bytes/sec)
In a worst-case scenario where the user has expanded the full-motion video window to fill the entire display screen (a full-motion video window resolution of Hd by Vd), the total bandwidth required will be 2*Hd*Vd*R*3 bytes/sec. Excluding the writing of the full-motion video data into the shared memory system, the total memory bandwidth requirement for the worst-case scenario (a full display screen sized full-motion video window) becomes:
Total Sum=Hv*Vv*f*1.5+Hd*Vd*f*1.5+Hd*Vd*f*4.5+Hd*Vd*R*6
Total Sum=Hv*Vv*f*1.5+Hd*Vd*6*(f+R)
Such a large amount of memory bandwidth usage will stress most memory systems. Within a small computer system with limited resources, such a large amount of memory bandwidth is unacceptable and must be reduced. Other types of systems may experience the same problem. For example, a system that supports multiple display systems from a single shared memory (such as a terminal mulitplier) will also have difficulties with memory bandwidth. In a multiple user (or display) system where there are N users (or displays) sharing the same memory and having separate video paths, the total sum of memory bandwidth usage becomes: N*(Hv*Vv*f*1.5+Hd*Vd*6*(f+R)). It would likely be impractical to construct such a memory system.
Various different methods may be employed to reduce the memory bandwidth requirements for such a display system. One technique would employ a pipelined video processing system that performs multiple video processing steps with a single pipeline processing unit. Such a pipelined processing system would thus greatly reduce the amount of memory bandwidth required since the intermediate results would not be stored in the main memory system.
The scaling system 720 scales incoming full-motion video data and stores the scaled results in a memory buffer (not shown) before the color conversion stage 730. Note that the results stored in the memory buffer are generally not a full video image frame. The intermediate results may vary from a few pixels to a few rows of video data.
The color conversion stage 730 converts the pixel color space into the RGB used by the video output system and then stores intermediate results (fully processed video data) in a memory buffer (not shown) before a full-motion video and frame buffer merge stage 740. The full-motion video and frame buffer merge stage 740 then reads the fully processed full-motion video data and merges it with the desktop graphics information read from the main frame buffer 760 in the shared memory 795. The merged data is then used to drive a video signal output system 750.
The pipelined video processor 790 illustrated in
In the pipelined video processor 790 disclosed in
As set forth above, the pipelined video processor 790 of
FMV read at a rate of monitor refresh=Hv*Vv*R*1.5 bytes/sec
Frame buffer read from memory=Hd*Vd*R*3 bytes/sec
Grand Total per user=Hv*Vv*R*1.5+Hd*Vd*R*3=1.5R*(Hv*Vv+2*Hd*Vd)
In systems that handle multiple display systems, the amount of memory bandwidth required will become very large.
Grand Total for N displays=N*1.5R*(Hv*Vv+2*Hd*Vd)
Thus, the video memory system in terminal multiplier 781 of
The pipelined video system of
Other windows may be overlaid on top of a full-motion video window. In regions where another window is overlaid on top of a full-motion video window, the frame buffer will not have key color data such that the data from the frame buffer will be used and the full-motion data read from the full-motion video buffer will be discarded. This discarded full-motion video data also represents inefficient memory usage. Between the discarded key color data and discarded full-motion video data, two sets of display data are read for the full-motion video window but data from only one set will be used for each pixel. The other data is discarded.
The reason for the above inefficiency is that the key color data stored within the frame buffer must be read since that key color data is used to select whether data from the frame buffer or data from the full-motion video buffer will be displayed. This is illustrated conceptually in
To eliminate all of this redundant data reading, a technique called On-The-Fly (OTF) key color generation was invented. With On-The-Fly (OTF) key color generation, the video system is informed about the location of all the various windows displayed on a user's desktop display. The On-The-Fly (OTF) key color generation system then calculates the locations where pixels must be read from the frame buffer and where pixels must be read from the full-motion video buffer such that no redundant data reading is required.
On-The-Fly (OTF) key color generation may be implemented in several different manners. The patent application “SYSTEM AND METHOD FOR ON-THE-FLY KEY COLOR GENERATION” with Ser. No. ______ filed on ______ discloses several methods of implementing an On-The-Fly (OTF) key color generation system and is hereby incorporated by reference. In some implementations, the On-The-Fly (OTF) key color generation system maintains the coordinates of where a full-motion video window is located on the desktop display and tables that provide the coordinates of all the windows (if any) that are overlaid on top of the full-motion video window.
Depending on the implementation, the On-The-Fly (OTF) key color generation system 868 may or may not literally generate key color pixels. In some embodiments, the On-The-Fly (OTF) Key color generation system 868 will simple control the reading of pixel information with a signal. In other embodiments, the On-The-Fly (OTF) key color generation system 868 synthetically generates actual Key color pixels that may be provided to legacy display circuitry that operates using the synthetically generated key color pixels. In such embodiments, the On-The-Fly (OTF) key color generation system 868 may also generate dummy full-motion video pixels that are discarded by the legacy display circuitry.
Referring to the conceptual diagram of
Note that in a system that uses the On-The-Fly (OTF) key color generation, having no full-motion video to display may appear to be the worst case situation for data read-out. Specifically, frame buffer data reads (of 3 bytes of RGB data per pixel) require more bandwidth than full-motion video data reads (of 1.5 bytes of YUV data per pixel). Thus, the maximum possible memory bandwidth required by the system for a single user will be Hd*Vd*R*3 bytes/sec when no full-motion video is displayed. (For an ‘N’ user system the maximum memory bandwidth required by the video display to read out display data is N*Hd*Vd*R*3 bytes/sec.)
When a user has a large full-motion video window open, the On-The-Fly (OTF) key color generation system 868 of
Full Frame buffer read from memory=Hd*Vd*R*3 bytes/sec
Savings by not reading FMV window Key color data=Hw*Vw*R*3 bytes/sec
FMV read at a rate of monitor refresh=Hv*Vv*R*1.5 bytes/sec
Grand Total=(Hd*Vd−Hw*Vw)*R*3 bytes/sec+Hv*Vv*R*1.5 bytes/sec
Referring to
Problems with Scaled Down Full-Motion Video Windows
As set forth in the previous section, the read and write systems for video display systems that employs On-The-Fly (OTF) Key color generation will generally offset each other with one reducing memory bandwidth requirements when the other system needs more memory bandwidth. However, there is one situation wherein this mutual offsetting does not work very well. Specifically, when a user scales a full-motion video window down to a very small size, the memory bandwidth savings from displaying full-motion video will be significantly reduced.
When a user requests the display of a full-motion video but then scales down the windows used to display the full-motion video to a small size, the video display system must continue to process the full-motion video but the savings achieved from the displaying the full-motion video are reduced. For example, when the resolution of a window used to display full-motion video is smaller than the native resolution video of the full-motion video then significant amounts of information read out of the full-motion video buffer will be discarded since the full-motion video must be scaled down to fit within the small window created by the user for displaying the full-motion video.
As conceptually illustrated in
As described in the previous section, a user can effectively nullify the advantages of an On-The-Fly (OTF) Key color generation system that eliminates redundant display data reads. Specifically, if a user reduces the window used to display full-motion video down to a single pixel, the video display system will effectively be forced to decode and process full-motion video without achieving any memory bandwidth reductions that would come from not reading the frame buffer in areas where the full-motion video is displayed. This would essentially render the difficult work of creating an efficient On-The-Fly (OTF) Key color generation system moot. To provide this from occurring, this document discloses a full-motion video pre-processing system that reduces full-motion video information upon entry when necessary. Thus, if a user significantly reduces the size of a desktop window used to display full-motion video then pre-processor will similarly reduce the amount of full-motion video information allowed to enter the system.
In the example of
In the embodiment disclosed in
The video pre-processor may be implemented in many different manners. In the embodiment of
A Full-Motion Video Pre-Processing Implementation with Motion-JPEG
There are many different digital video encoding systems that are used to digital encode video data. This section will focus upon an implementation that uses the motion-JPEG (M-JPEG) digital video encoding system. However, the disclosed video pre-processing system may be implemented with any type of digital video encoding system.
When configured for in the YUV 4:2:0 setting, the motion-JPEG (M-JPEG) digital video encoding system divides individual video image frames into multiple 16 by 16 pixel blocks known as Minimum Coded Units (MCUs) and each 16 by 16 pixel MCU consists of a total of six 8 by 8 element Macro Blocks (MB). Four of the 8 by 8 element macro blocks are used to store luminance (Y) data in a one byte of data to one pixel mapping such that each pixel has its own luminance value. The other two 8 by 8 element macro blocks are used to store chrominance (color) data: a first 8 by 8 element macro block stores Cr data and a second 8 by 8 element macro blocks stores Cb data. Each 8 by 8 chrominance element (Cr or Cb) macro block is applied to a 16 by 16 pixel MCU in a manner wherein each byte of chrominance data is applied to a 2 by 2 luminance pixel patch.
Finally,
The data for each 16 by 16 pixel MCU is transmitted as four consecutive 8 by 8 pixel luminance macro blocks (MBY0, MBY1, MBY2, and MBY3) followed by the 8 by 8 Cb chrominance data macro block and 8 by 8 Cr chrominance data macro block as illustrated in
As set forth in
Similarly, the data organization depicted in
Properly data formatting is also very important since it allows special features for efficient memory access within memory controllers, memory, and processor to be used. In one embodiment the system uses a 32-bit internal bus structure to the memory controller that has a special 16 cycle burst access feature. Thus, the memory controller can quickly transfer 64 bytes (16 operations of 4 bytes each) in a single efficient burst. Because of the structure of the Motion-JPEG data with sixteen byte wide macro blocks, the effective use a 16 cycle burst requires a minimum of four MCUs (4 MCUs*16 bytes/wide=64 bytes) to be present in local memory before the transformation can take place. As a result, the minimum local memory required to hold four MCUs is 1 KB for the luminance (Y) data (4*16 bytes*16 rows) and 0.5 KB for Cr and Cb together (each=4*8 bytes*8 rows) or a total of 1.5 KB. In one embodiment, a ping-pong memory structure (with two memory buffers) is used in order to keep the processing pipeline moving smoothly, thus bringing the total internal memory requirement to 3 KB. A ping-pong memory structure can be utilized using either with two memory buffers, with a dual-port memory buffer, or with a single memory buffer depending upon the input/output rates.
With four MCUs in the local internal memory for the pre-processor, the pre-processor can write data to the shared memory efficiently.
The video system must compete with other users of the shared memory system for access to the shared memory system. As the memory controller goes through arbitration between different masters, it is impossible to guarantee a pipeline that is always full. To circumvent this issue, one embodiment implements a stalling mechanism that may be used to stall the incoming data (from the motion-JPEG decoder).
As set forth in the previous sections, the video pre-processor receives decoded full-motion video data from the Motion-JPEG decoder in a macro block format. To prepare the full-motion video data for output by the video output system, the video pre-processor scales down the full-motion video when necessary to fit within a smaller full-motion video window created by a user. The video pre-processor may also convert the full-motion video data into a raster scan format that is better suited for the video output system that will read the pre-processed video data. The video pre-processor performs the scaling down (only when necessary) and rasterization internally. The video pre-processor then outputs the scaled and rasterized full-motion video data to the shared memory system. An example of a possible rasterized data format is illustrated in
After video decoder 1462, pieces of decoded full-motion video are then provided to a video pre-processor system 1442. In one embodiment, the video decoder 1462 provides decoded full-motion video information in decoded MCU sized chunks to the video pre-processor 1442. The video pre-processor 1442 also receives information about the window that will be used to display the full-motion video. Specifically, a window information source 1407 provides full-motion video window resolution information 1410 to the video pre-processor system 1442 so the video pre-processor system 1442 can determine whether scaling of the full-motion video information is required and output size needed. The full-motion video window resolution information 1410 is provided to a horizontal coefficient calculator 1421 and a vertical coefficient calculator 1451 that calculate coefficient values that will be used in the down-scaling process (if down-scaling necessary).
The chunks of decoded full motion video information are first provided to horizontal resize logic block 1420 that uses the coefficients received from the horizontal coefficient calculator 1421 to rescale the video information in a horizontal direction. If the resolution of the full-motion video window is larger than or equal to the native resolution of the full-motion video then no rescaling needs to be performed. When rescaling is required, the horizontal resize logic block 1420 will rescale the video using the coefficients received from the horizontal coefficient calculator 1421. The rescaling may be performed in various different manners. In one embodiment designed for efficiency, the horizontal resize logic block 1420 will simply drop some pixels from the incoming full-motion image frame to make the frames smaller in the horizontal direction.
The horizontal resize logic block 1420 also changes the format of the data into a rasterized data format. The rasterized data format will simplify the later vertical resizing stage and the eventual read-out of the data by the video output system. The horizontal resize logic block 1420 outputs the horizontally rescaled and rasterized data into a temporary memory buffer 1430.
Since the chrominance (Cr and Cb) data is already subsampled 2 to 1 relative to the luminance (Y) data in 4:2:0 formatted video, the downsizing of chrominance data is performed in a slightly different manner. For example, when the full-motion video information is being downsized 2 to 1 (50% reduction), the data will be the same as when no downsizing occurs since the Cr and Cb data was already downsized relative to the luminance data and they would be upsampled later to 4:4:4 format before display. Thus
After horizontal resizing, a vertical resize logic block 1450 reads the horizontally rescaled and rasterized full-motion video data from memory buffer 1430. The vertical resize logic block 1450 uses the coefficients received from the vertical coefficient calculator 1451 to scale down the full-motion video in the vertical direction when necessary. Again, various different methods may be used to perform this resizing but in one embodiment, the vertical resize logic block 1450 will periodically drop rows of data to down-size the full-motion image frames in the vertical dimension.
After the vertical resizing, the vertical resize logic block 1450 writes the horizontally and vertically rescaled and rasterized full-motion video data 1469 into a full-motion video buffer 1463 in the shared memory system 1464.
A video output system will then read in the rescaled and rasterized full-motion video data 1469 in order to create a video output signal that will drive a video display system. To maximize the throughput to the shared memory system 1464, the output system should output large bursts of data to the shared memory system 1464. In one embodiment, the possible burst choices are 4, 8, 16, or single cycle burst increments. 16 cycle bursts are obviously the most efficient because a larger amount of data is transferred for the same over head costs.
The output row length (the number of columns) of a resized down window may be any number since that is controlled by the user. However, in one implementation, this output row length is made to be a multiple of four to increase efficiency. In a system that always outputs a multiple of four, the system will send out 16 cycle bursts since smaller bursts are inefficient. For example, for an output row length of 88 one implementation will output two 16 cycle bursts of 64 bytes rather than 1 burst of 64 bytes, 1 bursts of 16 bytes, and 2 bursts of 4 bytes. The extra data bits will be ignored. Thus, in one implementation it was assumed that the output luminance (Y) row or chrominance (Cr and Cb) row length is an integral multiple of 16 bursts even though the actual valid data could be lesser. By making a mapped image frame row to the RAM a multiple of 64 bytes the system also does not have to make adjustments for writing across a 1 KB memory boundary.
In one embodiment, the process of resizing an image down will end up performing some combination of down-sampling and up-sampling of the source data. For example, in an embodiment wherein YUV 4:2:0 image frame data is being resized down to >½ vertical or >=½ horizontal of the original size a combination of downsizing and upsizing may be used. The luminance (Y) data will get down sized to the new target size. The chrominance (Cr and Cb) data will not be changed in size significantly since it was already sub sampled. The data is treated differently since there is one byte of luminance (Y) data for each pixel but only 1 byte of Cr and 1 byte of Cb chrominance data for every four pixels. When the image frame is scaled down to ½ vertical or ½ horizontal size, luminance (Y) data gets down sampled while the chrominance (Cr and Cb) data passes through without change. (Since chrominance data is not changed during downsizing, this can be considered as “up-sampled”.) When the image frame is scaled down to <½ vertical or <½ horizontal size then all the three components (Y, Cr, and Cb) will be down sampled. The following verilog code provides one set of equations that may be used to scale down full-motion image frames in the horizontal direction:
The preceding code calculates resize down coefficients on a per pixel basis for a horizontal resize down. In the disclosed embodiment, the output coefficient is a Boolean value ‘keep’ that determines if a particular pixel is kept or dropped. If keep=1 then the pixel is left intact else if keep=0 then the pixel is discarded. The value of the output coefficient keep is calculated using the step size as shown in the pseudo code. The step value is set as step=256*output columns/input columns for a 8 bit granularity step. For example, in a system where the horizontal scaling is downsizing the number of columns in half then step=256* (½)=128. In one implementation 16 pixels are worked on in parallel, so the coefficients (coef_row16) are calculated in advance for groups of 16.
Although the preceding pseudo-code is for a horizontal rescaling, the same methods may be used to calculate the vertical down size coefficients. The vertical down size coefficients are calculated on a per row basis. For the vertical down sizing, the value of step is set with step=256*output rows/input rows.
To illustrate how the system operates, a simple example is hereby provided. If an image needs to be resized down in half (output rows/columns=½*input rows/columns) then every other row/column should be dropped. The step value is calculated with step=256*output/input=256*(½)=128. The reset value for calc_alpha=256. So, using the pseudocode, the system will calculate the keep values as following:
For pixel 1: Calc_alpha_reg=256 so keep=1; Next Calc_alpha=256−256+128=128
For pixel 2: calc_alpha_reg=128 so keep=0; Next Calc_alpha=128+128=256
For pixel 3: calc_alpha_reg=256 so keep=1; Next Calc_alpha=256−256+128=128
For pixel 4: calc_alpha_reg=128 so keep=0; Next Calc_alpha=128+128=256
As illustrated in preceding example, the pattern of dropping every other pixel or row continues reducing the output to half of the original width or height. The system described above is one possible system for scaling down an image, however, there are many other techniques that may be used for scaling down a digital image.
There are many methods of specifically implementing the video pre-processing system. For example, the data path may be implemented in many different ways. The order of the horizontal resizing and vertical resizing stages may be switched. The basic goal is to scale down the full-motion video to a size that is no larger than the full-motion video window that will be used to display the full-motion video.
The internal memory systems used within the video pre-processing system can also be implemented in many different ways. As disclosed earlier, with a motion-JPEG based system a memory buffer of 1.5 KB allows a 32-bit system that can perform 16 cycle burst to output data to the shared memory system with efficient 16 cycle bursts. Furthermore, the use of a ping-pong memory buffer with back pressure allows a system to write into one memory buffer while the other memory buffer is being read by a later processing stage. This concept can further be extended to the use of memory buffers in a circular buffer configuration. Specifically, a writer will sequentially write into a set of ordered memory buffers in a circular round-robin pattern. Similarly, a reader will read out of the memory buffers in the same circular round-robin pattern but slightly behind the writer. In this manner, small temporary differences in the read and write speeds can be accommodated.
A full-motion video decoder 1462 decodes the encoded full-motion video 1405 and provides the decoded full-motion video to video pre-processor 1442. Using information about the size of the target full-motion video window 1479 from window information source 1407, the video pre-processor 1442 downscales (if necessary) the full-motion video from a native resolution to a resolution that will fit within target full-motion video window 1479. The video pre-processor 1442 also rasterizes the full-motion video information so that it is in better form for use by a video output system. The video pre-processor 1442 writes the down-scaled and rasterized decoded full-motion video 1469 into a full-motion video buffer 1463 in the share memory system 1464.
A pipelined video processor 1490 that incorporates an on-the-fly key color generation system then composites the down-scaled and rasterized decoded full-motion video 1469 and the main frame buffer 1460 to create a video output signal 1470. The pipelined video processor 1490 receives window information 1407 so that the pipelined video processor 1490 knows when full-motion video 1469 needs to be displayed and when normal frame buffer 1460 information needs to be displayed. In the example of
For the areas where full-motion video 1469 needs to be displayed, the pipelined video processor 1490 will read in the needed full-motion video 1469 information into a video scaling system 1471 that will upscale the full-motion video 1469 information. Upscaling will occur when the resolution of the full-motion video window 1479 is larger than the native resolution of the full-motion video. The full motion video then goes through a color space conversion with color convert stage 1473. Finally, the full-motion video is merged with the data from the main frame buffer 1460 and the video output signal 1470 is created.
Note that video display system of
The preceding technical disclosure is intended to be illustrative of the methods and systems, and not restrictive. For example, the above-described embodiments (or one or more aspects thereof) may be used in combination with each other. Other embodiments will be apparent to those of skill in the art upon reviewing the above description. The scope of the claims should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein.” Also, in the following claims, the terms “including” and “comprising” are open-ended, that is, a system, device, article, or process that includes elements in addition to those listed after such a term in a claim are still deemed to fall within the scope of that claim. Moreover, in the following claims, the terms “first,” “second,” and “third,” etc. are used merely as labels, and are not intended to impose numerical requirements on their objects.
The Abstract is provided to comply with 37 C.F.R. §1.72(b), which requires that it allow the reader to quickly ascertain the nature of the technical disclosure. The abstract is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. Also, in the above Detailed Description, various features may be grouped together to streamline the disclosure. This should not be interpreted as intending that an unclaimed disclosed feature is essential to any claim. Rather, inventive subject matter may lie in less than all features of a particular disclosed embodiment. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate embodiment.