The invention relates generally to data processing and, more particularly, to processing data for visual display.
Almost all desktop systems employ a landscape orientation of their displays. This is characterized by a display that is wider than it is tall. Video monitors and televisions also utilize landscape orientations. However, handheld device orientations vary based on the desired form factors of the products themselves. Often, the device uses a portrait orientation instead, which is characterized by a display that is taller than it is wide.
Due to the prevalence of systems that employ landscape orientations, there is a corresponding prevalence of displays that are designed for landscape orientations. Eventually, as there is more demand for portrait oriented images, portrait oriented displays will become available. But portrait displays are currently more expensive than their landscape counterparts.
It should be noted that in the long run, it is in the best interest of the product developers to eventually migrate to a natively portrait display for use with portrait oriented images. This will provide the maximum power efficiency and highest performance for the display. However, the lack of availability and/or higher cost of natively portrait displays can outweigh the power and performance advantages. Moreover, even when natively portrait displays do become available, there will be devices which need to switch between landscape and portrait orientations.
The invention provides a hardware solution that rotates a landscape oriented image to a portrait orientation for display on a landscape display, and vice-versa.
Display systems typically consist of a section of memory that is dedicated for graphics. Data from this section of memory is repeatedly rastered out to the display as it is refreshed. Applications and the operating system draw their graphics into this region of memory so that it shows up on the display. Normally, the operating system and applications assume that this memory is organized the same way as that memory on desktop systems. This orientation turns the one-dimensional memory (
Since memory was designed to be read sequentially for greatest efficiency, the normal method of refreshing the display also proceeds sequentially, in order to take advantage of this design. If the orientation of this memory matches the orientation of the display, which would be the case for the normal orientations outlined above, then both the software and hardware are working at the most efficient level possible, and there is no need for any rotation.
However, when the software and display must view the memory differently, there must be some rotation to provide each part with its desired orientation (see
In a second hardware approach, the display subsystem accesses memory as seen in the non-rotated view of
If the underlying graphics code can be modified or replaced, the rotation of the display can be accomplished via software. Many operations will suffer no appreciable performance degradation, because only the coordinates of the desired operation are rotated, and the operation proceeds in almost the same manner as its non-rotated counterpart. Nevertheless, there will be many operations that are impacted, because almost all external data (fonts, bitmaps, video, etc.) will need to be explicitly rotated. Unfortunately, in most systems, this level of access to underlying graphics code is either not possible or extremely impractical. For example, one conventional library of graphics functions performs over 128,000 different graphics operations, and replacing it for purposes of rotation would require several man-years of effort in development, debugging, and testing. Also, applications which run on top of such a library routinely make certain assumptions about the orientation of the memory with respect to the display, and unless every application can also be modified to add rotation support, they will not be compatible with this modified graphics code approach.
When the underlying graphics code cannot be modified or modifying it is impractical, the rotation of the display can often still be accomplished via software which operates outside of the baseline graphics code. In this scheme, an intermediate graphics buffer is allocated. This intermediate graphics buffer is oriented as needed by the operating system and applications. But the separate framebuffer that is actually displayed is oriented as necessary for the efficient feeding of data to the display. Then, once the software has completed a given graphics operation into the intermediate buffer (or at a specified interval) the data in this intermediate buffer (or better still, only the portion that was changed) is copied through a software rotation to the framebuffer. This approach is less efficient than the aforementioned graphics code modification, but it is more realistic in some cases where modifying the baseline operating system is impractical and where access to application source code is not possible.
The intermediate buffer approach can also be used with some hardware assistance. The data is copied and rotated to the framebuffer via a BLTer (Block Transfer engine) in place of software. This removes the software overhead of the rotation operation, but still leaves significant overhead.
In exemplary embodiments of the invention, the framebuffer itself is oriented neither for display nor for (OS or application) software, but instead in an intermediate, tiled format that is conducive to efficient software and display accesses simultaneously. Two separate apertures can be provided through which the display and software respectively access the framebuffer. These apertures provide the memory translation necessary to support the rotation.
The framebuffer is broken into tiles that consist of one memory page each. Example memory pages are shown at 0–7 in
Some embodiments provide apertures through which software will access the tiled framebuffer. One aperture represents a non-rotated access, as shown generally in
The memory translation for a given aperture is accomplished via a four-part memory offset equation, examples of which are shown in
For 16-bit and 32-bit accesses, the least significant bits (one for 16-bit and two for 32-bit) are masked off before the equation (one of
For the equations of
Page size—defined by memory architecture (bytes)
Display depth—defined by application (bytes)
Display width—defined by display hardware (pixels*display depth)
Display height—defined by display hardware (lines)
Tile width—normally the width of a burst (bytes)
Tile height=page size/tile width
Horizontal strip=display width*tile height
Vertical strip=display height*tile width
Tile rows=display height/tile height
Tile columns=display width/tile width
In order to implement the memory offset equations of
By performing the above simplifications, each part of each memory offset equation of
((aperture offset>>shift ^minuend) & mask
The shift operation (“>>”) is actually bi-directional, where a left shift is indicated by a negative value of the “shift” parameter. The “^” character represents an exclusive or (XOR) operation.
Using four of these groups of operations for each memory offset equation (see also
When combined with all of the other portions of
The “˜” character represents a logical complement or not operation.
Such an equation will, in some embodiments, require approximately 10K gates. As mentioned above, at least two apertures are needed, one for the display subsystem, and one for software access.
The values of the programmable parameters in the memory offset and memory address equations shown above are derived from the equations for the different rotations (see
Some embodiments implement the two address translations using two reserved regions of physical memory and two sets of 14 registers. The reserved regions of physical memory are the apertures through which the memory will be accessed. Whenever these regions of memory are accessed by the access address of
In some embodiments, the apertures are of sufficient size to hold any conceivable resolution and color depth, are aligned on a power-of-two boundary, and are a power-of-two in size. An example high-end assumption would be a 2048×2048×32 bpp display. This requires apertures of 16 MB, which means that the aperture offset can be contained in 24 bits. Since there is no actual memory associated with these apertures, the exemplary 16 MB requirement simply represents physical regions of memory space that are reserved.
The aforementioned 14 registers are:
Memory Base Address: 32-bits (32-bits populated; unsigned)—The base address of the physical memory area that is actually accessed. This value is added to the result of the address offset translation to obtain “memory address”.
Depth Mask: 32-bits (2-bits populated; unsigned)—The bit mask used to remove the least significant bits of an address before an address translation, and to restore the same bits after the translation. This is done to provide single byte accesses to multi-byte pixel formats. The register will be programmed with a value of 0xFFFFFFFF for 8-bit pixels, 0xFFFFFFFE for 16-bit pixels, and 0xFFFFFFFC for 32-bit pixels.
(Four) Shift: 16-bits (6-bits populated; signed)—The values (shift0, shift1, shift2 and shift3) in these registers specify the right shift for the first step of each of the four portions of the address translation. If the value is negative, the shift is to the left. Bits shifted out of the value are lost. Bits shifted into the value are set to 0.
(Four) Minuend: 32-bits (24-bits populated; unsigned)—YThe values (minuend0, minuend1, minuend2 and minuend3) in these registers are used to invert selected bits in the second step of each of the four portions of the address translation.
(Four) Mask: 32-bits (24-bits populated; unsigned)—The values (mask0, mask1, mask2 and mask3) in these registers are used to mask off selected bits in the third step of each of the four portions of the address translation.
Some embodiments complete the address translations in a single memory access cycle, implementing the translations with combinational logic. The translations in such embodiments can be accomplished through parallelization of the four portions of the memory offset equations so that each translation occurs quickly enough to avoid the addition of any extra cycles to a memory access.
Combinational logic can reduce power efficiency due to unnecessary changes in intermediate states. Some embodiments address power efficiency as follows. First, the address translations are active only when the associated aperture is being accessed. Second, intermediate values within the latter stages of the translation can be eliminated while the early changes are processing. A suitable internal propagation compensation can prohibit changes in later stages until the earlier stages have settled.
The aperture logic 13 controls a switch 23 to invoke the translator 15 whenever the access address on the bus 33 falls within the aperture implemented by aperture logic 13. Similarly, the aperture logic 17 controls a switch 27 to invoke translator 19 whenever the access address on bus 37 is within the aperture implemented by aperture logic 17.
Although exemplary embodiments of the invention are described above in detail, this does not limit the scope of the invention, which can be practiced in a variety of embodiments.
Number | Name | Date | Kind |
---|---|---|---|
5815168 | May | Sep 1998 | A |
5956049 | Cheng | Sep 1999 | A |
5990912 | Swanson | Nov 1999 | A |
6064407 | Rogers | May 2000 | A |
6215507 | Nally et al. | Apr 2001 | B1 |
6608626 | Chan | Aug 2003 | B2 |
6628294 | Sadowsky et al. | Sep 2003 | B1 |
6639603 | Ishii | Oct 2003 | B1 |
6667745 | Hussain | Dec 2003 | B1 |
6760035 | Tjandrasuwita | Jul 2004 | B2 |
6809737 | Lee et al. | Oct 2004 | B1 |
6847385 | Garritsen | Jan 2005 | B1 |
20030122837 | Saxena et al. | Jul 2003 | A1 |
20040239690 | Wyatt et al. | Dec 2004 | A1 |
Number | Date | Country | |
---|---|---|---|
20050134597 A1 | Jun 2005 | US |