The present invention relates to a memory device that incorporates both a multi-media accelerator and an embedded memory.
Handheld devices such as cell phones are proliferating globally, with most these handheld devices incorporating multi-media functions. These multi-media functions vary in performance and cost. Additionally, the multi-media functions are implemented by the base band processor or an application processor of a cell phone (or the equivalent in other handheld devices). These multi-media functions require their own memory to achieve adequate performance. Often, a dedicated memory, such as a synchronous dynamic random access memory (SDRAM), a mobile double data rate memory (MDDR) or a synchronous pseudo static random access memory (pSRAM) is used to implement the multi-media functions. This memory represents an additional cost for the handheld device, as the baseband processor already has an associated memory, which is used for operating a wireless network. When the multi-media functionality is not operating, the memory dedicated for the multi-media functions is not used. Additionally, the multi-media memory is implemented using a separate chip. Consequently, power consumption is high when running the multi-media functions, undesirably reducing battery life.
One of the largest performance and power consumption factors in mobile multi-media consumer devices is memory. Generally, mobile devices deploy various power management schemes such as sleep mode, standby mode and active mode. These modes are prevalent in cellular phones, where sleep mode is engaged while the phone is in its cradle waiting to receive a call. In this mode, a minimal set of operations are running and it is desirable for all devices which are not actively running to have the lowest leakage current. One of the largest sources of leakage current in sleep mode is memory. The leakage is dependent on the type of memory and the amount of memory or memory elements, as well as logic elements. Typically in memory devices, there is considerable memory on-chip, making memory leakage a dominant factor while in sleep mode.
In standby mode, several functions may be running and may periodically get interrupted by other functions, such as receiving a call while playing a game. If a call is received while playing a game, there would be a resource conflict for the display so the game could be put in standby mode while servicing the incoming call. In standby mode it may be necessary to retain the context for the game, in order to allow the game to continue after the call is terminated. For this reason, it may be necessary to have certain functions in standby mode by keeping some clocks running, while deactivating other clocks and preserving the memory content.
While it is desirable to have very low leakage current in sleep mode and very low operating current in standby mode, it is also very desirable to have very low power consumption while in active mode. In active mode, the application is running continually and accessing the memory, the display and other devices. Running a battery powered device in active mode imposes a relatively large strain on the battery in a short period of time. In order to reduce power consumption significantly during the active mode, it is not only necessary to extensively gate the clocks, thereby reducing logic power, but it is also necessary to have the most efficient access to memory for frame buffer, Z buffer, texture memory and the display list. It is therefore desirable to have all the required memory and the computing elements in the memory device.
Various papers have described embedding memory on a system on a chip (SoC) architecture (e.g., John Poulton, “An Embedded DRAM For CMOS ASICs” (1997), David Patterson et al., “A Case For Intelligent RAM” (1997), and M. F. Deering et al. “FBRAM: A New Form of Memory Optimized for 3D Graphics” (1994)).
Poulton teaches embedding DRAM for use as a register file between multiple processors which are also on the same chip. The chip is a graphics-enhanced memory chip with low voltage swing buses for full voltage swing multiple small page memories, thereby reducing power consumption and boosting performance with wide on-chip buses.
Patterson et al., address the integration of DRAM and logic on a single SoC. Patterson et al. teach integrated RAM (IRAM), which includes DRAM and a processor integrated on a single chip to over come the processor-memory performance gap. Patterson et al. incorporate vector processing with DRAM on the same chip, while utilizing wide buses to achieve high bandwidth. The wide on-chip buses exhibit a low capacitance, thereby reducing the power consumption and allowing higher on-chip bus frequencies.
Deering et al., describe the advantages of integrating graphics functions with DRAM in a chip and using multiple of these chips to produce a frame buffer solution. Deering et al. also teach integrating 3D graphics functions with DRAM on a single chip (FBRAM), wherein performance is enhanced by performing read-modify-write, Z compare and rgba blending in a single write operation. The DRAM memory and the graphics functions are 4-way interleaved, wherein each DRAM bank has its own page buffers. External devices access the FBRAM via a custom render bus and the DRAM is used for graphics functions only. Multiple FBRAMS are required to compose a full frame buffer for 3D graphics.
U.S. Pat. Nos. 5,650,955, 5,703,806, 6,356,497, 6,771,532, 6,920,077 and 7,106,619 by Puar et al. describe a method to integrate DRAM with a graphics accelerator and video logic for a mobile PC. These patents teach a CPU interfacing to a chip which has no external memory interface. Therefore, instead of using pins for a memory interface, the pins are used to provide a PCMCIA interface. The CPU has access to read the embedded DRAM and writes commands to the graphics accelerator via the CPU bus. The chip in these patents provides a CPU interface.
U.S. Pat. No. 6,101,620 by Ranganathan teaches a PC having a chip incorporating internal DRAM and a video display controller which operates with an external DRAM. The frame buffer is split between the external DRAM and the internal DRAM and is multiplexed out to the display interface. A host interface is used to write and read the internal DRAM and the external DRAM. The host interface is one of the buses present in a PC (i.e., a PCI bus, a VESA bus, an EISA bus, or an ISA bus).
The above-described references do not teach operating a chip as a memory device and efficiently sharing the memory of the memory device with a multi-media accelerator. Furthermore, these references do not teach operating a memory device having an interface that can implement more than one standard memory protocol. It would be desirable to have an interface to the memory device which is compatible with standard memory products protocols (e.g., DRAM, MDDR, pSRAM), so that the memory device can be a simple design-in within a system having standard memory buses, while providing multi-media acceleration functions.
One objective of the current invention is that the memory in the memory device is made usable for processors or external devices for functions other than multimedia when the multimedia accelerator is not operating, thus achieving the best cost optimization. In one embodiment, the memory in the memory device is an embedded memory with logic. Another aspect of the invention is that the embedded memory can be accessed (or made usable) when both the multimedia accelerator and an external device are running concurrently and using the same memory device. Because processors or external devices may have memory controllers that operate with different types of memory, the present invention includes a memory interface that is capable of operating in accordance with different protocols (i.e., different interfaces, timing and voltages). This memory interface enables the memory device to operate as multiple types of memory.
The present invention also provides a memory interface, which implements 3-D graphics and optionally other multimedia functions with reduced power consumption. For 3D graphics, there are four areas to consider for reduction in power consumption: (1) the logic, (2) the frame buffer where an image is composited, (3) the Z-buffer where the depth values for fragments of an image are stored, and (4) the texture memory.
The present invention is a memory device having a memory interface which is configured to operate with one of several standard memory protocols, one or more embedded memory subsystems, a memory mapping circuit, a graphics accelerator for 2D and/or 3D graphics processing, a display mechanism for updating a display. The memory device can be accessed via a memory interface. Optionally a video interface is provided for display purposes (e.g., MDDI). In one mode, the memory device can be operated such that an external device and the graphics accelerator concurrently access the embedded memory subsystems (by arbitrating accesses). In a second mode, the embedded memory subsystems are accessed by either the graphics accelerator or an external device based on setting an access mode bit. In a third mode, the embedded memory subsystems are accessed by an external device only, wherein the memory device acts as a standard memory device such as SDRAM (synchronous DRAM), DDR (double data rate SDRAM), mobile SDRAM, MDDR (mobile double data rate SDRAM), asynchronous pSRAM (pipelined SRAM), synchronous pSRAM, or cellularRAM.
In the described embodiments, embedded memory subsystem 114 can be used to store frame/Z-buffer data for a graphics application and/or general data. Similarly, embedded memory subsystem 115 can be used to store texture data for a graphics application and/or general data. Although the described embodiments include two embedded memory subsystems 114-115, it is understood that other numbers of embedded memory subsystems 114-115 can be used in other embodiments. In the described embodiments, embedded memory subsystems 114-115 are implemented using DRAM cells, although this is not necessary for all embodiments.
The efficiency in accessing embedded memory subsystems 114-115 can be optimized by implementing a memory architecture comprising multiple small banks of memory (i.e., multi-bank memory), wherein multiple small banks of memory collectively form each memory subsystem. For example, different groups of multiple small banks may be used to implement the frame buffer, Z-buffer and texture buffer of a graphics application. One example of a multi-bank memory architecture which can be used to implement memory subsystems 114-115 is described in U.S. Pat. No. 6,215,497 by Wingyu Leung, which is hereby incorporated by reference in its entirety. Another example of a multi-bank memory architecture which can be used to implement memory subsystems 114-115 is described in U.S. Pat. No. 6,370,073, also by Wingyu Leung, which is also hereby incorporated by reference in its entirety. Other memory architectures can be used in other embodiments of the present invention. Although both of the above-referenced multi-bank memory architectures are typically implemented using DRAM memory cells, other types of memory cells can be used in other embodiments of the present invention.
In
In one embodiment, memory device 100 can be used by one or more external devices when graphics accelerator 108 is not enabled to execute the graphics commands. These external devices are coupled to memory device 100 via memory bus 150. Multiplexer circuit 110 provides the mechanism which enables these external devices to access memory subsystems 114-115 via memory interface 104. More specifically, multiplexer circuit 110 allows an external device to access memory subsystems 114-115 using a path that includes memory interface 104, memory mapping circuit 106 and internal buses 117, 120, 123 and 124.
The frame/Z-buffer memory subsystem 114 and the texture buffer memory subsystem 115 are shown as two separate memories, each implemented as a multi-bank memory. However, memory mapping circuit 106 enables an external device to access the two embedded memory subsystems 114 and 115 as a single linearly addressable memory. Memory mapping circuit 106 may map the address space of the two embedded memory subsystems 114-115 such that these two memory subsystems appear as one linearly addressable memory to an external device coupled to memory interface 104. In an alternate embodiment, the frame/Z-buffer memory subsystem 114 and the texture memory subsystem 115 are implemented as a single multi-bank memory.
Registers 109 can be accessed by memory interface 104 via internal bus 119. Registers 109 include at least the standard registers available in the standard commercially available cellularRAM, SDRAM and MDDR products. Registers 109 also include memory device specific registers in addition to the registers available in standard commercial memory products. These memory device specific registers include registers to gate various clocks within memory device 100 for power management, to separately reset the graphics and other accelerators, to individually enable or disable graphics accelerator 108, other accelerators (not shown), or a memory interface mode register (see, e.g., memory interface mode register 411 of
In the described embodiment, the memory cells in embedded memory subsystems 114-115 include dynamic cells that require periodic refresh. Power management is incorporated by programming registers 109, which include clock gating registers that gate the clock off from embedded memory subsystems 114 and 115 (or designated banks within these embedded memory subsystems 114-115). The clock gating registers are programmed by an external device. For fine grain power management, each bank, which includes multiple sub-banks, receives a clock which is gated individually or on a sub-bank group basis. When the clock to any bank (or sub-bank) is gated off, the refresh circuitry does not refresh the memory cells of the bank (or sub-bank) and the data in the memory cells is not retained, because the memory cells comprise dynamic cells.
In an alternative embodiment, power management is achieved by keeping the clocks running to maintain the refresh circuitry so data is not lost, but reducing the power consumption by disabling accesses to the individual banks or sub-banks. Clock gating for power management can also be provided for the multimedia functions implemented by logic elements. A power management scheme is also provided where the embedded memory systems 114-115 are combined in a single multi-bank memory where the entire memory can be idled by disabling accesses, but maintaining the refresh mechanism. Other power management levels are incorporated wherein the clock is gated off to the entire single multi-bank memory or to the sub-banks within the single multi-bank memory, either individually or on a sub-bank group basis.
As described in more detail below, memory interface 104 is compatible with one or more standard memory devices. That is, memory interface 104 includes logic that allows external devices to access embedded memory subsystems 114-115 using different protocols. Therefore, from the perspective of an external device, memory interface 104 is capable of implementing a plurality of memory interface protocols associated with a plurality of standard memory devices. In another embodiment, memory interface 104 is capable of implementing a superset of a plurality of standard memory device protocols, and will support the protocols over the superset interface. Examples of standard memory device protocols include those used to implement SDRAM, DDR, mobile SDRAM, MDDR, asynchronous pSRAM, synchronous pSRAM, and cellularRAM.
By enabling connection to a plurality of different standard memory devices, memory interface 104 advantageously allows memory device 100 to be used in systems or devices where such standard memory devices are typically used, e.g., cell phones. As described in more detail below, memory interface allows many of the same pins of memory device 100 to be shared between different interfaces having different protocols. For example, the 16-bit data buses associated with both MDDR and cellularRAM protocols would share the same 16-data pins of memory device 100. (Note that the bus width of memory device 100 is not limited to 16-bits and can be of any width).
Other pins with similarity in function would also be common between protocols. As used herein, the commands associated with a protocol are generally designated as EXCMD signals, the clock signals associated with a protocol are generally designated as EXCLK signals, the data signals associated with a protocol are generally designated as EXDQ signals, and the address signals associated with a protocol are generally designated as EXADR signals.
For a superset of an MDDR interface, an extra select pin, (e.g., 3DCS#) is included to distinguish accesses to graphics accelerator 108 from accesses to the embedded memory subsystems 114-115. When accessing the embedded memory subsystems 114-115, a standard memory product protocol which can include chip select pin (CS#) is used.
In the described example, external device 300 is a base band processor implementing a MDDR memory controller 301 which is used to access a standard commercial MDDR memory product supporting a MDDR protocol, and having a memory density of 128 Mbits. The standard commercial MDDR memory product may be, for example, Micron part No. MT46H8M16LF. Alternatively, a standard SDBAM memory controller can be implemented in the external device. A standard commercial SDRAM product with a memory density is for example a Micron product with a part number MT48LC8M16A2. Both Micron products, MT46H8M16LF and MT48LC8M16A2 are incorporated herein by reference in their entirety. In this example, external device 300 has a 16-bit bus, and accesses are performed by first asserting a row address A[11:0] and bank address BA[1:0], followed by asserting a column address A[8:0] as well as the bank address BA[1:0], with the appropriate access command. The row and column address pins are shared in the protocol of external device 300, (which accounts for the step wise assertion of the row and column addresses).
Memory interface 104 is capable of supporting multiple protocols. Memory interface 104 is capable of deciphering and responding to signals associated with multiple protocols. However, to be able to implement a particular protocol, memory interface 104 must first be instructed which protocol is being presented. Thus, memory device 100 implements mode signals to identify the protocol of external device 300.
Mode signals M1INT and M2INT are provided to the ‘1’ input terminals of multiplexers 403 and 404, respectively. The ‘0’ input terminals of multiplexers 403 and 404 are coupled to receive mode signals HM1 and HM2, respectively, from mode interface register 411. The select terminals of multiplexers 403 and 404 are each coupled to receive a select control signal S from mode interface register 411. multiplexers 403 and 404 provide the mode signals M1 and M2, respectively, in response to the select control signal S. The select control signal S is initially set to a logic ‘1’ value, such that multiplexers 403 and 404 route the M1INT and M2INT signals as the mode signals M1 and M2, respectively. Memory interface 104 implements a particular memory protocol in response to the mode signals M1 and M2 and the signal on the CLK# pin.
If clock signals are present on both the CLK and CLK# pins of memory device 100, then the M1INT and M2INT signals (and therefore the M1 and M2 signals) are activated to logic ‘1’ values. In response, memory interface 104 is configured to implement an MDDR protocol.
If a clock signal is present on the CLK pin, but the CLK# pin is held at a logic ‘0’ state, then the M1INT signal (and therefore the M1 signal) is activated low, ‘0’ and the M2INT signal (and therefore the M2 signal) is deactivated high, ‘1’. In response, memory interface 104 is configured to implement an SDRAM protocol.
If a clock signal is present on the CLK pin, but the CLK# pin is held at a logic ‘1’ state, then the M1INT signal (and therefore the M1 signal) is activated high and the M2INT signal (and therefore the M2 signal) is deactivated low. In response, memory interface 104 is configured to implement a synchronous pSDRAM protocol.
If there are no clock signals present on the CLK and CLK# pins of memory device 100, then the M1INT and M2INT signals (and therefore the M1 and M2 signals) are deactivated to logic ‘0’ values. In response, memory interface 104 is configured to implement an asynchronous protocol.
In this manner, the CLK#, M1 and M2 signals are used to determine the type of memory protocol presented by external device 300. Note that other coding schemes can be used in other embodiments of the present invention.
Although mode signals M1 and M2 are automatically set upon power-on, these mode signals can be overridden by external device 300 to re-configure the memory protocol. The external device 300 can override the M1 and M2 mode signals by programming the memory mode interface register 411. As described above, memory mode interface register 411 provides the three bits HM1, HM2 and S to multiplexers 403 and 404. Upon power-on, select control bit S is defaulted to a logic ‘1’ state to select the M1INT and M2INT signals. However, the external device 300 may subsequently set the mode select bits HM1 and HM2 to a desired state by writing to mode interface register 411. External device 300 may also overwrite the select control bit S to have a logic ‘0’ state. Under these conditions, the mode select bits HM1 and HM2 are provided as the mode select signals M1 and M2, respectively, thereby controlling the protocol implemented by memory interface 104.
Decode/control logic 201 receives control signals from an external device via pin level input/output buffers (not shown). In the embodiment illustrated in
External device 300 requires access to registers 109, graphics accelerator 108 and embedded memory subsystems 114 and 115. In the described embodiments, the address space utilized by graphics accelerator 108 is mapped to the lower range of the available address space.
Memory device 100 appears as a standard commercial product to external device 300. Appropriate software libraries and drivers for the external device 300 and memory device 100 enable the use of graphics accelerator 108, access registers 109 and embedded memory subsystems 114-115 in memory device 100. Note that each of the four banks addressed by bank address BA[1:0] is constructed as a single memory or as multiple memories, each having a multi-bank architecture.
In one embodiment, when access to graphics accelerator 108 is required, external device 300 drives the chip select signal CS# to a logic high state (de-selecting memory sub-systems 210 and 220) and concurrently drives the 3DCS# signal low to access functions within the graphics accelerator 108 or registers 109. At the same time, external device 300 provides the row and bank addresses A[11:0] and BA[1:0] to the external address pins (EXADR) of memory interface 104. Conversely, when access to embedded memory systems 114 and 115 is required, external device 300 drives the chip select signal CS# to a logic low state, there by selecting the memory sub-systems 210 and 220 and concurrently drives the 3DCS# signal to a logic high state (de-selecting the graphics accelerator 108 and registers 109). At the same time, external device 300 provides the row and bank addresses A[11:0] and BA[1:0] to the external address pins (EXADR) of memory interface 104.
In one embodiment, the lower three bank addresses (BA[1:0]=00, 01, 10) are used for addressing registers 109 and other memories in graphics accelerator 108, and the uppermost bank address BA[1:0]=11) is used for addressing configuration registers, when CS# is high and 3DCS# is low.
In another embodiment, accesses between graphics accelerator 108, registers 109 and embedded memory subsystems 114 and 115 are distinguished without the extra 3DCS# pin by having a larger addressing range at memory interface 104 and decoding different smaller address ranges within memory device 100 to access graphics accelerator 108, registers 109 and embedded memory subsystems 114 and 115. This may be accomplished by having an extra row address or column address. In this embodiment, all accesses are made by asserting the chip select signal, CS# low.
In the described embodiment, decode/control circuit 201 receives an address signal A[x] to further distinguish accesses to the graphics accelerator 108 and registers 109. The address signal A[x] is at least one address bit sourced from the external device 300.
Memory interface 104 includes decode/control circuit 201 to determine external device access to graphics accelerator 108, registers 109 and embedded memory subsystems 114-115. Memory interface 104 also has control circuits for each type of memory protocol available in memory device 100. As described above, the mode bits M1 and M2 along with the CLK# signal determine which control circuits are active at any time.
The protocols presented at memory interface 104 are generally incompatible with the synchronous interface of the multi-bank embedded memory subsystems 114 and 115. Each of the embedded memory subsystems 114 and 115 has an address bus ADR, a data input bus Di, a data output bus Do, and a control signal bus CT.
Decode/control circuit 201 includes multiple finite state machines (FSM) for the different types of memory protocols presented by an external device, and also includes logic to decode the CLK#, M1 and M2 bits to identify the memory protocol of the external device. In one embodiment, the decoded CLK#, M1 and M2 bits enable the appropriate FSM according to
Embedded memory subsystem 114 is accessed as follows. Decode/control circuit 201 generates a set of control signals CTRL_FB/Z, which are provided to multiplexer 211 of embedded memory subsystem 114 (i.e., the embedded frame/Z-buffer memory). Multiplexer 211 also receives a set of control signals F_CTL generated by graphics accelerator 108. Multiplexer 212 receives write data signals W_DATA from address/data latches 202. Multiplexer 212 also receives data signal F/Z_Do provided by graphics accelerator 108. Multiplexer 213 receives address signal Ai from memory mapping circuit 106. Multiplexer 213 also receives address signal AF from graphics accelerator 108. Multiplexers 211-213 are controlled by memory interface 104, thereby allowing memory subsystem 114 to be accessed by either external device 300 or graphics accelerator 108.
Embedded memory subsystem 115 is accessed as follows. Decode/control circuit 201 generates a set of control signals CTRL_TEX, which are provided to multiplexer 221 of embedded memory subsystem 115 (i.e., the embedded texture memory). Multiplexer 221 also receives a set of control signals T_CTL generated by graphics accelerator 108. Multiplexer 222 receives write data signals W_DATA from address/data latches 202. Multiplexer 222 also receives data signal T_Do provided by graphics accelerator 108. Multiplexer 223 receives address signal Ai memory mapping circuit 106. Multiplexer 223 also receives address signal AT from graphics accelerator 108. Multiplexers 221-223 are controlled by memory interface 104, thereby allowing memory subsystem 115 to be accessed by either external device 300 or graphics accelerator 108. More specifically, multiplexers 211-213 and 221-223 are controlled by control signals CTRL_MISC generated by decode/control circuit 201.
Decode/control circuit 201 also generates control signals CTRL_GFX, which are provided to graphics accelerator 108. Graphics accelerator 108 also receives the write data signals W_DATA from address/data latches 202, and address signal Ai and from memory mapping circuit 106. Graphics accelerator 108 also receives output data signals F_Di and T_Di provided by embedded memory subsystems 114 and 115, respectively for reading frame/z and texture data. Graphics accelerator 108 provides output data signals D_GFX for reading registers and memories within the graphics accelerator 108. Note that in one embodiment, graphics accelerator 108 can be paused by a STALL signal provided by decode/control circuit 201 in the event that the memory sub-systems 210 and 220 are busy due to external device accessing the memory sub-systems 210 and 220.
Decode/control circuit 201 also generates control signals CTRL_REGS for controlling access to registers 109. Registers 109 also receive the write data signals W_DATA and the address signals Ai. In response, registers 109 provide output data signals D_REG.
Decode/control circuit 201 also provides control signals CTRL_DP, which are used by address/data latches 202 to latch addresses from external device 300 and data being transferred to and from the external device 300. Address/data latches 202 receive external device addresses and write data, which are latched in response to a subset of the CTRL_DP signals. The set of control signals CTRL_FB/Z, CTRL_TEX and CTRL_DP are asserted due to external device 300 or graphics accelerator 108 requiring access to the embedded memory subsystems 114-115. Decode/control circuit 201 receives the access control signals (e.g., CS#, CAS#, WE#, CS#, 3DCS#, UM# and LM#) from the external device via memory interface 104 and the control signals F_CTL and T_CTL from graphics accelerator 108 to determine weather an access is initiated by the external device 300 or the graphics accelerator 108. One of the bits in the registers 109 indicates which device is allowed access to the memory device 100 at any time (i.e., external device 300 or graphics accelerator 108). This bit is programmed by an external device. In order to avoid a deadlock, the registers 109 can be accessed by the external devices while the graphics accelerator 108 is accessing the embedded memory subsystems 114-115. The status of graphics accelerator 108 can be read concurrently with the graphics accelerator 108 accessing the embedded memory subsystems 114-115. The data mask bits (MSK) found in standard SDRAM/MDDR products are also latched in address/data latches 202 with the aid of the CTRL_DP signals.
Read data from the embedded memory subsystems 114-115, graphics accelerator 108, and registers 109 are selected with multiplexer 230 and latched in address/data latches 202 and provided to the external device 300 using a subset of the CTRL_DP signals.
External devices access the embedded memory subsystems 114-115 as one contiguously mapped memory. Although the memories 114-115 are two physically separate memories, memory map logic 203 maps contiguous linear external device addresses to access the two embedded memories. Multiplexer 204 selects the mapped output of memory map logic 203 when embedded memory subsystems 114-115 are accessed, and selects non-mapped addresses when the graphics accelerator 108 or the registers 109 are accessed by an external device. The selected address Ai from multiplexer 204 is further multiplexed with the addresses from the graphics accelerator 108 to access the embedded memory subsystems 114-115 using multiplexers 213 and 223. Graphics accelerator 108 outputs a frame buffer address AF and a texture address AT. As described above, multiplexer 213 is used to select the mapped external address Ai or the frame buffer address AF to access the frame/Z-buffer memory 114. Similarly, multiplexer 223 is used to select the mapped external address Ai or the texture buffer address AT to access the texture memory 115.
The external device data (EXDQ) to be written into memory device 100 by external device 300 is latched in address/data latches 202 and produced as write data signals W_DATA.
In another embodiment, the control signals CTRL_FB/Z and CTRL_TEX are produced by decode/control circuit 201 to comply with the required control signals (by using F_CTL and T_CTL) of the embedded memory blocks 210 and 220 of
The above embodiments describe external devices accessing the embedded memory subsystems 114-115 when graphics accelerator 108 is disabled. In another embodiment, an arbiter present in decode/control unit 201 is used to arbitrate memory accesses between the external devices via the memory interface 104 and the graphics accelerator 108. In order to accommodate memory access conflict due to the external device 300 and the graphics accelerator when only simultaneously attempting to access the embedded memory subsystems 114-115, the STALL signal is asserted by decode/control circuit 201, whereby graphics accelerator 108 would wait to access the embedded memory subsystems (i.e., stall) during a memory access. In another embodiment, a WAIT signal is asserted by the memory device pin, for sampling by an external device in which case the external device 300 would wait until the WAIT signal is de-asserted to complete an access or to start a new access. An example of a memory device which supports a WAIT pin is cellularRAM.
The embedded memory subsystems 114-115 can be large enough to allow for the graphics memories as well as additional memory for use by external devices for other functions. In such a case, the embedded memory subsystems can be logically partitioned to accommodate external devices as well as devices such as graphics accelerators to operate concurrently. In one embodiment, one of the logical partitions is used by graphics accelerator 108, and the second logical partition is used by external devices 300. The second logical partition of embedded memory acts as a standard memory device when accessed via the memory interface 104. Such an embodiment allows a cell phone user to suspend any game operation when answering a call and resume once the call is terminated. Alternatively, both the call and game can run concurrently. The call can be a person calling to talk or another device over a wireless network to interactively share the playing of a game.
With a graphics frame buffer with a VGA display which is 640×480 pixels, each pixel being 2 bytes, the memory for double frame buffering is 1228800 bytes. The Z-buffer for a 2-byte Z range is 614400 bytes. The total Frame and Z memory required is 1843200 bytes. For two textures with a base size of 640×480 with a texel depth of 2 bytes and associated Mip-Maps, the texture memory size is 1638400 bytes. The total embedded memory in one embodiment is 64 Mbits (8 MBytes), space constructed as 4 banks of 16 Mbits (2 MBytes), wherein each bank is constructed with a multi-bank architecture. Each bank of 2 Mbytes is addressed with a 20-bit address to access 2097152 address locations (220 for a 16 bit data bus).
In this embodiment, the first bank is large enough to store the frame and Z buffer. The second bank is large enough to store the textures. The extra storage space left in the memory of the first and second bank is utilized for other functions such as stencils planes for the graphics and/or a display list for the graphics. The third and fourth banks are used by external devices for other uses.
From an external device perspective, the individual banks are addressed using the bank address signals BA[1:0]. To access the third and fourth banks, when the graphics accelerator 108 is operational, BA[1:0] are given values of “10” and “11” when asserting the external command signals (EXCMD) for an MDDR protocol. An external device may provide bank address signals BA[1:0] having values of “00” or “01” to access the frame buffer and texture memory or the display lists, by coordinating or synchronizing with graphics accelerator 108. When the graphics accelerator 108 is not operating, the memory device 100 functions as a standard memory device wherein all four banks of memory are accessed as desired by external devices by asserting appropriate access commands and addresses with a MDDR protocol. In this case, the four banks are accessed by asserting appropriate bank address signals BA[1:0] with values of 00, 01, 10 or 11 to access the first, second, third and fourth banks, respectively. Additionally, row address and column addresses are asserted at the external address pins (EXADR), where in one embodiment the row address range is A[11:0] and the column address range is A[7:0].
In the described embodiment, embedded memory blocks 210 and 220 of embedded memory subsystems 114 and 115 each has a synchronous interface requiring dedicated clock, address and data signals.
As shown in
As shown in
The commands are the same as in the incorporated references. The embedded memory subsystems include multiple banks of memory, each bank having multiple sub-banks of memory.
Data strobes (not shown) indicating the output of valid data, in accordance with the standard MDDR protocol are also asserted when outputting data on the external data bus EXDQ.
The present invention is not limited to one on-chip accelerator such as graphics accelerator 108. It would be apparent to those skilled in the art how to include multiple accelerators. The address and data multiplexers would have to be expanded to accommodate multiple accelerators. Although the invention has been described in connection with several embodiments, it is understood that this invention is not limited to the embodiments disclosed, but is capable of various modifications, which would be apparent to a person skilled in the art. Accordingly, the present invention is limited only by the following claims.