Embodiments of the invention relate to a memory controller and to computer systems including the memory controller. More particularly, embodiments of the invention relate to a memory controller configured to perform memory block initialization and copy functions and to methods for performing memory block initialization and copy functions with reduced bus traffic.
A conventional computer system typically includes one or more memory modules for storing software applications and program data, and a memory controller that controls access to the memory module under the direction of a microprocessor. Conventional memory modules are typically powered up and initialized based on a predefined sequence of commands in order to operate properly (e.g., during a “boot” operation). Failure to follow the required procedures for power up and initialization may result in undefined operation.
Aside from boot or power-up initialization operations, memory initialization may be performed during normal system operation to reserve or allocate memory to one or more software programs or applications (e.g., a real-time streaming video application) being executed or scheduled to be executed by the computer system. These types of memory initializations are typically micro-managed by the processor. For example, an initialization program may be executed by the processor via a loop, with each iteration of the loop generating initialization commands which are sent to the memory controller instructing the memory controller to initialize one or more designated memory addresses. The initialization commands sent by the processor to the memory controller include memory writes instructing the memory controller to set the designated memory blocks to a given initialization value or logic level (e.g., a higher logic level or logic “1”, a lower logic level or logic “0”, etc.).
For example, referring to
Accordingly, as shown in the programming logic above, in order to initialize data at 10,000 memory addresses, the processor 12 executes a for-loop with 10,000 iterations, with each iteration generating an instruction (e.g., a write command), which is sent to the memory controller 16 via bus 14, for initializing data at one particular memory location or address in memory device(s) 18. Each generated instruction or write command includes one memory address and one initialization value. Accordingly, in the above-example, the processor 12 sends 10,000 memory addresses and 10,000 initialization values to the memory controller 16, which executes the initialization instructions. Further, while the programming logic provided above increments the parameter Address by 1 for each iteration of the for-loop, other conventional implementations may increment the parameter Address by a value other than 1 (e.g., a power of 2).
As shown above with respect to the example of conventional initialization programming logic, customized initializations of memory blocks for software applications may include a number of processor-executed write commands, which may consume valuable system resources (e.g., bus bandwidth, processor power, etc.). Further, the processor 12 must wait until after the initialization operation before issuing memory commands (e.g., read commands, write commands, etc.) for the target memory addresses, which may further delay the computer system 10. For example, the processor 12 may monitor the initialization operation in order to wait until the initialization operation is complete before issuing memory commands for the initialized memory addresses in memory 18.
Embodiments of the invention relate to a memory controller configured to perform memory block initialization and copy functions and to methods for performing memory block initialization and copy functions with reduced bus traffic.
Accordingly, an embodiment of the invention can include a memory controller comprising: logic configured to receive a start address of a memory; logic configured to receive an end address of the memory or a length; logic configured to receive a fill value; and logic configured to write the fill value to the memory in a fill range of arbitrary length defined by the start address and end address or length.
Another embodiment of the invention can include a method for initializing or copying data in a memory, performed at a memory controller, the method comprising: receiving a start address of a memory; receiving an end address of the memory or a length; receiving a fill value; and writing the fill value to the memory in a fill range of arbitrary length defined by the start address and end address or length.
Another embodiment of the invention can include a computer system, comprising: a processor configured to send one of a memory initialization instruction or a memory copy instruction including an arbitrary range of memory addresses to initialize or copy; and a memory controller coupled to the processor, wherein the memory controller is configured to receive the memory initialization instruction or memory copy instruction from the processor, and is configured to initialize or copy the range of memory addresses in accordance with the received instruction.
Another embodiment of the invention can include a method for memory initialization performed at a memory controller comprising; receiving a memory initialization command including a start address, an end address and an initialization value; setting a current address to the start address; writing the initialization value to the memory at the current address; incrementing the current address; and repeating the writing and incrementing, if the current address is not greater than the end address.
Another embodiment of the invention can include a method for copying memory performed at a memory controller comprising; receiving a memory copy command including a source address, a destination address and a copy count; copying data from the source address to the destination address; incrementing the source address and the destination address; incrementing a current count; and repeating the copying and incrementing, if the current count is not greater than the copy count.
The accompanying drawings are included to provide a further understanding of the invention, and are incorporated in and constitute a part of this specification. The drawings illustrate embodiments of the invention and, together with the description, serve to explain principles of the invention.
Aspects of the invention are disclosed in the following description and related drawings directed to specific embodiments of the invention. Alternate embodiments may be devised without departing from the scope of the invention. Additionally, well-known elements of the invention will not be described in detail or will be omitted so as not to obscure the relevant details of the invention.
The words “exemplary” and/or “example” are used herein to mean “serving as an example, instance, or illustration.” Any embodiment described herein as “exemplary” and/or “example” is not necessarily to be construed as preferred or advantageous over other embodiments. Likewise, the term “embodiments of the invention” does not require that all embodiments of the invention include the discussed feature, advantage or mode of operation.
Further, many embodiments are described in terms of sequences of actions to be performed by, for example, elements of a computing device. It will be recognized that various actions described herein can be performed by specific circuits (e.g., application specific integrated circuits (ASICs)), by program instructions being executed by one or more processors, or by a combination of both. Additionally, these sequence of actions described herein can be considered to be embodied entirely within any form of computer readable storage medium having stored therein a corresponding set of computer instructions that upon execution would cause an associated processor to perform the functionality described herein. Thus, the various aspects of the invention may be embodied in a number of different forms, all of which have been contemplated to be within the scope of the claimed subject matter. In addition, for each of the embodiments described herein, the corresponding form of any such embodiments may be described herein as, for example, “logic configured to” perform the described action.
It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of embodiments of the invention. Also, as used herein, the term “and/or” includes any and all combinations of one or more of the associated listed items.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of embodiments of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises”, “comprising,”, “includes” and/or “including”, when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
In order to better understand embodiments of the invention, an example computing system will be described, followed by an example of a memory initialization process performed within the example computing process.
In the embodiment of
Further, memory 108 may be representative of any well-known type of memory. For example, memory 108 may include one or more of a Single Inline Memory Module (SIMM), a Dual Inline Memory Module (DIMM), flash memory (e.g., NAND flash memory, NOR flash memory, etc.), random access memory (RAM) such as synchronous RAM (SRAM), magnetic RAM (MRAM), dynamic RAM (DRAM), and electrically erasable programmable read-only memory (EEPROM).
At the heart of complicated modern digital system designs is an interconnect that routes transfer requests on a bus (e.g., 104/204). It will be appreciated that the interconnect is logic that routes transaction requests and write data from masters to slaves and read data and write responses from slaves to masters (e.g., to/from the (master) sending device 202 to (slave) receiving device 206). For example, the interconnect (which can be part of 102/202) can be used in a system with multiple masters (e.g., multiple processors) and/or multiple memory controllers. For example, the bus 104 moves information amongst the various processing functions resident in the system 100. The bus structure can include independent and separate address, read, and write buses. These connections allow for communication of transfer addresses from the sending device to the receiving device, the communication of read data from the receiving device to the sending device, and the communication of write data from sending to receiving device.
As illustrated in
For example, using the illustrated two-channel bus structure 204, the sending device 202 may initiate a read or write transfer, or any combination thereof, by broadcasting the address, or addresses, on the transmit channel 208 during an address tenure. In the case of a read transfer request this is the only information that needs to be broadcast on the transmit channel 208. The receiving device 206 acknowledges this broadcast and subsequently provides the requested data by broadcasting the read data on the read data channel 210. In the case of a write transfer request the master (e.g., sending device 202) can subsequently follow the broadcast of the address on the transmit channel by broadcasting the write data to the receiving device 206 in a write tenure via the transmit channel 208. In the case of a memory initialization command both the “start” address and “end” (e.g., either an end address or length) for the memory initialization are broadcast simultaneously to the receiving device 206 (e.g., memory controller) during an address tenure. Further, an initialization value may be subsequently transmitted in a data tenure as an optional part of the memory initialization bus command according to embodiments of the invention.
The transmit channel may include control/signaling bits for indicating the type of data being broadcast (e.g., write address, read address, data) as part of the bits transmitted on transmit channel 208. Alternatively, a secondary signaling/control connection/bus (not shown) may be provided so the entire width (e.g., 64 bits) of the transmitting channel is available to send address/data information. Likewise, a secondary signaling/control connection/bus (not shown) may be provided from the receiving device to the sending device for control/signaling information. Details regarding secondary connections in the two-channel bus structure 204 can be found in the aforementioned U.S. patent application Ser. No. 10/833,716, so additional details will not be discussed further herein.
The sending device 202 may have control of the transmit channel 208 and may broadcast one or more transfer addresses prior to, during, or after an active write data tenure. Also the transmit channel 208 and the read data channel 210 may be independent. Accordingly, the broadcasting of address and write data by the sending device 202 may coincide with the broadcasting of read data by the receiving device 206 back to the sending device 202, which produces a very compact and efficient bus structure 204.
Another aspect of the-two-channel bus 204 is the capability to facilitate the pipelining of multiple data transfer requests in a single broadcast cycle from the sending device 202 to the receiving device 206. The broadcasting of multiple addresses at once increases performance of the bus 204. For example, by presenting a single bus request with the “start” and “end” addresses for a desired memory initialization operation in a single broadcast cycle, the command can be presented in a very efficient fashion.
In one embodiment of the invention, the transmit channel 208 and the read data channel 210 are 64-bits wide each. The transfer addresses presented to the receiving device 206 are 32-bits wide. This allows the sending device 202 to provide two transfer addresses, e.g., address A and address B, on the transmit channel 208 during a single broadcast cycle. In this case a broadcast cycle can be defined as one clock cycle.
For example, as illustrated in
In another embodiment as illustrated in
In yet another embodiment as illustrated in
As illustrated above this special bus command of the two-channel bus structure provides a efficient way of implementing the memory initialization function. For example, efficiencies include but are not limited to: a single bus command/transaction to create the memory initialization; the interconnect designates which memory controller to steer the command to based on the “start” address; and selectable formats with and without an “initialization” value. Also, as noted above, the special bus command can define an efficient memory copy function. Further, the end addresses may be substituted with lengths (e.g., bytes of memory) to be initialized or copied.
Embodiments of the invention place the functionality of memory initialization in the memory controller, which saves both time and energy. Time is saved since the processor can continue on with processing instead of looping over addresses (e.g., compare N processes performed in
Generally, as will be described in greater detail below with respect to FIGS. 3 and 4A-B, the processor 102 may issue initialization instructions to the memory controller 106 which instruct the memory controller 106 to initialize a plurality of memory addresses (e.g., memory addresses positioned within a designated memory address range).
In the embodiment of
In 305, the processor 102 generates initialization instructions based on the number of memory addresses determined for initialization from 300. The initialization instructions can include a range of memory addresses to be initialized (e.g., a start and end address as discussed in relation to
In an example, assume that 10,000 memory addresses are determined to be initialized in 300, and that the initialization value is “0”. With these assumptions, in an example, the initialization instructions may be represented with programming logic as follows:
As shown in the example programming logic above, the initialization instructions may be relatively simple as well as relatively short. As will be appreciated, the above example programming logic instructs the memory controller 106 to initialize memory addresses [0000] through [9999] with logic “0”, per the assumptions above. However, it is understood that other examples of programming logic need not initialize the particular addresses given above in order to initialize 10,000 memory addresses, but rather may designate any memory address range available within the memory 108. In an example, only two addresses (e.g., [0000] and [9999]) need be sent from the processor 102 to the memory controller 106 in support of the initialization operation. In another example, a single address (e.g., [0000]) and an offset value or length (e.g., 10000) may be sent in place of two separate memory addresses from the processor 102 to the memory controller 106 in support of the initialization operation.
In yet another example, the initialization instructions may instruct the memory controller 106 to initialize a first set of memory addresses with data copied from other memory addresses (see, e.g.,
As shown in the example 2 programming logic above, the memory controller 106 is instructed to initialize memory addresses [0000] through [9999] with data at memory addresses [10000] to [19999]. Again, each of the two respective memory address ranges in this example may either be represented as a set of two memory addresses, or alternatively a single memory address and an offset value or length. The initialization instruction example 2 is slightly more complex from the standpoint of the memory controller 106, as the memory controller 106 reads the data at the memory addresses to be copied as well as writes the read data to the corresponding target memory address.
Returning to the embodiment of
Optionally, a memory copy can be detected using the method 400 illustrated, if the received command is for a memory copy. In decision block 406, a memory copy instruction is detected. If the instruction is for a memory copy, the method can further include receiving a first read address of the memory to copy, in block 412. A read range of the memory is read beginning at the first read address that corresponds to the fill range, in block 414. For example, the read range may be established by the first read address and an ending read address (see, e.g.,
As illustrated, if the end address is reached in block 418, the method ends. If not, in block 420, the start and first address can be updated to the next address to be read and written, respectively. Alternatively, a pointer to the current read and write addresses can be updated and the original start and first read addresses can be maintained. The process can then continue until the memory is copied or initialized over the designated range.
In an alternative example, while not expressly shown in
In another alternative example, if the initialization of block 400 is executed with programming logic similar to the initialization instruction example 2 (e.g., a memory copy type operation which copies memory at a given memory device from one portion to another portion), block 452 may not return the initialization value, but rather may advance directly to block 454. For example, in a memory copy-type operation, the initialization value is not necessarily known at the memory controller 106, but instead may require a read operation of one or more memory addresses in order to ascertain. Accordingly, based on system design preferences, the initialization of block 400 need not be interrupted in order to respond with the read data in this scenario. For example, a read command for a memory address in the target or destination portion of a memory copy operation may either be ignored until it can be serviced, or alternatively may be added to a read command queue which is performed after the initialization 400 completes or after the memory address associated with the read command has been initialized or copied (e.g., based on a pointer indicating a current memory address position of the initialization process).
In block 454, while the initialization process of block 400 is being performed, the memory controller determines whether any subsequent write commands have been received from the processor 102. If the memory controller 106 determines that one or more write commands have been received from the processor 102, the process advances to block 456; otherwise, the process advances to block 458. In block 456, the memory controller 106 adds the one or more write commands to a write buffer/queue. In an example, the write queue may be stored locally at the memory controller 106. In another example, the write command queue may be configured to store up to a threshold number of write commands. In this example, any write commands received in addition to the threshold number (e.g., write commands received after the write queue is full) may be ignored until they can be serviced (i.e., not stored at the memory controller 106) and/or may not be acknowledged (which would allow the memory controller to apply back pressure to the bus/processor).
In block 458, the memory controller 106 determines whether the initialization operation of block 400 is complete (e.g., whether all or a portion containing the designated memory addresses from the initialization instructions has been initialized). If the memory controller determines that the initialization operation of block 400 is complete, the process of
Another embodiment of the invention is illustrated in
It will be appreciated that the embodiments of the invention are not limited to the examples provided in the foregoing. For example, the mem_copy instruction may include a source start and end address and a destination start and end address, instead of a copy count as describe above. Likewise, the mem_init instruction may contain a start address and length, instead of a start and end address. However, regardless of the specific format for communicating the instructions, each embodiment substantially reduces the bus bandwidth and processor power used by reducing the number of processor initiated transactions that are communicated to the memory controller for a given function (e.g., mem_init or mem_copy).
For example, in one embodiment, the initialization logic 140 can include logic configured to receive a start address of a memory, logic configured to receive an end address of the memory or a length, logic configured to receive a fill value, and logic configured to write the fill value to the memory in a fill range of arbitrary length defined by the start address and end address or length. The initialization logic 140 can be configured to use a transfer queue 150 of memory controller 106 that is also shared with other masters (e.g., DSP, CPU, etc.). The fill data can be placed on the transfer queue 150 by initialization logic 140 using established protocols for the memory controller 106. Accordingly, the initialization logic 140 can be more easily integrated into existing memory controller designs and work cooperatively with other masters in the system.
For example, as illustrated in
Additionally, since the fill data/value will be the same, a fill register 142 can be included in the memory controller 106 to queue the fill values prior to placing them in transfer queue 150 for writing to the memory 108. One advantage of having a separate fill register 142 is it prevents writing redundant data to the write buffer 152, which would limit the spaces in the write buffer 152 available for the other masters (e.g., DSP, CPU).
As discussed above, the memory controller 106 is configured to process a read/write instructions from one or more master devices that communicate with the memory controller over a bus 104 (e.g., AXI bus, two-channel bus, etc.). However, at least one of the start address, end address or length, or fill value can be communicated to the memory controller over an alternative bus 132 (e.g., a configuration bus). Also, as discussed in relation to
In another aspect of the invention, the initialization logic can include logic configured to detect a read request for a memory address within the fill range and logic configured to return the fill value prior to actually writing the fill value to the memory address specified in the read request. Accordingly, reads can be serviced even before the fill value is written to the memory which will improve the responsiveness of the system. Likewise, another aspect of the invention can include logic configured to update the start address to reflect the last address value written to make memory addresses initialized with the fill value available for access. Accordingly, by updating the start address to the next address to be written any memory address already initialized can be available for use by the rest of the system.
In another embodiment of the invention, the initialization logic 140 or memory controller 106 can further include logic configured to receive a first read address of the memory to copy, logic configured to read a read range of the memory beginning at the first read address, wherein a length of the read range corresponds to a length of the fill range; and logic configured to update the fill value based on a value read from each address of the read range prior to writing a corresponding address of the fill range with the fill value. Accordingly, the memory controller 106 can perform a local copy (see, e.g., Instruction Example 2) using the initialization logic 140. In this case, instead of a fixed value for the fill value, the fill value can be updated from the data read from the memory locations defined in the copy command and the updated values can be used to initialize the memory space. Accordingly, the initialization logic 140 can place a read request for the first read memory address on the transaction queue 150, capture the read data from memory (e.g., 108) and then place a write command on the transfer queue 150 to write the read value to the start address. The read and write addresses can be updated as the process continues until the end address is reached for both the read range and the fill range. The values read from the read range can be stored in the write buffer 152 or another buffer associated with initialization logic 140 until they are placed on the transfer queue for writing back to the memory. Accordingly, this aspect of the invention can also reduce the traffic on the bus (e.g., 104 or 204/208) and improve memory copy performance as the process is performed locally at the memory controller 106.
As described above the initialization logic may perform both the memory initialization and memory copy instructions, based on the type of instruction received. However, embodiments of the invention can also include individualized logic for each operation, which may be realized as separate state machines for each function. Likewise, embodiments of the invention are not limited to the illustrated configuration of buffers, registers, etc., as these may be shared or separated as desired by the system designer.
It will be appreciated that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, and symbols that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Further, those of skill in the art will appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the invention.
The various illustrative logical blocks, modules, and circuits described in connection with the embodiments disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration.
The methods, sequences and/or algorithms described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC. The ASIC may reside in a user terminal (e.g., access terminal). In the alternative, the processor and the storage medium may reside as discrete components in a user terminal.
Embodiments of the invention being thus described, it will be appreciated that the same may be varied in many ways. For example, while the computing system 100 of
While the foregoing disclosure shows illustrative embodiments of the invention, it should be noted that various changes and modifications could be made herein without departing from the scope of the invention as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the embodiments of the invention described herein need not be performed in any particular order. Furthermore, although elements of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.