Embodiments of the present invention relate to hardware for supporting source synchronous memory standards. More specifically, embodiments of the present invention relate to a method and apparatus for performing memory interface calibration on integrated circuits such as field programmable gate arrays (FPGAs).
Source synchronous memory standards are used to enable high-speed data transfer between processing devices and memory systems. Some of these standards include reduced latency dynamic random access memory (RLDRAM), quad data rate (QDR), and double data rate (DDR).
Memory interfaces for memory systems compliant with source synchronous memory standards perform double data rate transfers where data is transferred on both the rising and falling edges of the clock. When there is no setup or hold time requirement for the data, a data valid window of a half cycle is available. However, when setup and hold time requirements are present, the data valid window can be much smaller. The presence of board layout, process, voltage, and temperature variations further reduces the size of the data valid window. Consequently, the slightest amount of skew on the data lines could likely result in incorrect data transfers.
A memory interface implemented on integrated circuits such as FPGAs are operable to perform low level data rate conversions and synchronization of clocks to allow components on the integrated circuits to communicate with memory system. When designing an external memory interface, designers encounter the challenge of providing a design that supports multiple source synchronous memory standards without requiring a large number of changes. This challenge extends to designing memory interfaces capable of performing calibration necessary for operating with a specific source synchronous memory standard.
According to an embodiment of the present invention, a universal memory interface includes a sequencer unit operable to calibrate at least one of a data delay and a strobe delay between the universal memory interface and a memory system. The calibration results in center aligning a data signal with a clock that strobes it and expands a valid window for sampling the data. The sequencer unit may implement a processor or a finite state machine to execute a calibration procedure. The sequencer includes additional components to implement lower level primitives associated with adjusting delay chains and implementing low-level read and write commands to the memory system. According to an aspect of the present invention where a processor is implemented to execute the calibration procedure, the calibration procedure is implemented using program code and the calibration procedure may be loaded onto the integrated circuit in response to identifying a type of the memory system.
The features and advantages of embodiments of the present invention are illustrated by way of example and are not intended to limit the scope of the embodiments of the present invention to the particular embodiments shown.
In the following description, for purposes of explanation, specific nomenclature is set forth to provide a thorough understanding of embodiments of the present invention. It will be apparent to one skilled in the art that specific details in the description may not be required to practice the embodiments of the present invention. In other instances, well-known circuits, devices, and components are shown in block diagram form to avoid obscuring embodiments of the present invention unnecessarily.
The universal memory interface 110 includes an external memory interface (EMI) 130, which handles low level data rate conversions and timing specifics of communicating with the memory system. According to an embodiment of the present invention, the external memory interface 130 provides a layer that converts a high speed double data rate interface provided by modern high-speed memory devices such as RLDRAM, QDR, and DDR into a single or half-rate interface suitable for use within the integrated circuit 100.
Referring back to
The sequencer unit 140 uses connections 141-145 to transmit control signals to communicate with external memory interface 130 and selector 150. A control signal may be transmitted on connection 141 to program selector 150. Selector 150 is operable to control whether the memory controller 120 or the sequencer 140 has a direct connection to the external memory interface 130. During calibration, selector 150 provides the sequencer unit 140 with a direct connection to the external memory interface 130. During normal operation of the universal memory interface 100, the selector 150 provides the memory controller 120 with a direct connection to the external memory interface 130. A control signal may be transmitted on connection 142 to direct a phase locked loop in the external memory interface 130 to adjust a phase of a clock signal during calibration. Control signals transmitted on connections 143-145 may be used to change input and output delays by controlling delay chains associated with data (DQ), data mask (DM), and data strobe (DQS) signals. Control signals transmitted on connections 143-145 may also be used to set various latencies within the external memory controller 130 and to provide the external memory controller 130 with information as to when the memory initialization protocol has stopped running.
The sequencer unit 400 includes a plurality of managers 431-434 that implement timing, device, and memory protocol specific tasks. By equipping managers with the functionalities to handle lower level timing, memory protocol, and bit manipulation operations, the calibration procedure executed by the processor 410 may be focused on higher level functionalities specific to the memory system type. According to an embodiment of the present invention where the integrated circuit is a FPGA, the managers 431-434 are hardware components with functionalities specified in register transfer level (RTL) code and implemented by logic on the FPGA. The processor 410, memory 420, managers 431-434, and debug interface 440 are connected via a bus 450. It should be appreciated that the bus 450 may be implemented by a single bus or a plurality of buses and allow the components in the sequencer unit 400 to transmit data to one another.
The sequencer unit 400 includes a debug interface unit 440. The debug interface unit 440 is operable to provide a mechanism for interacting with the managers 431-434 and for tracking the progress of the calibration procedure. For example, the debug interface unit 440 may provide a user interface to examine data valid windows, data/strobe delay chain settings, and specified calibration stages. The debug interface 440 may be used as a debugging tool for the managers 431-434, the calibration procedure in memory 420, and the external memory interface. According to an embodiment of the present invention, the debug interface unit 440 provides an interface for a calibration procedure to be loaded into memory 420. The calibration procedure may be loaded into memory 420 in response to identifying a type of memory system that is connected to the integrated circuit. Alternatively, the calibration procedure may be loaded into memory 420 in response to there being a new or modified calibration procedure.
The scan chain (SCC) manager unit 431 is operable to set various delays and/or phase adjustments on the input outputs and strobes used to latch data on the integrated circuit which the universal memory interface resides on. The setting may be performed in response to the calibration procedure. According to an embodiment of the present invention, dynamic delay chains are present on the input, output, and output enable paths of the universal memory interface which are configurable at runtime. The scan chain manager unit 431 may accesses these chains to add delay on incoming and outgoing signals.
The read write (RW) manager unit 432 is operable to issue protocol specific low-level read and write commands to the memory system during calibration in response to the calibration procedure. The types of commands that are supported include write configuration, refresh, write guarantee, write/read burst, and write/read back-to-back.
According to an embodiment of the sequencer unit 400, the read write manager 432 includes a finite state machine, a global timer, and independent data paths for address/command, write data, and read data. In response to receiving a request to access the memory system, the finite state machine transmits an appropriate command to the memory system via the appropriate data path. The global timer is also set to run to inform the finite state machine of an appropriate period of time to transmit the command as required by the memory system. In one embodiment, the data paths are activated by the finite state machine during a period of time specified in operation code received by the calibration procedure.
According to an embodiment of the read write manager unit 432, a pattern register is implemented when writing to and reading from a memory system. When writing data to the memory system, write data may be constructed from a pattern register that specifies how data lines vary over a write burst. An inversion bit in the pattern register controls how data changes across bit lanes. Use of the pattern register allows the sequencer unit 400 to conserve use of memory 420.
Referring back to
The PLL manager unit 434 is operable to provide access to the external memory interface's phase locked loop. PLLs are used to generate a number of clocks used by the external memory interface and memory controller. The PLL manager unit 434 provides an interface to change the phases of the clocks during calibration.
Functions of the sequencer unit 400 such as delay chain management, memory system management, external memory face interface, and PLL management, and calibration procedure execution are assigned to components on the sequencer unit 400 in a modular fashion to allow for efficient design and implementation. Utilization of a processor 410 and memory 420 to control execution of the calibration instead of a finite state machine requires less space on an integrated circuit. By using the processor 410 and memory 420 to control execution of the calibration procedure, the calibration procedure may also be debugged, expanded, or modified without having to change other components in the sequencer unit 400. Thus, recompilation of a design for components implementing the sequencer unit 400 is not required when the calibration procedure is changed.
At 601, a memory system is initialized. Memory system initialization may involve asserting a reset signal, stopping a clock, loading registers in the memory system, and/or other procedures required for the memory system to be initialized. Memory system initialization configures the memory system to support requested burst lengths, read and write latencies, and other user requested memory parameters. According to an embodiment of the present invention, the calibration procedure prompts a read write manager and external memory interface manager unit on a sequencer unit to issue appropriate instructions to perform initialization.
At 602, the data strobe delay is set to zero. According to an embodiment of the present invention, the calibration procedure prompts the scan chain manager unit to set the data strobe delay to zero.
At 603, VFIFO calibration is performed. According to an embodiment of the present invention, VFIFO calibration involves identifying a cycle in which data is returned from the memory system. According to an embodiment of the present invention, the calibration procedure prompts the external memory interface manager unit to calibrate the VFIFO.
At 604, input path deskew is performed. According to an embodiment of the present invention, input path deskew involves determining a delay that is to be applied on the input path of a data signal so that the input data and input data clock (strobe) are aligned. A plurality of delay settings may be tested to identify reads that can be successfully be completed. According to one exemplary embodiment, delay settings that result in successful reads of the input data may be saved and the midpoint of the range of settings may be applied to the input data path at the end of the calibration procedure. According to an embodiment of the present invention, the calibration procedure prompts the scan chain manager unit to adjust delay settings while the read write manager unit issues instructions to write and read test data.
At 605, LIFO calibration is performed. According to an embodiment of the present invention, LIFO calibration involves lowering the read latency until the reduce latency that may be the minimum latency is still able to guarantee reliable reads. According to an embodiment of the present invention, the calibration procedure prompts the external memory interface manager unit to calibrate the LIFO. The external memory interface manager unit may increase or decrease a latency value identified by the calibration procedure.
At 606, output path deskew is performed. According to an embodiment of the present invention, output path deskew involves determining a delay that is to be applied to the output path of a data signal so that the output data and output data clock (strobe) are aligned. A plurality of delay settings may be tested to identify writes that can be successfully be completed. According to one exemplary embodiment, delay settings that result in successful writes of the output data may be saved and the midpoint of the range of settings may be applied to the output data path at the end of the calibration procedure. According to an embodiment of the present invention, the calibration procedure prompts the scan chain manager unit to adjust delay settings while the read write manager unit issues instructions to write and read test data.
At 607, it is determined whether the current pass through the calibration procedure is a first pass through the calibration procedure. If the current pass is the first pass, control proceeds to 608. If the current pass is not the first pass, control proceeds to 609.
At 608, the data strobe delay is adjusted to a non-zero value. According to an embodiment of the present invention, the non-zero value selected for adjustment may be based on the range of delay settings that resulted in successful reads of input data and writes of output data in order to maximize the use of the delay setting for a next pass of input path deskew and output path deskew. According to an embodiment of the present invention, the calibration procedure prompts the scan chain manager unit to set the data strobe delay to a value determined by the calibration procedure. Control returns to 603.
At 609, data mask deskew is performed. According to an embodiment of the present invention, data mask deskew involves determining a delay that is to be applied to a data mask signal so that the data mask signal and data mask clock (strobe) are aligned. Similar to the input path deskew and output path deskew procedures, according to one embodiment, a plurality of delay values for a data mask pin may be swept and the midpoint of the range of successful reads and writes may be applied at the end. According to an embodiment of the present invention, the calibration procedure prompts the scan chain manager unit to adjust delay settings while the read write manager unit issues instructions to write and read test data. According to an embodiment of the present invention, write and reads are tested separately when performing data mask deskew.
At 610, control terminates the procedure.
According to an alternate embodiment of the sequencer unit illustrated in
At 802, the system is placed. According to an embodiment of the present invention, placement involves placing the mapped logical system design on the target device. Placement works on the technology-mapped netlist to produce a placement for each of the functional blocks. According to an embodiment of the present invention, placement includes fitting the system on the target device by determining which resources on the logic design are to be used for specific logic elements, and other function blocks determined to implement the system as determined during synthesis. Placement may include clustering which involves grouping logic elements together to form the logic clusters present on the target device.
At 803, the placed design is routed. During routing, routing resources on the target device are allocated to provide interconnections between logic gates, logic elements, and other components on the target device.
At 804, an assembly procedure is performed. The assembly procedure involves creating a data file that includes information determined by the compilation procedure described by 801-803.
At 805, the target device is programmed. The data file created may be a bit stream that may be used to program the target device. By programming the target with the data file, components on the target device are physically transformed to implement the system.
At 806, a calibration procedure is identified for the universal memory interface. According to an embodiment of the present invention, the calibration procedure is identified in response to identifying a type of memory system that is to be coupled to or supported by the universal memory interface.
At 807, the sequencer unit is configured with the calibration procedure identified. According to an embodiment of the present invention, the calibration procedure is implemented in code and loaded onto a memory of the sequencer unit. A debugger interface unit may provide an interface to upload the code onto the memory. The sequencer unit executes the calibration procedure by having a processor execute the code in the memory.
Embodiments of the present invention have been described with reference to performing a calibration procedure that center aligns a data signal with a clock that strobes it and expanding a valid window for sampling data. It should be appreciated that other types of calibration procedures may also be performed using the embodiments disclosed. For example, calibration procedures that edge aligns a data signal with a clock that strobes it and other procedures may also be implemented using the sequencer unit and universal memory interface described.
It should be appreciated that embodiments of the present invention such as the procedures illustrated in
The device 900 includes memory blocks. The memory blocks may be, for example, dual port random access memory (RAM) blocks that provide dedicated true dual-port, simple dual-port, or single port memory up to various bits wide at up to various frequencies. The memory blocks may be grouped into columns across the device in between selected LABs or located individually or in pairs within the device 900. Columns of memory blocks are shown as 921-924.
The device 900 includes digital signal processing (DSP) blocks. The DSP blocks may be used to implement multipliers of various configurations with add or subtract features. The DSP blocks include shift registers, multipliers, adders, and accumulators. The DSP blocks may be grouped into columns across the device 900 and are shown as 931.
The device 900 includes an embedded processor 950. The embedded processor is operable to execute program instructions stored in a memory block. In an alternative embodiment of the device 900 where embedded processor 950 is not implemented, it should be appreciated that the programmable resources on the device 900 may be programmed to implement a processor operable to execute program instructions stored in a memory block.
The device 900 includes a plurality of input/output elements (IOES) 940. Each IOE feeds an IO pin (not shown) on the device 900. The IOEs 940 are located at the end of LAB rows and columns around the periphery of the device 900. Each IOE may include a bidirectional IO buffer and a plurality of registers for registering input, output, and output-enable signals.
The device 900 may include routing resources such as LAB local interconnect lines, row interconnect lines (“H-type wires”), and column interconnect lines (“V-type wires”) (not shown) to route signals between components on the target device.
In the foregoing specification, embodiments of the invention have been described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the embodiments of the invention. The specification and drawings are, accordingly, to be regarded in an illustrative rather than restrictive sense.
This application claims priority to provisional U.S. patent application Ser. No. 61/409,113 filed Nov. 1, 2010, entitled “Method and Apparatus for Performing Memory Interface Calibration”, and U.S. patent application Ser. No. 61/456,186 filed Nov. 2, 2010, entitled “Method and Apparatus for Performing Memory Interface Calibration”, the full and complete subject matter of which is hereby expressly incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61409113 | Nov 2010 | US | |
61456186 | Nov 2010 | US |