Demands placed on computerized systems are continuously increasing. Such demands may test the limits of system capacities in terms of, for example, central processing unit (CPU) speed, memory capacity and/or memory and memory interfaces speed. Internal clock frequencies of microprocessors and/or CPUs have now crossed the GHz boundary. Moreover, multiple CPU systems are now common in the industry.
The above described trends may affect many components of a computerized system, in particular, memory chips. Timing margins of memory sub-systems, such as double data rate (DDR) interface are shrinking quickly as the double data rate 3 (DDR3) and its extensions push beyond 1600 Mega Transactions/second (MTs) and double data rate 4 (DDR4) reach 2400 Mega Transactions/second.
One of the problems encountered when designing and implementing high speed interfaces is skew. In the context of data interface buses, skew may be described as the inconsistency of signal timing or phase across multiple lines of a communication and/or interface bus. Such inconsistency may result from differences in length, routing and/or resistance of wires comprising a communication and/or interface bus. Skew may generally be observed, for example, when data is communicated from a memory towards a memory controller hub (MCH), and/or within a MCH, for example, due to different circuitry comprising the paths of different wires. In the presence of skew, data integrity may be jeopardized.
Deskew, as known in the art, is the elimination or reduction of skew. Busses and/or interfaces typically employ byte lane deskew. Byte lane deskew assumes that skew within a byte is negligible and consequently employs skew counter-measures aimed at synchronizing bundles of eight (8) wires with other bundles of eight (8) wires, rather than synchronizing single wires with one another.
DQ lanes and DQ strobe (DQS) are typically matched within 100 mils, which translates into about 20 picoseconds (ps), a relatively small number. However, the total mismatch between the DQ lanes also includes the delay mismatch from the DQ pads to the latches. For the read path, the 6σ mismatch among eight DQ paths is estimated to be 7% to 10% of nominal path delay (about 1 unit interval (UI) for DDR3). So, the total delay mismatch, around 0.1*UI, becomes fairly large compared to the timing budget at high data rates.
The subject matter regarded as the invention is particularly pointed out and distinctly claimed in the concluding portion of the specification. The invention, however, both as to organization and method of operation, together with objects, features and advantages thereof, may best be understood by reference to the following detailed description when read with the accompanied drawings in which:
It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity. Further, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements.
In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of embodiments of the invention. However it will be understood by those of ordinary skill in the art that the embodiments of the invention may be practiced without these specific details. In other instances, well-known methods, procedures, components and circuits have not been described in detail so as not to obscure the embodiments of the invention.
Some portions of the detailed description which follow are presented in terms of algorithms and symbolic representations of operations on data bits or binary digital signals within a computer memory. These algorithmic descriptions and representations may be the techniques used by those skilled in the data processing arts to convey the substance of their work to others skilled in the art.
An algorithm is here, and generally, considered to be a self-consistent sequence of acts or operations leading to a desired result. These include physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared, and otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers or the like. It should be understood, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities.
Unless specifically stated otherwise, as apparent from the following discussions, it is appreciated that throughout the specification discussions utilizing terms such as “processing.” “computing,” “calculating,” “determining,” or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulate and/or transform data represented as physical, such as electronic, quantities within the computing system's registers and/or memories into other data similarly represented as physical quantities within the computing system's memories, registers or other such information storage, transmission or display devices.
Embodiments of the present invention may include apparatuses for performing the operations herein. This apparatus may be specially constructed for the desired purposes, or it may comprise a general purpose computer selectively activated or reconfigured by a computer program stored in the computer. Such a computer program may be stored in a computer readable storage medium, such as, but is not limited to, any type of disk including floppy disks, optical disks, CD-ROMs, magnetic-optical disks, read-only memories (ROMs), random access memories (RAMs), electrically programmable read-only memories (EPROMs), electrically erasable and programmable read only memories (EEPROMs), magnetic or optical cards, or any other type of media suitable for storing electronic instructions, and capable of being coupled to a computer system bus.
The processes and displays presented herein are not inherently related to any particular computer or other apparatus. Various general purpose systems may be used with programs in accordance with the teachings herein, or it may prove convenient to construct a more specialized apparatus to perform the desired method. The desired structure for a variety of these systems will appear from the description below. In addition, embodiments of the present invention are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the invention as described herein.
Unless explicitly stated, the method embodiments described herein are not constrained to a particular order or sequence. Additionally, some of the described method embodiments or elements thereof can occur or be performed at the same point in time.
According to embodiments of the invention, bit lane deskew may reduce or eliminate skew between distinct wires rather than skew between bundles of wires.
Reference is made to
According to embodiments of the invention, interface bus 120 may comprise any suitable number of wires. For example, bus 120 may comprise eight (8) data wires as shown by DQ[0] to DQ[7]. Alternatively, bus 120 may comprise, for example 16, 32 or 64 data wires. Bus 120 may further comprise strobe wires as shown by DQS and DQSb. According to embodiments of the invention, interface bus 120 may be, for example, an integrated drive electronics (IDE) bus, an advanced technology attachment (ATA) bus, serial advanced technology attachment (SATA) bus, small computer system interface (SCSI) bus, extended industry standard architecture (EISA) bus, video electronics standards association (VESA) bus or any other suitable bus, possibly in accordance with a connected device.
According to embodiments of the invention, memory controller hub (MCH) 130 may provide the system bus interface, memory controller, accelerated graphics port (AGP) interface, and hub interface for input/output (I/O). According to embodiments of the invention, input circuits 141 to 148 may perform any functionality that may be required in association with input signals. For example, input circuits 141 to 148 may attenuate, amplify or invert input signals. According to embodiments of the invention, input circuits 141 to 148 may comprise active and/or passive components, for example, logic gates, buffers, switches, latches, or any other suitable input circuit. According to embodiments of the invention, output circuits 151 to 158 may perform any functionality that may be required in association with output signals. For example, output circuits 151 to 158 may attenuate, amplify or invert output signals. According to embodiments of the invention, output circuits 151 to 158 may comprise active and/or passive components, for example, logic gates, buffers, switches, latches, or any other suitable output circuit. According to embodiments of the invention, delay cells 131 to 138 may delay a signal passing through them as described below.
Reference is made to
According to embodiments of the invention, a computer program may access delay cell 220 and alter its configuration parameters, for example, an introduced time delay may be altered in such fashion in accordance with deskewing parameters or requirements. According to other embodiments of the invention, controller 225 may control delay cell 220. According to other embodiments of the invention, controller 225 may be embedded in, or implemented as a circuit or chip, an application specific integrated circuit (ASIC), a field-programmable gate array (FPGA) or any other suitable hardware. According to embodiments of the invention, controller 225 may control delay cell 220 according to input received from majority detector 240. According to embodiments of the invention, majority detector 240 may be used in order to sample and/or verify an output signal, for example, as described further below. According to embodiments of the invention, majority detector 240 may additionally be used in order to aid the verification of the output signal by filtering out noise.
It will be noted that although a single delay cell is shown in
Reference is made to
Reference is made to
Reference is made to
According to embodiments of the invention and as shown by block 330, the flow may include sending a training pattern, for example from memory 110 towards MCH 130. According to embodiments of the invention, a training pattern may be any suitable pattern, for example a pattern of 0101 may be used. According to embodiments of the invention, a training pattern may be sent simultaneously over all wires comprising an interface bus, for example, DQ[0] to DQ[7] in
According to embodiments of the invention, the flow may include measuring transmission delays associated with the wires comprising an interface bus. According to embodiments of the invention and as shown by block 340, the flow may further include finding the highest and lowest delay associated with a plurality of wires comprising an interface bus. Reference is additionally made to
According to embodiments of the invention and as shown by 412, the maximal delay may be determined as the point where the vector comprising all majority detectors assumes, for the first time, a value of all ones. For example if eight wires comprise a bus under test, then the value of the vector may be 1111111. At such point, all wires may have communicated the signal. Accordingly, such point may reflect the maximal delay.
According to embodiments of the invention, the flow may include calculating a corrective timing parameter. According to embodiments of the invention, a corrective timing parameter may be a timing or phase modification value that may be computed and further applied to the DQS timing or phase. For example, a corrective timing parameter may comprise a value and direction for modifying a timing parameter, for example a delay of 0.1 unit intervals or any other specified value of time or phase. According to embodiments of the invention and as shown by block 350, the flow may include adjusting a setting of the DQS timing (or phase). According to embodiments of the invention, the DQS timing (or phase) may be set to a value greater than the minimal delay measured for the wires comprising an associated interface bus and smaller than the maximal delay measured for the wires comprising an associated interface bus. For example, According to embodiments of the invention, a DQS timing (or phase) may be altered such that it aligns with a median of the measured delays. According to other embodiments of the invention, the DQS may be set to match an average, a weighted average, a mean, a midrange, or a mode associated with the delays measured.
According to embodiments of the invention and as shown by block 360, the flow may include calibrating the delay of the relevant paths to match the DQS timing (or phase). Reference is additionally made to
According to embodiments of the invention, a logic module controlling the deskew process may alter the delay cells parameters according to output received from the associated majority detectors. For example, a controller, possibly implemented as a chip or ASIC, may be wired to the output of the majority detectors. According to embodiments of the invention, such controller may additionally be wired to the delay cells control inputs, e.g. 270, 280 and 290 as shown in
While certain features of the invention have been illustrated and described herein, many modifications, substitutions, changes, and equivalents will now occur to those of ordinary skill in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes as fall within the spirit of the invention.