Embodiments of the present invention are described hereinafter, making reference to the appended drawings.
In the embodiments of an inventive signal processing circuit 100 shown in
Both clock signals CK and FCK are provided by the first and the second clock signal line 170 and 180 to the phase detector 150, which compares the phases of the two clock signals CK and FCK. Depending on a relation between the phases of the first clock signal CK and the second clock signal FCK, the phase detector 150 provides the comparison signal CS at the output 150c indicating the relation between the phases of the firsts clock signal CK and the second clock signal FCK.
The comparison signal CS can, for instance, be an analog and/or a digital signal. Furthermore, the comparison signal CS can indicate a phase difference between the phase of the first clock signal CK and the phase of the second clock signal FCK, wherein, for instance, a negative value of the phase difference indicates that the second clock signal FCK is, with respect to the first clock signal CK, too early by an amount indicated by the comparison signal or an absolute value of the comparison signal. Accordingly, a positive value of the comparison signal CS can indicate that the first clock signal CK is late with respect to the first clock signal CK, e.g. by an amount indicated by the value of the comparison signal CS. Moreover, a comparison signal with an absolute value, which is lower than a predetermined value indicates the state in which the first clock signal CK and the second clock signal FCK are “in phase” and “synchronized” with respect to each other to an extent defined by the predetermined value. As the state does not require the second clock signal FCK or the phase of the second clock signal FCK to be modified or altered to reach a state in which both clock signals CK and FCK are synchronized or in phase with respect to the predetermined value, this state is also referred to as “hold”.
As an alternative to providing a comparison signal CS indicating the phase difference between the phases of the two clock signals CK and FCK, the comparison signal CS can just indicate a relation between the two phases of the two clock signals CK and FCK. For instance, the comparison signal CS can acquire the values indicating the states “early”, “late” and “hold”, depending on the value of the phase difference between the first and the second clock signal CK and FCK. If, for instance, the absolute value of the phase difference between the first clock signal and the second clock signal is within the predetermined value, the comparison signal CS can acquire a first state indicating to “hold”. Accordingly, if the second clock signal FCK precedes the first clock signal CK by more than the predetermined value, the comparison signal CS acquires a second state indicating “early”. Moreover, if the second clock signal FCK lags behind the first clock signal CK by more than the predetermined value, a third state indicating “late” is acquired by the comparison signal CS.
A further alternative for the phase detector 150 to provide the comparison signal CS at the output 150c is to utilize more than one predetermined value to indicate more than three states, as outlined above. Hence, a plurality of predetermined values can be used to indicate the phase difference between the first clock signal CK and the second clock signal FCK more accurately than by using only one predetermined value and three states. Accordingly, it is also possible to use different predetermined values for positive and negative phase differences between the first clock signal CK and the second clock signal FCK.
The second clock signal generator 140 changes or alters the second clock signal CFK depending on the comparison signal CS received at the input 140b of the second clock signal generator 140. If, for instance, the comparison signal CS indicates that the second clock signal FCK precedes the first clock signal CK (“early”), the second clock signal generator 140 can increase the phase of the second clock signal FCK to reduce the absolute value of the two clock signals CK and FCK. This can, for instance, be achieved by activating additional phase-shifting parts of the second clock signal generator 140 or by directly modifying the generated second clock signal FCK according to the comparison signal CS.
In the embodiment described above, the first circuit 110 can, for instance, be a graphics processing unit (GPU), a central processing unit (CPU) or another memory controller circuit. The second circuit 120 can, for instance, be a memory circuit comprising or being connected to a memory core to write data to and/or to read data from.
An advantage of the embodiments of the present invention is that the first clock signal and the second clock signal can more easily be aligned with respect to each other. As a consequence, the data transfer rate can more easily be increased without risking the negative influence on exchanging data between the first circuit and the second circuit. Furthermore, due to the alignment of the first clock signal and the second clock signal, the overall circuitry of the signal processing circuit can be simplified, as complex synchronization circuits can be omitted, which usually require an elaborate training scheme.
An advantage of an embodiment of the present invention is that an inventive signal processing circuit gives rise to the opportunity of employing a high data transfer rates between the first circuit and the second circuit (due to comparing the phases of the first clock signal and the second clock signal with respect to each other), while the synchronization of the control signals (in the CK domain) to the data signals (in the FCK domain) is considerably simplified. As a further advantage of an embodiment of the present invention, the circuitry of the signal processing circuit in general and the second circuit in detail can be simplified, as complex structures like a FIFO circuit (FIFO=First In First Out) do not have to be implemented. Furthermore, highly complex training procedures for the first circuit and the second circuit of the inventive signal processing circuit can also be omitted.
An embodiment of the present invention is based on the finding that the first clock signal CK and the second clock signal FCK can be more easily aligned with respect to each other by integrating an (auxiliary) phase detector 150 in the second circuit 120, which can, for instance, be a DRAM memory circuit (DRAM=Dynamic Random Access Memory).
In other words, a phase detector 150 is integrated into the second circuit, e.g. a DRAM memory circuit, which compares the first clock signal CK and the second clock signal FCK to obtain information concerning the phase difference between the two clock signals CK and FCK. The information as to whether the second clock signal FCK is too early, too late or “in time” (hold) with respect to the first clock signal CK is generated by the phase detector 150 and passed on via the further signal line 180 to the first circuit 110, which can, for instance, be a graphical processing unit GPU. The first circuit 110 can then use the information obtained via the comparison signal CS to alter the phase of the second clock signal FCK, which can, for instance, be a data clock signal, by employing a controllable delay circuit until the first clock signal CK and the second clock signal FCK are aligned to each other. This alignment can, for instance, be carried out with an accuracy of ⅛th of a cycle period of the first clock signal or with any other accuracy, that can render a FIFO (FIFO=First In First Out) unnecessary due to the simplified synchronization, as explained above. As a consequence, employing complex circuits, like a FIFO to synchronize the second clock signal FCK with respect to the first clock signal CK can be left out.
In other words, at the interface between the part of the circuitry controlled by the second clock signal (FCK-domain) to the part of the circuitry of the second circuit 120 controlled by the first clock signal CK (CK-domain), an implementation of a FIFO can be omitted. As a consequence, the command-domain or CK-domain controls the data-domain or the FCK-domain, as if the data-domain was part of the command-domain. Accordingly, the inventive signal processing circuit enables a great simplification of a signal processing circuit comprising more than one clock signal, as due to the better alignment of the second clock signal FCK with respect to the first clock signal CK, as a complex FIFO-solution with its disadvantages concerning the area for the FIFO, its control, its influence on write times and read times can be omitted.
To be more precise, the time shifts with respect to the write times and the read times between the second clock signal FCK (data clock signal) with respect to the first clock signal CK (address/command clock signal) can increase the clock signal phase difference in some cases up to a whole clock cycle of the first clock signal CK. Hence, the synchronization of the data domain (FCK) and the command domain (CK) becomes more difficult. While the (net) data transfer rate remains the same, the latency (time between events, e.g. a write process and a following read process) will be increased. In other words, although the (net) data transfer rate is the same, the total number of data transferred can be increased with respect to time by employing an embodiment of the present invention leading to an increased effective data transfer rate.
Furthermore, the long and complex training procedures required to synchronize the FIFO circuits can also be omitted. Apart from the advantages of not having to incorporate a FIFO and of not having to carry out a long and complex training procedure, further advantages of the embodiments of the present invention are that restrictions with respect to the routing of the second clock signal line 170 with respect to the first clock signal line 160 can also be omitted. To be more precise, the inventive signal processing circuit does not pose restrictions on the layout of a printed circuit board to use equally long clock signal lines 160, 170. A further advantage of an inventive signal processing circuit is the fact that inside the first circuit 110, e.g. the GPU, the second clock signal generator 140 (FCK-clock) and the first clock signal generator 130 (CK-clock) do not have to be fully matched.
Hence, compared to circuits according to the standards or specifications DDR, DDR2, DDR3, GDDR3 and GDDR4, which specify a time difference tDQSS of the address/command clock signal and the data clock signal to be equal to or less than ¼th of a clock signal period tCK (tDQSS=+/−0.25·tCK), an inventive signal processing circuit does not depend on the substantial effort for integrating or implementing an optimized layout with respect to the internal design of the first circuit 110 and the second circuit 120 and the design of the printed circuit board to match the signal paths between the first circuit 110 and the second circuit 120, especially the first clock signal line 160 and the second clock signal line 170.
By further increasing the data transfer rate between the first circuit 110 and the second circuit 120, the alignment between the first clock signal CK and the second clock signal FCK could be done but is very difficult and costly. This problem will for instance be important with respect to the new memory standard GDDR5. With respect to the frequency of the first clock signal CK, the time difference or rather the phase difference between the first clock signal CK and the second clock signal FCK, which is usually referred to as the tDQSS time, increases up to a value of +/−0.5·tCK, wherein tCK is again the period of the first clock signal CK. This essentially means that the first clock signal CK and the second clock signal FCK are not aligned with respect to each other, as the phase difference between the two clock signals can vary up to half a cycle in both directions.
The inventive signal processing circuit offers the advantage that a compensating FIFO at the interface between the FCK-domain and the CK-domain is not required. Apart from the advantage of saving the space for the FIFO circuit, as a second advantage, the long and complicated training procedure required to align the two clock signals with respect to each other, which would be necessary to determine which CK-clock edge belongs to which FCK-clock edge can be omitted. As a consequence, an inventive signal processing circuit is capable of a very fast power-up and capable of changing the clock frequency without a significant disturbance.
Before describing the second embodiment of the present invention in more detail, it should be noted that objects with the same or similar functional properties are denoted with the same reference signs. Unless explicitly noted otherwise, the description with respect to objects with similar or equal functional properties can be exchanged with respect to each other.
The inventive signal processing circuit 100 shown in
The GPU master PLL 130 is furthermore connected to a second clock signal generator 140, which comprises a GPU data PLL Rx/Tx circuit or GPU data PLL 250. The GPU data PLL 250 generates an intermediate clock signal, which is provided to a phase control circuit 260, also comprised in the second clock signal generator 140. The phase control circuit 260 is connected via an output driver 270 to a second clock signal line 170, which is also referred to in
Apart from the second clock signal generator 140, the output driver 270 and the input driver 290, a data-domain 300 also comprises a read phase control circuit 310 and a write phase control circuit 320, which are connected to an output of the GPU data PLL 250. The write phase control circuit 320 is connected to a latch 330 receiving data to be written at an input d of the latch 330. An output q of the latch 330 is connected via an output driver 340 to a plurality of 16 data lines (DQ; DQ=Data Query) and 2 data bit inversion lines (DBI), which, together, are referred to as data lines 350. The data lines 350 are also connected inside the data-domain 300 of the GPU 110 via an input receiver 360 to an input d of a latch 370. A clock signal input of the latch 370 is connected to the read phase control circuit 310. An output q of the latch 370 provides read data to the rest of the GPU 110.
The DRAM memory 120 also comprises command/address-domain 380, which is connected to the address/command signal line 240 and the first clock signal line 160. To be more precise, the address/command signal line 240 is connected via an input driver 390 to an input d of a latch 400 comprised in the command/address-domain 380. At an output q of the latch 400, address and/or command signals are provided to the rest of the DRAM memory 120. The command/address domain 380 furthermore comprises an input receiver 410, which is connected to a clock signal input of the latch 400.
The data lines 350 are connected to a data-domain 420 comprised in the DRAM memory 120. To be more precise, the data lines 350 are connected to both an output driver 430 and an input receiver 440. The second clock signal line 170 is also connected to an input receiver 450. An output of the input receiver 450 is connected to a clock signal input of a latch 460 and a latch 470. The latch 460 is furthermore connected with an input d to the input receiver 440 and with an output q to an input d of a latch 480. The latch 480 is connected with a clock signal input to the input driver 410 and, hence, triggered by the first clock signal CK. An output q of the latch 480 provides write data to a memory core not shown in
Moreover, the DRAM memory 120 comprises a phase detector or tDQSS phase detector 150, which is connected to the input receiver 450 receiving the second clock signal FCK via the second clock signal line 170 and to the input receiver 410 receiving the first clock signal CK generated in essence by the first signal generator 130 of the GPU 110. The phase detector 150 is connected with an output to an output driver 510 providing a comparison signal to the further signal line 180.
The inventive signal processing circuit 100 shown in
As already laid out, due to employing the phase detector 150 and employing the second clock signal generator 140, which is capable of controlling the phase of the second clock signal FCK, an implementation of a FIFO circuit as well as a long and complex training procedure can be omitted. Furthermore, no restrictions apply with respect to the layout of the first clock signal line 160 (CK-clock) and the second clock signal line 170 (FCK-clock) to be equally long. No restrictions apply inside the GPU 110 to fully match the second clock signal generator 140, or rather the FCK clock 140, to the first clock signal generator 130, or rather the CK clock 130.
In other words, the inventive signal processing circuit 100 shown in
The closed feedback loop shown in
If, however, a drift of the misalignments should become notable, the closed feedback loop can be activated periodically. In the case of a DRAM memory, the necessary auto-refresh cycle can be employed. During the time necessary for the auto-refresh cycle, the data lines 350 and, hence, the whole bus are not used, so that an update of the timing can easily be employed.
While
Furthermore, the GPU 110 comprises a WPH decoder 620, which is connected to the output of the input receiver 290 of the GPU 110. The WPH decoder 620 is connected to both the decoder 280 of the first clock signal generator 140 as well as to a write phase CDR circuit (CDR=Clock Data Recovery), which is connected to the write phase control 320 providing it with appropriate write phase control data. The WPH decoder 620 decodes the WPH data frames generated by the WPH frame generator 600, which comprise both information concerning the write phase (WPH) as well as the comparison signal CS, which is also referred to as tDQSS or tCK2FCK information generated by the phase detector 150.
Hence, the embodiment shown in
Furthermore, the embodiment shown in
A further difference between the embodiments shown in
In the implementation of the third embodiment shown in
In the embodiment shown in
As already explained, the phase information concerning the phase relation between the first clock signal CK and the second clock signal FCK (tCK2FCK or tDQSS) can be sent back to the GPU 110 in the WPH frame by using two preamble bits, which are not really needed in terms of transferring write phase information. In a conventional WPH frame, the two preamble bits used for transferring the comparison signal CS (tCK2FCK or tDQSS) do not contain any information and are, hence, empty.
If, however, no write commands are carried out by the inventive signal processing circuit shown in
The inventive signal processing circuit provides the opportunity of operating, for instance, a high speed memory circuit at a higher speed without creating a significant influence on the so-called write/read turnaround by being able to limit the phase difference between the first clock signal CK and the second clock signal FCK to far less than +/−0.5·tCK or +/−500 ps at a frequency of 1 GHz, which creates a great amount of problems concerning the timing and other effects in the write path of a memory circuit. Hence, an inventive signal processing circuit can render highly complex and lengthy write training solutions obsolete.
In the embodiment shown in
Before discussing a typical system power-on sequence of an inventive signal processing circuit shown in
Furthermore, an inventive signal processing circuit is not limited to using exactly the length or number of bits and/or number of bit lines, as described with respect to the three embodiments discussed above. It is possible to use a different number of bit lines and/or different number of bits.
Furthermore, with respect to the embodiments shown in
Furthermore, it should be noted that the closed feedback loop comprising the second clock signal line 170, the further signal line 180, the second clock signal generator 140 and the phase detector 150 can be implemented as a digital and/or an analog feedback loop system. In addition, the comparison signal CS provided by the phase detector 150 can comprise more than three states, as described in the framework of the second and third embodiments of the present invention. To be more precise, the comparison signal CS can indicate an arbitrary number of states indicating the phase of the second clock signal FCK to be adjusted with respect to the first clock signal CK in both directions. In principle, the number of states indicated by the comparison signal CS is only limited by the number of bits available for the comparison signal CS if a digital transmission of the comparison signal CS is used or by the signal-to-noise-ratio of an analog comparison signal CS. Depending on the concrete implementation and the properties of the comparison signal CS, the second clock signal generator 140 can adjust the phase of the second clock signal FCK by increasing or decreasing the phase of the second clock signal FCK in discrete steps or in a continuous fashion. It is, for instance, possible to adjust the phase of the second clock signal FCK in a discrete number of steps. Due to the digital nature of most signal processing circuits comprised in computer systems, a second clock signal generator capable of shifting the phase of the second clock signal FCK in terms of steps of one cycle divided by 2n, wherein n is a positive integer, is especially favorable. However, any other discrete number of steps is also possible, e.g. a step of ⅛th, 1/10th or 1/16th of a cycle.
Furthermore, in the embodiments shown in
A typical system power-on sequence of an inventive signal processing circuit as described and shown in
1. Turn on of the first clock signal generator 130 (GPU PLL). As the second clock signal generator 140 is coupled to the first clock signal generator 130 in the embodiment shown in
2. Turn on the further clock signal generator 640 (DRAM PLL). After a certain period of time has elapsed, the further clock signal generator 640 has locked onto the second clock signal FCK provided by the second clock signal generator 140.
3. Turn on the CDR circuit comprised in the WPH decoder 620 (DQ-CDR or READ GPU) to enable decoding and obtaining the information comprised in the WPH frame received at the WPH-pin of the GPU 110. To be more precise, the CDR circuit comprised in the WPH decoder 620 is needed to decode the comparison signal CS or tDQSS signal provided by the phase detector 150 and encoded by the WPH frame generator 600 of the DRAM memory 120.
4. Align the second clock signal FCK to the first clock signal CK. As will be laid out later, this alignment process is carried out slowly compared to the alignment carried out by the CDR circuit comprised in the WPH decoder 620 to make sure that the CDR of the WPH decoder 620 is capable of keeping track of the WPH signal comprising the idle frame 710 or the write phase frame 700 shown in
5. Reset the FIFO circuits 490, 650 according to a target phase shift with respect to the first clock signal CK and the second clock signal FCK. If, for instance, the target phase shift of the inventive signal processing circuit is tDQSS=+/−⅛·tCK (+/−125 ps at a frequency of 1 GHz), the FIFO circuits 490, 650 are reset accordingly.
6. Turn off the phase detector 150. As outlined above, the alignment of the first clock signal CK and the second clock signal with respect to each other does not have to be carried out all the time.
7. Wait for a short, predetermined period of time for the CDR comprised in the WPH decoder 620 (DQ-CDR) to settle.
8. The DQ-CDR can now be used to write data to the DRAM memory 120 and to read data from the DRAM memory 120 to transfer the data to the GPU 110. The DQ-CDR is turned on during the step 3 above.
9. Depending on the alignment achieved or specified, the phase difference between the first clock signal CK and the second clock signal FCK is allowed to drift to a predetermined extent during the operation, before the feedback loop should be reactivated. In the case of a “perfect” alignment after the feedback loop is switched off, i.e. no or almost no phase difference between the first clock signal CK and the second clock signal FCK, the phase difference is in principle allowed to drift up to 0.5·tCK in either direction, until the feedback loop should be activated again. In case of a non-optimal alignment, the tolerable drift of the phase difference should be adjusted accordingly.
In essence, the power-on sequence comprises a CDR training loop, which leads to a state in which a read data eye is aligned with the incoming data and the position of the first bit (bit 0) in the burst is known. Nevertheless, the read/write latency is, at this point, not known. In a next step, the inventive FCK/CK training loop is executed, which leads to an alignment of the second clock signal FCK with respect to the first clock signal CK. The FCK/CK training loop will be described in more detail below. Afterwards, in the following steps, the latency times are determined and further initialization sequences are carried out.
In a next step, S120, the CPU 110 observes the data received at the WPH-pin. Furthermore, it extracts the comparison signal from the WPH frame, which is comprised in the last two bits of the write phase data indicating the relation between the second clock signal FCK to the first clock signal CK. In step S130, the position of the second clock signal FCK is determined in relation to the first clock signal CK. If the second clock signal FCK is late with respect to the first clock signal CK, the GPU 110 shifts the second clock signal FCK by adjusting the phase control circuit 260 accordingly one step earlier in step S140a. In a step S150a, the GPU 110 then waits for M clock cycles of the first clock signal CK and hence for a time M·tCK for the further clock signal generator 640 of the DRAM memory 120 and the DRAM PLL regulation loop to settle, wherein M is a positive integer. The system then once again enters the CDR training loop S110, as explained above.
If, however, the GPU 110 detects in step S130 that the position of the second clock signal FCK with respect to the first clock signal CK is early, the GPU 110 shifts the second clock signal FCK by adjusting the phase control circuit 260 accordingly one step later in step S140b. Afterwards, in step S150b, the GPU 110 waits for M clock cycles of the first clock signal CK to once again allow the DRAM PLL regulation loop of the further clock signal generator 640 of the DRAM memory 120 to settle before the CDR training loop is entered in step S110. In the steps S150a and S150b, the time waited by the GPU 110 is determined by the integer M, which can be a predetermined value, for instance, a value of 24, or can be adjusted according to the behavior of the inventive signal processing circuit.
If, finally, in step S130, the position of the second clock signal FCK with respect to the first clock signal CK is found to be aligned (hold), the FCK/CK training loop is exited in step S160.
Depending on certain implementation requirements of the inventive methods, the inventive methods can be implemented in hardware or in software. The implementation can be performed using a digital storage medium, in particular, a disc, CD or a DVD having an electronically readable control signal stored thereon, which co-operates with a programmable computer system such that an embodiment of the inventive methods is performed. Generally, an embodiment of the present invention is, therefore, a computer program product with a program code stored on a machine-readable carrier, the program code being operative for performing the inventive methods when the computer program product runs on the computer. In other words, embodiments of the inventive methods are, therefore, a computer program having a program code for performing at least one of the inventive methods when the computer program runs on a computer.
While the foregoing has been particularly shown and described with reference to particular embodiments thereof, it will be understood by those skilled in the art that various other changes in the form and details may be made without departing from the spirit and scope thereof. It is to be understood that various changes may be made in adapting to different embodiments without departing from the broader concept disclosed herein and comprehend by the claims that follows.