This disclosure pertains to computing system, and in particular (but not exclusively) to a computing system having a channel with a daisy-chain type interconnect topology having junctions limited by reflection resonances.
In computing systems, when using a channel for high speed signaling and the channel comprises multiple slots or nodes in a daisy-chain interconnect topology, there exists a junction effect in which noise signals are created by multiple reflections. For instance, in a typical daisy-chain interconnect topology with two or more junctions per channel, multiple reflections between junctions are significant, and seriously degrades channel signaling performance.
Current state of the art avoid this problem by running memory channels at slower speeds and/or improving the electrical performance of components in the channel to compensate for the junction effects and/or reducing the number of slots or nodes per channel depending on the signal integrity requirement so the system.
The present invention will be understood more fully from the detailed description given below and from the accompanying drawings of various embodiments of the invention, which, however, should not be taken to limit the invention to the specific embodiments, but are for explanation and understanding only.
In one embodiment, the reflected noise signals of junctions are eliminated by eliminating the multiple reflections between junctions, particularly for memory channels, which in turn eliminates the reflection resonances between junctions. This reduces inter-symbol interference (ISI) and harmful coupling and corrects timing jitter, both of which are induced by the junctions reflected signals.
In one embodiment, in order to eliminate the multiple reflections between junctions, three techniques are used, including (1) reducing the routing length of interconnect routing between junctions, which pushes the resonance frequency higher, thus helping mitigate junctions effects; (2) matching the impedance of interconnect routing between junctions to the impedance of the junctions, which reduces the impedance discontinuities and thus suppresses the junction resonance effects; and (3) changing a two junctions topology to a single junction topology, which reduces, and potentially eliminates, the multiple reflections between junctions.
Referring to
MB DIMM-DIMM 107 and MB DIMM-DIMM 108 are connected together with an interconnect portion of the memory channel that includes junction 121, while MB DIMM-DIMM 108 and MB DIMM-DIMM 109 are connected together with an interconnect potion of the memory channel that includes junction 122. Each of DIMM cards 110-112 includes a memory and a DIMM connector to interface to the interconnect portions of the memory channel. Thus, junction 121 includes the port to DIMM2, and junction 122 includes the port to DIMM1.
Junctions 121 and 122 are 3-port junctions. For a 3-port junction, the input impedance of one port can be expressed as
where Z1 and Z2 are the output impedances of the other two ports. Given Z1≅Z2, Zjunction≅0.5Z1. This means that, the junction impedance is only half of the characterization impedance of regular routings in the channel. Thus, the impedance discontinuities due to junctions are usually much larger if compared with other discontinuities in the channel. In the case of DDR, this results in the degradation of DDR signal integrity, which is due to the multiple reflection resonances caused by the junctions.
In order to improve the signaling performance (e.g., the DDR signaling performance), techniques described herein substantially reduce, or potentially eliminate, the multiple reflections between these junctions. Specifically, in one embodiment, three techniques are used to suppress the junction reflection resonance and reduce the multiple reflections.
The first of the three techniques involves reducing the interconnect length between two junctions.
In
where c is the speed of electromagnetic wave in vacuum (free space), ∈eff is the effective relative permittivity of the interconnect medium, and L is the length of the interconnect routing. Based on equation (2), the resonance frequency can be pushed onto higher frequency by increasing the routing length L, which can improve the signaling performance. In addition, it also can reduce the crosstalk a little bid. In one embodiment, the topology with a junction to junction interconnect routing length is reduced to 0.15 in. from 0.7 in. In such a case, the resonance frequency has been pushed out to 10 GHz.
Various other routing lengths include, but are not limited to, 0.8 in., 0.6 in., 0.4 in, and 0.3 in., while the routing width is 4 or 6 mils. Note that other lengths and width may be used and select to reduce reflections at either or both of the junctions.
The second of the three techniques involves matching the routing impedance of the interconnect with the junction impedance. Since junctions have much smaller impedance than the regular routing, the interconnect impedance can be chosen to match that of the two junctions, so that the reflections between two junctions are substantially reduced, and potentially eliminated. More specifically, as showed in formula (3) below, the impedance of routing are approximately proportional to dielectric high h, and are approximately inverse proportional to the routing width w and the dielectric constant ∈.
Thus, the impedance can be matched by one or more of the following three ways: increasing the routing width; reducing the routing dielectric thickness; or increasing routing dielectric constant. Note that increasing routing width several mils gives a small amount of additional crosstalk in DDR channel implementations when compared to interconnect routings that are thinner, but it is negligible compared to the overall crosstalk in the DDR channel because there are some components including the DIMM connectors dominate the channel crosstalk, which are typically more than 10 dB higher than the crosstalk of junction to junction interconnect routing. Note, in one embodiment, the benefits are increased, and potentially maximized, based on selection of the routing width.
In
The third of the three techniques involves changing from a traditional topology with 2 junctions to a new “+” topology with 1 junction.
Note that there is no limitation in the location of the “+” junction in the channel, but the empty connector effect need to be handled appropriately.
In
In
Table 2 illustrates a comparison of the routing lengths of the four different “+” junction topologies in
Table 3 illustrates a comparison of the PCB routing densities of three different “+” junction topologies (showed in
Note that in one embodiment the three topologies of
Topologies with Staggered Vias
In one embodiment, the interconnect routing used for a channel includes staggered transition vias. One embodiment of this arrangement is shown in block diagram form in
The interconnect topology of
In order to mitigate some of the limiting factors, a hybrid T topology with staggered transition vias as illustrated in
Referring to
CPU 101 is connected to the circuit board on the same side as DIMMs 110-112. CPU 101 is connected to via 901 in the circuit board using a surface mount connector. Via 901 is connected to staggered transition via 905 via an interconnect portion 902. In one embodiment, the interconnect portion 902 comprises a stripline or microstrip.
Note that in alternative embodiments, the CPU and DIMMs are connected to their respective vias and micro-vias using connectors other than surface mount connectors.
Referring to
In one embodiment, a processing element refers to hardware or logic to support a software thread. Examples of hardware processing elements include: a thread unit, a thread slot, a thread, a process unit, a context, a context unit, a logical processor, a hardware thread, a core, and/or any other element, which is capable of holding a state for a processor, such as an execution state or architectural state. In other words, a processing element, in one embodiment, refers to any hardware capable of being independently associated with code, such as a software thread, operating system, application, or other code. A physical processor (or processor socket) typically refers to an integrated circuit, which potentially includes any number of other processing elements, such as cores or hardware threads.
A core often refers to logic located on an integrated circuit capable of maintaining an independent architectural state, wherein each independently maintained architectural state is associated with at least some dedicated execution resources. In contrast to cores, a hardware thread typically refers to any logic located on an integrated circuit capable of maintaining an independent architectural state, wherein the independently maintained architectural states share access to execution resources. As can be seen, when certain resources are shared and others are dedicated to an architectural state, the line between the nomenclature of a hardware thread and core overlaps. Yet often, a core and a hardware thread are viewed by an operating system as individual logical processors, where the operating system is able to individually schedule operations on each logical processor.
Physical processor 1400, as illustrated in
As depicted, core 1401 includes two hardware threads 1401a and 1401b, which may also be referred to as hardware thread slots 1401a and 1401b. Therefore, software entities, such as an operating system, in one embodiment potentially view processor 1400 as four separate processors, i.e., four logical processors or processing elements capable of executing four software threads concurrently. As alluded to above, a first thread is associated with architecture state registers 1401a, a second thread is associated with architecture state registers 1401b, a third thread may be associated with architecture state registers 1402a, and a fourth thread may be associated with architecture state registers 1402b. Here, each of the architecture state registers (1401a, 1401b, 1402a, and 1402b) may be referred to as processing elements, thread slots, or thread units, as described above. As illustrated, architecture state registers 1401a are replicated in architecture state registers 1401b, so individual architecture states/contexts are capable of being stored for logical processor 1401a and logical processor 1401b. In core 1401, other smaller resources, such as instruction pointers and renaming logic in allocator and renamer block 1430 may also be replicated for threads 1401a and 1401b. Some resources, such as re-order buffers in reorder/retirement unit 1435, ILTB 1420, load/store buffers, and queues may be shared through partitioning. Other resources, such as general purpose internal registers, page-table base register(s), low-level data-cache and data-TLB 1415, execution unit(s) 1440, and portions of out-of-order unit 1435 are potentially fully shared.
Processor 1400 often includes other resources, which may be fully shared, shared through partitioning, or dedicated by/to processing elements. In
Core 1401 further includes decode module 1425 coupled to fetch unit 1420 to decode fetched elements. Fetch logic, in one embodiment, includes individual sequencers associated with thread slots 1401a, 1401b, respectively. Usually core 1401 is associated with a first ISA, which defines/specifies instructions executable on processor 1400. Often machine code instructions that are part of the first ISA include a portion of the instruction (referred to as an opcode), which references/specifies an instruction or operation to be performed. Decode logic 1425 includes circuitry that recognizes these instructions from their opcodes and passes the decoded instructions on in the pipeline for processing as defined by the first ISA. For example, as discussed in more detail below decoders 1425, in one embodiment, include logic designed or adapted to recognize specific instructions, such as transactional instruction. As a result of the recognition by decoders 1425, the architecture or core 1401 takes specific, predefined actions to perform tasks associated with the appropriate instruction. It is important to note that any of the tasks, blocks, operations, and methods described herein may be performed in response to a single or multiple instructions; some of which may be new or old instructions. Note decoders 1426, in one embodiment, recognize the same ISA (or a subset thereof). Alternatively, in a heterogeneous core environment, decoders 1426 recognize a second ISA (either a subset of the first ISA or a distinct ISA).
In one example, allocator and renamer block 1430 includes an allocator to reserve resources, such as register files to store instruction processing results. However, threads 1401a and 1401b are potentially capable of out-of-order execution, where allocator and renamer block 1430 also reserves other resources, such as reorder buffers to track instruction results. Unit 1430 may also include a register renamer to rename program/instruction reference registers to other registers internal to processor 1400. Reorder/retirement unit 1435 includes components, such as the reorder buffers mentioned above, load buffers, and store buffers, to support out-of-order execution and later in-order retirement of instructions executed out-of-order.
Scheduler and execution unit(s) block 1440, in one embodiment, includes a scheduler unit to schedule instructions/operation on execution units. For example, a floating point instruction is scheduled on a port of an execution unit that has an available floating point execution unit. Register files associated with the execution units are also included to store information instruction processing results. Exemplary execution units include a floating point execution unit, an integer execution unit, a jump execution unit, a load execution unit, a store execution unit, and other known execution units.
Lower level data cache and data translation buffer (D-TLB) 1450 are coupled to execution unit(s) 1440. The data cache is to store recently used/operated on elements, such as data operands, which are potentially held in memory coherency states. The D-TLB is to store recent virtual/linear to physical address translations. As a specific example, a processor may include a page table structure to break physical memory into a plurality of virtual pages.
Here, cores 1401 and 1402 share access to higher-level or further-out cache, such as a second level cache associated with on-chip interface 1410. Note that higher-level or further-out refers to cache levels increasing or getting further way from the execution unit(s). In one embodiment, higher-level cache is a last-level data cache—last cache in the memory hierarchy on processor 1400—such as a second or third level data cache. However, higher level cache is not so limited, as it may be associated with or include an instruction cache. A trace cache—a type of instruction cache—instead may be coupled after decoder 1425 to store recently decoded traces. Here, an instruction potentially refers to a macro-instruction (i.e. a general instruction recognized by the decoders), which may decode into a number of micro-instructions (micro-operations).
In the depicted configuration, processor 1400 also includes on-chip interface module 1410. Historically, a memory controller, which is described in more detail below, has been included in a computing system external to processor 1400. In this scenario, on-chip interface 1410 is to communicate with devices external to processor 1400, such as system memory 1475, a chipset (often including a memory controller hub to connect to memory 1475 and an I/O controller hub to connect peripheral devices), a memory controller hub, a northbridge, or other integrated circuit. And in this scenario, bus 1405 may include any known interconnect, such as multi-drop bus, a point-to-point interconnect, a serial interconnect, a parallel bus, a coherent (e.g. cache coherent) bus, a layered protocol architecture, a differential bus, and a GTL bus.
Memory 1475 may be dedicated to processor 1400 or shared with other devices in a system. Common examples of types of memory 1475 include DRAM, SRAM, non-volatile memory (NV memory), and other known storage devices. In one embodiment, the memory channel to interface the memory to the remainder of the computing system includes an interconnect topology described above. As discussed above, the memory can be in accordance with a Joint Electron Devices Engineering Council (JEDEC) low power double data rate (LPDDR)-based design such as the LPDDR standards being referred to as LPDDR3 or LPDDR4. In various implementations the individual memory devices may be of different package types such as single die package (SDP), dual die package (DDP) or quad die package (Q17P). These devices, in some embodiments, are directly soldered onto a motherboard to provide a lower profile solution. In a particular illustrative embodiment, memory is sized between 2 GB and 16 GB, and may be configured as a DDR3LM package or an LPDDR2 or LPDDR3 memory that is soldered onto a motherboard via a ball grid array (BGA). In one embodiment, the memory channel comprises an interconnect topology described above.
Note that device 1480 may include a graphic accelerator, processor or card coupled to a memory controller hub, data storage coupled to an I/O controller hub, a wireless transceiver, a flash device, an audio controller, a network controller, or other known device.
Recently however, as more logic and devices are being integrated on a single die, such as SOC, each of these devices may be incorporated on processor 1400. For example in one embodiment, a memory controller hub is on the same package and/or die with processor 1400. Here, a portion of the core (an on-core portion) 1410 includes one or more controller(s) for interfacing with other devices such as memory 1475 or a graphics device 1480. The configuration including an interconnect and controllers for interfacing with such devices is often referred to as an on-core (or un-core configuration). As an example, on-chip interface 1410 includes a ring interconnect for on-chip communication and a high-speed serial point-to-point link 1405 for off-chip communication. Yet, in the SOC environment, even more devices, such as the network interface, co-processors, memory 1475, graphics processor 1480, and any other known computer devices/interface may be integrated on a single die or integrated circuit to provide small form factor with high functionality and low power consumption.
One interconnect fabric architecture includes the Peripheral Component Interconnect (PCI) Express (PCIe) architecture. The more recent versions of PCI Express take advantage of advances in point-to-point interconnects, Switch-based technology, and packetized protocol to deliver new levels of performance and features. Power Management, Quality Of Service (QoS), Hot-Plug/Hot-Swap support, Data Integrity, and Error Handling are among some of the advanced features supported by PCI Express.
In some embodiments, a system comprising: a processor; a plurality of devices; and a channel coupling the processor to the plurality of devices, the channel having an interconnect topology with a plurality of interconnect portions coupled together with two or more junctions, at least one of the two or more junctions having first and second interconnect portions that cross each other to form a plus-shaped junction, and wherein interconnect routing between the two or more junctions having an impedance matched to impedance of the two or more junctions. In some embodiments, the channel comprises a memory channel with multiple slots for interfacing to DDR memory devices.
In some embodiments, the first and second interconnect portions of the plus-shaped junction of the interconnect topology of a channel are perpendicular to each other.
In some embodiments, the first interconnect portion of the plus-shaped junction of the interconnect topology of a channel is connected to a third interconnect portion at a slot for one of the devices. In some embodiments, the one device is closest in the channel to the processor. In some embodiments, the devices comprises three devices and the one device is between two of the three devices.
In some embodiments, the plus-shaped junction of the interconnect topology of a channel is located between two of the plurality of devices closest in the channel to the processor.
In some embodiments, the routing length of an interconnect routing between two junctions of the two or more junctions is set based on resonance frequency of the interconnect routing between the two junctions, effective relative permittivity of the interconnect routing, and electromagnetic wave speed.
In some embodiments, interconnect routing between the two or more junctions has an impedance matched to impedance of the two or more junctions by at least one of: increasing routing width of the interconnect routing; reducing routing dielectric thickness of the interconnect routing; and increasing routing dielectric constant of the interconnect routing.
In some embodiments, the topology includes a staggered transition via in which a first interconnect portion connecting a second interconnect portion, which is connected to the processor, to a third interconnect portion, which is connected to a first set of devices, is connected at a location at the second interconnect portion away from being directly below any of the first set of devices. In some embodiments, the first interconnect portion of the staggered transition via is connected to the second interconnect portion at a location between two devices in the first set of devices. In some embodiments, the second interconnect portion is coupled to one or more to a first set of devices via micro-vias. In some embodiments, the first set of devices comprises a plurality of DIMMs. In some embodiments, the third interconnect comprises a stripline or microstrip.
In some embodiments, a channel for using in providing communication between a processor and a plurality of devices, includes an interconnect topology with a plurality of interconnect portions coupled together with two or more junctions, at least one of the two or more junctions having first and second interconnect portions that cross each other to form a plus-shaped junction, and wherein interconnect routing between the two or more junctions having an impedance matched to impedance of the two or more junctions.
In some embodiments, the first and second interconnect portions of the plus-shaped junction of the interconnect topology of the channel are perpendicular to each other. In some embodiments, the first interconnect portion of the plus-shaped junction is connected to a third interconnect portion at a slot for one of the devices, and wherein the one device is closest in the channel to the processor or is between two of three devices. In some embodiments, the plus-shaped junction is located between two of the plurality of devices closest in the channel to the processor. In some embodiments, the interconnect routing between the two or more junctions has an impedance matched to impedance of the two or more junctions by at least one of: increasing routing width of the interconnect routing; reducing routing dielectric thickness of the interconnect routing; and increasing routing dielectric constant of the interconnect routing.
In some embodiments, the topology of the channel includes a staggered transition via in which a first interconnect portion connecting a second interconnect portion, which is connected to the processor, to a third interconnect portion, which is connected to a first set of devices, is connected at a location at the second interconnect portion away from being directly below any of the first set of devices. In some embodiments, the first interconnect portion of the staggered transition via is connected to the second interconnect portion at a location between two devices in the first set of devices.
In some embodiments, a method for reducing multiple reflections between junctions in a channel having an interconnect topology includes: communicating information from a processor and one or more of a plurality of devices using a channel; and communicating information to the processor from one or more of a plurality of devices using the channel, wherein the channel has an interconnect topology with a plurality of interconnect portions coupled together with two or more junctions, at least one of the two or more junctions having first and second interconnect portions that cross each other to form a plus-shaped junction, and wherein interconnect routing between the two or more junctions having an impedance matched to impedance of the two or more junctions. In some embodiments, the first and second interconnect portions of the plus-shaped junction of the interconnect topology are perpendicular to each other.
In some embodiments, the first interconnect portion of the plus-shaped junction is connected to a third interconnect portion at a slot for one of the devices, and the one device is closest in the channel to the processor, is between two of three devices, or is located between two of the devices closest in the channel to the processor.
In some embodiments, the topology includes a staggered transition via in which a first interconnect portion connecting a second interconnect portion, which is connected to the processor, to a third interconnect portion, which is connected to a first set of devices, is connected at a location at the second interconnect portion away from being directly below any of the first set of devices.
Use of the phrase ‘to’ or ‘configured to,’ in one embodiment, refers to arranging, putting together, manufacturing, offering to sell, importing and/or designing an apparatus, hardware, logic, or element to perform a designated or determined task. In this example, an apparatus or element thereof that is not operating is still ‘configured to’ perform a designated task if it is designed, coupled, and/or interconnected to perform said designated task. As a purely illustrative example, a logic gate may provide a 0 or a 1 during operation. But a logic gate ‘configured to’ provide an enable signal to a clock does not include every potential logic gate that may provide a 1 or 0. Instead, the logic gate is one coupled in some manner that during operation the 1 or 0 output is to enable the clock. Note once again that use of the term ‘configured to’ does not require operation, but instead focus on the latent state of an apparatus, hardware, and/or element, where in the latent state the apparatus, hardware, and/or element is designed to perform a particular task when the apparatus, hardware, and/or element is operating.
Furthermore, use of the phrases ‘capable of/to,’ and or ‘operable to,’ in one embodiment, refers to some apparatus, logic, hardware, and/or element designed in such a way to enable use of the apparatus, logic, hardware, and/or element in a specified manner. Note as above that use of to, capable to, or operable to, in one embodiment, refers to the latent state of an apparatus, logic, hardware, and/or element, where the apparatus, logic, hardware, and/or element is not operating but is designed in such a manner to enable use of an apparatus in a specified manner.
Reference throughout this specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. Thus, the appearances of the phrases “in one embodiment” or “in an embodiment” in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
In the foregoing specification, a detailed description has been given with reference to specific exemplary embodiments. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than a restrictive sense. Furthermore, the foregoing use of embodiment and other exemplarily language does not necessarily refer to the same embodiment or the same example, but may refer to different and distinct embodiments, as well as potentially the same embodiment.
Whereas many alterations and modifications of the present invention will no doubt become apparent to a person of ordinary skill in the art after having read the foregoing description, it is to be understood that any particular embodiment shown and described by way of illustration is in no way intended to be considered limiting. Therefore, references to details of various embodiments are not intended to limit the scope of the claims which in themselves recite only those features regarded as essential to the invention.