Embodiments of the present invention relate generally to the technical field of electronic circuits, and more particularly to resonant rotary clocking in electronic circuits.
The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure. Unless otherwise indicated herein, the approaches described in this section are not prior art to the claims in the present disclosure and are not admitted to be prior art by inclusion in this section.
The silicon industry is moving towards die-disintegration and chiplet-based systems in which smaller heterogeneous dies are integrated on a single substrate, through which superior functionality and enhanced operating characteristics can be obtained. Designing a robust, high-speed, low-skew, low-jitter, and low-power clock across such chiplet based systems is extremely challenging. The traditional globally asynchronous locally synchronous (GALS) solution has multiple design overhead and verification challenges that have distanced designers from asynchronous solutions in general. However, enabling clock synchronization for a chiplet based systems (across multiple dies) is extremely difficult and remains a key challenge in multi-die systems.
Embodiments will be readily understood by the following detailed description in conjunction with the accompanying drawings. To facilitate this description, like reference numerals designate like structural elements. Embodiments are illustrated by way of example and not by way of limitation in the figures of the accompanying drawings.
Various embodiments herein provide apparatuses, systems, and methods for resonant rotary clocking for die-to-die communication, in accordance with various embodiments. For example, a multi-die system may include an interposer and two or more dies coupled to the interposer. The interposer may include a resonant rotary ring structure to form one or more resonant rotary oscillators (e.g., of a resonant rotary oscillator array). The resonant rotary oscillators may be traveling wave and/or standing wave oscillators. The dies may tap respective clock signals from the rotary ring structure and use the clock signals for die-to-die communication.
In the following detailed description, reference is made to the accompanying drawings that form a part hereof wherein like numerals designate like parts throughout, and in which is shown by way of illustration embodiments that may be practiced. It is to be understood that other embodiments may be utilized and structural or logical changes may be made without departing from the scope of the present disclosure. Therefore, the following detailed description is not to be taken in a limiting sense, and the scope of embodiments is defined by the appended claims and their equivalents.
Various operations may be described as multiple discrete actions or operations in turn, in a manner that is most helpful in understanding the claimed subject matter. However, the order of description should not be construed as to imply that these operations are necessarily order dependent. In particular, these operations may not be performed in the order of presentation. Operations described may be performed in a different order than the described embodiment. Various additional operations may be performed and/or described operations may be omitted in additional embodiments.
The terms “substantially,” “close,” “approximately,” “near,” and “about,” generally refer to being within +/−10% of a target value. Unless otherwise specified the use of the ordinal adjectives “first,” “second,” and “third,” etc., to describe a common object, merely indicate that different instances of like objects are being referred to, and are not intended to imply that the objects so described must be in a given sequence, either temporally, spatially, in ranking or in any other manner.
For the purposes of the present disclosure, the phrases “A and/or B” and “A or B” mean (A), (B), or (A and B). For the purposes of the present disclosure, the phrase “A, B, and/or C” means (A), (B), (C), (A and B), (A and C), (B and C), or (A, B, and C).
The description may use the phrases “in an embodiment,” or “in embodiments,” which may each refer to one or more of the same or different embodiments. Furthermore, the terms “comprising,” “including,” “having,” and the like, as used with respect to embodiments of the present disclosure, are synonymous.
As used herein, the term “circuitry” may refer to, be part of, or include an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group), a combinational logic circuit, and/or other suitable hardware components that provide the described functionality. As used herein, “computer-implemented method” may refer to any method executed by one or more processors, a computer system having one or more processors, a mobile device such as a smartphone (which may include one or more processors), a tablet, a laptop computer, a set-top box, a gaming console, and so forth.
Rotary traveling wave oscillators (RTWO) may include a ring structure on which the clock signal travels as a traveling wave. Multiple RTWOs may be coupled to one another in a rotary oscillator array (ROA) to distribute the clock signal over a larger area. For example,
In embodiments, the RTWO may be modeled as an inductor-capacitor (LC) oscillator, where the frequency fose is estimated by:
In Equation (1), vp is the phase velocity and I is the length/perimeter of the ring. The 2 factor (in the denominator) arises from fact that the pulse requires two complete laps for a single cycle. Further, the total inductance and total capacitance of a rotary ring are defined by LT and CT, respectively. The total inductance LT depends on the geometry of the rotary ring and CT is the total capacitance of the ring, interconnects and devices connected to the rotary ring.
Various embodiments herein include techniques to use ROAs to provide clock synchronization across a multi-die system (MDS) for die-to-die (D2D) communication (e.g., D2D input-output (IO). The MDS may include, for example, a System-In-Package (SiP). The MDS may include multiple dies coupled to a common base die (e.g., interposer) and/or otherwise integrated into a same package. The dies may include heterogenous dies of different types and/or capabilities. Additionally, or alternatively, the dies may include multiple similar/same dies. For example, the dies may include one or more processor dies, memory dies, graphics processor dies, input-output (IO) dies, power management dies, and/or other suitable types of die.
Aspects of various embodiments herein may include, but are not limited to:
In various embodiments, the resonant clocking circuit may be implemented in a multi-die system using a passive or active interposer (also referred to as a base die).
In some embodiments, the multi-die system 200 may include an active base die 204. For example,
The resonant rings in the base die 204 may enable the dies 202 to tap synchronized clock signals with deterministic phase points. In some embodiments, the base die 204 may include bumps 214 coupled to a lower surface of the base die 204, e.g., to mount the multi-die system on a motherboard or another circuit structure. The bumps 214 may be larger (e.g., C4 bumps) than the μ-bumps 206 used to couple the die 202 to the base die 204 in some embodiments.
Silicon interposer-based systems allow for integration of heterogeneous dies capitalizing on the yield and cost benefits. The footprint on the interposer is important because passive interposers demonstrate superior yield with cost reduction through die partitioning, while active interposers demonstrate superior performance while trading-off with cost/yield. Embodiments herein enable the resonant clocking circuit to be used with either a passive or active interposer.
In various embodiments, the synchronized resonant clock signal described herein may be used for die-to-die (D2D) communication (also referred to as D2D input-output (IO)), e.g., in a multi-die system. In some embodiments, the D2D communication may use alternate phase tap-off points at the transmit (Tx) side and receive (Rx) side (e.g., at the Tx side serializer and Rx side deserializer), as described further below.
The figures of merit for D2D IO include the bandwidth (BW)/mm and energy/bit. In prior implementations, D2D IO typically includes a phase-locked loop (PLL) on the Tx side to generate high speed edges, which are used to serialize data and a strobe bundle (e.g., for higher BW/mm). This bundle is forwarded to the receiver, where it is deserialized and transitioned to the Rx die clock domain (e.g., using a clock-domain-crossing (CDC) first-in/first-out (FIFO)).
This IO clocking infrastructure (e.g., PLLs on the Tx side, drivers for strobe forwarding, and delay-locked loops (DLLs) on the Rx side) contributes to a dominant chunk of overall energy/bit of the D2D IO and occupies significant area footprint. To improve energy/bit, schemes typically adopt approaches such as voltage scaling or balancing the ratio of data lines/strobe.
In accordance with various embodiments herein, a rotary oscillator array may be laid out across the base die (e.g., interposer), providing deterministic phase points across the multi-die system (e.g., SiP). In embodiments, this resonant clock may be used as the common IO clock for D2D communication. For example, the resonant clock signal may be tapped at deterministic phase points at the Tx and Rx side of D2D IO within respective dies (e.g., chiplets). The use of the synchronized resonant clock signal may avoid the need for Tx-side PLLs, strobe forwarding, and Rx-side DLLs of prior techniques, thereby reducing the overall energy/bit and/or area footprint of D2D IO.
In some embodiments, alternate phase tap-off points may be used at the Tx side and Rx side for D2D IO. For example, on the Tx side, data may be serialized using Phase-A of the resonant clock signal. On the Rx side, data may be captured using Phase-B of the resonant clock. The captured data may be de-serialized and passed to CDC FIFO. In one example, Phase-B leads Phase-A in phase, e.g., by 45-135 degrees. The phase lead between Phase-B and Phase-A may be used to improve setup margin of the IO scheme (e.g., by 16-37%), thereby enabling faster operation.
The D2D IO scheme described herein may be complementary to voltage scaling techniques. Accordingly, voltage scaling may also be used in some embodiments. Additionally, the lack of strobe forward lines means that more data lines can be included in the same circuit area.
The dies 302a-f may further include respective IO circuitry (e.g., PHY circuitry) to communicate with other dies of the dies 302a-f. For example,
In embodiments, the IO circuitries 308a-b may communicate (e.g., transmit and/or receive data) with one another via a bus 310. The bus 310 may include one or more communication paths (e.g., data wires). For example, the bus 310 may include a plurality of communication paths for parallel communication. In various embodiments, the resonant clock signals described herein may be used at the dies 302a-b to serialize data (e.g., at the Tx side) and/or de-serialize data (e.g., at the Rx side) that is transmitted on the bus 310.
For example,
The IO circuitry 308b may include one or more Rx drivers 320 coupled to the bus 310, one or more deserializers 322 coupled to respective Rx drivers 320, and logic 324 coupled to the one or more deserializers 322. The one or more Rx drivers 320 may receive the serialized data and pass the serialized data to respective deserializers 322. The deserializers 322 may tap a resonant clock 318b from a resonant clock circuitry (e.g., resonant rings 306 of
The IO circuitry 308a-b is merely one example, and other configurations of IO circuitry may be used with resonant clock signals in accordance with various embodiments herein. For example, some embodiments may not use serializers and deserializers. In one such example, the transmitter may send data directly based on a Tx clock. The Tx clock may be the resonant clock signal or a frequency-adjusted version of the resonant clock signal. For example, the resonant clock signal may be used as a global clock for the multi-die system, and the IO circuitries may use transmit/receive clocks that have a lower frequency than the global clock.
In some embodiments, the resonant clock signal may be tapped off at equal phase points at the Tx side and Rx side. Accordingly, the entire period of the clock signal may be used as the D2D transmission window (e.g., flight time+setup margin).
For example,
In other embodiments, a multi-phase tap-off scheme may be used for D2D IO.
In various embodiments, the tapped clock signal used at the Rx side may lead in phase the tapped clock signal used at the Tx side, e.g., by 45-135 degrees. For example, consider the second die 602b transmitting the first die 602a in
In various embodiments, transmitting between any two phase points on the same line (e.g., corresponding tap points 606a-b, increases the transmission window, e.g., by ⅛th to ⅜th of overall period.
In some embodiments, the D2D IO traffic may be asynchronous between the two dies (as the timing is determined by the phase relationship between Tx side clock and IO clock). On the Rx side, the data may be de-serialized and handed off to the CDC FIFO.
In embodiments, the interposer 904 may include a resonant ring 906 disposed across the region adjoining the dies 902a and 902b. For example, a first long edge 908a of the resonant ring 906 may be partially or completely under the first die 902a and a second long edge 908b of the resonant ring 906 may be partially or completely under the second die 902b.
The dies 902a-b may include respective D2D PHY circuitry 910a-b. In some embodiments, the D2D PHY circuitry 910a-b may be above the respective long edge 908a-b of the resonant ring 906 The D2D PHY circuitry 910a-b may include data serializers and/or de-serializers for data transmission. In some embodiments, the D2D PHY circuitry 910a-b may further include inverter pairs to excite the resonant ring 906. The resonant ring 906 provides a common strobe, with a deterministic phase at any tap off point. This strobe may be tapped off by the D2D PHY circuitry 910a-b from the nearest points both at the serializer and the de-serializer to transition between parallel data to serial bit stream.
For example, at die 902a, the resonant ring 906 may be tapped to pull the high-speed IO clock, which is used to serialize data and transmit the serialized data to die 902b (e.g., via D2D interconnects 912). At die 902a, a local IO clock copy may be tapped off the nearest point to the resonant ring and used to deserialize data. The deserialized data may be passed to a clock-domain crossing (CDC) first-in, first-out (FIFO) circuit.
The deterministic phase difference between tap off points at the dies 902a-b (e.g., as described in the next section and elsewhere herein) ensures a reliable setup/hold margin for the data going across the dies 902a-b.
Typically, the edge of a die is much longer than the die to die spacing.
In embodiments herein, an alternate track tap out scheme may be used. For example, if a signal is transmitted using the clock derived from the outer ring 1002b, it is received using the clock derived from the inner ring 1002a.
As an example, consider the IO located close to the middle of the resonant ring 1000. At a first die (e.g., the die 902a of
Serializers in the first die use the nearest resonant ring tap off point to transmit data. In
Various embodiments herein include custom rotary ring structures for a rotary oscillator array (e.g., for D2D IO). The custom rotary rings may include on-chip interconnects and inverter pairs that are terminated mobiusly (as described herein) to generate a resonating clock signal with 50% duty cycle. The custom rings and/or custom rotary oscillator arrays may be used for tapping clock signals for D2D IO. The resonant rings oscillate to generate an IO clock with deterministic phase points across dies. The IO clock may be used to serialize and de-serialize data.
For example,
As with the regular resonant rings, the custom resonant rings may be implemented in the interposer (e.g., silicon interposer). In some embodiments, the inverter pairs to excite the resonant ring may be implemented in the dies that are coupled to the interposer and/or in the interposer itself. The inverter pairs may replace conventional strobe infrastructure (e.g., PLLs, strobe drivers, DLLs) of prior IO circuits.
The techniques described herein may enable the use of custom rotary rings that may be employed for D2D IOs in multi-die systems (e.g., heterogeneous multi-die systems). The custom rings may be coupled to one another to form custom rotary oscillator arrays to distribute the required clocks across a large area (e.g., the whole reticle). Embodiments may include chiplet-aware resonant array implementation to identify the required clock tap-points for D2D IOs. Accordingly, the shape of the resonant rings may be designed to provide tap points at desired locations to the top dies coupled to the interposer. The traveling wave scheme provides deterministic delay, which may facilitate use in D2D IO circuits. This scheme may enable the use of either the same phase points on multiple custom rings and/or different phase points with deterministic delays on the custom rings for D2D IO.
With the resonant traveling wave scheme, it is possible to tap the clock signals from different points of the custom ring and provide them as inputs to the chiplets. As the delay/phase at the tapping points are deterministic, the difference in the phase/delay is used as the transmission window.
For instance, for a 4 GHz resonant ring in
Other schemes for D2D IO using an array of custom rings may be used in accordance with various embodiments, For example,
Further examples of custom rotary array schemes are depicted in
In a rotary traveling wave oscillator (RTWO), the clock signal continues to move in an uninterrupted fashion until it encounters another wave along the medium or until it encounters a boundary with another medium. For a RTWO, the distributed inverter pairs enable the multiple phases. Rotary traveling waves may be implemented using square rings and/or custom rings, as described herein. Both square and custom rings can be distributed using array structures as described herein. A sample RTWO 1800 with a square ring rotary structure is shown in
In a standing wave (SW) scheme using transmission lines, each point on the transmission line generates a sine wave with different amplitude due to the parasitic losses. The ring-based standing wave clocking topology is motivated by the goal of combining the energy recycling feature of the rotary clock scheme with the constant phase (across all points in the ring) of the standing wave oscillator. The mobius termination back to the source is used where the standing wave ring is a single cross coupled rotary wave oscillator. A sample standing wave oscillator 1850 with a square ring standing wave structure is shown in
The implications of having the mobius connection at the cross coupled inverters location is that the ring's clock information is dual phased. A clock recovery circuit is used to obtain the required clock. Note that, due to the dual phased nature of the clock, the clock recovery circuits on one side needs to have their polarity reversed compared to the ones on the other side to enable same phase tapping. Equal and opposite phased waves will meet at the middle of this differential loop. A traveling wave originated due to wire losses will find its opposite wave at this middle and cancel the opposite wave.
In the RTWO structure, due to the propagation of the wave in one direction of the transmission line, the multiple-phase signals can be obtained from different positions on the transmission line. In case of a standing wave oscillator (SWO), the generated signals have the same phase and different amplitudes. Both the RTWO and SWO circuits have the same property of distributing high frequency clock with low skew and low jitter which can be used for global clocking.
Various embodiments herein may use standing wave oscillators for D2D IO. The standing wave oscillators may include rectangular (e.g., square or other rectangle) rings, and/or custom (e.g., rectilinear) rings. The oscillators may include interconnects and inverter pairs (e.g., on the chip and/or interposer) that are terminated mobiusly to generate a resonating clock signal. The embodiments may enable the use of resonant ring with standing wave clocks that can be employed for D2D IO in any multi-die system.
As discussed above with respect to the traveling wave oscillator embodiments, the ring structures may be implemented in the interposer (e.g., silicon interposer). The inverter pairs may be implemented in the dies that are coupled to the interposer and/or in the interposer itself. The ring oscillators may be stacked to form standing wave oscillator arrays to distribute the required clocks across the whole reticle. Embodiments may include a chiplet-aware resonant standing wave array implementation to identify the required clock tap-points for D2D IO.
One of the key properties of standing wave rings is that the clock phase is constant across the rings. However, the amplitude varies. A clock recovery circuit may be used to extract the square wave clock. The clocks can be tapped out of these structures with clock-recovery circuits and provided across dies which are used to serialize and de-serialize data. Thus, the standing wave rings enable constant phase clocks. Accordingly, the clock signals may be tapped from different convenient points (e.g., with inherent synchronization/phase alignment by construction) on the ring structures and provided as respective inputs to the dies (e.g., for D2D IO).
The clock signals from the rotary oscillator 1906 may be provided to the dies 1902a-c. For example,
Furthermore, the resonant rings may be implemented to enable a favorable transmit/receive window for D2D IO, e.g., depending on the architecture and/or placement of the dies 2102a-e. Thus, a chiplet-placement aware resonant rotary clocking scheme may be implemented on the interposer for efficient D2D IO. Note that, as discussed herein, this scheme may be extended to custom ring based standing wave oscillator arrays and other array topologies.
The system 3750 includes processor circuitry in the form of one or more processors 3752. The processor circuitry 3752 includes circuitry such as, but not limited to one or more processor cores and one or more of cache memory, low drop-out voltage regulators (LDOs), interrupt controllers, serial interfaces such as SPI, I2C or universal programmable serial interface circuit, real time clock (RTC), timer-counters including interval and watchdog timers, general purpose I/O, memory card controllers such as secure digital/multi-media card (SD/MMC) or similar, interfaces, mobile industry processor interface (MIPI) interfaces and Joint Test Access Group (JTAG) test access ports. In some implementations, the processor circuitry 3752 may include one or more hardware accelerators (e.g., same or similar to acceleration circuitry 3764), which may be microprocessors, programmable processing devices (e.g., FPGA, ASIC, etc.), or the like. The one or more accelerators may include, for example, computer vision and/or deep learning accelerators. In some implementations, the processor circuitry 3752 may include on-chip memory circuitry, which may include any suitable volatile and/or non-volatile memory, such as DRAM, SRAM, EPROM, EEPROM, Flash memory, solid-state memory, and/or any other type of memory device technology, such as those discussed herein.
The processor circuitry 3752 may include, for example, one or more processor cores (CPUs), application processors, GPUs, RISC processors, Acorn RISC Machine (ARM) processors, CISC processors, one or more DSPs, one or more FPGAs, one or more PLDs, one or more ASICs, one or more baseband processors, one or more radio-frequency integrated circuits (RFIC), one or more microprocessors or controllers, a multi-core processor, a multithreaded processor, an ultra-low voltage processor, an embedded processor, or any other known processing elements, or any suitable combination thereof. The processors (or cores) 3752 may be coupled with or may include memory/storage and may be configured to execute instructions stored in the memory/storage to enable various applications or operating systems to run on the platform 3750. The processors (or cores) 3752 is configured to operate application software to provide a specific service to a user of the platform 3750. In some embodiments, the processor(s) 3752 may be a special-purpose processor(s)/controller(s) configured (or configurable) to operate according to the various embodiments herein.
As examples, the processor(s) 3752 may include an Intel® Architecture Core™ based processor such as an i3, an i5, an i7, an i9 based processor; an Intel® microcontroller-based processor such as a Quark™, an Atom™, or other MCU-based processor; Pentium® processor(s), Xeon® processor(s), or another such processor available from Intel® Corporation, Santa Clara, California. However, any number other processors may be used, such as one or more of Advanced Micro Devices (AMD) Zen® Architecture such as Ryzen® or EPYC® processor(s), Accelerated Processing Units (APUs), MxGPUs, Epyc® processor(s), or the like; A5-A12 and/or S1-S4 processor(s) from Apple® Inc., Snapdragon™ or Centriq™ processor(s) from Qualcomm® Technologies, Inc., Texas Instruments, Inc.® Open Multimedia Applications Platform (OMAP)™ processor(s); a MIPS-based design from MIPS Technologies, Inc. such as MIPS Warrior M-class, Warrior I-class, and Warrior P-class processors; an ARM-based design licensed from ARM Holdings, Ltd., such as the ARM Cortex-A, Cortex-R, and Cortex-M family of processors; the ThunderX2® provided by Cavium™, Inc.; or the like.
In some implementations, the processor(s) 3752 and/or other components of the system 3750 may be a part of a system on a chip (SoC), System-in-Package (SiP), a multi-chip package (MCP), and/or the like, in which the processor(s) 3752 and other components are formed into a single integrated circuit, or a single package, such as the Edison™ or Galileo™ SoC boards from Intel® Corporation. Other examples of the processor(s) 3752 are mentioned elsewhere in the present disclosure. In embodiments, two or more components of the system 3750 may be on different dies that are coupled to a same base die. The base die may include resonant rings of a ROA, as described herein. The dies may tap the clock signal from the resonant rings at deterministic phase points, e.g., for D2D IO communication and/or other purposes.
The system 3750 may include or be coupled to acceleration circuitry 3764, which may be embodied by one or more artificial intelligence (AI)/machine learning (ML) accelerators, a neural compute stick, neuromorphic hardware, an FPGA, an arrangement of GPUs, one or more SoCs (including programmable SoCs), one or more CPUs, one or more digital signal processors, dedicated ASICs (including programmable ASICs), PLDs such as complex (CPLDs) or high complexity PLDs (HCPLDs), and/or other forms of specialized processors or circuitry designed to accomplish one or more specialized tasks. These tasks may include AI/ML processing (e.g., including training, inferencing, and classification operations), visual data processing, network data processing, object detection, rule analysis, or the like. In FPGA-based implementations, the acceleration circuitry 3764 may comprise logic blocks or logic fabric and other interconnected resources that may be programmed (configured) to perform various functions, such as the procedures, methods, functions, etc. of the various embodiments discussed herein. In such implementations, the acceleration circuitry 3764 may also include memory cells (e.g., EPROM, EEPROM, flash memory, static memory (e.g., SRAM, anti-fuses, etc.) used to store logic blocks, logic fabric, data, etc. in LUTs and the like.
In some implementations, the processor circuitry 3752 and/or acceleration circuitry 3764 may include hardware elements specifically tailored for machine learning and/or artificial intelligence (AI) functionality. In these implementations, the processor circuitry 3752 and/or acceleration circuitry 3764 may be, or may include, an AI engine chip that can run many different kinds of AI instruction sets once loaded with the appropriate weightings and training code. Additionally or alternatively, the processor circuitry 3752 and/or acceleration circuitry 3764 may be, or may include, AI accelerator(s), which may be one or more of the aforementioned hardware accelerators designed for hardware acceleration of AI applications. As examples, these processor(s) or accelerators may be a cluster of artificial intelligence (AI) GPUs, tensor processing units (TPUs) developed by Google® Inc., Real AI Processors (RAPS™) provided by AlphaICs®, Nervana™ Neural Network Processors (NNPs) provided by Intel® Corp., Intel® Movidius™ Myriad™ X Vision Processing Unit (VPU), NVIDIA® PX™ based GPUs, the NM500 chip provided by General Vision®, Hardware 3 provided by Tesla®, Inc., an Epiphany™ based processor provided by Adapteva®, or the like. In some embodiments, the processor circuitry 3752 and/or acceleration circuitry 3764 and/or hardware accelerator circuitry may be implemented as AI accelerating co-processor(s), such as the Hexagon 685 DSP provided by Qualcomm®, the PowerVR 2 NX Neural Net Accelerator (NNA) provided by Imagination Technologies Limited®, the Neural Engine core within the Apple® A11 or A12 Bionic SoC, the Neural Processing Unit (NPU) within the HiSilicon Kirin 3770 provided by Huawei®, and/or the like. In some hardware-based implementations, individual subsystems of system 3750 may be operated by the respective AI accelerating co-processor(s), AI GPUs, TPUs, or hardware accelerators (e.g., FPGAs, ASICs, DSPs, SoCs, etc.), etc., that are configured with appropriate logic blocks, bit stream(s), etc. to perform their respective functions.
The system 3750 also includes system memory 3754. Any number of memory devices may be used to provide for a given amount of system memory. As examples, the memory 3754 may be, or include, volatile memory such as random access memory (RAM), static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), RAMBUS® Dynamic Random Access Memory (RDRAM®), and/or any other desired type of volatile memory device. Additionally or alternatively, the memory 3754 may be, or include, non-volatile memory such as read-only memory (ROM), erasable programmable ROM (EPROM), electrically erasable programmable (EEPROM), flash memory, non-volatile RAM, ferroelectric RAM, phase-change memory (PCM), flash memory, and/or any other desired type of non-volatile memory device. Access to the memory 3754 is controlled by a memory controller. The individual memory devices may be of any number of different package types such as single die package (SDP), dual die package (DDP) or quad die package (Q17P). Any number of other memory implementations may be used, such as dual inline memory modules (DIMMs) of different varieties including but not limited to microDIMMs or MiniDIMMs.
Storage circuitry 3758 provides persistent storage of information such as data, applications, operating systems and so forth. In an example, the storage 3758 may be implemented via a solid-state disk drive (SSDD) and/or high-speed electrically erasable memory (commonly referred to as “flash memory”). Other devices that may be used for the storage 3758 include flash memory cards, such as SD cards, microSD cards, XD picture cards, and the like, and USB flash drives. In an example, the memory device may be or may include memory devices that use chalcogenide glass, multi-threshold level NAND flash memory, NOR flash memory, single or multi-level Phase Change Memory (PCM), a resistive memory, nanowire memory, ferroelectric transistor random access memory (FeTRAM), anti-ferroelectric memory, magnetoresistive random access memory (MRAM) memory that incorporates memristor technology, phase change RAM (PRAM), resistive memory including the metal oxide base, the oxygen vacancy base and the conductive bridge Random Access Memory (CB-RAM), or spin transfer torque (STT)-MRAM, a spintronic magnetic junction memory based device, a magnetic tunneling junction (MTJ) based device, a Domain Wall (DW) and Spin Orbit Transfer (SOT) based device, a thyristor based memory device, a hard disk drive (HDD), micro HDD, of a combination thereof, and/or any other memory. The memory circuitry 3754 and/or storage circuitry 3758 may also incorporate three-dimensional (3D) cross-point (XPOINT) memories from Intel® and Micron®.
The memory circuitry 3754 and/or storage circuitry 3758 is/are configured to store computational logic 3783 in the form of software, firmware, microcode, or hardware-level instructions to implement the techniques described herein. The computational logic 3783 may be employed to store working copies and/or permanent copies of programming instructions, or data to create the programming instructions, for the operation of various components of system 3700 (e.g., drivers, libraries, application programming interfaces (APIs), etc.), an operating system of system 3700, one or more applications, and/or for carrying out the embodiments discussed herein. The computational logic 3783 may be stored or loaded into memory circuitry 3754 as instructions 3782, or data to create the instructions 3782, which are then accessed for execution by the processor circuitry 3752 to carry out the functions described herein. The processor circuitry 3752 and/or the acceleration circuitry 3764 accesses the memory circuitry 3754 and/or the storage circuitry 3758 over the interconnect (IX) 3756. The instructions 3782 direct the processor circuitry 3752 to perform a specific sequence or flow of actions, for example, as described with respect to flowchart(s) and block diagram(s) of operations and functionality depicted previously. The various elements may be implemented by assembler instructions supported by processor circuitry 3752 or high-level languages that may be compiled into instructions 3781, or data to create the instructions 3781, to be executed by the processor circuitry 3752. The permanent copy of the programming instructions may be placed into persistent storage devices of storage circuitry 3758 in the factory or in the field through, for example, a distribution medium (not shown), through a communication interface (e.g., from a distribution server (not shown)), over-the-air (OTA), or any combination thereof.
The IX 3756 couples the processor 3752 to communication circuitry 3766 for communications with other devices, such as a remote server (not shown) and the like. The communication circuitry 3766 is a hardware element, or collection of hardware elements, used to communicate over one or more networks 3763 and/or with other devices. In one example, communication circuitry 3766 is, or includes, transceiver circuitry configured to enable wireless communications using any number of frequencies and protocols such as, for example, the Institute of Electrical and Electronics Engineers (IEEE) 802.11 (and/or variants thereof), IEEE 802.7.4, Bluetooth® and/or Bluetooth® low energy (BLE), ZigBee®, LoRaWAN™ (Long Range Wide Area Network), a cellular protocol such as 3GPP LTE and/or Fifth Generation (5G)/New Radio (NR), and/or the like. Additionally or alternatively, communication circuitry 3766 is, or includes, one or more network interface controllers (NICs) to enable wired communication using, for example, an Ethernet connection, Controller Area Network (CAN), Local Interconnect Network (LIN), DeviceNet, ControlNet, Data Highway+, or PROFINET, among many others.
The IX 3756 also couples the processor 3752 to interface circuitry 3770 that is used to connect system 3750 with one or more external devices 3772. The external devices 3772 may include, for example, sensors, actuators, positioning circuitry (e.g., global navigation satellite system (GNSS)/Global Positioning System (GPS) circuitry), client devices, servers, network appliances (e.g., switches, hubs, routers, etc.), integrated photonics devices (e.g., optical neural network (ONN) integrated circuit (IC) and/or the like), and/or other like devices.
In some optional examples, various input/output (I/O) devices may be present within or connected to, the system 3750, which are referred to as input circuitry 3786 and output circuitry 3784 in
The components of the system 3750 may communicate over the IX 3756. The IX 3756 may include any number of technologies, including ISA, extended ISA, 12C, SPI, point-to-point interfaces, power management bus (PMBus), PCI, PCIe, PCIx, Intel® UPI, Intel® Accelerator Link, Intel® CXL, CAPI, OpenCAPI, Intel® QPI, UPI, Intel® OPA IX, RapidIO™ system IXs, CCIX, Gen-Z Consortium IXs, a HyperTransport interconnect, NVLink provided by NVIDIA®, a Time-Trigger Protocol (TTP) system, a FlexRay system, PROFIBUS, and/or any number of other IX technologies. The IX 3756 may be a proprietary bus, for example, used in a SoC based system.
The number, capability, and/or capacity of the elements of system 3700 may vary, depending on whether computing system 3700 is used as a stationary computing device (e.g., a server computer in a data center, a workstation, a desktop computer, etc.) or a mobile computing device (e.g., a smartphone, tablet computing device, laptop computer, game console, IoT device, etc.). In various implementations, the computing device system 3700 may comprise one or more components of a data center, a desktop computer, a workstation, a laptop, a smartphone, a tablet, a digital camera, a smart appliance, a smart home hub, a network appliance, and/or any other device/system that processes data.
Some non-limiting examples of various embodiments are provided below.
Example 1 is an comprising: a base die that includes resonant rings of respective rotary oscillators, wherein the resonant rings of different rotary oscillators are shorted to one another to form a rotary oscillator array (ROA); and a first die and a second die coupled to the base die, wherein the first die is to tap a clock signal from the ROA, and transmit the serialized data to the second die based on the tapped clock signal.
Example 2 may include the apparatus of example 1 or some other example herein, wherein the resonant rings of the respective rotary oscillators include a first ring and a second ring that are cross-coupled to one another, wherein the rotary oscillators further include one or more pairs of cross-coupled inverters that are coupled between the first ring and the second ring.
Example 3 may include the apparatus of example 2 or some other example herein, wherein the inverters are included in the base die.
Example 4 may include the apparatus of example 2 or some other example herein, wherein the inverters are included in at least one of the first die or the second die.
Example 5 may include the apparatus of example 4 or some other example herein, wherein the inverters are coupled to the resonant rings via micro-bumps.
Example 6 may include the apparatus of example 2-5 or some other example herein, wherein the rotary oscillators include a first rotary oscillator and a second rotary oscillator, wherein the first ring of the first rotary oscillator is shorted to the second ring of the second rotary oscillator and the second ring of the first rotary oscillator is shorted to the first ring of the second rotary oscillator.
Example 7 may include the apparatus of example 1-6 or some other example herein, wherein the clock signal is a first clock signal, and wherein the second die is to tap a second clock signal from the ROA, and receive the data based on the second clock signal.
Example 8 may include the apparatus of example 7 or some other example herein, wherein the first die includes transmit circuitry with one or more serializers to serialize the data based on the first clock signal for transmission to the second die, and wherein the second die includes receive circuitry with one or more deserializers to deserialize the data.
Example 9 may include the apparatus of example 7-8 or some other example herein, wherein the first clock signal has a same phase as the second clock signal.
Example 10 may include the apparatus of example 7-8 or some other example herein, wherein the first clock signal has a different phase than the second clock signal.
Example 11 may include the apparatus of example 10 or some other example herein, wherein the second clock signal is ahead in phase by 45 to 135 degrees compared to first the clock signal.
Example 12 may include the apparatus of example 7, 8, 10, or 11, wherein the data is transmitted via a communication bus with multiple channels that use respective pairs of tap points, wherein the respective pairs of tap points have different phase differences between the first and second clock signals.
Example 13 may include the apparatus of example 1-12 or some other example herein, wherein the rotary oscillators are rotary traveling wave oscillators.
Example 14 may include the apparatus of example 1-9 or some other example herein, wherein the rotary oscillators are rotary standing wave oscillators.
Example 15 may include the apparatus of example 1-14 or some other example herein, wherein at least one of the resonant rings has an irregular shape.
Example 16 may include the apparatus of example 15 or some other example herein, wherein the irregular shape is a non-rectangular rectilinear shape.
Example 17 may include the apparatus of example 1-16 or some other example herein, wherein at least one of the resonant rings has a rectangular shape.
Example 18 may include the apparatus of examples 1-17 or some other example herein, wherein at least one of the rotary oscillators is operable in a traveling wave mode and a standing wave mode.
Example 19 may include the apparatus of example 18 or some other example herein, wherein the rotary oscillators include one or more switches coupled between the first ring and the second ring of the respective rotary oscillators to control whether the respective rotary oscillators are in the traveling wave mode or the standing wave mode.
Example 20 may include the apparatus of claim 19 or some other example herein, wherein the switches are to be open when the respective rotary oscillator is in the traveling wave mode, and wherein a selected one of the switches is to be closed when the respective rotary oscillator is in the standing wave mode.
Example 21 may include the apparatus of claim 1-20 or some other example herein, wherein a first resonant ring of the resonant rings has a first long side below the first die and a second long side below the second die.
Example 22 may include the apparatus of claim 21, wherein the first resonant ring is rectangular and further includes a first short side coupled between the first and second long sides, and a second short side coupled between the first and second long sides.
Example 23 may include the apparatus of claim 21-22 or some other example herein, wherein the first long side is at least partially below a first D2D PHY circuitry of the first die, and wherein the second long side is at least partially below a second D2D PHY circuitry of the second die.
Example 24 may include the apparatus of claim 1-23 or some other example herein, wherein the rotary oscillators are standing wave oscillators, and wherein the rotary oscillators include one or more clock recovery circuits coupled to the resonant rings, wherein a first clock recovery circuit of the one or more clock recovery circuits is to generate the clock signal.
Example 25 may include the apparatus of example 24 or some other example herein, wherein the first clock recovery circuit is coupled to the first and second rings of the respective resonant rings.
Example 26 may include the apparatus of example 25, wherein the clock recovery circuits are to generate a square wave.
Example 27 may include a multi-die system comprising: a base die that includes resonant rings of respective rotary oscillators, wherein the resonant rings of different rotary oscillators are shorted to one another to form a rotary oscillator array (ROA); a first die that includes transmit circuitry to: tap a first clock signal from the ROA, serialize data based on the first clock signal, and transmit the serialized data to the second die via a communication bus; and a second die that includes receive circuitry to receive the serialized data via the communication bus, tap a second clock signal from the ROA, and deserialize the data based on the second clock signal.
Example 28 may include an apparatus comprising: a base die that includes a resonant ring structure of a rotary oscillator; a first die coupled to the base die, wherein the first die includes transmit circuitry above a resonant ring, wherein the transmit circuitry is to tap a first clock signal from the resonant ring and transmit the data based on the first clock signal; and a second die coupled to the base die, wherein the second die includes receive circuitry above the resonant ring, and wherein the second die includes receive circuitry to tap a second clock signal from the resonant ring and receive the data based on the second clock signal.
Example 29 may include the apparatus of example 28, wherein the transmit circuitry is above a first long edge of the resonant ring structure and is to tap the first clock signal from the first long edge, and wherein the receive circuitry is above a second long edge of the resonant ring structure and is to tap the second clock signal from the second long edge.
Example 30 may include the apparatus of example 28 or some other example herein, wherein the rotary oscillator is a rotary traveling wave oscillator.
Example 31 may include the apparatus of example 30 or some other example herein, wherein the first clock signal has a different phase than the second clock signal.
Example 32 may include the apparatus of example 31 or some other example herein, wherein the second clock signal is ahead in phase by 45 to 135 degrees compared to first the clock signal.
Example 33 may include the apparatus of example 28 or some other example herein, wherein the rotary oscillator is a rotary standing wave oscillator.
Example 34 may include the apparatus of example 33 or some other example herein, wherein the rotary oscillator further includes a clock recovery circuit coupled to a first ring and a second ring of the resonant ring structure to generate a square wave signal as the clock signal. Example 35 may include the apparatus of any of examples 28-34 or some other example herein, wherein the transmit circuitry includes one or more serializers to serialize the data based on the first clock signal, and wherein the receive circuitry includes one or more deserializers to deserialize the data based on the second clock signal.
Example 36 may include a computer system comprising: a multi-die system (MDS) and one or more antennas coupled to the MDS to enable the computer system to wirelessly communicate with another device. The MDS may include: a base die that includes a resonant ring structure of a traveling wave rotary oscillator (RTWO) array; a first die coupled to the base die, wherein the first die includes transmit circuitry, and wherein the transmit circuitry is to tap a first clock signal from the resonant ring and serialize data based on the first clock signal; and a second die coupled to the base die, wherein the second die includes receive circuitry above the resonant ring, wherein the receive circuitry is to tap a second clock signal from the resonant ring and deserialize the data based on the second clock signal, and wherein the second clock signal has a different phase than the first clock signal.
Example 37 may include the system of example 36, wherein the second clock signal is ahead in phase by 45 to 135 degrees compared to first the clock signal.
Example 38 may include the system of example 36 or 37, wherein the data is transmitted via a communication bus with multiple channels that use respective pairs of tap points, wherein the respective pairs of tap points have different phase differences between the respective first and second clock signals.
Example 39 may include the system of example 36-38, wherein the transmit circuitry and the receive circuitry are to tap the respective first and second clock signals from a same ring of the resonant ring structure.
Example 40 may include the system of example 36-38, wherein the transmit circuitry and the receive circuitry are to tap the respective first and second clock signals from different rings of the resonant ring structure.
Example 41 may include a computer system comprising: the apparatus of any one of examples 1-40; and at least one of a memory, a communication interface, a radio frequency circuit, or one or more antennas couple to the multi-die system.
Although certain embodiments have been illustrated and described herein for purposes of description, this application is intended to cover any adaptations or variations of the embodiments discussed herein. Therefore, it is manifestly intended that embodiments described herein be limited only by the claims.
Where the disclosure recites “a” or “a first” element or the equivalent thereof, such disclosure includes one or more such elements, neither requiring nor excluding two or more such elements. Further, ordinal indicators (e.g., first, second, or third) for identified elements are used to distinguish between the elements, and do not indicate or imply a required or limited number of such elements, nor do they indicate a particular position or order of such elements unless otherwise specifically stated.
This application is a continuation of International Application No. PCT/US2022/022658, filed Mar. 30, 2022, which is hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/US2022/022658 | Mar 2022 | WO |
Child | 18821887 | US |