The present disclosure relates in general to systems and methods for generating asymmetrical clock signals for single flux quantum (SFQ) circuits and optimizing pipeline stages in SFQ circuits.
In Single Flux Quantum (SFQ) logic, information is stored in the form of magnetic flux quanta and transferred in the form of SFQ voltage pulses. Devices that implement SFQ logic use superconducting devices, such as Josephson Junction (JJ) devices, to process digital signals. An SFQ voltage pulse is produced when magnetic flux through a superconducting loop containing a JJ device changes by one flux quantum as a result of the junction switching. SFQ logic expresses 1 and 0 with a detection of whether an SFQ exists or not in superconductor circuit loops that include the JJ devices. SFQ logic requires a clock input for read and write operations. With the clock signal, for read operation, an SFQ voltage pulse is generated at the output of SFQ gate in state “1” and no SFQ voltage pulse is generated at the output of SFQ gate in state “0”.
In one embodiment, an apparatus for generating asymmetrical clock signals is generally described. The apparatus can include an integrated circuit. The integrated circuit can be configured to receive a stream of single flux quantum (SFQ) pulses. The integrated circuit can be further configured to generate at least one pair of SFQ clock signals based on the stream of SFQ pulses. Each pair of SFQ clock signals can include a first SFQ clock signal and a second SFQ clock signal that is out of phase with the first SFQ clock signal. The second SFQ clock signal can have same frequency as the first SFQ clock signal. The integrated circuit can be further configured to output at least one pair of SFQ clock signals to a processor implementing a pipeline.
In another embodiment, a system for optimizing a pipeline is generally described. The system can include a processor configured to implement a pipeline. The processor can be further configured to receive at least one pair of SFQ clock signals. Each pair of SFQ clock signals can includes a first SFQ clock signal and a second SFQ clock signal that is out of phase with the first SFQ clock signal. The second SFQ clock signal can have same frequency as the first SFQ clock signal. For each pair of SFQ clock signals, the processor can be configured to define a first clock cycle based on a delay from the first SFQ clock signal to the second SFQ clock signal. For each pair of SFQ clock signals, the processor can be configured to define a second clock cycle based on a delay from the second SFQ clock signal to the first SFQ clock signal of a next pair of SFQ signals. The second clock cycle can be greater than the first clock cycle. The processor can be further configured to assign the first clock cycle and the second clock cycle to different stages of the pipeline. The assignment can be based on an amount of delay incurred by the different stages.
In another embodiment, a method for optimizing a pipeline is generally described. The method can include generating at least one pair of SFQ clock signals based on a stream of SFQ pulses. Each pair of SFQ clock signals can include a first SFQ clock signal and a second SFQ clock signal that is out of phase with the first SFQ clock signal. The second SFQ clock signal can have same frequency as the first SFQ clock signal. The method can further include, for each pair of SFQ clock signals, defining a first clock cycle based on a delay from the first SFQ clock signal to the second SFQ clock signal. The method can further include, for each pair of SFQ clock signals, defining a second clock cycle based on a delay from the second SFQ clock signal to the first SFQ clock signal of a next pair of SFQ clock signals. The second clock cycle can be greater than the first clock cycle. The method can further include assigning the first clock cycle and the second clock cycle to different stages of a pipeline. The assignment can be based on an amount of delay incurred by the different stages.
Further features as well as the structure and operation of various embodiments are described in detail below with reference to the accompanying drawings. In the drawings, like reference numbers indicate identical or functionally similar elements.
The present application will now be described in greater detail by referring to the following discussion and drawings that accompany the present application. It is noted that the drawings of the present application are provided for illustrative purposes only and, as such, the drawings are not drawn to scale. It is also noted that like and corresponding elements are referred to by like reference numerals.
In the following descriptions, numerous specific details are set forth, such as particular structures, components, materials, dimensions, processing steps and techniques, in order to provide an understanding of the various embodiments of the present application. However, it will be appreciated by one of ordinary skill in the art that the various embodiments of the present application may be practiced without these specific details. In other instances, well-known structures or processing steps have not been described in detail in order to avoid obscuring the present application.
In an aspect, SFQ logic can provide a relatively high energy efficiency due to the size of SFQ voltage pulses being relatively small (e.g., duration in the picosecond scale). However, the delay of wires connecting SFQ devices with one another can be relatively large compared to the delay of SFQ gates. For example, in gate level pipelines being implemented by SFQ logic gates, a transition from one pipeline stage to another pipeline stage can include having a SFQ voltage pulse travel through wires connecting different SFQ logic gates, and these wires can incur delays. Further, the incurred delays can be inconsistent. For example, a delay resulting from a transition from a first pipeline stage to a second pipeline stage can be greater than a delay resulting from a transition from the second pipeline stage to a third pipeline stage. Under this inconsistency, if a single clock is being used by all stages of the gate level pipeline, then the frequency of the clock needs to be at a level that is low enough to accommodate the longer delay stages. However, a lower frequency clock may be sufficient for the shorter delay stages and performance of the circuit may degrade. The usage of one low frequency clock for both long and short stages can impact operation speed of the gate level pipeline, such as longer processing times and increased energy usage.
Integrated circuit 110 can include clock drivers and splitters configured to drive and split or divide base clock signal 106 into a plurality of SFQ clock signals 116. Integrated circuit 110 can include at least one driver, such as a driver 112, configured to receive base clock signal 106 and output N copies of base clock signal 106, labeled as 106_1 to 106_N. Integrated circuit 110 can distribute and split each copy of base clock signal 106 into a pair of SFQ clock signals to generate N pairs of SFQ clock signals 116. In one embodiment, each copy of base clock signal 106 can be directly outputted as a SFQ clock signal to processor 120, and can also undergo a delay 114 to be outputted as a delayed SFQ clock signal. For example, a copy of base clock signal 106_1 can be provided to an output pin of integrated circuit 110 to be outputted as a SFQ clock signal 116a_1, and also undergo delay 114 before being outputted as a delayed SFQ clock signal 116b_1. In one embodiment, delay 114 can be implemented by a Josephson transmission line (JTL).
In the example shown in
Integrated circuit 110 can provide SFQ clock signals 116 to processor 120. Processor 120 can be a processing element, such as a central processor unit (CPU), a processor core, a microprocessor, and/or other types of processing elements. In one embodiment, processor 120 can include a plurality of SFQ circuits, including SFQ logic gates, configured to perform digital logic operations. To be described in more detail below, processor 120 can receive SFQ clock signals 116 and define asymmetrical clock cycles based on SFQ clock signals 116. For example, processor 120 can use SFQ clock signal 116 to define a short clock cycle and a long clock cycle, where the long clock cycle has a longer time period. SFQ circuits in processor 120 can be used for implementing a pipeline, and processor 120 can categorize stages of the pipeline into different categories. For example, processor 120 can categorize each stage of the pipeline into a first delay stage (e.g., which can also be referred to as a short delay stage) or a second delay stage (e.g., which can also be referred to as a long delay stage), where the second delay stage can incur more delay or latency than the first delay stage. Processor 120 can assign the asymmetrical clock cycles to different stages of the pipeline based on the categories of the stages to optimize a utilization of clock cycles by the pipeline. For example, processor 120 can assign a first clock cycle (e.g., short clock cycle) to trigger operations of stages categorized as the first delay stage, and assign a second clock cycle (e.g., long clock cycle) to trigger operations of stages categorized as the second delay stage.
Processor 120 (see
In one or more embodiment, if the first and second delay stages (e.g., the short and long delay stages, respectively) has considerably small delay compared with the first clock cycle (e.g., the short clock cycle), processor 120 can arbitrarily assign either one of the first clock cycle or the second clock cycle to stages, regardless of whether the stages are first delay stages or second delay stages. In another embodiment, a frequency of base clock signal 106 can be increased such that cycles of the base clock signal 106 can be decreased until either the first clock cycle or the second clock cycle is comparable or sufficiently close to be equivalent to either the delay of the first delay stages or the delay of the second delay stages.
Processor 120 can classify or categorize each stage among stages 304 into different delay stages 306. Delay stages 306 can be either a first delay stage (e.g., short delay stage labeled as “S”) or a second delay stage (e.g., long delay stage labeled as “L”). In one embodiment, a first delay stage of pipeline 302 can be a stage including relatively simple gate or a gate with relatively short delay. For example, a first delay stage can implement SFQ logic gates that has no more than two input terminals (e.g., 2-to-1 AND gate, 2-to-1 OR GATE, an inverter, etc.). A second delay stage of pipeline 302 can be a stage including relatively complex gate, or a gate with relatively long delay, or a combination of considerably simple and complex gates. For example, a second delay stage can implement SFQ logic gates that has more than two input terminals and/or have logic gates followed by long wire line. In one or more embodiments, a stage that writes to and/or read from a register can be considered as a second delay stage if the register write and read takes long period of time. A stage that performs data flow (e.g., transfer data from one block to another block with short data transfer period) can be considered as a first delay stage, and the stage that performs data flow with long data transfer period can be considered as a second delay stage.
In response to categorizing stages 304 of pipeline 302, processor 120 can assign first clock cycle 210 and second clock cycle 212 to the categorized stages. In one embodiment, the clock cycles defined by processor 120 can be aligned alternately, such as SC1, LC1, SC2, LC2, . . . as shown in
In the example shown in
Subsequent to assigning long clock cycle LC2 to stage 304-4, a next clock cycle to be assigned is a short clock cycle SC3 and a next stage to be assigned with a clock cycle is stage 304-5. In one embodiment, if pipeline stage 304-5 can be an extra-long delay stage (e.g., longer than a long delay stage, denoted as XL in
By assigning short clock cycles 210 to first delay stages (e.g., short delay stages) and long clock cycles 212 to second delay stages (e.g., long delay stages), an efficiency of pipeline 302 can be improved by reducing unused time. For example, a reduction of short delay stages utilizing long clock cycles can reduce an amount of wait time when tasks of a short delay stage are completed significantly earlier than an end of a long clock cycle, leading to optimized operation speed. The systems and methods described herein can improve processing pipelines implemented by SFQ circuits because SFQ logic gates need clock signals to operate. The systems and methods described herein can also improve processing pipelines implemented by complementary metal-oxide-semiconductor (CMOS) logic especially for fast clock signal with very few combinational gate stages between D-flip Flops (DFFs) or D-Latches (DL) which receive clock signals to operate. For example, the systems and methods described herein can be applied to relatively faster systems, such as 20 gigahertz (GHz) systems, by supplying the asymmetrically separated clock signals to faster operation blocks (e.g., operation blocks that incur relatively smaller delays) instead of having these faster operation blocks use longer cycle clocks to accommodate other slower operation blocks.
In one embodiment, the gates shown as boxes with labels “S” or “C” shown in
Stage S2 includes multiple paths from stage S1 to stage S3. In this example, timing critical path includes 1-bit-length wire in 503 and one simple gate 512. Therefore stage S2 can be categorized as a short delay stage. In response to stage S2, a next available clock cycle LC1, where LC1 is a long clock cycle, can be assigned to stage S2. Since the consecutive delay stages have to be assigned clock cycles SC and LC alternately, LC1 can be assigned to S2.
Stage S3 includes multiple paths from stage S2 to stage S4. In this example, timing critical path includes 2-bit-length wire in 505 and one simple gate 504. Therefore, stage S3 can be categorized as a short delay stage. In response to stage S3, a next available clock cycle SC2, where SC2 is a short clock cycle, can be assigned to stage S3.
Stage S4 includes multiple paths from stage S3 to stage S5. In this example, timing critical path includes 4-bit-length wire in 507 and one simple gate 506. Therefore, stage S4 can be categorized as a long delay stage. In response to stage S4, a next available clock cycle LC2, where LC2 is a long clock cycle, can be assigned to stage S4.
Stage S5 includes multiple paths from stage S4 to stage S6. In this example, timing critical path includes 8-bit-length wire in 509 and one complex gate 508. Therefore, stage S5 can be categorized as an extra-long delay stage which is larger than the second clock cycle. In response to stage S5 being an extra-long delay stage, stage S5 can be divided into two delay stages, S5a and S5b, by inserting simple gates 514 such as DFFs before 508 as shown in
Stage S6 includes multiple paths from stage S5 to stage S7. In this example, timing critical path includes 8-bit-length wire in 511 and one simple gate 513. Therefore, stage S6 can be categorized as a long delay stage. In response to stage S6 being a long delay stage, it is divided into 2 delay stages S6a and S6b by inserting simple gate 515 such as DFFs before 513 as shown in
Process 600 can be performed by a system, such as system 100 shown in
In one embodiment, the integrated circuit can generate the second SFQ clock signal by applying a delay on the first SFQ clock signal. In one embodiment, the integrated circuit can generate the second SFQ clock signal by applying a delay on the first SFQ clock signal using a Josephson transmission line (JTL). In one embodiment, the integrated circuit can implement an H-tree network to distribute the at least one pair of SFQ clock signals from a root to a leaf of the H-tree network. In one embodiment, the integrated circuit can receive a stream of SFQ pulses from a DC to SFQ converter, and the at least one pair of SFQ clock signals is generated based on the received stream of SFQ pulses.
Process 600 can proceed from block 602 to block 604. At block 604, blocks 604a and 604b can be performed for each pair of SFQ clock signals. At block 604a, a processor configured to be in communication with the integrated circuit can define a first clock cycle based on a delay from the first SFQ clock signal to the second SFQ clock signal. At block 604-b, the processor can define a second clock cycle based on a delay from the second SFQ clock signal to the first SFQ clock signal of a next pair of SFQ clock signals. The second clock cycle can be greater than the first clock cycle.
In one embodiment, a duration of the first clock cycle can be equivalent to a delay from the first SFQ clock signal to the second SFQ clock signal. A duration of the second clock cycle can be equivalent to the delay from the second SFQ clock signal to the first SFQ clock signal of the next pair of SFQ clock signals, and the duration of the second clock cycle is equivalent to a difference between the first clock cycle and a cycle of the SFQ clock signal.
Process 600 can proceed from block 604 to block 606. At block 606, the processor can assign the first clock cycle and the second clock cycle to different stages of a pipeline. The assignment can be based on an amount of delay incurred by the different stages. In one embodiment, the pipeline can be a gate level pipeline being implemented by SFQ logic gates. In one embodiment, the processor can categorize different stages among the pipeline as one of a first delay stage and a second delay stage. The first delay stage can have less delay than the second delay stage. The processor can assign the first clock cycle to the first delay stages and assign the second clock cycle to the second delay stages.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the blocks may occur out of the order noted in the Figures. For example, two blocks shown in succession may, in fact, be performed substantially concurrently, or the blocks may sometimes be performed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms “a”, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” and/or “comprising,” when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
The corresponding structures, materials, acts, and equivalents of all means or steps plus function elements, if any, in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
This invention was made with Government support under Contract No.: W911NF-14-C-0090 awarded by Army Research Office (ARO/ARMY). The Government has certain rights in this invention.
Number | Name | Date | Kind |
---|---|---|---|
5077686 | Rubinstein | Dec 1991 | A |
5371417 | Mirov | Dec 1994 | A |
6281728 | Sung | Aug 2001 | B1 |
6404247 | Wang | Jun 2002 | B1 |
6750794 | Durand | Jun 2004 | B1 |
6759974 | Herr | Jul 2004 | B1 |
6922066 | Hidaka | Jul 2005 | B2 |
7786786 | Kirichenko | Aug 2010 | B2 |
8023605 | Tsukamoto | Sep 2011 | B2 |
8250395 | Carter et al. | Aug 2012 | B2 |
8327158 | Titiano et al. | Dec 2012 | B2 |
8476962 | Pelley | Jul 2013 | B2 |
8516426 | Bose et al. | Aug 2013 | B2 |
9646682 | Miller | May 2017 | B1 |
9710586 | Muller et al. | Jul 2017 | B2 |
9733978 | Suarez et al. | Aug 2017 | B2 |
9876505 | Dai | Jan 2018 | B1 |
9998122 | Hamilton et al. | Jun 2018 | B2 |
10069410 | Chang et al. | Sep 2018 | B1 |
10222416 | Inamdar et al. | Mar 2019 | B1 |
10529437 | Jang | Jan 2020 | B2 |
10599481 | Pistol et al. | Mar 2020 | B2 |
10651808 | Egan et al. | May 2020 | B2 |
10658335 | Gu et al. | May 2020 | B2 |
10680617 | Rylov | Jun 2020 | B2 |
10726351 | Li | Jul 2020 | B1 |
10795853 | Nassif et al. | Oct 2020 | B2 |
10950299 | Mukhanov et al. | Mar 2021 | B1 |
11137822 | Gelman et al. | Oct 2021 | B2 |
20020060635 | Gupta | May 2002 | A1 |
20030016069 | Furuta | Jan 2003 | A1 |
20040042255 | Labrum | Mar 2004 | A1 |
20040179421 | Kim | Sep 2004 | A1 |
20050036254 | Premerlani et al. | Jan 2005 | A1 |
20050047245 | Furuta | Mar 2005 | A1 |
20050270870 | Shin | Dec 2005 | A1 |
20060288196 | Unsal | Dec 2006 | A1 |
20060290553 | Furuta | Dec 2006 | A1 |
20070064135 | Brown | Mar 2007 | A1 |
20090160492 | Hailu | Jun 2009 | A1 |
20100127679 | Satterfield | May 2010 | A1 |
20100229034 | Kanaya | Sep 2010 | A1 |
20100295584 | Sano | Nov 2010 | A1 |
20120007638 | Meng | Jan 2012 | A1 |
20130305078 | Lee | Nov 2013 | A1 |
20140266327 | Ancis et al. | Sep 2014 | A1 |
20170097655 | Jeon | Apr 2017 | A1 |
20200044632 | Powell, III | Feb 2020 | A1 |
20210081209 | Meswani et al. | Mar 2021 | A1 |
20210208803 | Myers et al. | Jul 2021 | A1 |
20210226635 | Mukhanov et al. | Jul 2021 | A1 |
20210271288 | Seo | Sep 2021 | A1 |
20220021391 | Pasandi | Jan 2022 | A1 |
20220094339 | Kim | Mar 2022 | A1 |
20220255541 | Zlotnik | Aug 2022 | A1 |
Number | Date | Country |
---|---|---|
3936889 | Jun 2007 | JP |
4402136 | Jan 2010 | JP |
2004102628 | Nov 2004 | WO |
Entry |
---|
Johnson, Kevin, “Design and implementation of an asynchronous version of the MIPS R3000 microprocessor”, Jan. 1994, Rochester Institute of Technology, pp. 2-4 (Year: 1994). |
C. Fourie, “Single Flux Quantum Circuit Technology and CAD overview,” 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 2018, pp. 1-6, doi: 10.1145/3240765.3243498. |
P. Yuh, “A 512-bit shift register using compact two-phase single flux quantum clock generators with large margins and low power,” in IEEE Transactions on Applied Superconductivity, Dec. 1993, pp. 3116-3118, vol. 3, No. 4, doi: 10.1109/77.251814. |
G. Pasandi, A. Shafaei and M. Pedram, “SFQmap: A Technology Mapping Tool for Single Flux Quantum Logic Circuits,” 2018 IEEE International Symposium on Circuits and Systems (ISCAS), 2018, pp. 1-5, doi: 10.1109/ISCAS.2018.8351603. |
A. Chattopadhyay and Z. Zilic, “Flexible and Reconfigurable Mismatch-Tolerant Serial Clock Distribution Networks,” in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Mar. 2012, pp. 523-536, vol. 20, No. 3, doi: 10.1109/TVLSI.2011.2104982. |
List of IBM Patents or Patent Applications Treated as Related, filed herewith, 2 pages. |
Montoye, R.K., et al., Stacked, Reconfigurable Co-Regulation of Processing Units for Ultra-Wide DVFS, U.S. Appl. No. 17/729,638, filed Apr. 26, 2022, 76 pages. |
Bunyk Paul. “RSFQ Subsystem for Petaflops-Scale Computing: “COOL-0”.” SUNY Stony Brook, Jan. 1999. pp. 1-7. |
Feldman Marc J. “Josephson Junction Digital Circuits—Challenges and Opportunities.” FED Report, Feb. 1998. pp. 1-22. |
Filippov et al., “20 GHz operation of an asynchronous wave-pipelined RSFQ arithmetic-logic unit,” Physics Procedia, 2012, pp. 59-65, vol. 36. |
Filippov et al., “8-bit asynchronous wave-pipelined RSFQ arithmetic-logic unit,” IEEE Transactions On Applied Superconductivity, 2011, pp. 847-851, vol. 21, No. 3. |
Fujimaki, et al. “Large-scale integrated circuit design based on a Nb nine-layer structure for reconfigurable data-path processors.” IEICE Trans> Electron., Mar. 2014. pp. 157-165, vol. E97-C, No. 3. |
Hirsch J. E. et al. , “What is the speed of the supercurrent in superconductors?” arXiv: 1605.09469v4, Jul. 2016. pp. 1-15. |
Jabbari et al. “H-tree clock synthesis in RSFQ circuits”, In2020 17th Biennial Baltic Electronics Conference (BEC), IEEE, Oct. 6, 2020 (pp. 1-5). |
Kito et al. “Rapid Single-Flux-Quantum Truncated Multiplier Based on Bit-Level Processing” IEICE Trans. Electron, Jul. 2019. pp. 607-611, vol. E102-C, No. 7. |
Likharev et al., “RSFQ Logic/Memory Family: A New Josephson-Junction Technology for Sub-Terahertz-Clock-Frequency Digital Systems.” IEEE Transactions on applied Superconductivity, Mar. 1991, pp. 1-26, vol. I, No. 1. |
Mukhanov et al. “New elements of the RSFQ logic/memory family (part 1),”. 3rd ISEC Extended Abstracts, (Glasgow, UK), Aug. 1991, pp. 196-199. |
Mukhanov, et al. “New elements of the RSFQ logic family.” IEEE Trans. Magnetics, Mar. 1991, pp. 2436-2438, vol. 27, No. 2. |
NSA, “Superconducting Technology Assessment”, Aug. 2005. pp. 1-257. |
Osman, et al. “Simplified Josephson-junction fabrication process for reproducibly high-performance superconducting qubits.” arXiv:2011.05230v1. Nov. 2020. pp. 1-6. |
Takagi, et al., “Circuit description and design flow of superconducting SFQ logic circuits.” IEICE Trans. Electron, Mar. 2014. pp. 149-156, vol. E97-C, No. 3. |
Wittie et al. “CNET: Design of an RSFQ switching network for petaflops-scale computing”, IEEE transactions on applied superconductivity, Jun. 1999, pp. 4034-4039, vol. 9, No. 2. |
Number | Date | Country | |
---|---|---|---|
20230344432 A1 | Oct 2023 | US |