The various embodiments described herein are related to application specific integrated circuits (ASICs), and more particularly to the design of various ASICs.
Continuing advances in semiconductor device fabrication technology have yielded a steady decline in the size of process nodes. For example, 7 nanometer (nm) process nodes were introduced in 2017 but were quickly succeeded by 5 nm nm fin-field-effect-transistors (FinFETs) in 2018 while 3 nm gate-all-around-field-effect-transistors (GAAFETs) process nodes are projected for commercialization by end of 2021.
The decrease in process node size allows a growing number of intellectual property (IP) cores or IP blocks to be placed on a single ASIC chip. Latest ASIC designs often use a comparatively large silicon die and include combinations of independent IP blocks and logic functions. At the same time, modern applications also require increased connectivity and large data transfers between various IP blocks. The vast majority of modern ASIC chips are heterogenous systems to enable optimization of performance and power figures for the numerous IPs, as well as multi-core implementations, leading to a very complicated interconnect sub-system.
All indications point to an even higher levels of integration and data processing in further System on Chips (SoCs) in the year to come. This will allow even more functions to be added, making systems more complex, more intelligent, more power efficient while putting even more pressure on the interconnect fabric.
Interconnect fabrics have changed over time to address requirements of evolving systems. Traditional busses (such as AMBA AHB) have evolved over time, to more intelligent crossbars and later hierarchical crossbars which enabled faster data switching among multiple ports or port domains. Once the number of busses and data width grew to an unmanageable amount, the industry responded with more flexible packetized approach (as it was done previously for computer hardware networks) through the development of Network on Chips (NoCs).
NoCs have been able to handle bandwidth more efficiently by utilizing packetization and Quality of Service (QoS) channel prioritization strategies. NoC started as a centralized IP, more like a smarter crossbar with a certain number of input ports and output ports, regulated by specific routing rules. Once SoC size started to grow significantly, the distance between IPs became significant, at that time the centralized NoC slowly transformed into a distributed NoC, where individual routers were dispersed across the silicon area following a specific arrangement (such as ring, torus, mesh, etc.) and connected to each other to create a network.
Modern SoCs for Artificial Intelligence (AI) and Machine Learning (ML) requires high throughout and most importantly low latency architectures. Data must move between GPUs, TMUs or CPUs and the Memory system with minimum latency, because most of the operations use a very large amount of data and repeated linear matrices operations.
In a traditional Synchronous NoC the common way to minimize latency relies on running the system at the highest clock frequency possible. This approach generates two issues:
Therefore, what is needed are an apparatus and method that overcome these significant problems found in the aforementioned conventional approach to ASIC design, as well as a way of routing the information among the different IPs efficiently and with minimized latency.
Apparatuses and methods for ASIC design are provided.
In one embodiment, a centralized Network-on-Chip (NOC) system is disclosed. The NOC system comprises a plurality of intellectual property (IP) blocks; a centralized switch block; and communication channels coupled between the centralized switch block and one or more of the plurality of IP blocks, wherein each of the communication channels is configured (i) to transmit data between the centralized switch block and the one or more of the plurality of IP blocks and (ii) to encode the data using delay insensitive coding and transmit the encoded data using a quasi-delay insensitive logic and a clock-less temporal compression ratio.
In another embodiment, a System on Chip (SoC) using network-on-chip (NoC) sub-units is disclosed. The SoC comprises: a high speed (HS) switch block; a medium speed (MS) switch block; one or more fast IP blocks; one or more medium speed IP blocks; first communication channels coupled between the HS switch block and each of the one or more fast IP blocks; second communication channels coupled between the MS switch block and each of the one or more medium speed IP blocks; and a third communication channel coupled between the HS switch block and the MS switch block, wherein each of the first communication channels, the second communication channels, and the third communication channel is configured to encode data using delay insensitive coding and transmit the encoded data using a quasi-delay insensitive logic circuit and a clock-less temporal compression ratio.
Other features and advantages of the present inventive concept should be apparent from the following description which illustrates by way of example aspects of the present inventive concept.
The above and other aspects and features of the present inventive concept will be more apparent by describing example embodiments with reference to the accompanying drawings, in which:
While certain embodiments are described, these embodiments are presented by way of example only, and are not intended to limit the scope of protection. The methods and systems described herein may be embodied in a variety of other forms. Furthermore, various omissions, substitutions, and changes in the form of the example methods and systems described herein may be made without departing from the scope of protection.
This invention describes an Advanced Centralized Chronos NoC which is able to efficiently satisfy the interconnect traffic requirement of modern SoC, simplifying top level timing closure while providing high throughput and low latency.
To implement a Chronos Channel in a target technology, different circuits can be employed.
An encoder 111 is responsible for transforming the input data (e.g., input data received from a producer IP block to be transmitted to a consumer IP block), which is represented using “m” wires, into encoded data that uses “k” wires and a specific DI code. A Chronos Channel requires “j” encoders 111, where “j” is the size of the input data divided by the size of the DI code of choice. Also, encoder blocks 111 may require input control signals to indicate the validity of the data in their inputs. A clock signal (clockA) can be used for synchronous data inputs and an enable signal (enableA) can be used to enable or disable data consumption in order to fulfil specific data transmission protocol requirements. These encoder blocks 111 also generate an output control signal to indicate when the Chronos Channel is full and cannot accept new data. Note that data in either the inputs or the outputs of an encoder 111 can be digital or analog.
The TC 112 splits a “j” sized set of encoded data in “j/i” (or the temporal compression ratio) “i” sized sets of encoded data. Then, the TC 112 issues each of the “j/i” sets in its outputs, one at a time. To control the flow of this data, the handshake protocol defined by the choice of DI code is used. Note that the maximum time to transmit each of the “j/i” sets is the delay of the slot defined by the target cycle time divided by the compression ratio. In this way, and assuming that the remaining parts of the circuit will also be able to consume the data while guaranteeing cycle time performance, all the “j/i” sets will be sent in one cycle time. The outputs of the TC 112 can feed either a repeater 130 or the TD 122 directly. Also, note that in case “j/i” is not a natural number, but rather a positive rational number, the TC 112 will use only the required number of its outputs in the transmission of the last slots of data. Nevertheless, the division of the cycle time in slots will still be a natural number defined as the ceiling function of “j/i”.
Repeaters 130 have memory elements and are capable of holding encoded data and sending it to a next repeater or the TD 122. To control the flow of this data, the handshake protocol defined by the choice of DI code is used. Furthermore, the maximum time to transmit each of the “j/i” sets is also the delay of the slot defined by the target cycle time divided by the compression ratio. Note that repeaters 130 may or may not be required in a Chronos Channel, as they are used to fix slot delay violations in long paths that fail to meet cycle time requirements or to improve signal strength. Also, note that different numbers of repeaters 130 may be required for the different outputs of a TC 112. This is valid because, in a Chronos Channel, there is no global control signal dictating how events flow through the data path. Rather, each path from an output of a TC 112 to the input of a TD 122 has an independent flow control. Again, the only restriction is the specified cycle time.
The TD 122 merges “q/i” sets of encoded data, each with size “i”, in a single set of encoded data with size “q”. Then the TD 122 issues the whole “q” sized set in its outputs, which feed the decoder blocks 121. To control the flow of this data, the handshake protocol defined by the choice of DI code is used. In this circuit, the maximum time to consume each of the “q/i” sets is the delay of the slot defined by the target cycle time divided by its compression ratio. Note that, in some embodiments, TDs 122 can have a different compression ratio than that of the TC 112 and can generate sets with a different size from those originally consumed by the TC 112. This is particularly useful when connecting transmitters and receivers with different clock frequencies. Also, if the compression ratio of the TD 122 is a positive rational number, it will only use the required number of its inputs in the consumption of the last slots of data.
The decoder 121 is responsible for transforming input encoded data, which is represented using “k” wires and a specific DI code, back to the original input data that used “m” wires. In various embodiments, the decoder 121 is configured to transform the input encoded data to form a representation of the data signals input to the encoders 111, the representation being compliant to an input data format of the consumer IP block. To decode data, a Chronos Channel needs “q” decoders, as defined in the compression ratio of the TD 133. A decoder block may also require input control signals to indicate that data in its outputs was successfully collected. To do so, a clock signal (clockB) can be used, for synchronous data outputs, and an enable signal (enableB) can be used to enable or disable the generation of new data in the outputs of the Chronos Channel, to fulfil specific data transmission protocol requirements. Furthermore, decoders 121 also generate an output control signal to indicate when they are empty, which means there is no data in the Chronos Channel to be consumed. Note that data in either the inputs or the outputs of a decoder 121 can be digital or analog.
Another important concept in a Chronos Channel is the definition of TX and RX blocks. As
Due to the asynchronous communication between TX and RX blocks 110 and 120, a Chronos Channel can interface transmitters and receivers that operate at different frequencies and with different data bus widths (as the compression ratios can be different in the TX and RX blocks 110 and 120). However, to avoid data loss, it must be ensured that the receiver consumes data as fast as the producer generates new data. To do so, the output throughput must be greater or equal to the input throughput. More specifically, recalling
The usage of controllers coupled to the TX 110 and RX 120 can enable avoiding the requirement of constrained frequencies between transmitter and receiver blocks. Such controllers must be able to implement a communication protocol using the control signals provided by the TX and RX blocks 110 and 120. Note that these signals allow implementing a variety of communication protocols, such as (and not limited to) handshake- or credit-based protocols. The coupling of controllers to a Chronos Channel generates what is called a Chronos Link, and enables leveraging the full flexibility of Chronos Channels. This is because transmitters and receivers connected to Chronos Links can be completely asynchronous to each other and communication may be established by a handshake procedure without any need to perform complex timing closure. An example of such an implementation is given in U.S. Pat. No. 9,977,853, the disclosure of which is incorporated herein by reference in its entirety.
Further examples of the Chronos Chanel are described in U.S. Pat. Nos. 9,977,852 and 9,977,853, the disclosures of which are incorporated herein by reference in their entireties as if set forth in full.
The proposed architecture of the ACC-NoC in
The architecture of
The architecture of
The present application claims the benefit of priority under 35 U.S.C. 119(e) to Provisional Patent Application Ser. No. 63/185,605, entitled “ADVANCED CENTRALIZED CHRONOS NoC”, filed on May 7, 2021, which is incorporated herein by reference as if set forth in full. The present application is also related to U.S. application Ser. No. 15/344,416, filed on Nov. 4, 2016, which granted as U.S. Pat. No. 9,977,852 on May 22, 2018; U.S. application Ser. No. 15/344,420, filed on Nov. 4, 2016, which granted as U.S. Pat. No. 9,977,853 on May 22, 2018; U.S. application Ser. No. 15/344,441, filed on Nov. 4, 2016, which granted as U.S. Pat. No. 10,073,939 on Sep. 11, 2018; U.S. application Ser. No. 15/645,917, filed on Jul. 10, 2017, which granted as U.S. Pat. No. 10,181,939 on Jan. 15, 2019; U.S. application Ser. No. 15/644,696, filed on Jul. 7, 2017, which granted as U.S. Pat. No. 10,331,835 on Jun. 25, 2019; U.S. application Ser. No. 16/053,486, filed on Aug. 2, 2018, which granted as U.S. Pat. No. 10,637,592 on Apr. 28, 2020; U.S. application Ser. No. 16/266,994, filed on Feb. 4, 2019; and U.S. application Ser. No. 16/827,256, filed on Mar. 23, 2020, the disclosures of which are each incorporated by reference in their entirety as if set forth in full.
Number | Date | Country | |
---|---|---|---|
63185605 | May 2021 | US |