The present technology is in the field of electronic system design and, more specifically, related to physical implementation guidance during generation of a network-on-chip (NoC).
Multiprocessor systems have been implemented in systems-on-chips (SoCs) that communicate through NoCs. The SoCs include instances of initiator intellectual properties (IPs) and target IPs. Transactions, in the form of packets, are sent from an initiator to one or more targets using industry-standard protocols. The initiator, connected to the NoC, sends a request transaction to a target, using an address to select the target. The NoC decodes the address and transports the request from the initiator to the target. The target handles the transaction and sends a response transaction, which is transported back by the NoC to the initiator.
For a given set of performance requirements, such as connectivity and latency between source and destination, frequency of the various elements, maximum area available for the NoC logic and its associated routing (wiring), minimum throughput between sources and destinations, power consumption requirements for the NoC, and position on the floorplan of elements attached to the NoC, it is a complex task to create an optimal NoC that fulfills all the requirements with a minimum amount of logic and wires. This is typically the job of the chip architect or chip designer to create this optimal NoC, and this is a difficult and time-consuming task. In addition to this being a difficult task, the design of the NoC is revised every time one of the requirements changes, such as modifications of the chip floorplan, addition or deletion of IP components, or modification of the expected performance. As a result, this task needs to be redone frequently over the design time of the chip. This process is time consuming, which results in production delays. Therefore, what is needed is a system and method to efficiently generate a NoC from a set of constraints, which are listed as requirements, and a set of inputs. The system needs to produce the NoC with all its elements placed on a floorplan of a chip.
A current technical problem with creating a NoC is that physical floorplan information is not available. For example, if two components that are logically plugged into the network in an adjacent configuration, or logically next to each other, they may go through a single switch to implement communication between the two components. However, on the physical floorplan, the two components may be far away from each other. As a result, it may be difficult and/or even impossible to meet the timing requirements once the NoC is being implemented in the body of the actual floor plan. The timing requirements are passed to a downstream tool. Timing requirements set forth parameters, for example, between Component A and Component B. For example, the timing between Component A and Component B must be a value within a range. If the value measured from Component A to Component B exceeds the value range of the timing requirements, when Component A is too far away from Component B, as it is currently known for the downstream tools to be overworked when working longer than normal times in an attempt to satisfy a time requirement that is essentially not satisfiable. Thus, it would be more desirable to create a NoC that the downstream tools can successfully implement while satisfying all the timing, power, area, and congestion design constraints.
In accordance with various embodiments and aspects of the invention, a process for generating implementation guidance during the synthesis of a NoC is described. Systems and methods are disclosed that generate a NoC using constraints and steps with inputs to produce or generate the NoC with all its elements. The elements of the NoC are placed on a floorplan of a chip. An advantage of the invention is simplification of the design process and the work of the chip architect or designer. Further, technical solutions that provide a technical effect enabling creating a NoC in a manner that by virtue of having an indication of the physical floorplan, it is easier to automate a mechanism such as including, but not limited to, the addition of a distance pipe component, as compared to current techniques.
In an embodiment, creating a synthesized NoC is configured to: augment the NoC with information to ensure that as it is implemented in the downstream tools and that it meets the performance criteria that was supplied in the specification for the NoC; and to provide an indication of the floorplan so that timing estimation may be implemented to determine the how far apart a first component is from a second component. The timing estimation may provide data indicating that, for example, the first component of a NoC is too far apart from the second component of a NoC and exceeds the timing requirement. As a result of the exceeding of the timing requirement, at least one distance pipeline or link may be inserted into the floorplan. An addition of a distance pipe into the floorplan solves many of the problems associated with two components of the NoC exceeding the timing requirement. However, the addition of a distance pipe to the floorplan of a NoC contributes to added latency in that it adds a clock cycle for data to traverse through the distance pipe. It is more desirable to simultaneously add a distance pipe and create NoC topology that does not require distance pipes as much as possible. As a result, the overall performance, bandwidth and latency may be optimized given the constraints of a floorplan.
The following describes various examples of the present technology that illustrate various aspects and embodiments of the invention. Generally, examples can use the described aspects in any combination. All statements herein reciting principles, aspects, and embodiments as well as specific examples thereof, are intended to encompass both structural and functional equivalents thereof. Additionally, it is intended that such equivalents include both currently known equivalents and equivalents developed in the future, i.e., any elements developed that perform the same function, regardless of structure.
It is noted that, as used herein, the singular forms “a,” “an” and “the” include plural referents unless the context clearly dictates otherwise. Reference throughout this specification to “one aspect,” “an aspect,” “certain aspects,” “various aspects,” or similar language means that a particular aspect, feature, structure, or characteristic described in connection with any embodiment is included in at least one embodiment of the invention.
Appearances of the phrases “in one embodiment,” “in at least one embodiment,” “in an embodiment,” “in certain embodiments,” and similar language throughout this specification may, but do not necessarily, all refer to the same embodiment or similar embodiments. Furthermore, aspects and embodiments of the invention described herein are merely exemplary, and should not be construed as limiting the scope or spirit of the invention as appreciated by those of ordinary skill in the art. The disclosed invention is effectively made or used in any embodiment that includes any novel aspect described herein. All statements herein reciting principles, aspects, and embodiments of the invention are intended to encompass both structural and functional equivalents thereof. It is intended that such equivalents include both currently known equivalents and equivalents developed in the future.
As used herein, a transaction may be a request transaction or a response transaction. Examples of request transactions include write request and read request.
As used herein, a node is defined as a distribution point or a communication endpoint that is capable of creating, receiving, and/or transmitting information over a communication path or channel. A node may refer to any one of the following: switches, splitters, mergers, buffers, and adapters. As used herein, splitters and mergers are switches; not all switches are splitters or mergers. As used herein and in accordance with the various aspects and embodiments of the invention, the term “splitter” describes a switch that has a single ingress port and multiple egress ports. As used herein and in accordance with the various aspects and embodiments of the invention, the term “merger” describes a switch that has a single egress port and multiple ingress ports.
Referring now to
Referring now to
Referring now to
Referring now to
Referring again to
In accordance with the various aspects of the invention, input 251 includes input about the global consolidation roadmap. The global consolidation roadmap includes a consolidation model that captures the global physical view of the connectivity of the floorplan's free space, as well as the connectivity across/between the initiators and targets. The global consolidation roadmap is modeled by a graph of physical nodes and canonical segments that are used to position the nodes (splitters, mergers, switches, adapters) of the network under construction. The global consolidation roadmap is used to fasten computation. In accordance with various aspects of the invention, the global consolidation roadmap is persistent, which means that it is data the system exports and re-consumes in incremental synthesis and subsequent runs.
In accordance with some aspects of the invention, input 259 includes information about edge clustering. Edge clustering aims to minimize resources and enhance performance goals through proper algorithms and techniques. In accordance with some aspects of the invention, edge clustering is applied in conjunction and in cooperation with input 260, node clustering. Edge clustering and node clustering can be used in combination by mixing, by being applied concurrently, or by being applied in sequence. The advantage and goal is to expand the spectrum of synthesis and span a larger solution space for the network.
In accordance with various aspects of the invention, input 262 includes information about restructuring. Re-structuring includes a variety of transformations and capabilities. In accordance with some aspects of the invention, the transformations are logical in that there is a change in structure of the network. In accordance with some aspects of the invention, the transformation is physical because there is a physical change in the network, such as moving a node to a new location. Other examples of restructuring include: breaking a node into smaller nodes; reparenting between nodes; network sub-part duplication to avoid deadlocks and to deal with congestion; and physically re-routing links to avoid congestion areas or to meet timing constraints.
In accordance with the various aspects of the invention, another constraint includes extension of the clock domain and power domain 212 can also be provided. The domain 212 includes areas of the chip where logic belonging to a particular domain is allowed to be placed.
Referring now to
In accordance with the various aspects of the invention, capabilities of the logic library, which will be used to implement the NoC, are provided. The information includes the size of a reference logic gate, and the time it takes for a signal to cover a 1 mm distance.
Referring again to
In accordance with the various aspects of the invention, initiators and targets are communicatively connected to the NoC. An initiator is a unit that sends requests. An initiator typically is configured to read and write commands. A target is a unit that serves or responds to the requests. A target typically is configured to read and write commands. Each initiator is attached to or connected to the NoC through a NIU. The NIU that is attached to an initiator is called an Initiator Network Interface Unit (INIU). Further, each target is attached to the NoC through an NIU. The NIU that is attached to a target is called a Target Network Interface Unit (TNIU). The primary functionality of the NoC is to carry each request from an initiator to the desired destination target, and if the request demands or needs a response, then the NoC carries each target's response to the corresponding requesting initiator. Initiators and targets have many different parameters that characterize them. In accordance with the various aspects of the invention, for each initiator and target, the clock domain and power domain they belong to are defined. The width of the data bus they use to send, write, and receive read payloads is a number of bits. In accordance with the various aspects of the invention, the width of the data bus for the connection (the communication path to/from a target) used to send, write requests, and receive write responses are also defined. Furthermore, the clock and power domain definition are a reference to the previously described clock and power domains existing in the SoC, as described herein.
Continuing with
In accordance with the various aspects of the invention, initiators are not required to be able to send requests to all targets or targets that are connected to the NoC. The precise definition of the target that can receive requests from an initiator is outlined or set forth in the connectivity table, such as table 400. The connectivity and traffic class labeling information can be represented as an explicit or conceptual matrix. Each initiator has a row and each target has a column. If an initiator must be able to send traffic to a target, a traffic class label must be present at the intersection between the initiator row and the target column. If no label is present at an intersection, then the tool does not need connectivity between that initiator and that target. For example, initiator 1 (M1) is connectively communicating with target 1 (S1) using a defined label 1 (L1) while M1 does not communicate with S2 and hence there is no label in the intersection of M1 and S2. In accordance with the various aspects of the invention, the actual format used to represent connectivity can be different, as long as each pair of initiator-target combination has a precise definition of its traffic class, or no classification label if there is no connection. It is within the scope of this invention for an initiator/target connection to support a plurality of traffic classes.
Referring now to
A scenario can be represented as 2 matrixes, one defining read throughputs and one defining write throughputs. In accordance with the various aspects of the invention, read throughput requirements will be used to size the response network, which handles data returning from targets back to the initiator. Write throughput requirements will be used to size the request network, which is data going from initiator to target, in accordance with the various aspects of the invention. An example, in accordance with the various aspects of the invention, of the throughput requirements for the various scenarios is shown in table 500. The actual format used to represent a scenario can be different, as long as each pair of (initiator, target) has a precise definition of its minimum required throughput for read and for write. In table 500, read transaction from M1 to S1 has a minimum performance throughput of 100 MB/s. In table 500, a write transaction from M1 to S1 has a minimum throughput of 50 MB/s.
It is within the scope of this invention for latency to refer to the number of clock cycles it takes for data to make its way through the network. Latency causes problems in transport networks despite having a high-bandwidth (frequency). An example of a “real-time traffic class” would be video data from a camera in a self-driving car. It isn't acceptable to have long latency in the propagation of the data. If some data were lost it would make the vehicle unsafe. A still real-time but less vital traffic class would be for audio and/or video in the entertainment system. It would not be desirable to have a gap in that data, but not unsafe. An example of non-real-time data would be data from a gas gauge sensor in a car. If the data is delayed for several seconds, it is of no matter since the rate of change of the data is quite slow compared to the operating speed of an SoC.
In accordance with some aspects of the invention, scenarios are not defined for the tool, in which case the tool optimizes the NoC synthesis process for physical cost, such as lowest gate cost and/or lowest wire cost.
Referring now to
The data width of each switch, and the clock domain it belongs to, is computed using the data width of each attached interface, and their clock domain, as inputs to the tool. In accordance with the various aspects of the invention, each step that transforms the network, which is part of the NoC, also performs the computation of the data width and the clock domain of the newly created network elements.
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
In accordance with the various aspects of the invention, the tool transforms the network in order to reduce the number of wires used between switches achievable, while keeping the performance as defined in the scenarios, which are a set of required minimum throughput between initiator and target. In accordance with the various aspects of the invention switches are clustered for performance aware switching, mergers and splitters that have been distributed on the roadmaps are treated like ordinary switches.
In accordance with an aspect of the invention, the tool uses a process that is iterative and will merge switches under the condition that performances are still met, until no further switch merge can occur. The tool uses a process that is described as follows:
In accordance with various aspects of the invention, it is possible for the process to ensure the switches do not grow above a certain size (maximum number of ingress ports, maximum number of egress ports). If a combined switch is above the set threshold, then the merge is prevented.
Referring now to
Referring again to
Continuing with
In accordance with other aspects of the invention, extension of clock and power domains on the floorplan are provided and each element is tested to ensure it is located within the bounds of the specified clock and power domain. If the test fails, the element is moved until a suitable location is found where the test is passing. Once a suitable placement has been found for each element, a routing is done of each connection between elements. The routing process will find a suitable path for the set of wires making the connections between elements. After routing is done, distance-spanning pipeline elements are inserted on the links if required, using the information provided regarding the capabilities of the technology, based on how long it takes for a signal to cover a 1 mm distance.
In accordance with some aspects and embodiments of the invention, the tool generates one or more computer files describing the generated NoC that includes:
In accordance with various aspects of the invention, the tool is used to generate metrics about the generated NoC, such as: histograms of wire length distribution, number of switches, histogram of switch by size.
In accordance with another aspect of the invention, the tool automatically inserts in the network various adapters and buffers. The tool inserts the adapters based on the adaptation required between two elements that have different data width, different clock and power domains. The tool inserts the buffers based on the scenarios and the detected rate mismatch.
In accordance with some aspects and embodiments, the tool can be used to ensure multiple iterations of the synthesis are done for incremental optimization of the NoC, which includes a situation when one constraint provided to the tool is information about the previous run.
After execution of the synthesis process by the software, the results are produced in a machine-readable form, such as computer files using a well-defined format to capture information. An example of such a format is XML, another example of such a format is JSON. The scope of the invention is not limited by the specific format.
Referring again to
It is more desirable to guide the physical implementation tool to place NoC topology in a new way that is similar to the topology synthesis result, while still enabling the physical implementation tool enough flexibility to move logic elements as needed. Rather than enforcing exact positions for switches, the routes traverse a set of floor plan regions or checkpoints.
In an embodiment, a mechanism for accommodating timing realization between a first component and a second component within a NoC, includes, but is not limited to, distance pipe insertion, adding buffer stages, switching buffers to higher drive or faster buffers, adjustments to wire widths, and/or indication to which wire layer(s) are being used.
It is within the scope of this invention for different mechanisms to be physically implemented to overcome timing issues within a synthesized topology, resolving timing issues prior to their occurrence by communicating constraints and/or guidance for the physical implementation. It is within the scope of this invention for physical implementation of a connection to include, but not be limited to, a physical placement of an object, a component and/or a gate around the NoC topology so that a downstream tool does not place a first connection at a great enough distance away from a second connection, so as to exceed a time restraint. It is more desirable to synthesize a network for all of the connections capable of satisfying connectivity requirements and to minimize the number of gates in the synthesized network. It is important to minimize the number of gates in the synthesized network for including, but not limited to, enabling the NoC to fit within the topology parameters; ensuring the timing requirement can be met; building more efficiently in smaller networks; and minimizing the overall power used in the interconnect between signals.
In another embodiment, components of the driving side are replaced with components having greater drive strength to increase speed. For example, when a signal is forwarded from a first switch to a second switch, the timing requirement is exceeded because the distance between the first switch and the second switch is too great, a distance pipe may be placed in between the first switch and the second switch or the drive strength may be increased to make the signal faster by using larger wires having a lower resistance to reduce the parasitic elements on the particular line.
In an embodiment, a floorplan region is a mechanism capable of including, but not limited to; constraining a component, such as a switch and/or a distance pipe stage to a physical location and/or an approximate physical location; and/or dividing a logical interconnect into a plurality of portions that are tuned to the size of the downstream tools for proper synthesis. The physical and logical configuration of the interconnects are configured together to form a logical interconnect. The floorplan regions are used to constrain a component to a particular location. The need for a floorplan region is a response to the increasing size of the network or interconnect of SoCs that are being developed. In other words, the network or interconnect of SoCs have grown larger than downstream implementation tools capacity including, but not limited to, the addition of wider interconnects and/or more components being interconnected.
In a first aspect, it may be more desirable to have a switch in a particular constrained location, based on the data associated with signals being received and transmitted by the switch, because the performance of the switch may be adversely affected if the switch was moved to a location out of the constrained location.
In a second aspect, an additional constraint may be utilized to place, for example, gates of a component for a switch or a pipe stage, as close together as possible within each individual component. A current problem in existing placement programs is their tendency to place components too far away from each other. It is within the scope of this invention for a switch being a single component having a plurality of gates, with each gate having a need to be placed. Although the netlist facilitates the interconnection of these components, the downstream tools physically place the components and make the routes to connect the logical connectivity specified.
In a third aspect, the two separate constraints of the first aspect and the second aspect may be used simultaneously. For example, if a switch that is specified to have a particular performance has the components of that switch distributed over a larger area than specified, the switch will have a lower performance than originally specified. Thus, it would be more desirable to keep components of this switch close together, to each other. Further, based on a larger network, it is more desirable to keep components within a particular region or area of the floorplan free space so that the overall network will perform as specified or analyzed.
In an embodiment, approximations in a given semiconductor process, a maximum length of a single wire is established to ensure that the overall timing of that particular path can be met. Further, it is an important aspect to move from a first component to a second component within a given amount of time. Due to an excessive length between a first component and a second component it may not be achievable to move a signal from the first component to the second component within the given amount of time. As a result, the need for the technical solution of an insertion of distance pipe stages would solve this technical problem, as compared to current techniques.
In an embodiment, receiving an area-estimation for the first component and the second component, and balancing the timing and other performance requirements in the optimization of the NoC so as to ensure the resulting NoC gates will be placeable, that is, they will fit into the free space in the floorplan allotted to the NoC.
In an embodiment, further receiving a power requirement for the NoC, and balancing the timing and other performance requirements in the optimization of the NoC so as to ensure the resulting NoC will not exceed the power requirement specified in the constraint.
In an embodiment, further incorporating physical routing congestion data, either as an estimation based on the NoC netlist structure, or obtained as feedback from the downstream implementation tools, and balancing the timing and other performance requirements in the optimization of the NoC so as to ensure the resulting NoC will be routable in the free space in the floorplan allotted to the NoC.
In accordance with some aspects and embodiments of the invention, a method is disclosed for guiding physical generation of a NoC from a synthesized representation. The includes receiving, at a tool, at least one constraint parameter for the NoC, the at least one constraint parameter is selected from a group of constraint parameters including at least one physical constraint and at least one performance constraint; augmenting, using the tool, a physical floorplan for the NoC with information that guides a physical implementation of the NoC; and constraining, using the physical floorplan, the physical implementation of a connection to a location on the physical floorplan based on the at least one constraint parameter.
In accordance with some aspects and embodiments of the invention, a method is disclosed wherein a gate of the connection is oriented, using the tool, at a shortest routing distance from the connection compared to a listing of routing distances between the gate and the connection detected by the tool.
In accordance with some aspects and embodiments of the invention, a method is disclosed that includes dividing, using the tool, a logical interconnect into a plurality of portions, whereby, the plurality of portions of the logical interconnect are calibrated to a size of the tool.
In accordance with some aspects and embodiments of the invention, a method is disclosed for guiding a physical implementation of a synthesized topology of a NoC. The method includes receiving, at a tool, at least one timing requirement for the NoC; receiving, at the tool, at least one performance constraint for the NoC; augmenting, using the tool, a physical floorplan for the NoC with information to guide the physical implementation; implementing a timing estimation configured to determine if a length between a first component and a second component placed on the physical floorplan exceeds the at least one timing requirement; and inserting, in response to exceeding the at least one timing requirement, at least one link in the physical floorplan.
In accordance with some aspects and embodiments of the invention, a method is disclosed, wherein the at least one link is inserted using a wire delay technology-specific parameter. In accordance with some aspects and embodiments of the invention, a method is disclosed further comprising creating, at the tool, at least one module region, wherein the at least one link of the NoC is assigned to the at least one module region.
In accordance with some aspects and embodiments of the invention, a method is disclosed that includes receiving an area-estimation for the first component and the second component.
In accordance with some aspects and embodiments of the invention, a method is disclosed that includes performing a balance to the at least one timing requirement in an optimization of the NoC to identify that at least one gate of the NoC can be placeable into at least a portion of the physical floorplan.
In accordance with some aspects and embodiments of the invention, a method is disclosed that includes performing a balance to the at least one performance constraint in an optimization of the NoC to identify that at least one gate of the NoC can be placeable into at least a portion of the physical floorplan.
In accordance with some aspects and embodiments of the invention, a method is disclosed that includes receiving a power requirement for the NoC.
In accordance with some aspects and embodiments of the invention, a method is disclosed that includes performing a balance to the at least one timing requirement in an optimization of the NoC to identify that the NoC will not exceed the power requirement specified in the at least one timing requirement.
In accordance with some aspects and embodiments of the invention, a method is disclosed that includes performing a balance to the at least one performance constraint in an optimization of the NoC to identify that the NoC will not exceed the power requirement specified in the at least one performance constraint. In accordance with some aspects and embodiments of the invention, a method is disclosed that includes providing physical routing congestion data.
In accordance with some aspects and embodiments of the invention, a method is disclosed that includes performing a balance to the at least one timing requirement in an optimization of the NoC to identify that the NoC can be routable into at least a portion of the physical floorplan.
In accordance with some aspects and embodiments of the invention, a method is disclosed that includes performing a balance to the at least one performance constraint in an optimization of the NoC to identify that at least one gate of the NoC can be routable into at least a portion of the physical floorplan.
Certain methods according to the various aspects of the invention may be performed by instructions that are stored upon a non-transitory computer readable medium. The non-transitory computer readable medium stores code including instructions that, if executed by one or more processors, would cause a system or computer to perform steps of the method described herein. The non-transitory computer readable medium includes: a rotating magnetic disk, a rotating optical disk, a flash random access memory (RAM) chip, and other mechanically moving or solid-state storage media. Any type of computer-readable medium is appropriate for storing code comprising instructions according to various examples.
Certain examples have been described herein and it will be noted that different combinations of different components from different examples may be possible. Salient features are presented to better explain examples; however, it is clear that certain features may be added, modified and/or omitted without modifying the functional aspects of these examples as described.
Various examples are methods that use the behavior of either or a combination of machines. Method examples are complete wherever in the world most constituent steps occur. For example and in accordance with the various aspects and embodiments of the invention, IP elements or units include: processors (e.g., CPUs or GPUs), RAM—e.g., off-chip dynamic RAM or DRAM, a network interface for wired or wireless connections such as ethernet, Wi-Fi, 3G, 4G long-term evolution (LTE), 5G, and other wireless interface standard radios. The IP may also include various I/O interface devices, as needed for different peripheral devices such as touch screen sensors, geolocation receivers, microphones, speakers, Bluetooth peripherals, and USB devices, such as keyboards and mice, among others. By executing instructions stored in RAM devices processors perform steps of methods as described herein.
Some examples are one or more non-transitory computer readable media arranged to store such instructions for methods described herein. Whatever machine holds non-transitory computer readable media comprising any of the necessary code may implement an example. Some examples may be implemented as: physical devices such as semiconductor chips; hardware description language representations of the logical or functional behavior of such devices; and one or more non-transitory computer readable media arranged to store such hardware description language representations. Descriptions herein reciting principles, aspects, and embodiments encompass both structural and functional equivalents thereof. Elements described herein as coupled have an effectual relationship realizable by a direct connection or indirectly with one or more other intervening elements.
Practitioners skilled in the art will recognize many modifications and variations. The modifications and variations include any relevant combination of the disclosed features. Descriptions herein reciting principles, aspects, and embodiments encompass both structural and functional equivalents thereof. Elements described herein as “coupled” or “communicatively coupled” have an effectual relationship realizable by a direct connection or indirect connection, which uses one or more other intervening elements. Embodiments described herein as “communicating” or “in communication with” another device, module, or elements include any form of communication or link and include an effectual relationship. For example, a communication link may be established using a wired connection, wireless protocols, near-field protocols, or RFID.
To the extent that the terms “including”, “includes”, “having”, “has”, “with”, or variants thereof are used in either the detailed description and the claims, such terms are intended to be inclusive in a similar manner to the term “comprising.”
The scope of the invention, therefore, is not intended to be limited to the exemplary embodiments shown and described herein. Rather, the scope and spirit of present invention is embodied by the appended claims.