Integrated circuit (IC) development, involving design and fabrication, can be complicated and time consuming. The process can be particularly challenging with specialized integrated circuits, such as application-specific integrated circuits (ASICs) or system-on-chip (SoC) devices having many on-chip components such as transistors. There are a variety of approaches that have been employed to design such devices. In some approaches, a standard cell library may be used when designing the IC. The library contains a set of cell structures that may comprise transistors and interconnections between them, in which the cell structures perform specific functions such as a Boolean logic function or a state or storage function. Each cell is pre-characterized, and can be placed and routed at the transistor level. If a function to synthesize is not directly implementable in one cell, a combination of cells may be used to achieve it.
As part of the design process, technology mapping is used to express the Boolean logic functions associated with a netlist as an arrangement of elements selected from the standard cell library. This can be done to achieve an objective such as minimizing the total area or minimizing signal delay. However, the overall process may be challenging when it is not clear which cells should be used to design a circuit. In addition, existing approaches can be inefficient when going from logic gate abstraction to standard cell mapping, both in terms of the number of transistors required as well as the physical size of the circuit (e.g., according to poly pitch or other physical size factor).
Aspects of the technology involve transistor-level synthesis that can achieve significant benefits with integrated circuit design and fabrication. This includes novel optimization algorithms to reduce the literal count in combinational logic such that the circuit area after technology mapping to standard-cells can be improved. This may involve mapping the entire design directly to the transistor level instead of to a set of standard cells. The technical benefits include using fewer transistors than a conventional standard cell approach, a resultant smaller integrated circuit area, as well as reduced power consumption by the integrated circuit.
According to one aspect of the technology, a computer-implemented method to perform transistor-level synthesis for an integrated circuit element, the method comprises: generating, by one or more processors of a computer system, single-stage transistor networks from Boolean functions, wherein each single-stage transistor network is composed of a pulldown network and a pullup network; scaling, by the one or more processors, the single-stage transistor networks to multi-stage transistor networks to globally optimize for factored form literals; and performing, by the one or more processors, technology mapping based on the factored form literals to generate a circuit design.
In an example, generating the single-stage transistor networks includes representing a function to be performed by the integrated circuit element as a sum-of-products (SOP), and finding a factorization that minimizes a number of the factored form literals. Here, finding the factorization may include performing one of algebraic or Boolean factoring. The Boolean factoring may generate a solution represented as an AND-OR graph, in which factored forms are generated for both the function to be performed and a complement of the function to be performed. Alternatively or additionally, finding the factorization may include creating an AND-OR graph for each transistor topology corresponding to the factored form literals.
Alternatively or additionally to the above, generating the single-stage transistor networks may comprise generating an irredundant sum-of-products (ISOP) from a truth table. Alternatively or additionally to the above, scaling the single-stage transistor networks to multi-stage transistor networks to globally optimize for factored form literals may include And-inverter graph (AIG) rewriting for the factored form literals. The AIG rewriting may include replacing a part of a circuit component using one or more precomputed smaller structures that are smaller than the circuit component. The AIG may use size as a cost function to limit a number of AIG nodes.
Alternatively or additionally to the above, scaling the single-stage transistor networks to multi-stage transistor networks to globally optimize for factored form literals may include And-inverter graph (AIG) resubstitution for the factored form literals. Alternatively or additionally to the above, scaling the single-stage transistor networks to multi-stage transistor networks to globally optimize for factored form literals may include performing refactoring. The refactoring may include rewriting maximum fanout-free cones (MFFCs) with a new factored implementation when a number of gates decreases. Alternatively or additionally to the above, scaling the single-stage transistor networks to multi-stage transistor networks to globally optimize for factored form literals may include performing technology mapping driven by the factored form literals.
According to another aspect of the technology, a computing system is provided that comprises memory configured to store integrated circuit information, and one or more processors operatively coupled to the memory. The one or more processors are configured to: generate single-stage transistor networks from Boolean functions, wherein each single-stage transistor network is composed of a pulldown network and a pullup network; scale the single-stage transistor networks to multi-stage transistor networks to globally optimize for factored form literals; and perform technology mapping based on the factored form literals to generate a circuit design. The one or more processors may be further configured to store the circuit design in the memory.
Generation of the single-stage transistor networks may include: representation of a function to be performed by an integrated circuit element as a sum-of-products (SOP); and find a factorization that minimizes a number of the factored form literals. Generation of the single-stage transistor networks may comprise generation of an irredundant sum-of-products (ISOP) from a truth table. The single-stage transistor networks may be scaled to multi-stage transistor networks to globally optimize for factored form literals by performance of And-inverter graph (AIG) rewriting for the factored form literals. The single-stage transistor networks may be scaled to multi-stage transistor networks to globally optimize for factored form literals by performance of And-inverter graph (AIG) resubstitution for the factored form literals. Alternatively or additionally, the single-stage transistor networks may be scaled to multi-stage transistor networks to globally optimize for factored form literals by performance of refactoring.
The process flow continues with performing functional design and logic design at block 106, and performing circuit design at block 108. Functional design may include refinement of the design's specification to achieve the functional behavior of the desired system. Logic design involves adding the design's structure to a behavioral representation of the desired design. Here, considerations include logic minimization, performance enhancement, as well as testability. This stage may consider problems associated with test vector generation, error detection and correction, and the like. By way of example, the functional design and logic design may include generating a behavioral model description (e.g., using HDL) and floor-planning. During circuit design, logic blocks are replaced by corresponding electronic circuits, which may include devices such as transistors. At this stage, circuit simulation may be performed in order to verify timing behavior and other constraints of the system. A Spice tool or other program may be used for circuit simulation.
Once the circuit design is complete, physical design may be performed at block 110 (e.g., component and wiring placement and routing), followed by physical verification and sign-off at block 112 (e.g., to obtain GDSII information with shapes to form the masks used to create the layers for fabricating the integrated circuit). During physical design, the actual layout of the integrated circuit is performed. Here, all of the components are placed and interconnected using metal interconnections. A circuit design that is able to pass testing of a circuit simulator in the circuit design stage may be found to be faulty after it has been packaged, e.g., due to geometric design rule issues. Thus, physical design rules are followed to ensure correctness during chip fabrication. Errors may include short or open circuits, open channels, or other issues may result when physical design rules are not followed. During physical verification and sign-off, the system performs any verification steps that are required before chip manufacturing. This can include design rule checking and correction, timing simulation, electromagnetic simulation, etc.
Layout post-processing occurs at block 114, then fabrication at block 116, and the packaging and testing at block 118. At block 114, the layout post-processing may include geometry processing before actual manufacturing, e.g., any dummy fill insertion, correction for optical proximity, mask optimization, etc. Fabrication comprises semiconductor manufacturing, which includes stages such as lithography patterning (masking), baking or annealing, etching, etc. Then the raw die of the chip is inserted into a package and I/O pins are connected to the package at block 118. Testing of the chip also occurs at this stage.
As shown, in the circuit design phase of block 108, the process may involve technology-independent synthesis at block 120. This step involves transferring the circuit definitions, such as register-transfer-level (RTL) descriptions, into generic data structures such as And-inverter graph (AIG), and optimizing the circuit in terms of nodes and levels. At block 122, technology mapping is performed based on information from a standard cell library 124. This step involves mapping the generic optimized AIG descriptions into real, manufacturable standard cells included in the standard cell library. From this, technology-dependent synthesis is then performed at block 126. This step further optimizes the circuit defined in the gate-level netlist in terms of power, performance and area, using standard-cell-based definitions from block 122.
One example of a system for performing circuit design is shown in
By way of example, the one or more processors may be any conventional processors, such as commercially available central processing units (CPUs), graphical processing units (GPUs) or tensor processing unites (TPUs). Alternatively, the one or more processors may include a dedicated device such as an ASIC or other hardware-based processor. As shown in
The instructions may be any set of instructions to be executed directly (such as machine code) or indirectly (such as scripts) by the processor. For example, the instructions may be stored as computing device code on the computing device-readable medium. In that regard, the terms “instructions” and “programs” may be used interchangeably herein. The instructions may be stored in object code format for direct processing by the processor, or in any other computing device language including scripts or collections of independent source code modules that are interpreted on demand or compiled in advance.
The data may be retrieved, stored or modified by processor in accordance with the instructions. For instance, although the claimed subject matter is not limited by any particular data structure, the data may be stored in computing device registers, in a relational database as a table having a plurality of different fields and records, XML documents or flat files, HDL information, GDSII information, etc. The data may also be formatted in any computing device-readable format.
The computing devices may include all of the components normally used in connection with a computing device such as the processor and memory described above as well as a user interface having one or more user inputs (e.g., one or more of a button, mouse, keyboard, touch screen, gesture input and/or microphone), various electronic displays (e.g., a monitor having a screen or any other electrical device that is operable to display information), and speakers. The computing devices may also include a communication system having one or more wired or wireless connections to facilitate communication with other computing devices of system 200 and/or the fabrication facility 212.
The various computing devices may communicate directly or indirectly via one or more networks, such as network 210. The network 210 and any intervening nodes may include various configurations and protocols including short range communication protocols such as Bluetooth™, Bluetooth LE™, the Internet, World Wide Web, intranets, virtual private networks, wide area networks, local networks, private networks using communication protocols proprietary to one or more companies, Ethernet, WiFi and HTTP, and various combinations of the foregoing. Such communication may be facilitated by any device capable of transmitting data to and from other computing devices, such as modems and wireless interfaces.
In one example, computing device 202 may include one or more server computing devices having a plurality of computing devices, e.g., a load balanced server farm or cloud computing architecture, which exchange information with different nodes of a network for the purpose of receiving, processing, and transmitting the data to and from other computing devices. For instance, computing device 202 may include one or more server computing devices that are capable of communicating with computing devices 204, 206 and the fabrication facility 212 via the network 210. In some examples, client computing device 204 may be an engineering workstation used by a developer to perform circuit design and/or other processes for integrated circuit design and fabrication. Client computing device 206 may also be used by a developer, for instance to prepare system requirements for the integrated circuit or manage the manufacturing process with the fabrication facility 212.
Storage system 208 can be of any type of computerized storage capable of storing information accessible by the server computing devices 202, 204 and/or 206, such as a hard-drive, memory card, ROM, RAM, DVD, CD-ROM, flash drive and/or tape drive. In addition, storage system 208 may include a distributed storage system where data is stored on a plurality of different storage devices which may be physically located at the same or different geographic locations. Storage system 208 may be connected to the computing devices via the network 210 as shown in
Storage system 208 may store various types of information. For instance, the storage system 208 may store a standard cell library, transistor-level netlists. It may also maintain functions for logic optimization and transistor-level synthesis, as well as for performing technology mapping and other processes described herein.
Factored form is a powerful multi-level representation of a Boolean function that readily translates into an implementation of the function in CMOS technology. In particular, the number of literals of a factored form correlates well with the number of transistors in the CMOS implementation. Aspects of the technology involve developing novel methods for minimizing the total number of factored form literals needed to represent combinational logic given as an and-inverter graph (AIG). The methods lead to reduced literal counts, compared to the traditional methods focusing on minimizing the number of AIG nodes. Experiments show that applying these methods helps reduce area after technology mapping by an additional 2.6% on average. Deploying these methods as part of an industrial standard-cell design flow may be able to dramatically reduce design costs and power consumption. Additionally, this work enables efficient transistor-level synthesis with application in design automation.
Logic representations are the key to represent the functionality in EDA tools. They are fundamental for efficiently storing data in memory and running optimization algorithms on the logic. Sum-of-products (SOP) is a basic representation of Boolean logic. A powerful extension of SOPs to a multi-level representation is the factored form. An example is shown below:
A multi-level circuit may be represented as a directed acyclic graph (DAG) where each node is a gate, or a primary input, or a primary output. Multi-level circuits represent the majority of the circuits in ASICs and FPGAs and tend to be smaller, more power efficient, and faster compared to the two-level counterpart. Functionality in DAGs is usually expressed using few primitives such that DAGs are easy to manipulate and have small memory footprint. The most common multi-level representation is the AIG, where nodes act as two-input ANDs.
Logic optimization is a key element that enables designing efficient circuits. The most common and powerful optimization algorithms perform resubstitution, rewriting, refactoring, and balancing. Optimization scripts may comprise a combination of those algorithms.
One application of AIGs in synthesis is the representation of DAGs coming from Boolean decomposition and factored forms. In particular, factored forms can be represented as syntax trees where nodes are AND or OR operations, and leaves are literals (variables complemented or not complemented). Thus, factored forms can be directly represented by an AIG by translating ANDs to ANDs, and ORs to ANDs using De Morgan's law.
As an example,
Factored form optimization methods for literals reduction and results after technology mapping have been explored. The following Boolean resynthesis script was created, called compress2ff, which comprises the following commands:
For technology independent synthesis, this approach was compared to compress2rs which is a well-known default script in ABC for size optimization. For technology mapping, the best area-oriented mapper in ABC was used, called amap. Since the mapper supports AIGs with maximum 4000 logic levels, the approach employs the mapper nf for area using the command &nf −R 1000 for designs with higher depths. As a technology library, it was mapped to a 3 nm node technology. Table I in
The literal count in factored forms is a known proxy for transistor count in CMOS transistor networks. Transistor count is a fundamental measure that strongly correlates with area. Even if transistor count alone does not capture other important factors affecting area and power such as transistor ordering, placement, and routing, it is one of the best indicators. In particular, factored forms describe the serial-parallel connection of transistors. A serial connection is described by an AND operator, while a connection in parallel is described by an OR operator. This relation allows a developer to generate a CMOS transistor networks from factored forms. Since the pulldown and pullup networks in CMOS are complementary, two factored forms are needed, one the dual of the other.
Since an AIG could contain many factored forms, it naturally describes the connection of transistors in a multi-stage network. Using this relation, one can extract a transistor-level network after mapping each factored form into CMOS using technology mapping, the natural translation of factored forms, or other methods. This property opens up to transistor-level synthesis offering flexibility in functionality, not restricted by standard cell libraries, and compact layout thanks to transistor placement opportunities. Hence, factored form literal optimization plays an important role in reshaping combinational circuits modeled as AIGs to minimize globally the number of transistors.
Generating Single-Stage Transistor Networks from Boolean Functions
According to one aspect of the technology, given a Boolean function, the goal is to find a transistor level netlist that implements the function. The method supports CMOS using only parallel-series connections of transistors. Serial connections can be interpreted as ANDs while parallel connections can be interpreted as ORs.
One stage transistor-level networks are composed of two main blocks called pulldown and pullup networks. The former one is connected between VDD and the output, is composed of PMOS transistors, and is responsible for bringing an high (‘1’) state to the output. Conversely, the latter is connected between the output and VDD, is composed of NMOS transistors, and is responsible for bringing a low (‘0’) state to the output. The function of the pulldown and pullup network are designed such that when one network behaves as a short circuit, the other behaves as an open circuit. This relation is called duality, i.e., one function can be derived from the other by negating inputs and outputs.
This method starts by representing the function as a sum-of-products (SOP). A SOP contains a disjunction (OR) of terms (AND of literals). Basically, an SOP can be directly translated into a transistor-level network by transforming ANDs into serial connections and ORs into parallel connections and each literal is a transistor. To reduce the number of transistors, it is important to find common expressions that can be shared. At this point, a goal is to find a factorization that minimizes the number of literals.
To find a factored form there are mainly two methods: algebraic and Boolean. Algebraic methods are known to be fast but they cannot utilize some Boolean properties of the algebra. Boolean, instead, can exploit those opportunities at the cost of run time. Aspects of the technology implement both algebraic and Boolean algorithms, referred to herein as sop_factoring and transistor_graph.
Transistor_graph is a module that generates a single-stage transistor level network starting from a truth table. The algorithm takes a truth table, it generates an irredundant sum-of-products (ISOP) and factors it. The factored solution is represented as a AND-OR graph, where nodes can be ANDs or ORs and negations are allowed only for inputs and outputs. Factored forms are generated for both the target function and its complement. One will be assigned to the pullup network and one to the pulldown network. In one example, the algorithm for transistor_graph works as follow:
In one scenario, a Spice writer (or other analog electronic circuit simulator) can then take the generated ANDOR graph and dump it in a .spi file or equivalent file. Moreover, additional routines may be used to report statistics of the transistor level network and for validation.
Another method can be used to generate all the transistor topologies that depend on how transistors in series are connected. The method creates one ANDOR graph for each one of the configurations.
The following is an example command or other function to implement such features:
Generating Multi-Stage Transistor Networks from Boolean Functions
The intuition behind a method used to scale the single-stage transistor network to multi-stage comes from noticing that realizing a multi-stage transistor network is equivalent to mapping a circuit into single-stage networks which are connected together. This problem is very well connected to logic synthesis. It involves logic synthesis and technology mapping.
The network is initially described as an AIG. According to one aspect, it is beneficial to optimize the network such that the result after mapping has fewer transistors as possible. The main concept here is to globally optimize for factored literals. It is important to realize that AIGs contain factored forms. Factored forms are composed by AND and OR nodes that have single fanout and no complementation. AIGs contain factored forms for all the logic cones that do not have multiple fanouts. If an additional fanout is present, that node must be considered as a new literal. Complementations can be partially ignored since they can be redistributed to literals nodes by the use of DeMorgan's law. Given an AIG, once can measure the number of factored form (FF) literals of the structure by: adding the internal fanout count of all the PIs (which are literals), adding the internal fanout count of all the nodes that have internal fanout count greater than one, and adding one for each remaining node (not counted before) that is a PO.
Factored form (FF) literals are well correlated with the number of gates in an AIG which is the most used cost function for logic synthesis nowadays. Nevertheless, to tackle this problem at the root, an optimization script can be used that optimizes specifically for FF literals. This involves: AIG rewriting for FF literals, AIG resubstitution for FF literals, refactoring, and technology mapping driven by FF literals. These aspects are discussed below.
AIG rewriting is a DAG-aware optimization method that aims at minimizing the number of AND nodes by replacing small parts of the circuit using precomputed smaller structures. The advantage of being DAG-aware is to be able to reuse existing logic and to exploit structural hashing. AIG rewriting has been implemented to consider FF literals minimization as a new cost function rather than the size. To help to limit the network from increasing the number of AIG nodes and having a poor shape for other following optimization steps, size may be used as a second cost function, i.e., if literals cannot be improved, better size is accepted.
AIG resubstitution aims at minimizing the number of AND gates in an AIG by trying to replace some nodes by fewer ones starting from some divisors. The advantage of this method is to be able to exploit local don't cares during the optimization process. To speed up the algorithm, the divisors can be collected in a window around the node to replace. The method evaluates the resubstitution for one node at the time. The gain for a single node resub is evaluated by considering the number of nodes to remove if the resub is accepted, i.e., the nodes in the maximum fanout-free cone (MFFC). The new structure is built by combining the divisors until the right functionality is achieved and the number of gates generated is lower than the ones in the MFFC. The new version of the algorithm works similarly but considering the gain by counting the number of FF literals before and after the resub.
For refactoring, maximum fanout-free cones (MFFCs) are rewritten with a new factored implementation if the number of gates decreases.
Optimization flow, according to one scenario, utilizes a script where each command optimizes according to the literal cost. The script may include the following:
A mapper, such as may be implemented at block 122 of
The following is an example approach for generating a Spice (or equivalent) netlist using a specified command. In one example, the command requires a library that is used to evaluate the functions' cost in terms of number of transistors. The command can be interfaced with ABC to perform the mapping.
In which an example usage of the command is: flex_map −mv −w res.spi 6_3_fin.spi design.v
This aspect of the technology is used to enumerate various transistor networks given a few fixed topologies. Each segment in the topologies represents a transistor, each vertex is wire connecting two or more transistors. Example topologies are the ones shown in
An algorithm to enumerate the transistor networks works as follows. Given a topology composed by m edges, assign each edge to a literal or a constant connection. A constant zero assignment represents a disconnected edge. A constant one assignment represents a wire connection with no transistors. Literals representing variables in the non-complemented and complemented polarity are used to find binate functions. Given n variables, 2+2*n literals are created. The enumeration problem involves assigning all the combinations of the literals to the m edges: (2+2*n)m combinations. For each combination, the functionality is extracted by simulating the network. Since the network is fixed, the function can be obtained in O(1) time using truth tables (represented on a 64-bit unsigned integer). In one example, the algorithm deals with up to 6 variables and 9 edges for a total of ˜2*1011 combinations. This enumeration problem can be solved in a few minutes using existing computing resources.
For each combination, a cost based on the number of transistors is computed. That corresponds to the number of non-zero/one literals used in the topology. Each topology-function pair with the minimum cost is added to a hash map. At the end, the found functions and cost (and their corresponding dual) are written in a genlib library file. The cost associated with each gate in the library need not consider input and output inverters. However, the method may add inverter costs, filter and clean the library. Alternatively, the approach may only reduce the size of the library by including only the gates that are representative in their P-class (set of functions that are reachable by permuting the inputs).
An optimization can prune a considerable part of the search space by taking into account that one need not consider literals of negative polarity if the corresponding variable is not used in the positive polarity. This is because once can normalize the space to enumerate just one topology in a N-class (set of functions that are reachable by applying input negations). The other missing functions with the same topology can be found by enumerating input negations. Nevertheless, to construct a transistor library, it is not necessary to consider input negations. This trick translates into approximately a 10× speedup in compute time. The following is an example command:
Although the technology herein has been described with reference to particular embodiments and configurations, it is to be understood that these embodiments and configurations are merely illustrative of the principles and applications of the present technology. It is therefore to be understood that numerous modifications may be made to the illustrative embodiments and configurations, and that other arrangements may be devised without departing from the spirit and scope of the present technology as defined by the appended claims.
This application claims the benefit of the filing date and priority to U.S. Provisional Patent Application No. 63/426,935, filed Nov. 21, 2022, the entire disclosure of which is incorporated herein by reference.
| Number | Date | Country | |
|---|---|---|---|
| 63426935 | Nov 2022 | US |