The present disclosure relates generally to programmable logic devices. More particular, the present disclosure relates to improving resource utilization for heterogeneous field programmable gate arrays (FPGAs).
This section is intended to introduce the reader to various aspects of art that may be related to various aspects of the present disclosure, which are described and/or claimed below. This discussion is believed to be helpful in providing the reader with background information to facilitate a better understanding of the various aspects of the present disclosure. Accordingly, it may be understood that these statements are to be read in this light, and not as admissions of prior art.
Programmable logic devices, a class of integrated circuits, may be programmed to perform a wide variety of operations. Some programmable devices (e.g., FPGAs) may include a heterogeneous architecture, and as such may include a heterogeneous combination of logic elements such as lookup tables (LUTs) and additional logic elements, such as AND-inverter cones, or any other appropriate logic element. A challenge that may arise in heterogeneous FPGA architectures is that combinational logic in a user design may be mapped to each of the different types of logic element. However, each chip may include limited numbers of each type of available logic element. In some instances, the combinational logic may be mapped to the different types of logic elements in an inefficient or suboptimal manner, which may result in excess die-area consumed, lower maximum clock frequency (fmax), and increased wire length.
Various aspects of this disclosure may be better understood upon reading the following detailed description and upon reference to the drawings in which:
One or more specific embodiments will be described below. In an effort to provide a concise description of these embodiments, not all features of an actual implementation are described in the specification. It should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another. Moreover, it should be appreciated that such a development effort might be complex and time consuming, but would nevertheless be a routine undertaking of design, fabrication, and manufacture for those of ordinary skill having the benefit of this disclosure.
When introducing elements of various embodiments of the present disclosure, the articles “a,” “an,” and “the” are intended to mean that there are one or more of the elements. The terms “comprising,” “including,” and “having” are intended to be inclusive and mean that there may be additional elements other than the listed elements. Additionally, it should be understood that references to “one embodiment” or “an embodiment” of the present disclosure are not intended to be interpreted as excluding the existence of additional embodiments that also incorporate the recited features.
The present disclosure describes systems and techniques related to a compiler flow that supports heterogeneous field-programmable gate array (FPGA) architecture. In certain heterogeneous FPGA architectures, logic array blocks (LABs) may include logic elements such as lookup tables (LUTs) in addition to other programmable logic elements, such as AND-inverter cones (AICs). A benefit of programmable logic elements such as AICs is that an AIC may implement wider functions than LUTs. For example, AICs may have 16 inputs, 32 inputs, or greater.
The additional logic elements may share their inputs and outputs with other logic in a block. An advantage of the additional programmable logic element is that some of the combination logic in a design that would otherwise be mapped to LUTs may now be mapped to these additional logic elements. This allows for a more area-efficient use of the FPGA by implementing fewer LABS, which may consume excessive on-die area. Additionally, using the additional programmable logic may improve logic utilization and wire usage of the design as well as improving routing and timing closure.
A challenge associated with the heterogeneous FPGA architecture stems from the fact that combinational logic in a user design may be mapped to any type of logic element on a chip, but in cach area of the chip there is a limited number of each type of logic element available. Moreover, not every function may be mapped to both LUTs and AICs. Consequently, during physical synthesis the combinational logic may not be mapped to corresponding logic elements in an efficient or optimal manner for each given logic element type.
For example, in an architecture where each LAB includes 80% LUTs and 20% AICs, it is possible that a synthesis tool may not map all the logic efficiently. For example, all logic may be mapped to LUTs if it would be the most efficient implementation for each logic cone. Yet while it may be the most efficient implementation viewed individually, in a placed design, there may be wasted AIC logic between the LUTs and the design maps to a larger area on the chip than may be desired. This results in an area-inefficient mapping and may decrease a maximum clock frequency (fmax) and increase wire length. Moreover, even if the physical synthesis would map 80% of the design's combinational logic to LUTs and 20% of the combinational logic to AICs, this may result in an undesirable placement as each part of the chip has the same ratio of LUTs and AICs, leading to an excess number of LUTs in one area and an excess number of AICs in another area. This undesirable placement may result in inefficient die area usage, lower fmax, and increased wire length.
Accordingly, it may be desirable to provide a compiler flow that supports heterogeneous FPGA architecture, taking advantage of a combination of LUTs and other logic elements (e.g., AICs) to improve resource utilization (e.g., die area, wire length) and improve fmax and compile time.
With the foregoing in mind,
In a configuration mode of the integrated circuit system 12, a designer may use an electronic device 13 (e.g., a computer) to implement high-level designs (e.g., a system user design) using design software 14, such as a version of INTEL® QUARTUS® by INTEL CORPORATION. The electronic device 13 may use the design software 14 and a compiler 16 to convert the high-level program into a lower-level description (e.g., a configuration program, a bitstream). The compiler 16 may provide machine-readable instructions representative of the high-level program to a host 18 and the integrated circuit system 12. The host 18 may receive a host program 22 that may control or be implemented by the kernel programs 20. To implement the host program 22, the host 18 may communicate instructions from the host program 22 to the integrated circuit system 12 via a communications link 24 that may include, for example, direct memory access (DMA) communications or peripheral component interconnect express (PCIc) communications. In some embodiments, the kernel programs 20 and the host 18 may configure programmable logic blocks 110 on the integrated circuit system 12. The programmable logic blocks 110 may include circuitry and/or other logic elements and may be configurable to implement a variety of functions in combination with digital signal processing (DSP) blocks 120.
The designer may use the design software 14 to generate and/or to specify a low-level program, such as the low-level hardware description languages described above. Further, in some embodiments, the system 10 may be implemented without a separate host program 22. Thus, embodiments described herein are intended to be illustrative and not limiting.
An illustrative embodiment of a programmable integrated circuit system 12 such as a programmable logic device (PLD) that may be configured to implement a circuit design is shown in
Programmable logic the integrated circuit system 12 may contain programmable memory elements. Memory elements may be loaded with configuration data (also called programming data or configuration bitstream) using input-output elements (IOEs) 102. Once loaded, the memory elements each provide a corresponding static control signal that controls the operation of an associated functional block (e.g., LABs 110, DSP 120, RAM 130, or input-output elements 102).
In one scenario, the outputs of the loaded memory elements are applied to the gates of metal-oxide-semiconductor transistors in a functional block to turn certain transistors on or off and thereby configure the logic in the functional block including the routing paths. Programmable logic circuit elements that may be controlled in this way include parts of multiplexers (e.g., multiplexers used for forming routing paths in interconnect circuits), look-up tables, logic arrays, AND, OR, NAND, and NOR logic gates, pass gates, etc.
The memory elements may use any suitable volatile and/or non-volatile memory structures such as random-access-memory (RAM) cells, fuses, antifuses, programmable read-only-memory memory cells, mask-programmed and laser-programmed structures, combinations of these structures, etc. Because the memory elements are loaded with configuration data during programming, the memory elements are sometimes referred to as configuration memory, configuration random-access memory (CRAM), or programmable memory elements.
Programmable logic device (PLD) 100 may be configured to implement a custom circuit design. For example, the configuration RAM may be programmed such that LABs 110, DSP 120, and RAM 130, programmable interconnect circuitry (i.e., vertical channels 140 and horizontal channels 150), and the input-output elements 102 form the circuit design implementation.
In addition, the programmable logic device may have input-output elements (IOEs) 102 for driving signals off the integrated circuit system 12 and for receiving signals from other devices. Input-output elements 102 may include parallel input-output circuitry, serial data transceiver circuitry, differential receiver and transmitter circuitry, or other circuitry used to connect one integrated circuit to another integrated circuit.
The integrated circuit system 12 may also include programmable interconnect circuitry in the form of vertical routing channels 140 (i.e., interconnects formed along a vertical axis of the integrated circuit system 12) and horizontal routing channels 150 (i.e., interconnects formed along a horizontal axis of the integrated circuit system 12), each routing channel including at least one track to route at least one wire. If desired, the interconnect circuitry may include pipeline elements, and the contents stored in these pipeline elements may be accessed during operation. For example, a programming circuit may provide read and write access to a pipeline element.
Note that other routing topologies, besides the topology of the interconnect circuitry depicted in
The compiler flow 200 may include a synthesis portion 202 that includes a register transfer level (RTL) elaboration block 204, a netlist optimization block 206, and a technology mapping block 208. The compiler flow 200 may include a placing and routing portion 210 that includes a periphery planning block 212, a global placement block 214, a clustering block 216, a detailed placement block 218, and a routing block 220. The compiler flow 200 may begin by receiving a user design file 222 (e.g., an RTL file) and applicable settings 224, and may perform a sign-off timing analysis 226 after the final routing stage in the routing block 220. Each of the blocks and functions in the compiler flow 200 may be performed by the compiler 16, the integrated circuit system 12, or one or more components or algorithms thereof.
In the RTL elaboration block 204, the compiler 16 may convert the user design file 222 to a netlist representation with operators and logic gates. In the netlist optimization block 206, a netlist optimization algorithm may optimize the netlist for area and maximum frequency (fmax). The compiler may infer blocks (e.g., RAM blocks, DSP blocks, carry chain blocks, and so on) that may be mapped efficiently to the FPGA. The netlist optimization algorithm may be made aware of any available logic elements, such as the LUTs and the AICs. Awareness of the various logic elements may affect the inferences of the netlist optimization algorithm. For example, in an architecture with LUTs and AICs, wide decoder logic may not be directly mapped to LUTs because the wide decoder logic may be more efficiently mapped to AICs instead.
In the technology mapping block 208, a technology mapping algorithm may map remaining combinational logic to the logic element primitives that are present in the FPGA. The technology mapping block 208 may map combinational logic to both LUTs and new logic elements (e.g., AICs). A technology mapping algorithm of the technology mapping block may be cut-based, similar to other LUT mapping algorithms. For each node in a netlist, the technology mapping algorithm computes cuts for both LUTs and AICs. Each of these cuts have an associated arca cost and delay. In cases where the LUTs are more prevalent in an FPGA, the technology mapping algorithms may favor LUT cuts for functions that can be implemented as both LUTs and AICs. The cuts that can be implemented as AIC may have a lower area and/or delay cost than the cuts that cannot be implemented as AIC. Given these cuts, the technology mapping algorithm may iteratively choose a desired cut for each node to reduce the area consumed and delay. As a result, a mapping may be selected where AICs are only used if implementing AICs will clearly reduce are and delay. LUTs may be used in other areas, with an advantage given to LUTs that may also be mapped to AICs. If a final mapping includes more AICs than may be implemented on the device, a second iteration may be performed via the technology mapping algorithm that changes the cost of the AIC cuts. Additionally or alternatively, a post-processing step may remap some of the AICs to LUTs.
Moving to the placing and routing portion 210 of the compiler flow 200, the periphery planning block 212, which places ports and other periphery elements of the user design 222, may be unaffected by the heterogeneous logic elements. The global placement block 214 may determine approximate locations for core elements of the user design 222. The global placement block 214 (e.g., a global placer algorithm of the global placement block 214) may account for resources available at each location on the FPGA to generally determine placements that may decrease wire length and increase fmax. As will be discussed in greater detail later, before global placement, the netlist includes three categories of logic elements: LUTS that may only be implemented as LUTs (e.g., a 6-input XOR gate); AICs that may only be implemented as an AIC (e.g., an 8-input AND gate); and LUTs that may be implemented as both LUTs and AICs (e.g., a 4-input OR gate). This last category may be referred to herein as an AIC candidate, as it is a LUT that may be selected for implementation as an AIC.
Returning to
In the clustering block 216, an implementation may be selected for each candidate AIC as either a LUT or an AIC. The clustering algorithm of the clustering block 216 may check the “legality” of implementation and structure. For example, the clustering algorithm may check whether a set of logic elements and registers can fit into a LAB 110 without violating a relevant specification. The clustering algorithm understands that candidate AICs may be implemented as both LUTs and AICs, thus implementing a candidate AIC as an LUT or an AIC may not violate a given specification on its own.
Area-efficient clusters may have a combination of LUT and AIC logic. This allows more logic to be packed into a cluster, which may reduce the number of clusters. The clustering algorithm may favor clusters that have an efficient combination of LUTs and AICs. The clustering algorithm may also account for timing. A delay through an AIC may in some instance be slower than the delay though LUT inputs, which means it may be better to map timing-critical candidates to LUTs instead of AICs. However, logic that is not timing-critical may be mapped to AICs, as in some instances AICs consume less die-area. Clustering also accounts for routability. As the AICs added to the LAB 110 may allow more logic to be implemented into the LAB 110, there may be risk of having too many unique inputs coming into the LAB 110, which may cause routing issues. The clustering algorithm limits the number of inputs into the LAB to ensure efficient routing.
At the end of the clustering block 216, all AIC candidates may be mapped either to a LUT or an AIC. However, the clustering algorithm may continue to track which LUTs and/or AICs are candidates that may be mapped to another logic type. The provides flexibility in how these candidates are mapped.
In the detailed placement block 218, legal locations are found for cach LAB 110. In the detailed placement block 218, individual elements are moved around between locations on the chip to get a better placement (in terms of fmax, wire length, wire use, and so on). The detailed placement block 218 may be adapted for the heterogeneous architecture such that it may switch an AIC candidate from a LUT to an AIC or vice versa based on a determination of the placement, routability, or timing. For example, AIC candidates that are not timing-critical may be implemented as AICs, which may free up LUTs for candidates that are more timing critical.
For the purposes of a heterogeneous architecture, the routing block 220 may be made aware of the multiple types of logic elements. For example, for a LUT, the inputs may be freely rotated without making the LUT illegal (although the LUT function may be adjusted). However, for an AIC, it is not always possible to freely rotate inputs.
It may be advantageous to enable user control over some or all of the processes of the compile flow 200. For example, primitives may allow the user to directly instantiate either a LUT or other logic element such as an AIC. Assignments may allow the user to control a percentage of combinational logic that the compiler 16 (e.g., via the technology mapping block 208) maps to each type of logic element. Settings for each step of the compile flow 200 may allow the user to disable mapping to an alternate logic block. Settings to control the amount of effort each step of the compile flow 200 spends on balancing between the different logic blocks. Increasing the effort may increase compile time but may improve resource usage and fmax. User control may include settings which allow the user to group specific AICs and LUTs into the same adaptive logic module (ALM) or LAB 110.
The method 400 may be performed by various components and algorithms within the synthesis portion 202 of the compiler flow. As will be described in greater detail below, various steps of the method 400 may be performed in the RTL elaboration block 204, the netlist optimization block 206, and/or the technology mapping block 208. In process block 402, the technology mapping algorithm (discussed with respect to the technology mapping block 208) may, for each node in a netlist, compute cuts for LUTs or AIC candidates. The technology mapping algorithm may compute the cuts based on combinational logic associated with the integrated circuit system 12 (e.g., an FPGA). The technology mapping algorithm may determine whether combinational logic may be cut (e.g., implemented) as an LUT or an AIC candidate.
In process block 404, the technology mapping algorithm may determine a technology mapping based on whether given combinational logic of the FPGA may be implemented as only LUTs and may be implemented as AIC candidates, respective die-area impacts of the LUTs and AIC candidates, and respective signal delay impacts of the LUTs and AIC candidates. In some scenarios, LUTs may consume a greater portion of a die while facilitating lower signal delay and AICs may consume a smaller portion of the die while resulting in a greater signal delay. However, in other scenarios LUTs may have both lower die-area impact and reduced signal delay than AICs given a certain implementation, or vice versa. There may be a desired ratio of LUTs and AIC implementations (e.g., AIC candidates cut into AICs rather than LUTs) based on the die-area consumed by and the signal delay caused by respective implementations of AICs and LUTs.
That is, in the process block 404, the technology mapping algorithm may determine whether to implement LUTs, AICs, both, or neither into certain functional blocks such as LABs 110, DSPs 120, and RAM 130 based on said determinations. In query block 406, the technology mapping algorithms will determine whether the technology mapping (e.g., the implementations of LUTs, AICs, both, or neither) includes a greater number of AICs than may be implemented in the programmable device (e.g., FPGA). The technology mapping may include a greater number of AICs than may be implemented if the AICs consume a greater die-area than is allowed or desired on a die or consumes more logic than is allowed or desired. There may be a set threshold number of AICs desired by a user. This threshold may be set by user control of the compiler flow 200. If, in the query block 406 it is determined that the present technology mapping does not include a greater number of AICs than may be implemented on the FPGA, in process block 408 the technology mapping algorithm may maintain the present technology mapping. However, if it is determined that the number of AICs is more than may be implemented, in process block 410, the technology mapping algorithm may adjust weighting factors for the AICs and/or LUTs and recalculate die-area impact and signal delay associated with AIC candidates to determine a second technology mapping. The technology mapping algorithm may adjust the weighting factors by applying a greater weight to the die-area impact and a greater weight to the signal delay associated with the AIC candidates or by applying a lesser weight to the die-area impact of the LUTs and a lesser weight to the signal delay associated with the LUTs. The technology mapping algorithm may then restart the method 400 from the process block 402. The technology mapping algorithm may iteratively perform the method 400 until a technology mapping with a desired ratio of LUTs and AICs is mapped.
After completing technology mapping, the compiler flow 200 may proceed to the placing and routing portion 210. In the global placement block 214, a global placer algorithm may identify the various LUTs and AIC candidates from the technology mapping block 208 as objects placeable on-die with some constraints associated with the resources available in a given area on the die. For example, there is only so much room available for a given number of LUTs per LAB 110, a given number of AICs per LAB 110, and so on. The global placer algorithm will spread the logic elements across the die while trying to decrease or minimize wire length and space consumed and trying to increase or maximize routability and fmax. However, the determination of whether a logic element such as an AIC candidate is implemented on-die as a LUT or an AIC occurs in the clustering block 216. The clustering block 216 (e.g., via a clustering algorithm) may account for additional legality rules (e.g., input sharing rules) that may arise due to the use of new logic elements such as AICs.
In process block 452, a clustering algorithm may receive global placements from the global placer algorithm as discussed above. In process block 454, based on the global placement presented by the global placer algorithm, the clustering algorithm may form LABs including ALMs, the ALMs composed of LUTs and/or AICs. The clustering algorithm may also determine if there are unused DSPs 120 and/or RAM 130 with logic capable of implementing AICs. If so, the clustering algorithm may leverage the unused logic in the unused DSPs 120 and/or RAM 130 to implement AICs. For example, if the global placer algorithm places four AICs on a LAB 110 that can only handle two AICs, to ensure that the LAB 110 is legal (e.g., doesn't violate any specification) the clustering algorithm may either convert two of the four AICs to LABs, or may implement the two AICs in nearby unused DSPs 120 or RAM 130.
In query block 456, the clustering algorithm may determine whether clustering has converged. Convergence may refer to a state wherein the various logic elements (LUTs, AICs, any other appropriate logic element) is placed in an optimal or desired arrangement via the clustering algorithm. That is, if the clusters have converged there are no illegal placements or implementations within the FPGA, wire length, die-area consumption, and signal delay have been reduced or minimized, routability has been optimized, and fmax has been increased or maximized, among other relevant considerations. If the clustering algorithm determines that the clusters haven o converged, the clustering algorithm may, in process block 460, change LUT implementations to AICs, or vice versa. Switching implementations for the logic elements may reduce or minimize wire length, die-area consumption, and signal delay, enhance routability, and increase or maximize fmax. Once the logic elements have been switched, the method 450 may return to the process block 454 and again reform the LABs 110 and implement AICs in unused DSPs 120 or RAM 130 as discussed above.
If the clustering algorithm determines that the clusters have converged, in process block 510 the clustering algorithm may maintain the selected implementations. That is, the LABs 110 are legal, and the clustering algorithm may no longer change or adjust the LUTs and/or AICs implemented in the LABs 110 or the AICs implemented in the unused DSPs 120 or RAM 130. However, additional changes (e.g., replacement of a LAB 110, changing the content of a LAB 110) may be made in later processes, such as detailed placement, physical synthesis, and so on.
The clustering algorithm may maintain legality for multiple types of cluster architectures.
The processes discussed above may be carried out on the integrated circuit system 12, which may be a component included in a data processing system, such as a data processing system 500, shown in
The data processing system 500 may be part of a data center that processes a variety of different requests. For instance, the data processing system 500 may receive a data processing request via the network interface 506 to perform encryption, decryption, machine learning, video processing, voice recognition, image recognition, data compression, database search ranking, bioinformatics, network security pattern identification, spatial navigation, digital signal processing, or other specialized tasks.
The techniques and methods described herein may be applied with other types of integrated circuit systems. For example, the programmable routing bridge described herein may be used with central processing units (CPUs), graphics cards, hard drives, or other components.
While the embodiments set forth in the present disclosure may be susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and have been described in detail herein. However, the disclosure is not intended to be limited to the particular forms disclosed. The disclosure is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the disclosure as defined by the following appended claims.
The techniques presented and claimed herein are referenced and applied to material objects and concrete examples of a practical nature that demonstrably improve the present technical field and, as such, are not abstract, intangible or purely theoretical. Further, if any claims appended to the end of this specification contain one or more elements designated as “means for [perform]ing [a function] . . . ” or “step for [perform]ing [a function] . . . ”, it is intended that such elements are to be interpreted under 35 U.S.C. 112(f). However, for any claims containing elements designated in any other manner, it is intended that such elements are not to be interpreted under 35 U.S.C. 112(f).
EXAMPLE EMBODIMENT 1. A computer-implemented method comprising:
EXAMPLE EMBODIMENT 2. The computer-implemented method example embodiment 1, comprising:
EXAMPLE EMBODIMENT 3. The computer-implemented method of example embodiment 2, wherein determining the first technology mapping comprises determining a first die area impact of a LUT implementation a second die-area impact of an AIC implementation.
EXAMPLE EMBODIMENT 4. The computer-implemented method of example embodiment 3, wherein determining the second technology mapping comprises adjusting a weighting factor associated with the second die-area impact of the AIC implementation.
EXAMPLE EMBODIMENT 5. The computer-implemented method of example embodiment 2, wherein determining the first technology mapping comprises determining a first signal delay associated with a LUT implementation.
EXAMPLE EMBODIMENT 6. The computer-implemented method of example embodiment 5, wherein determining the first technology mapping for the node of the plurality of nodes of the netlist comprises determining a second signal delay associated with an AIC implementation.
EXAMPLE EMBODIMENT 7. The computer-implemented method of example embodiment 6, wherein determining the second technology mapping comprises adjusting a weighting factor associated with the second signal delay associated with the AIC implementation.
EXAMPLE EMBODIMENT 8. A non-transitory, computer-readable medium comprising computer-readable code, that when executed by one or more processors, causes the one or more processors to:
EXAMPLE EMBODIMENT 9. The non-transitory, computer-readable medium of example embodiment 8, comprising the computer-readable code, that when executed by the one or more processors, causes the one or more processors to:
EXAMPLE EMBODIMENT 10. The non-transitory, computer-readable medium of example embodiment 8, comprising the computer-readable code, that when executed by the one or more processors, causes the one or more processors to move one or more LABs from a first cluster of the plurality of clusters to a second cluster of the plurality of clusters while maintaining legality of the one or more LABs.
EXAMPLE EMBODIMENT 11. The non-transitory, computer-readable medium of example embodiment 10, comprising the computer-readable code, that when executed by the one or more processors, causes the one or more processors to change logic associated with one or more LABs of a first cluster of the plurality of clusters while maintaining the legality.
EXAMPLE EMBODIMENT 12. The non-transitory, computer-readable medium of example embodiment 8, wherein forming the plurality of clusters of LABs comprises forming a LUT cluster including LUTs and forming an AIC cluster including AICs.
EXAMPLE EMBODIMENT 13. The non-transitory, computer-readable medium of example embodiment 8, wherein forming the plurality of clusters of LABs comprises forming a plurality of mixed LUT-AIC clusters comprising LUTs, AICs, or both.
EXAMPLE EMBODIMENT 14. The non-transitory, computer-readable medium of example embodiment 8, comprising the computer-readable code, that when executed by the one or more processors, causes the one or more processors to identify unused digital signal processing (DSP) blocks and implement AICs in the unused DSP blocks.
EXAMPLE EMBODIMENT 15. The non-transitory, computer-readable medium of example embodiment 8, comprising the computer-readable code, that when executed by the one or more processors, causes the one or more processors to identify unused random-access memory (RAM) and implement AICs in the unused RAM.
EXAMPLE EMBODIMENT 16. A non-transitory, computer-readable medium comprising computer-readable code, that when executed by one or more processors, causes the one or more processors to perform operations comprising:
EXAMPLE EMBODIMENT 17. The non-transitory, computer-readable medium comprising computer-readable code of example embodiment 16, that when executed by the one or more processors, causes the one or more processors to determine whether a first number of AICs associated with the second portion of the AIC candidates exceeds a desired number of AICs.
EXAMPLE EMBODIMENT 18. The non-transitory, computer-readable medium comprising computer-readable code of example embodiment 17, that when executed by the one or more processors, causes the one or more processors to:
EXAMPLE EMBODIMENT 19. The non-transitory, computer-readable medium comprising computer-readable code of example embodiment 18, wherein the first weighting factor is associated with a die-area impact of the AIC implementations and the second weighting factor is associated with a signal delay impact of the AIC implementations.
EXAMPLE EMBODIMENT 20. The non-transitory, computer-readable medium comprising computer-readable code of example embodiment 19, that when executed by the one or more processors, causes the one or more processors to determine a second technology mapping based on first weighting factor, the second weighting factor, or both.