Embodiments of the invention relate to methods and apparatuses based on machine learning for placing circuit blocks with flexible aspect ratios on a semiconductor chip.
In integrated circuit (IC) design, macro placement is the process of placing circuit blocks on a chip canvas. A macro contains post-synthesized descriptions of a circuit block. The logic and electronic behavior of the macro are given but the internal structural description may or may not be known. Mixed-size macro placement is the problem of placing macros of various sizes on a chip canvas to optimize an objective such as the wirelength, congestion, etc.
The number of circuit blocks involved in the placement can be on the order of hundreds or thousands. The placement of circuit blocks is a complicated and time-consuming process and typically relies on the manual efforts of human experts. The reliance on manual efforts severely limits the number of placement options that can be explored within a reasonable time. As a result, the manual placement may be suboptimal. If the chip design later calls for a different placement, the high iteration cost and impact on the schedule and resources would be prohibitive. Thus, there is a need to improve the quality and efficiency of circuit block placement.
In one embodiment, a method is provided for placing macros by a neural network on a chip canvas in an IC design. The method includes the steps of clustering the macros into multiple macro clusters, and generating, using the neural network, a probability distribution over locations on a grid and aspect ratios of a macro cluster. The grid represents the chip canvas and is formed by rows and columns of grid cells, and the macro cluster is described by at least an area size, aspect ratios, and wire connections. The method further includes the steps of generating action masks for respective ones of the aspect ratios to block out a subset of unoccupied grid cells based on design rules that optimize macro placement, generating a masked probability distribution by applying the action masks on the probability distribution, and selecting a location on the grid for placing the macro cluster with a chosen aspect ratio based on the masked probability distribution.
In another embodiment, a system is provided for placing macros on a chip canvas in an IC design. The system includes memory to store descriptions of the macros, and one or more processors coupled to the memory. At least one of the processors is operative to perform operations of a neural network. The one or more processors are operative to cluster the macros into multiple macro clusters, and generate, using the neural network, a probability distribution over locations on a grid and aspect ratios of a macro cluster. The grid represents the chip canvas and is formed by rows and columns of grid cells, and the macro cluster is described by at least an area size, aspect ratios, and wire connections. The one or more processors are further operative to generate action masks for respective ones of the aspect ratios to block out a subset of unoccupied grid cells based on design rules that optimize macro placement, generate a masked probability distribution by applying the action masks on the probability distribution, and select a location on the grid for placing the macro cluster with a chosen aspect ratio based on the masked probability distribution.
Other aspects and features will become apparent to those ordinarily skilled in the art upon review of the following description of specific embodiments in conjunction with the accompanying figures.
The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like references indicate similar elements. It should be noted that different references to “an” or “one” embodiment in this disclosure are not necessarily to the same embodiment, and such references mean at least one. Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to effect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
In the following description, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures, and techniques have not been shown in detail in order not to obscure the understanding of this description. It will be appreciated, however, by one skilled in the art, that the invention may be practiced without such specific details. Those of ordinary skill in the art, with the included descriptions, will be able to implement appropriate functionality without undue experimentation.
This disclosure describes a macro placer that is based on reinforcement learning (RL) and incorporates design principles followed by circuit designers. The macro placer is driven by a reward to reduce routing wirelength and congestion, with human experts' design principles incorporated as constraints in an RL environment. The macro placer can place hundreds of macros within a few hours with the quality of human experts. A macro is a circuit block, which may be a circuit module coded in a register transfer language (RTL) or a post-synthesized circuit module. One example of a macro is a memory circuit such as static random access memory (SRAM).
A common goal of macro placement is to minimize wirelength, congestion, power, and area for the placement. In addition to the common goal, circuit designers also follow design principles, such as placing macros at the periphery, avoiding dead space (i.e., areas that cannot be used effectively by electronic design automatic (EDA) tools), choosing regularity of the placement, etc.
The learning-based macro placer disclosed herein aims at the common goal and incorporates circuit designers' design principles as constraints in the RL learning environment to achieve the quality of human experts. The macro placer determines the shape (e.g., the aspect ratio, which is the ratio of width to height of a rectangle) of each macro cluster and places the macro cluster in an optimal location. Furthermore, a convex optimization refiner and a rule-based refiner are applied to de-overlap and refine the macros' positions to comply with the chip integration checker (CIC) rule. Therefore, the next stage of place-and-route for standard cells can start immediately without additional manual fine-tuning.
After hierarchy groups are formed, the leaf nodes having the same width and height within the same hierarchy group form a cluster. Six and four macro clusters are formed in the examples of
Each macro cluster may have multiple possible shapes, such as rectangles with multiple possible aspect ratios. For example, if a macro cluster contains six macros, the aspect ratio may be chosen from 1×6, 6×1, 2×3, and 3×2. Thus, a macro cluster can also be referred to as a “flexible block,” which is a circuit block having a fixed area and a flexible shape. The number of macros in a macro cluster may be subject to an upper limit. If a macro cluster contains more macros than this upper limit, the macro cluster may be split into several child macro clusters. In addition, a soft constraint is imposed in the RL agent's action to guide these child macro clusters to be close to each other in the placement. The output of pre-processor 110 is fed to feature extractor 120 (
In one embodiment, feature embedding 625 is also fed into an FC network 640 (also referred to as a value network). FC network 640 outputs a predicted reward value for a corresponding action. The predicted reward value is used to update the coefficients of neural network 600. For example, the neural network's coefficients can be updated using a Proximal Policy Optimization (PPO) gradient estimator with generalized advantage estimation.
For each macro cluster to be placed, multiple action masks may be generated to block out grid cells based on a density constraint. For density threshold=1, a grid cell is blocked out if placing a given macro cluster on the grid cell would cause the sum of occupied areas in the grid cell to exceed 1. Different action masks may be generated for the different aspect ratios of a given macro cluster. The action masks may be indicated by gt(x, y, r), which spans over a space of size M×N×|S|. When the action masks gt(x, y, r)=0, it means that the grid cell (x, y) is blocked for a flexible block with aspect ratio index r, and when gt(x, y, r)=1, it means that the grid cell (x, y) is not blocked for placing the flexible block with aspect ratio index r. A macro placement module such as macro placer 100 in
The action masks gt(x, y, r) may be applied to the probability distribution P(x, y, r) to set the blocked areas to a zero probability value. A masked distribution P(x, y, r) 660 of size is M×N×|S| is calculated by applying action masks 670 to the probability distribution P(x, y, r). In one embodiment, masked distribution 660 spans over the action space formed by the valid placement locations and the available aspect ratios of a macro cluster. With a deterministic policy, the highest probability according to masked distribution 660 may be chosen to place the macro cluster. With a stochastic policy, an action may be sampled according to masked distribution 660.
In one embodiment, action masks are used to block out invalid placement locations that may cause severe macro overlapping (e.g., when the overlapping exceeds a threshold) or out-of-bound placement. Action masks are further used to block out undesired placement locations based on circuit designers' experiences and/or preferences.
After the placement of macro cluster 710, the state of the chip canvas is updated and the next macro cluster is to be placed on the updated canvas. After all macro clusters are placed, a reward is calculated. In one embodiment, the reward may be expressed as an objective function that minimizes the wirelength and congestion.
The placement process may iterate a predetermined number of times or for a predetermined time period, or when the reward has reached a steady state or a given goal. After neural network 600 (
After the convex optimization, the locations of macro clusters are further refined by rule-based refiner 920. Referring to
The space between two adjacent macro clusters on a chip canvas is called channel spacing. Channel spacing is reserved for routing and buffer insertion between macro clusters to avoid timing violations. In one embodiment, the operations performed by channel reserve module 924 are as follows. For each given macro cluster, channel reserve module 924 calculates the distance from the given macro cluster to the nearest available channel. If the distance is greater than a predetermined threshold, channel reserve module 924 identifies a gap between two adjacent macro clusters, where the gap is the farthest gap from the given macro cluster and the distance from the given macro cluster to the gap is less than the predetermined threshold. Channel reserve module 924 then inserts a channel for routing and buffer insertion at the identified gap.
In one embodiment, fine-tuning module 925 inspects the placement of each macro cluster to ensure that the placement meets the CIC rules and design rule check (DRC) rules as required by the chip foundry. For each macro cluster, fine-tuning module 925 determines whether or not the spacing between the macro cluster and the canvas boundary and the spacing between the macro cluster and its adjacent macro cluster(s) comply with the respective requirements according to the CIC and DRC rules. Fine-tuning module 925 moves the macro cluster if the rules are violated.
The disclosed learning-based macro placer not only digests gate-level connection information, but also follows backend physical design principles when creating macro placement. By clustering the standard cells and macros, exploiting their connection signatures, exploring 3-D placement algorithms, incorporating physical design principles, and leveraging convex optimization and a rule-based refiner, the learning-based macro placer can place hundreds of macros in a few hours with the quality of human experts. In addition, the learning-based macro placer has the ability to learn from each placement project. Thus, the learning-based macro placer can accelerate and automate the physical design process.
Method 1100 starts with step 1110 at which the system clusters the macros into macro clusters. At step 1120, the system uses a neural network to generate a probability distribution over locations on a grid and aspect ratios of a macro cluster. The grid represents a chip canvas and is formed by rows and columns of grid cells. The macro cluster is described by at least an area size, aspect ratios, and wire connections. The system at step 1130 generates action masks for respective aspect ratios to block out a subset of unoccupied grid cells based on design rules that optimize macro placement. The system at step 1140 generates a masked probability distribution by applying the action masks on the probability distribution. The system at step 1150 selects a location on the grid for placing the macro cluster with a chosen aspect ratio based on the masked probability distribution.
In one embodiment, the system detects edge grid cells in a region of the grid, where each grid cell in the region is valid for placement. The system then removes non-edge grid cells from candidate grid cells to generate updated candidate grid cells. The system further detects one or more dead-space grid cells among the updated candidate grid cells. Placement of the macro cluster on any of the dead-space grid cells causes fragmentation of usable placement space in the grid. The system then removes the one or more dead-space grid cells from the updated candidate grid cells to generate target grid cells. The system generates an action mask that blocks out all grid cells in the grid except the target grid cells.
In one embodiment, the macros having a same height and width and in a same hardware hierarchy group are clustered into a macro cluster. When forming hardware hierarchy groups, each macro is a leaf node in a tree structure that describes a hierarchical hardware design. Then the tree structure is partitioned into hardware hierarchy groups with the number of macros in each hardware hierarchy group subject to an upper limit. In one embodiment, the neural network is an RL neural network that receives a reward for placing the macros on the grid. The reward is a measurement of wirelength and congestion of the placement.
In one embodiment, after placing all of the macro clusters on the grid, the system may apply a convex refiner to overlapping macro clusters to minimize a total macro displacement while satisfying a non-overlapping constraint for all of the macro clusters. The system may further apply a rule-based refiner to minimize wasted areas between adjacent macro clusters and between a chip canvas boundary and each macro cluster. The system may further apply a rule-based refiner to reserve channel space for each macro cluster. The system may further apply a rule-based refiner to enforce requirements of foundry process technologies with respect to spacing between adjacent macro clusters and spacing between a chip canvas boundary and the macro clusters.
System 1200 further includes memory 1220 coupled to processing hardware 1210. Memory 1220 may include memory devices such as dynamic random access memory (DRAM), SRAM, flash memory, and other non-transitory machine-readable storage media; e.g., volatile or non-volatile memory devices. Memory 1220 may further include storage devices, for example, any type of solid-state or magnetic storage device. In one embodiment, memory 1220 may store one or more EDA tools 1240 and a macro placer including but not limited to a neural network 1260 (e.g., neural network 600 in
In some embodiments, system 1200 may also include a network interface 1230 to connect to a wired and/or wireless network. It is understood the embodiment of
The operations of the flow diagram of
Various functional components or blocks have been described herein. As will be appreciated by persons skilled in the art, the functional blocks will preferably be implemented through circuits (either dedicated circuits or general-purpose circuits, which operate under the control of one or more processors and coded instructions), which will typically comprise transistors that are configured in such a way as to control the operation of the circuitry in accordance with the functions and operations described herein.
While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described, and can be practiced with modification and alteration within the spirit and scope of the appended claims. The description is thus to be regarded as illustrative instead of limiting.
This application claims the benefit of U.S. Provisional Application No. 63/343,111 filed on May 18, 2022, and U.S. Provisional Application No. 63/373,207 filed on Aug. 23, 2022, the entirety of both which is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
63373207 | Aug 2022 | US | |
63343111 | May 2022 | US |