This application claims priority to Taiwan Application Serial Number 112125499, filed on Jul. 7, 2023, which is herein incorporated by reference in its entirety.
The present disclosure relates to a technology to estimate chip performance values. More particularly, the present disclosure relates to a method for estimating a plurality of performance values of a plurality of chips, a computing system and a non-transitory computer-readable storage medium.
A critical path may be used to estimate performances of integrated circuits, since a signal delay of the critical path determines a maximum frequency of a chip. In different process-voltage-temperature (PVT) situations, the critical path of the chip may change. Output of ring oscillators may change according to PVT variations, therefore outputs of ring oscillators are often used to construct formulas to estimate chip performance values (i.e., a propagation delay of the critical path). However, different chips may have different PVT-to-delay sensitivities. In this situation, the performance values of different chips may not be estimated by the same formula.
The present disclosure provides a method for estimating a plurality of performance values of a plurality of chips. The method includes following steps: (A) using a plurality of oscillation period vectors of a to-be-divide chip set to train a first neural network model corresponding to the to-be-divide chip set, to obtain a training error of the to-be-divided chip set, in which when step (A) is conducted for the first time, the to-be-divided chip set includes the plurality of chips; (B) dividing the to-be-divided chip set into a plurality of divided chip sets according to the training error; and (C) using a plurality of oscillation period vectors of the plurality of divided chip sets as training data of a second neural network model, so that the second neural network model outputs a plurality of weight vectors respectively corresponding to the plurality of divided chip sets. A product of one or more oscillation period vectors of each divided chip set and a weight vector of the divided chip set is larger than a product of the one or more oscillation period vectors of the divided chip set and a weight vector of each of rest of divided chip sets.
The present disclosure provides a non-transitory computer-readable storage medium configured to store one or more computer-readable instructions. When one or more processors executing the one or more computer-readable instructions, the one or more computer-readable instructions cause the one or more processors to conduct following steps to estimate a plurality of performance values of a plurality of chips: (A) using a plurality of oscillation period vectors of a to-be-divided chip set to train a first neural network model corresponding to the to-be-divided chip set, to obtain a training error of the to-be-divided chip set, in which when step (A) is conducted for the first time, the to-be-divided chip set includes the plurality of chips; (B) dividing the to-be-divided chip set into a plurality of divided chip sets according to the training error; and (C) using a plurality of oscillation period vectors of the plurality of divided chip sets as training data of a second neural network model, so that the second neural network model outputs a plurality of weight vectors respectively corresponding to the plurality of divided chip sets. A product of one or more oscillation period vectors of each divided chip set and a weight vector of the divided chip set is larger than a product of the one or more oscillation period vectors of the divided chip set and a weight vector of each of rest of divided chip sets.
The present disclosure provides a computing system including one or more processors. The one or more processors are configured to conduct following steps to estimate a plurality of performance values of a plurality of chips: (A) using a plurality of oscillation period vectors of a to-be-divided chip set to train a first neural network model corresponding to the to-be-divided chip set, to obtain a training error of the to-be-divided chip set, in which when step (A) is conducted for the first time, the to-be-divided chip set includes the plurality of chips; (B) dividing the to-be-divided chip set into a plurality of divided chip sets according to the training error; and (C) using a plurality of oscillation period vectors of the plurality of divided chip sets as training data of a second neural network model, so that the second neural network model outputs a plurality of weight vectors respectively corresponding to the plurality of divided chip sets. A product of one or more oscillation period vectors of each divided chip set and a weight vector of the divided chip set is larger than a product of the one or more oscillation period vectors of the divided chip set and a weight vector of each of rest of divided chip sets.
One of advantages of the aforementioned method, non-transitory computer-readable storage medium and computing system is to estimate a plurality of performance values of a plurality of chips with different PVT-to-delay sensitivities accurately without extracting critical paths of chips.
It is to be understood that both the foregoing general description and the following detailed description are by examples, and are intended to provide further explanation of the disclosure as claimed.
Reference will now be made in detail to the present embodiments of the disclosure, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers are used in the drawings and the description to refer to the same or like parts.
In some embodiments, each of the oscillator circuits 110_1-110_r can be implemented with a ring oscillator. In some embodiments, a period of each of the oscillation signals OS_1-OS_r is configured to reflect a process-voltage-temperature (PVT) variation of a respective location corresponding to the oscillation signal.
It is worth mentioning that at least a part of the oscillator circuits 110_1-110_r are located near a critical path 130 of the chip 100, so that the oscillation signals OS_1-OS_r may be used to estimate the performance of the chip 100. By reducing the working voltage VDD, the power consumption of the chip 100 may be reduced, but the sum of the propagation delay and the setup time of the critical path 130 may also be increased. When the sum of the propagation delay and the setup time of the critical path 130 is increased to approximately equal to the system clock period Tcycle of the system clock CLK, the chip 100 may enter a failure state. Therefore, in some embodiments, estimating the performance of the chip 100 means to calculate the sum of the propagation delay and the setup time of the critical path 130 when the chip 100 is operated under a predetermined voltage VDD, so as to help achieve a balance between power consumption and performance when designing the chip 100.
Utilizing the characteristic that the periods of the oscillation signals OS_1-OS_r may change according to the working voltage VDD, a performance estimating function of the single chip 100 may be represented by the following Formula 1.
Here “Tes” represents the performance value of the chip 100, that is, the sum of the propagation delay and the setup time of the critical path 130; “T1×r” is an 1×r array, where the r elements thereof are the periods of the oscillation signals OS_1-OS_r of the chip 100, respectively, and r is a positive integer; “K1×rT” is an 1×r array representing a weight vector of the chip 100.
In some embodiments that a simultaneous estimation of a plurality of performance values of chips 100 is needed, different critical paths 130 of chips 100 may have different PVT-to-delay sensitivities. Therefore, in order to estimate the plurality of performance values of the chips 100 accurately, Formula 1 needs to be adjusted to the following Formula 2. For the convenience of explanation, in the following embodiments, the number of the chips 100 whose performance values need to be estimated is assumed to be 5. However, the present disclosure is not limited thereto; Formula 2 may be applied to estimate performance values of any number of chips 100.
Referring to
It is worth mentioning that the 5 chips 100 are divided into 3 divided chip sets DGa-DGc according to the PVT-to-delay sensitivity. In the array T5×r, three dashed boxes represent the divided chip sets DGa-DGc from top to bottom, respectively. The numbers of chips 100 in the divided chip sets DGa-DGc are 2, 2 and 1, respectively. However, the number of divided chip sets and the numbers of chips 100 in divided chip sets are not limited thereto, where each divided chip set may comprise at least one chip 100. The grouping method for chips 100 is illustrated with reference to
The weight vector K1×r,jT is an 1×r array representing the weight vector of the j-th divided chip set, where j is a positive integer in the range of 1 to 3 since there are 3 divided chip sets in total. That is, as shown in
It is worth mentioning that, as shown in
The operator maxj( ) is used to select the maximum over the products related to the oscillation period vector(s) of the j-th divided chip set as one or more elements of the array Tes5×1. Taking the divided chip set DGa as an example, the operator max1( ) thereof is used to select the maximum over the products M1-M3. Ideally, the product M1 of the plurality of oscillation period vectors of the divided chip set DGa (i.e., the first and the second row vectors of the array T5×r) and the weight vector K1×r,1T of the divided chip set DGa, will be larger than the products M2 and M3 of the plurality of oscillation period vectors of the divided chip set DGa and the weight vectors K1×r,2T and K1×r,3T of the rest of the divided chip sets DGb and DGc, where the reason is illustrated with reference to
In some embodiments, the product M1 is larger than products M2 and M3; that is, every element in the product M1 is larger than an element of the products M2 and M3 at the respective position. For example, the element “D11-D1r×K1×r,1T” of the product M1 is larger than the element “D11-D1r×K1×r,2T” of M2, and the element “D11-D1r×K1×r,3T” of M3.
The process of estimating the performance values of the rest of the divided chip sets of aforementioned 5 chips 100 by Formula 2, and the functions of operators max2( ) and max3( ), are all similar to their corresponding part in the aforementioned plurality of embodiments about the first divided chip set. For simplicity, the detailed descriptions thereof are omitted here.
In some embodiments, since the periods of the oscillation signals OS_1-OS_r are all measureable, the row vectors of the array T5×r are known. However, since the PVT-to-delay sensitivities of the chips 100 may be hardly measured directly, the values in the weight vector K1×r,jT and the grouping method of the chips 100 (e.g., the total number of the divided chip sets and the divided chip set that each of the chips 100 belongs to) may not be determined by measuring the chips 100.
Thereof, to construct Formula 2, the present disclosure provides a method 300 for estimating the plurality of performance values of the plurality of chips 100, which is illustrated in
In some embodiments, the computing system is coupled to the aforementioned 5 chips 100, and is configured to receive the periods of the oscillation signals OS_1-OS_r of each of the 5 chips 100.
Referring to
More specifically, in step S310, the computing system may take the oscillation period vectors Vo1-Vo5 as training data of the first neural network model 410A, and then input the oscillation period vectors Vo1-Vo5 into a multi-layer neural structure 412 of the first neural network model 410A, so that the first neural network model 410A generates an initial weight vector K1×r′T corresponding to the to-be-divided chip set PDG, where the initial weight vector K1×r′T is an 1×r array. In some embodiments, the first neural network model 410A may be implemented with a feedforward neural network (FNN) model.
Then, to calculate the training error Terr1×m, the computing system will calculate products (hereinafter refer to as the initial performance value Tes1×m′ of the to-be-divided chip set PDG) of the oscillation period vectors Vo1-Vo5 and the initial weight vector K1×rT first, where the initial performance value Tes1×m′ is an 1×m array, and m is the number of chips 100 in the to-be-divided chip set PDG. Then, the computing system will subtract the product of a unit vector u1×mT and the system clock period Tcycle of the system clock CLK from the initial performance value Tes1×m′, to obtain the training error Terr1×m. In other words, the loss function of the first neural network model 410 is the following Formula 3.
The computing system will provide the training error Terr1×m to the optimizer 414 of the first neural network model 410A. The optimizer 414 is configured to adjust the weights of the multi-layer neural structure 412 according to the training error Terr1×m, which in turn adjusts the initial weight vector K1×r′T, so as to minimize the training error Terr1×m. In some embodiments, the optimizer 414 may be implemented with an Adam optimizer. Accordingly, the computing system may adjust the initial weight vector K1×r′T, so as to minimize training error Terr1×m where the training error Terr1×m is related to the following two terms: (1) the product of the system clock period Tcycle and the unit vector u1×mT, and (2) the products of the plurality of oscillation period vectors Vo1-Vo5 of the to-be-divided chip set PDG and the initial weight vector K1×r′T.
Then, in step S320, the computing system is configured to divide the to-be-divided chip set PDG into a plurality of chip sets according to the training error Terr1×m. More specifically, the computing system will provide the minimized training error Terr1×m to a detection module 420. The detection module 420 is configured to judge whether the m elements in the minimized training error Terr1×m are in a normal distribution. If the detection module 420 determines the training error Terr1×m not in a normal distribution, the detection module 420 will divide the to-be-divided chip set PDG into two chip sets GA and GB, where the each of the chip sets GA and GB comprises at least one chip 100. In some embodiments, the detection module 420 uses k-means clustering to divide the to-be-divided chip set PDG into chip sets GA and GB, and the chip(s) 100 in the same set have same or similar PVT-to-delay sensitivities.
In step S320, the computing system will further take the chip sets GA and GB as to-be-divided chip sets PDGa and PDGb, respectively, to repeat steps S310-S320 on each of the chip sets GA and GB (i.e., the to-be-divided chip sets PDGa and PDGb). In other words, as shown in
In addition, after conducting steps S310-S320 for one or more times, if the detection module 420 determines, in step S320, that the training error Terr1×m of each chip set is in normal distribution, the computing system will take all the chip sets as the plurality of divided chip sets DGa-DGc in
Referring to
At first, the computing system uses the oscillation period vectors Vo1-Vo5 as training data of a second neural network model 510, and input into a multi-layer neural structure 512 of the second neural network model 510, so that the multi-layer neural structure 512 outputs the weight vectors K1×r,1T, K1×r,2T and K1×r,3T corresponding to the divided chip sets DGa-DGc, respectively.
Then, the computing system inputs the oscillation period vectors Vo1-Vo5 and the weight vectors K1×r,1T, K1×r,2T and K1×r,3T into an computing module 514 of the second neural network model 510. The computing module 514 will calculate a training error L of the divided chip sets DGa-DGc, and the loss function used to calculate the training error L is represented by the following Formula 4.
In Formula 4, the arrays Tmain_a, Tmain_b and Tmain_c represent “the product of the oscillation period vectors Vo1-Vo2 and the weight vector K1×r,1T”, “the product of the oscillation period vectors Vo3-Vo4 and the weight vector K1×r,2T”, and “the product of the oscillation period vector Vo5 and weight vector K1×r,3T”, respectively.
Furthermore, in Formula 4, the arrays Tother_1a and Tother_2a represent “the product of the oscillation period vectors Vo1-Vo2 and the weight vector K1×r,2T of the divided chip set DGb” and “the product of the oscillation period vectors Vo1-Vo2 and the weight vector K1×r,3T of the divided chip set DGc”, respectively. The arrays Tother_1b and Tother_2b represent “the products of the oscillation period vectors Vo3-Vo4 and the weight vector K1×r,1T of the divided chip set DGa” and “the products of the oscillation period vectors Vo3-Vo4 and the weight vector K1×r,3T of the divided chip set DGc”, respectively. The arrays Tother_1c and Tother_2c represent “the products of the oscillation period vector Vo5 and the weight vector K1×r,1T of the divided chip set DGa” and “the products of the oscillation period vector Vo5 and the weight vector K1×r,2T of the divided chip set DGb”, respectively.
In Formula 4, the system clock period Tcycle is multiplied by a unit vector uT with a suitable number of columns to apply array subtraction. For simplicity, the unit vector uT is not shown in Formula 4. Accordingly, the training error L is related to: (1) the difference obtained by subtracting the product of the system clock period Tcycle and the unit vector from the product of one or more oscillation period vectors of each divided chip set and the weight vector of the divided chip set (i.e., the array Tmain), and (2) the difference obtained by subtracting the product of the system clock period Tcycle and the unit vector from the products of one or more oscillation period vectors of the divided chip set and the weight vector of each of the rest divided chip sets (i.e., the array Tother).
In step S330, the training error L will be provided to an optimizer 516 of the second neural network model 510. The optimizer 516 is configured to adjust the weights of the multi-layer neural structure 512, so as to adjust the weight vector K1×r,1T, which in turn minimizes the training error L. The weight vectors K1×r,1T, K1×r,2T and K1×r,3T obtained when the training error L is minimized may be used as K1×r,1T in Formula 2 to construct Formula 2. It is worth mentioning that in the process of minimizing the training error L, the array Tmain will become larger than all respective arrays Tother. For example, each element of the array Tmain_a will be larger than elements in the respective entries in the array Tother_1a and the array Tother_2a, where the arrays Tmain_a, Tother_1a and Tother_2a are the products M1, M2 and M3 in
In some embodiments, the method 300 further comprises step S340. In step S340, the computing system estimates a plurality of (e.g., 5) performance values of the aforementioned plurality of (e.g., 5) chips 100 according to the weight vectors K1×r,1T, K1×r,2T and K1×r,3T. More specifically, the computing system uses the weight vectors K1×r,1T, K1×r,2T and K1×r,3T obtained in steps S310-S330 to construct Formula 2, and then use Formula 2 to estimate the 5 performance values of the aforementioned 5 chips 100 (i.e., generating the array Tes5×1 in
Consequently, the method 300 may estimate a plurality of performance values of the plurality of chips 100 with different PVT-to-delay sensitivities accurately without extracting critical paths 130 of chips 100.
In some embodiments, the first neural network models 410A-410C, the 420 detection module and/or the second neural network model 510 may be implemented with a plurality of computer-readable instructions stored in a non-transitory computer-readable storage medium, and one or more processors in the computing system may execute the computer-readable instructions, so to run the first neural network models 410A-410C, the detection module 420 and/or the second neural network model 510 on the computing system. In other embodiments, the first neural network models 410A-410C, the detection module 420 and/or the second neural network model 510 may be implemented with one or more processors in the computing system (e.g., application specific integrated circuits (ASICs) or field programmable gate arrays (FPGAs)), so to run the first neural network models 410A-410C, the detection module 420 and/or the second neural network model 510 on the computing system.
As used herein, “around”, “about” or “approximately” shall generally mean within 20 percent, preferably within 10 percent, and more preferably within 5 percent of a given value or range. Numerical quantities given herein are approximate, meaning that the term “around”, “about” or “approximately” can be inferred if not expressly stated.
Certain terms are used in the specification and the claims to refer to specific components. However, those of ordinary skill in the art would understand that the same components may be referred to by different terms. The specification and claims do not use the differences in terms as a way to distinguish components, but the differences in functions of the components are used as a basis for distinguishing. Furthermore, it should be understood that the term “comprising” used in the specification and claims is open-ended, that is, including but not limited to. In addition, “coupling” herein includes any direct and indirect connection means. Therefore, if it is described that the first component is coupled to the second component, it means that the first component can be directly connected to the second component through electrical connection or signal connections including wireless transmission, optical transmission, and the like, or the first component is indirectly electrically or signally connected to the second component through other component(s) or connection means.
It will be understood that, in the description herein and throughout the claims that follow, the phrase “and/or” includes any and all combinations of one or more of the associated listed items. Unless the context clearly dictates otherwise, the singular terms used herein include plural referents.
Although the present disclosure has been described in considerable detail with reference to certain embodiments thereof, other embodiments are possible. Therefore, the spirit and scope of the appended claims should not be limited to the description of the embodiments contained herein. It will be apparent to those skilled in the art that various modifications and variations can be made to the structure of the present disclosure without departing from the scope or spirit of the present disclosure. In view of the foregoing, it is intended that the present disclosure cover modifications and variations of this disclosure provided they fall within the scope of the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
112125499 | Jul 2023 | TW | national |