Processor designers rely heavily on representative benchmark simulations to evaluate various design alternatives. However, accurate modeling of a complex design may reduce simulation speed, in spite of increasing processing power, thereby restricting the ability to study design tradeoffs. Researchers may use ad hoc solutions, such as simulating only a small fraction of the overall benchmark, in the hope that the simulated fraction is a good representative of the overall behavior. However, recent studies show that programs exhibit different behaviors during different execution phases that occur over a long time period. Attempts to avoid this problem by simulating several samples of program execution must be, by nature, based on code instrumentation and simulations, which restricts the ability to apply them to a wide-range of applications running on complex native hardware.
Various exemplary features and advantages of embodiments of the invention will be apparent from the following, more particular description of exemplary embodiments of the present invention, as illustrated in the accompanying drawings wherein like reference numbers generally indicate identical, functionally similar, and/or structurally similar elements.
Exemplary embodiments of the invention are discussed in detail below. While specific exemplary embodiments are discussed, it should be understood that this is done for illustration purposes only. A person skilled in the relevant art will recognize that other components and configurations may be used without parting from the spirit and scope of the invention.
Embodiments of the present invention may provide a system and/or method for detecting phases of a program that may be executed by a processor. As referred to herein, a phase of a program bay be defined by the loops of the program and their subordinates, as well as the data with which the loop is called. Exemplary embodiments of the invention may process sampling addresses to obtain the processor state in each sample and cluster the sampling addresses using clustering algorithms, for example, algorithms as would be known by a person having ordinary skill in the art, to detect the phases of the program.
Referring now to the drawings,
System 100 may also include interrupt device 103, control module 104, storage device 105, and analysis module 106. Interrupt device 103 may interrupt execution of the program at regular intervals of instructions executed (e.g., once every one million instructions) so that the performance metrics, for example, may be recorded. Control module 104 may obtain a state of the processor during the interrupt and generate sampling data based on the state of the processor. To obtain the state of the processor, control module 104 may record the EIPs and event counter totals (e.g., clocktick count and instruction count), for example. To generate sampling data, control module 104 may write the EIPs, process ID, module name, and cycles per instruction, for example, to storage device 105. This sampling data may then be used by analysis module 106 during phase analysis.
In block 202, program execution may be sampled. In an exemplary embodiment of the invention, program execution may be sampled in real-time, for example, by monitoring embedded event counters in a processor.
In block 203, the sampling data may be collected. To collect sampling data, a control module, such as, e.g., control module 106, may collect EIPs and other performance metrics. In an exemplary embodiment of the invention, other data such as, e.g., performance metrics, that is collected may include, but is not limited to process IDs, module names, and CPIs. Once the data is collected, a control module, for example, may generate sampling data and store it in a storage device, for example, for later analysis.
In block 204, the phases of the program may be identified based on the sampling data. To identify the phases of a program, extended instruction pointer vectors (EIPVs) may be constructed from the sampling data. In exemplary embodiments of the invention, known algorithms, such as, e.g., the k-means, Spectral, and Agglomerative algorithms, may then be used to cluster the EIPVs, as would be understood by a person having ordinary skill in the art.
As an example, to construct EIPVs for clustering, the execution of the program may be divided into equal intervals, each of length 100 million instructions, for example. Each interval may be “signed” by a vector that corresponds to the normalized histogram of EIPs interrupted during program execution. For example, let N be the total number of unique EIPs recorded throughout the complete execution of the program. The jth interval of 100 million executed instructions may then be represented by the N-dimensional vector xj=[x1jx2j . . . xNj]T, where xij, is the total number of times the ith EIP has been sampled during the execution interval divided by the total number of EIPs collected (vector xj is normalized so that the sum of its entries xij equals one). If the code is sampled at a rate of once every million instructions executed, for example, then each histogram vector xj may be computed on the basis of 100 consecutive samples. In such an embodiment, xj may be called the jth EIPV. Following this representation, the Euclidian norm in N-dimensional space may be a natural distance metric to measure similarity between program segments. In other words, the nth and mth EIPVs may be declared similar if the Euclidian distance d(xn, xm)=∥xn−xm∥ is “small”, for example, within some predetermined amount. As will be understood by a person having ordinary skill in the art, because each EIPV covers a fixed number of instructions executed, the EIPV may be a machine independent attribute.
Once the EIPVs have been constructed, the EIPVs may then be clustered for program phase detection in block 204. Continuing with the above example, a program that is executed may be represented by its set of N-dimensional EIP vectors X={x1, x2, . . . xM}, where M is the total number of instruction segments executed (or total number of instruction segments in units of 100 million, for example). The k-means algorithm may then be applied on the set X to identify the k most representative clusters corresponding to the k program phases.
In an exemplary embodiment of the invention, random projection may then be used to reduce the dimensionality of the vectors, for example, without losing separability between clusters. Random projection may consist of replacing the original set of EIPVs xj (j=1, . . . M), by their orthogonal projections x′j onto a randomly selected linear subspace of dimension D<<N. In an exemplary embodiment of the invention, D=15 may be a suitable target dimensionality for random projection. In an exemplary embodiment of the invention, k-means clustering may then be applied to the set of “projected” EIPVs X′={x′1, x′2, . . . , x′M}.
The above example illustrates the effectiveness of the EIPV approach in identifying phase behavior. In an exemplary embodiment of the invention, Bayesian Information Criterion (BIC) may be used to compute a number of distinct phases k of a program that approaches an optimum value, in connection with use of k-means clustering. As will be understood by a person having ordinary skill in the art, the BIC may be used to identify a good value for the number of clusters (or phases) k. For a given choice of k the BIC score for an EIPV X′={x′1, x′2, . . . , x′M} may be written as follows:
BIC(k)=log(pk(X′))−k(D+1)/2 log M,
where pk(X′) is a determined (by k-means clustering) probability distribution of the data with estimated parameters, i.e., the centroids of clusters in k-means clustering, which may provide a measure of the distortion from the underlying data, and k(D+1) is the total number of parameters (accounting for dimensionality).
In block 304, the state of the processor may be obtained. To obtain the state of the processor, data, such as, but not limited to, EIPs, process IDs, module name, CPIs, and the like may be recorded. Once the data is recorded, sampling data may be generated in block 305.
In block 306, the sampling data may be stored in a storage device for later use during phase detection, for example. In block 307, the phases of the program may be detected based on the sampling data. As discussed above, EIPVs may be constructed and clustered for program phase detection. In an exemplary embodiment of the invention, the EIPVs may be clustered using the k-means algorithm, as will be understood by a person having ordinary skill in the art.
The embodiments illustrated and discussed in this specification are intended only to teach those skilled in the art various ways known to the inventors to make and use the invention. Nothing in this specification should be considered as limiting the scope of the present invention. All examples presented are representative and non-limiting. The above-described embodiments of the invention may be modified or varied, without departing from the invention, as appreciated by those skilled in the art in light of the above teachings. It is therefore to be understood that the invention may be practiced otherwise than as specifically described.