Embodiments generally relate to accurate latency and analysis measurements. More particularly, embodiments relate to one or more profilers that provide an accurate measurement of the latency of source code.
A developer may analyze source code to identify hotspots and bugs in the source code. Such analysis may become difficult. For example, the source code may be a high-level language code. A high-level language code may be a code written with natural language elements such as C# or Java with a strong abstraction from the details of the computer, such as the underlying microarchitectures. The source code may be compiled into an intermediate language (e.g., bytecode), and then into a low-level language code (e.g., machine code or assembly code) that is executable by the microarchitecture of a computer. A low-level language code may be a code that provides little or no abstraction from a computer's instruction set architecture (ISA) or microarchitecture. For example, the low-level language code may include commands or functions in a language that maps closely to processor instructions, and may be assembly language instructions that may be conceptually machine code. Thus, the high-level language code may not consider the underlying computer architecture and include instructions abstracted away from the computing architecture, whereas the low-level language code may be heavily mapped to a computer architecture and include instructions specific to the computing architecture.
As such, a high-level language code may be compiled into different low-level language codes depending on the computer architectures. Therefore, accurate latency measurements of the source code may be difficult since the same source code may be implemented differently depending on the computer architecture. Thus, a developer may inaccurately ascertain the latency of a computer program only in relation to a single computing architecture.
The various advantages of the embodiments will become apparent to one skilled in the art by reading the following specification and appended claims, and by referencing the following drawings, in which:
As described below in greater detail, the source code analysis orchestrator 106 may operate in conjunction with the first and second platforms 108, 118 to identify hotspots of the source code 104, and identify the portions of the source code 104 that generated the hotspots. Doing so provides an enhanced understanding of the source code 104. In contrast, some conventional situations may generate a lower-level language (e.g., assembly code) from the source code. Due to a lack of source level mapping, only portions of the lower-level language are identified as causing the hotspots, leaving the programmer unaware of the original source code that caused the hotspots. Performance tuning tips and hotspots that are only identified at the assembly code level may typically only be useful for compiler developers. In contrast, some embodiments provide tuning tips and hotspot analysis at the source code level, and thus may be efficiently utilized by application (e.g., .NET) developers and compiler developers.
Further, both the first and second platforms 108, 118 may each undertake an independent hotspot analysis of the source code 104 to provide an enhanced understanding of source code 104 execution. For example, the source code 104 may generate a hotspot on certain architectures but not on other architectures. As one possibility, the same source code 104 may use general registers/xmm/ymm on different generations of microarchitectures, and specialized registers on other architectures. Such distinctions across architectures may generate unique hotspots for the particular architecture, as well as unique assembly code. By implementing the hotspot analysis across different architectures, a more accurate measurement and understanding may be achieved, to therefore enhance future generations of the source code 104.
On the other hand, if only a local machine, such as the computing device 102 (e.g., a user device), was used to execute profiling of the source code 104, the programmer may lack the ability to determine whether the local machine and a deploy-machine will have the same assembly, and therefore lack the ability to identify hotspots and/or bugs on the deploy-machine. Thus, the programmer may not be able to fine tune the source code 104 for the deploy-machine. Some of the embodiments described herein remedy this deficiency by conducting hotspot analysis on multiple micro-architectures that resemble possible deploy-machines.
Further, the source code analysis orchestrator 106, the first platform 108 and the second platform 118 may be considered remote to the computing device 102, and/or constitute a FaaS cloud environment that may implement the source code 104 at a deploy time. The deploy time may be when the developer has finalized the source code 104 (debugged) and the source code 104 is to execute live and utilized in real-world services. Since the same FaaS cloud environment that will deploy the source code 104 is used to profile the source code 104, the developer may have an accurate understanding of the latencies of the source code 104 during deployment.
The source code analysis orchestrator 106 may coordinate the analysis of the source code 104 through one or more function calls to the first platform 108 and the second platform 118. For example, the source code analysis orchestrator 106 may determine that a first platform 108 and a second platform 118 may compile and execute the source code 104. The first platform 108 and the second platform 118 may include different microarchitectures (e.g., central processing units, host processors, microcontroller, etc.) from each other to generate different implementations of the source code 104. For example, a compiler 128 may compile the source code 104 into a first machine code on the first platform 108 and generate a symbolic database 116. The first machine code may be conceptually equivalent to a first assembly language code (be equivalents). The symbolic database 116 may be in a Portable PDB (Program Database) format that describes an encoding of debugging information produced by compiler 128, and is consumed by debuggers or profiler tools such as the profiler 110. For example, the symbolic database 116 may be an auxiliary file produced by the compiler 128 to provide other tools, particularly the profiler 110, information about what is in the first assembly language code and how the first assembly language code was produced. The symbolic database 116 may be operating system agnostic (may have cross-platform compatibility to be generated on Windows and Linux).
The source code to disassembly mapper 114 may map a high level-language code to a low-level language code. In detail, the source code 104 (a high-level language code) may be mapped to a disassembly level code (a low-level language code). For example, the source code to disassembly mapper 114 may read the symbolic database 116 to map a particular line of the source code 104 to a correct location in the first assembly language code so that a latency measurer 112 of the profiler 110 may set a breakpoint when measuring the latency of the source code 104. For example, the source code to disassembly mapper 114 may disassemble the first assembly language code into an intermediate language, and link the source code 104 to the first assembly language code through the intermediate language. In some embodiments, the compiler 128 may compile the source code 104 into the bytecode or Intermediate Language (IL), which is then compiled into the first assembly language code that represents a first machine language code. In such embodiments, the source code to disassembly mapper 114 may interpret the symbolic database 116 to map the source code 104 to the bytecode (or Intermediate Language) and map the bytecode to the first assembly language code. Therefore, each line of the first machine code (first assembly language code) may be mapped to corresponding lines of the source code 104. As described, the first assembly language code or the first machine code may be low-level language codes.
The profiler 110 of the first platform 108 may profile the source code 104. For example, the latency measurer 112 may measure the execution (latency measurements of the execution) of the first assembly language code, and associate the measurements with the source code 104. For example, the latency measurer 112 may measure a latency of execution of each portion of the first assembly language code. The latency measurer 112 may then determine a latency of the corresponding lines of the source code 104 and first assembly language code.
The profiler 110 may combine the information from the latency measurer 112 and the source to code disassembly mapper 114 to generate latency measurements in relation to the source code 104. For example, the profiler 110 may generate a latency data structure that includes each line of the source code 104 and a latency of the line. As one example, a particular line of the source code 104 may be mapped to several lines of the first assembly language code. The particular line of the source code 104 may have a total latency that is a summation of each latency of the several lines of the first assembly language code, and the particular line of the source code 104 may therefore be associated with the total latency.
Similarly, the second platform 118 may generate a second assembly language code (second machine code) and a symbolic database 126. The second assembly language code may be different from the first assembly language code. In further detail, the generated first and second assembly codes may be particular to the underlying microarchitectures of the first and second platforms 108, 118.
Similarly, the second platform 118 may include a profiler 120 that includes a latency measurer 122 and source code to disassembly mapper 124. For the sake of brevity, a detailed description of the compiler 130, the profiler 120, the latency measurer 122 and the source code to disassembly mapper 124 will be omitted. It is worth noting however that the compiler 130, profiler 120, the latency measurer 122 and the source code to disassembly mapper 124 operate and are configured similarly to the compiler 128, profiler 110, latency measurer 112 and source code to disassembly mapper 114 described above.
The latency measurer 122 may measure the execution of the second assembly language code. As noted above, the second assembly language code may be different from the first assembly language code. Moreover, the second assembly language code may operate on a different architecture than the first assembly language code. Thus, the latency measurements of the latency measurer 122 may differ from the latency measurements of the latency measurer 112, even for a same line of the source code 104. As described above, the profiler 120 may combine the information from the latency measurer 122 and the source to code disassembly mapper 124 to generate latency measurements in relation to the source code 104. For example, the profiler 120 may generate a latency data structure that includes each line of the source code 104 and a latency of the line.
The profilers 110, 120 may provide the latency measurements to the source code analysis orchestrator 106. For example the profilers 110, 120 may provide the source code 104 and the latencies of each line of the source code 104 (e.g., the latency data structures) to the source code analysis orchestrator 106. The source code analysis orchestrator 106 may in turn average the latency measurements of the profilers. For example, for each line of the source code 104, the source code analysis orchestrator 106 may average the latency measured by the profiler 110 for that line, and the latency measured by the profiler 120 for that line. The source code analysis orchestrator 106 may in turn present the received information, the latencies and hotspots to the computing device 102 for display.
Thus, the user of a computing device 102 may receive dynamic hotspot and latency information indicating the execution of source code 104 across the first and second platforms 108, 118. Such an implementation may allow a more comprehensive and accurate overview of the source code 104, which in turn allows a user to modify the source code 104. In some embodiments, the source code analysis orchestrator 106 may allow provide suggestions to the user, such as modifying specific lines of code or automatically revising some portions of the source code 104 that are identified as being hotspots.
For example,
The graphical user interface 140 may include a source code mapping 142. In the source code mapping 142, a portion of the source code 104, a latency for the first platform 108, a latency the second platform 118 and an average latency of the first platform 108 and the second platform 118 are presented. As illustrated, each line of the source code corresponds to several latencies. Thus, a developer may quickly ascertain an overall performance through the average latency, as well as platform (i.e., architecture) specific latencies for the first and second platforms 108, 118. For example, line 19 (i.e., foreach (var item)), has a latency on the first platform 108 of 835.26 ms, a latency of 698.26 ms on the second platform 118 and an average latency of 766.76 ms. Thus, the developer may quickly ascertain that line 19 is a hotspot, and modify the source code 104 if needed. It is worth noting that the same line of code 19 generates different latencies of the first and second platforms 108, 118 due to the differing underlying architectures.
The graphical user interface 140 further includes an assembly code mapping 144 of the source code 104 that may be used by the source code analysis orchestrator 106 to derive the latencies presented by the source code mapping 142. The assembly code mapping 144 may include assembly code of the first assembly language code of the first platform 108, an address of the first assembly language code, corresponding source code lines, and latencies. In detail, each line of the first assembly code may be associated with a different address and correspond to a line of the source code. For example, source code line 19 corresponds to assembly code “mov ebi, dword” and “mov, dword ptr [r].” The assembly code “mov ebi, dword” has a latency of 373.26, and the assembly code “mov, dword ptr [r]” has a latency of 462 ms. Thus, source code line 19 (i.e., “foreach (var item)”) has a total latency of 835.26 on the first platform 108, which is the summation of the latencies of each line of the assembly code that corresponds to the source code line 19. An assembly code line may be deemed to correspond to the source code line when the assembly code line implements the source code line. For example, a compiler may compile the source code line 19 to assembly code “mov ebi, dword” and “mov, dword ptr [r]” to represent the source code line 19 in assembly language. Thus, the latencies for the first platform 108 presented by the source code mapping 142 may be derived from the assembly code mapping 144.
The graphical user interface 140 further includes an assembly code mapping 146 of the source code 104 of the second platform 118, an address of the second assembly language code generated by the second platform 118, corresponding source code lines, and latencies. Similarly to as above, the latencies for the second platform 118 presented by the source code mapping 142 may be derived by the source code analysis orchestrator 106 from the assembly code mapping 146.
The graphical user interface 140 may further graphically link different lines of code. For example, if the user selects source code line 19 in one or more of the source code mapping 142, the assembly code mapping 144 or the assembly code mapping 146, each latency, assembly code, address and line corresponding to source code line 19 may be highlighted. In some embodiments, the graphical user interface 140 may automatically place a graphical emphasis (e.g., highlighting) on lines that include hotspots, such as each line that corresponds source code line 19.
Thus, the graphical user interface 140 may present the latencies of the first and second platforms 108, 118 in relation to the source code 104. By doing so, an enhanced and global platform perspective (e.g., across different architectures) may be provided to the user. Moreover, enhancing the source code 104 may be less cumbersome and time consuming since a user may quickly understand which lines of code are presenting the highest latencies and on which architectures. The graphical user interface 140 may present more lines of the source code 104 depending on the nature of the user's preferences and display screen size.
Turning back to
In some embodiments, the first platform 108 and/or second platform 118 may build the source code 104. In such embodiments, the computing device 102 may be omitted. In some embodiments, the first and second platforms 108, 118 are collocated at a same node, and in some embodiments, the first and second platforms 108, 118 are located at different nodes (e.g., servers, mobile devices, tablets) from each other.
Thus, the enhanced architecture 100 may leverage the convenience of FaaS (Function as a Service) and several performance tools/architectures as well as a debug format information, such as symbolic databases 116, 126, to offer application developers the ability to tune applications for various architectures. That is, the architecture 100 may implement a universal FaaS based tuning solution. Combined with performance tools and FaaS mechanisms, the architecture 100 may leverage the symbolic databases 116, 126 (which may be traditional PDB, Portable PDB or another debug information format) to quickly achieve application source code to disassembly level mapping and offer performance tuning tips to the application developers to tune their code. For example, for .NET Core developers, the architecture 100 may generate performance tuning tips for C# code.
For example, computer program code to carry out operations shown in the method 300 may be written in any combination of one or more programming languages, including an object oriented programming language such as JAVA, SMALLTALK, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. Additionally, logic instructions might include assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, state-setting data, configuration data for integrated circuitry, state information that personalizes electronic circuitry and/or other structural components that are native to hardware (e.g., host processor, central processing unit/CPU, microcontroller, etc.).
Illustrated processing block 302 may identify a trigger to analyze source code. The trigger may be a request from a user, an identification of the source code as having a particular importance, if the source code is saved is modified and saved in an IDE, a button actuation in an IDE, an extension/addon to the IDEs which may send the source code/dll to a database on a cloud for analysis, a “new push to the code base” command (e.g., in Github or source control tools that includes extensions to support the command) that sends the source code to the cloud, the event can also be “new dll and pdbs are generated.”
Illustrated processing block 304 may generate a first low-level language code (e.g., a first assembly code) that corresponds to the source code. For example, illustrated processing block 304 may compile the source code into an intermediate language code, and then generate the first assembly code. In some embodiments, the compiler may compile the source code into the first assembly code. In the embodiments, the compiler may generate a symbolic database that indicates that relationship between the source code and the first assembly code.
Illustrated processing block 306 may profile the first low-level language code to identify a first latency of a first portion of the first low-level language code. The profiling may include executing the first low-level language code and timing a total execution of each portion of the first low-level language code until completion.
Illustrated processing block 308 may map the first latency to a source portion of the source code. For example, block 308 may reference the symbolic database to determine that the portion of the first low-level language code is an assembly code representation of the source portion. Illustrated processing block 310 may associate the first low-level language code with the source portion based on the first portion being mapped to the source portion.
In the illustrated example, block 402 may detect a source code analysis event (e.g., a trigger to analyze source code). Illustrated processing block 404 may detect whether the source code is compatible with more than one microarchitecture. For example, the source code may be configured to operate on any operating system or computing device. Some source codes may be configured to work on any version of certain mobile phone architectures, and some source codes may be designed for various cloud based architectures. This, block 404 may identify compatible mircoarchitectures. If the source code is compatible with more than one microarchitecture, illustrated processing block 406 may one or more nodes (compute nodes) that have the compatible microarchitectures. Illustrated processing block 408 may obtain user permission to execute on one more of the compatible microarchitectures. In detail, FaaS may include a granular billing system in which a user may be charged per function and/or time. Thus, a user may be consulted to ensure that the user agrees to test the source code across the different compatible microarchitectures. The permission may be set ahead of time (e.g. a blanket permission to execute on any compatible microarchitecture), and/or a user may be queried when the compatible microarchitectures are identified to obtain the permission.
Illustrated processing block 410 may execute a profiler analysis of the source code on each of the one or more microarchitectures that block 408 has obtained permission to execute upon. For example, a first latency may be identified for the source code at a first microarchitecture, and a second latency may be identified for the source code at a second microarchitecture. Illustrated processing block 412 may provide the profiler analysis (e.g., hotspot identification) to the user.
If in processing block 404 the source code is compatible with only one microarchitecture, processing block 414 may execute a profiler analysis on the compatible microarchitecture. Illustrated processing block 416 may provide the profiler analysis (e.g., hotspot identification) to the user.
In the method 500, various processing blocks may be executed by a user device and a cloud (e.g., a FaaS architecture). In
In illustrated processing block 502, a user device may trigger a profiling event for an application. Illustrated processing block 504 may be executed by the cloud, and detect the profiling event. Illustrated processing block 506 may trigger functions in response to the detected profiling event. The functions may be profile functions, as described below with respect to illustrated processing block 508.
Processing block 508 may profile the application (e.g., the source code of the application) with functions that invoke performance tools on different microarchitectures. For example, the microarchitectures may be different generations of processors, and/or types of processors. Illustrated processing block 510 may use symbolic resolution to get source-assembly mapping. For example, the source code may be related to the results (latency measurements) of the performance tools through the source-assembly mapping.
Processing block 512 displays performance tuning tips to a programmer based on the source-assembly mapping of the results. Thus, the programmer may identify hotspots in the application and adjust the source code.
Turning now to
The illustrated system 158 also includes a graphics processor 168 (e.g., graphics processing unit/GPU) and an input output (IO) module 166 implemented together with the processor 160 (e.g., as microcontrollers) on a semiconductor die 170 as a system on chip (SOC), where the IO module 166 may communicate with, for example, a display 172 (e.g., touch screen, liquid crystal display/LCD, light emitting diode/LED display), an input peripheral 156 (e.g., mouse, keyboard, microphone), a network controller 174 (e.g., wired and/or wireless), and mass storage 176 (e.g., HDD, optical disc, SSD, flash memory or other NVM).
A user may provide a source code to the computing system through the network controller 174. In some embodiments, the source code may be provided to the SOC 170 through the input peripheral 156. The SOC 170 may implement instructions stored on, for example, the NVM 176 and/or system memory 164. For example, the host processor 160 may implement instruction stored on the system memory 164 to profile the source code in response to a detected trigger. For example, if the source code is saved, the host processor 160 may automatically profile the source code to determine profiling results such as hotspots of the source code, and link the hotspots to the source code. In some embodiments, the host processor 160 may include two different cores P0, P1. The cores P0, P1 may be heterogeneous (e.g., different generations or types of cores) from each other. The host processor 160 may separately profile the source code on each of the cores P0, P1 to determine hotspots for each respect one of the cores P0, P1. For example, the host processor P0 may profile the source code on core P0 to identify latencies, and separately profile the source code on core P1 to identify latencies. Thus, the host processor 160 may identify whether hotspots are generated by core P0, and/or core P1, and may further average the latencies.
In some embodiments, the host processor 160 may send an instruction through the network controller 174 to a second compute node (e.g., another computing system) to profile the source code. The another compute node may include a SOC that is similar to the SOC 170 above, and a description is omitted for brevity. The SOC 170 may receive a result of the profiling from the second compute node and combine the profiling results to obtain data across different microarchitectures and operating systems. The profiling results may be displayed on the display 172 and/or transmitted to the user through the network controller 174.
The processor core 200 is shown including execution logic 250 having a set of execution units 255-1 through 255-N. Some embodiments may include a number of execution units dedicated to specific functions or sets of functions. Other embodiments may include only one execution unit or one execution unit that can perform a particular function. The illustrated execution logic 250 performs the operations specified by code instructions.
After completion of execution of the operations specified by the code instructions, back end logic 260 retires the instructions of the code 213. In one embodiment, the processor core 200 allows out of order execution but requires in order retirement of instructions. Retirement logic 265 may take a variety of forms as known to those of skill in the art (e.g., re-order buffers or the like). In this manner, the processor core 200 is transformed during execution of the code 213, at least in terms of the output generated by the decoder, the hardware registers and tables utilized by the register renaming logic 225, and any registers (not shown) modified by the execution logic 250.
Although not illustrated in
Referring now to
The system 1000 is illustrated as a point-to-point interconnect system, wherein the first processing element 1070 and the second processing element 1080 are coupled via a point-to-point interconnect 1050. It should be understood that any or all of the interconnects illustrated in
As shown in
Each processing element 1070, 1080 may include at least one shared cache 1896a, 1896b. The shared cache 1896a, 1896b may store data (e.g., instructions) that are utilized by one or more components of the processor, such as the cores 1074a, 1074b and 1084a, 1084b, respectively. For example, the shared cache 1896a, 1896b may locally cache data stored in a memory 1032, 1034 for faster access by components of the processor. In one or more embodiments, the shared cache 1896a, 1896b may include one or more mid-level caches, such as level 2 (L2), level 3 (L3), level 4 (L4), or other levels of cache, a last level cache (LLC), and/or combinations thereof.
While shown with only two processing elements 1070, 1080, it is to be understood that the scope of the embodiments are not so limited. In other embodiments, one or more additional processing elements may be present in a given processor. Alternatively, one or more of processing elements 1070, 1080 may be an element other than a processor, such as an accelerator or a field programmable gate array. For example, additional processing element(s) may include additional processors(s) that are the same as a first processor 1070, additional processor(s) that are heterogeneous or asymmetric to processor a first processor 1070, accelerators (such as, e.g., graphics accelerators or digital signal processing (DSP) units), field programmable gate arrays, or any other processing element. There can be a variety of differences between the processing elements 1070, 1080 in terms of a spectrum of metrics of merit including architectural, micro architectural, thermal, power consumption characteristics, and the like. These differences may effectively manifest themselves as asymmetry and heterogeneity amongst the processing elements 1070, 1080. For at least one embodiment, the various processing elements 1070, 1080 may reside in the same die package.
The first processing element 1070 may further include memory controller logic (MC) 1072 and point-to-point (P-P) interfaces 1076 and 1078. Similarly, the second processing element 1080 may include a MC 1082 and P-P interfaces 1086 and 1088. As shown in
The first processing element 1070 and the second processing element 1080 may be coupled to an I/O subsystem 1090 via P-P interconnects 10761086, respectively. As shown in
In turn, I/O subsystem 1090 may be coupled to a first bus 1016 via an interface 1096. In one embodiment, the first bus 1016 may be a Peripheral Component Interconnect (PCI) bus, or a bus such as a PCI Express bus or another third generation I/O interconnect bus, although the scope of the embodiments are not so limited.
As shown in
Note that other embodiments are contemplated. For example, instead of the point-to-point architecture of
Example 1 may include one or more compute nodes including a network controller, a first microarchitecture, one or more host processors, and one or more memories coupled to the one or more host processors, the one or more memories including executable program instructions, which when executed by the one or more host processors, cause the one or more compute nodes to profile a first low-level language code to identify a first latency of a first portion of the first low-level language code that is to execute on the first microarchitecture, map the first portion to a source portion of a source code based on an identification that the first portion is a low-level language code representation of the source portion, wherein the source code is a high-level language code, and associate the first latency with the source portion based on the first portion being mapped to the source portion
Example 2 may include the one or more compute nodes of example 1, wherein the one or more compute nodes includes a second microarchitecture different from the first microarchitecture, and the executable program instructions, when executed by the one or more host processors, cause the one or more compute nodes to generate a second low-level language code, wherein the second low-level language code is configured to execute on the second microarchitecture and is a low-level language code representation of the source code.
Example 3 may include the one or more compute nodes of example 2, wherein the executable program instructions, when executed by the one or more host processors, cause the one or more compute nodes to profile the second low-level language code to identify a second latency of a second portion of the second low-level language code, map the second portion to the source portion based on an identification that the second portion is a low-level language code representation of the source portion, and associate the second latency with the source portion based on the second portion being mapped to the source portion.
Example 4 may include the one or more compute nodes of example 3, wherein the executable program instructions, when executed by the one or more host processors, cause the one or more compute nodes to execute the first low-level language code on the first microarchitecture, time the execution of the first low-level language code on the first microarchitecture to identify the first latency, execute the second low-level language code on the second microarchitecture, and time the execution of the second low-level language code on the second microarchitecture to identify the second latency.
Example 5 may include the one or more compute nodes of example 3, wherein the executable program instructions, when executed by the one or more host processors, cause the one or more compute nodes to average the first latency and the second latency to generate an average latency.
Example 6 may include the one or more compute nodes of example 5, wherein the executable program instructions, when executed by the one or more host processors, cause the one or more compute nodes to instruct, with the network controller, a user device to display one or more of the first latency, the second latency or the average latency, instruct, with the network controller, the user device to display the source portion, and instruct, with the network controller, the user device to display a graphical link that indicates an association between the displayed source portion and the displayed one or more of the first latency, the second latency or the average latency.
Example 7 may include a semiconductor apparatus including one or more substrates, and logic coupled to the one or more substrates, wherein the logic is implemented in one or more of configurable logic or fixed-functionality logic hardware, the logic coupled to the one or more substrates to profile a first low-level language code to identify a first latency of a first portion of the first low-level language code, map the first portion to a source portion of a source code based on an identification that the first portion is a low-level language code representation of the source portion, wherein the source code is a high-level language code, and associate the first latency with the source portion based on the first portion being mapped to the source portion
Example 8 may include the apparatus of example 7, wherein the first low-level language code is configured to execute on a first microarchitecture, and the logic coupled to the one or more substrates is to generate a second low-level language code, wherein the second low-level language code is configured to execute on a second microarchitecture different from the first microarchitecture and is a low-level language code representation of the source code.
Example 9 may include the apparatus of example 8, wherein the logic coupled to the one or more substrates is to profile the second low-level language code to identify a second latency of a second portion of the second low-level language code, map the second portion to the source portion based on an identification that the second portion is a low-level language code representation of the source portion, and associate the second latency with the source portion based on the second portion being mapped to the source portion.
Example 10 may include the apparatus of example 9, wherein the logic coupled to the one or more substrates is to execute the first low-level language code on the first microarchitecture, time the execution of the first low-level language code on the first microarchitecture to identify the first latency, execute the second low-level language code on the second microarchitecture, and time the execution of the second low-level language code on the second microarchitecture to identify the second latency.
Example 11 may include the apparatus of example 9, wherein the logic is to average the first latency and the second latency to generate an average latency.
Example 12 may include the apparatus of example 11, wherein the logic is to instruct a user device to display one or more of the first latency, the second latency or the average latency, instruct the user device to display the source portion, and instruct the user device to display a graphical link indicating an association between the displayed source portion and the displayed one or more of the first latency, the second latency or the average latency.
Example 13 may include the apparatus of example 7, wherein the logic coupled to the one or more substrates includes transistor channel regions that are positioned within the one or more substrates.
Example 14 may include at least one computer readable storage medium including a set of instructions, which when executed by one or more compute nodes, cause the one or more compute nodes to profile a first low-level language code to identify a first latency of a first portion of the first low-level language code, map the first portion to a source portion of a source code based on an identification that the first portion is a low-level language code representation of the source portion, wherein the source code is a high-level language code, and associate the first latency with the source portion based on the first portion being mapped to the source portion.
Example 15 may include the at least one computer readable storage medium of example 14, wherein the first low-level language code is configured to execute on a first microarchitecture, and wherein the instructions, when executed, cause the one or more compute nodes to generate a second low-level language code, wherein the second low-level language code is configured to execute on a second microarchitecture different from the first microarchitecture and is a low-level language code representation of the source code.
Example 16 may include the at least one computer readable storage medium of example 15, wherein the instructions, when executed, cause the one or more compute nodes to profile the second low-level language code to identify a second latency of a second portion of the second low-level language code, map the second portion to the source portion based on an identification that the second portion is a low-level language code representation of the source portion, and associate the second latency with the source portion based on the second portion being mapped to the source portion.
Example 17 may include the at least one computer readable storage medium of example 16, wherein the instructions, when executed, cause the one or more compute nodes to execute the first low-level language code on the first microarchitecture, time the execution of the first low-level language code on the first microarchitecture to identify the first latency, execute the second low-level language code on the second microarchitecture, and time the execution of the second low-level language code on the second microarchitecture to identify the second latency.
Example 18 may include the at least one computer readable storage medium of example 16, wherein the instructions, when executed, cause the one or more compute nodes to average the first latency and the second latency to generate an average latency.
Example 19 may include the at least one computer readable storage medium of example 18, wherein the instructions, when executed, cause the one or more compute nodes to instruct a user device to display one or more of the first latency, the second latency or the average latency, instruct the user device to display the source portion, and instruct the user device to display a graphical link indicating an association between the displayed source portion and the displayed one or more of the first latency, the second latency or the average latency.
Example 20 may include a method including profiling a first low-level language code to identify a first latency of a first portion of the first low-level language code, mapping the first portion to a source portion of a source code based on an identification that the first portion is a low-level language code representation of the source portion, wherein the source code is a high-level language code, and associating the first latency with the source portion based on the mapping.
Example 21 may include the method of example 20, wherein the first low-level language code is configured to execute on a first microarchitecture, and the method further includes generating a second low-level language code, wherein the second low-level language code is configured to execute on a second microarchitecture different from the first microarchitecture and is a low-level language code representation of the source code.
Example 22 may include the method of example 21, further including profiling the second low-level language code to identify a second latency of a second portion of the second low-level language code, mapping the second portion to the source portion based on an identification that the second portion is a low-level language code representation of the source portion, and associating the second latency with the source portion based on the second portion being mapped to the source portion.
Example 23 may include the method of example 22, wherein profiling the first low-level language code includes executing the first low-level language code on the first microarchitecture, and timing the execution of the first low-level language code on the first microarchitecture to identify the first latency, and profiling the second low-level language code includes executing the second low-level language code on the second microarchitecture, and timing the execution of the second low-level language code on the second microarchitecture to identify the second latency.
Example 24 may include the method of example 22, further including averaging the first latency and the second latency to generate an average latency.
Example 25 may include the method of example 24, further including instructing a user device to display one or more of the first latency, the second latency or the average latency, instructing the user device to display the source portion, and instructing the user device to display a graphical link indicating an association between the displayed source portion and the displayed one or more of the first latency, the second latency or the average latency.
Example 26 may include a semiconductor apparatus including means for profiling a first low-level language code to identify a first latency of a first portion of the first low-level language code, means for mapping the first portion to a source portion of a source code based on an identification that the first portion is a low-level language code representation of the source portion, wherein the source code is a high-level language code, and means for associating the first latency with the source portion based on the mapping.
Example 27 may include the semiconductor apparatus of example 26, wherein the first low-level language code is to be configured to execute on a first microarchitecture, and the method further includes means for generating a second low-level language code, wherein the second low-level language code is to be configured to execute on a second microarchitecture different from the first microarchitecture and is a low-level language code representation of the source code.
Example 28 may include the semiconductor apparatus of example 27, further including means for profiling the second low-level language code to identify a second latency of a second portion of the second low-level language code, means for mapping the second portion to the source portion based on an identification that the second portion is a low-level language code representation of the source portion, and means for associating the second latency with the source portion based on the second portion being mapped to the source portion.
Example 29 may include the semiconductor apparatus of example 28, wherein the means for profiling the first low-level language code includes means for executing the first low-level language code on the first microarchitecture, and means for timing the execution of the first low-level language code on the first microarchitecture to identify the first latency, and the means for profiling the second low-level language code includes means for executing the second low-level language code on the second microarchitecture, and means for timing the execution of the second low-level language code on the second microarchitecture to identify the second latency.
Example 30 may include the semiconductor apparatus of example 28, further including means for averaging the first latency and the second latency to generate an average latency.
Example 31 may include the semiconductor apparatus of example 30, further including means for instructing a user device to display one or more of the first latency, the second latency or the average latency, means for instructing the user device to display the source portion, and means for instructing the user device to display a graphical link indicating an association between the displayed source portion and the displayed one or more of the first latency, the second latency or the average latency.
Thus, technology described herein may support source code analysis that previously was not enabled. For example, the technology may allow for enhanced mapping of source code to assembly code to identify hotspots of the source code. Moreover, the technology may allow the source code to be analyzed across a series of different platforms with different microarchitectures to generate a more accurate latency analysis on a global implementation of the source code.
Embodiments are applicable for use with all types of semiconductor integrated circuit (“IC”) chips. Examples of these IC chips include but are not limited to processors, controllers, chipset components, programmable logic arrays (PLAs), memory chips, network chips, systems on chip (SOCs), SSD/NAND controller ASICs, and the like. In addition, in some of the drawings, signal conductor lines are represented with lines. Some may be different, to indicate more constituent signal paths, have a number label, to indicate a number of constituent signal paths, and/or have arrows at one or more ends, to indicate primary information flow direction. This, however, should not be construed in a limiting manner. Rather, such added detail may be used in connection with one or more exemplary embodiments to facilitate easier understanding of a circuit. Any represented signal lines, whether or not having additional information, may actually comprise one or more signals that may travel in multiple directions and may be implemented with any suitable type of signal scheme, e.g., digital or analog lines implemented with differential pairs, optical fiber lines, and/or single-ended lines.
Example sizes/models/values/ranges may have been given, although embodiments are not limited to the same. As manufacturing techniques (e.g., photolithography) mature over time, it is expected that devices of smaller size could be manufactured. In addition, well known power/ground connections to IC chips and other components may or may not be shown within the figures, for simplicity of illustration and discussion, and so as not to obscure certain aspects of the embodiments. Further, arrangements may be shown in block diagram form in order to avoid obscuring embodiments, and also in view of the fact that specifics with respect to implementation of such block diagram arrangements are highly dependent upon the computing system within which the embodiment is to be implemented, i.e., such specifics should be well within purview of one skilled in the art. Where specific details (e.g., circuits) are set forth in order to describe example embodiments, it should be apparent to one skilled in the art that embodiments can be practiced without, or with variation of, these specific details. The description is thus to be regarded as illustrative instead of limiting.
The term “coupled” may be used herein to refer to any type of relationship, direct or indirect, between the components in question, and may apply to electrical, mechanical, fluid, optical, electromagnetic, electromechanical or other connections. In addition, the terms “first”, “second”, etc. may be used herein only to facilitate discussion, and carry no particular temporal or chronological significance unless otherwise indicated.
As used in this application and in the claims, a list of items joined by the term “one or more of” may mean any combination of the listed terms. For example, the phrases “one or more of A, B or C” may mean A; B; C; A and B; A and C; B and C; or A, B and C.
Those skilled in the art will appreciate from the foregoing description that the broad techniques of the embodiments can be implemented in a variety of forms. Therefore, while the embodiments have been described in connection with particular examples thereof, the true scope of the embodiments should not be so limited since other modifications will become apparent to the skilled practitioner upon a study of the drawings, specification, and following claims.