Many computing devices, including personal computers, servers, and portable computing devices (e.g., mobile phones, table computers, portable game consoles, navigation devices, wearable devices, and other battery-powered devices) include a system on chip (SoC). A design approach typically used in developing these types of devices is to develop integrated circuits to support various functions. The desired operational performance, usability and market success of a computing device may be directly determined by the software that is developed to run on the programmable subsystems of the computing device.
System trace and debugging systems have been developed to expose various characteristics of the operation of the system to the software and hardware development teams. Such embedded trace systems include a number of SoC peripherals such as cells or circuit modules for processing and buffering trace data. Some of these SoC peripherals insert a timestamp into a trace data stream. These conventional time stamping techniques have included the insertion of a timestamp in a native trace packet layer that corresponds to the trace protocol being used by the trace source. A native trace packet layer includes a set of packet-based protocols for tracing the operation of various hardware cores. Each packet-based trace protocol is able to differentiate between trace sources, recognize instructions, arguments, timestamps, and other performance monitoring data.
Many server and higher-end mobile SoCs have a strong need for cheap cycle accurate logic analyzer style cross triggering. “Logic analyzer style” triggering refers to triggering capabilities that enable logic analyzers to observe real-time behavior of a number of SoC pins connected to the logic analyzer. Many existing SoCs have limited capabilities to bring all of the tens to hundreds of thousands of internal signals of interest to primary pins for external logic analyzer observation. Trace capture memory typically exists in one or more points in the SoC. The logic analyzer trigger functionality may comprise a hardware block within the SoC that is connected to a number of internal signals that are suspected to be useful to generating relevant triggering scenarios that may be used for trace capture or other system behavior (e.g., causing an interrupt to a system CPU).
An existing method for performing cycle accurate triggering distributed across the SoC involves including a number of triggering hardware blocks (e.g., trigger generation units (TGUs)) at each point where there are bundles of signals of interest to triggering scenarios. This distributed approach has significant disadvantages. For example, the triggering hardware blocks can be area intensive because they must provide storage for a mini triggering “program”.
Accordingly, there is a need for improved systems and methods for distributing and replaying trigger packets via a variable latency bus interconnect.
Systems and methods are disclosed for distributing and replaying trigger packets via a variable latency bus interconnect in a trace system. An embodiment of such a method comprises generating a plurality of trigger packets from a plurality of trigger sources on a system on chip. Each trigger packet defines a corresponding event and a corresponding system-generated timestamp. The plurality of trigger packets are distributed from the corresponding trigger sources to a centralized logic analyzer via a variable latency bus interconnect. The received triggered packets are re-ordered according to the corresponding system-generated timestamps into an order in which the corresponding events occurred. The received trigger packets are replayed in the order in which the corresponding events occurred.
An embodiment of a system comprises a plurality of trigger sources a variable latency bus interconnect, and a centralized logic analyzer. The plurality of trigger sources for generating a plurality of trigger packets, each trigger packet defining a corresponding event and a corresponding system-generated timestamp. The variable latency bus interconnect is electrically coupled to the plurality of trigger sources for distributing the plurality of trigger packets to a centralized logic analyzer. The centralized logic analyzer is configured to: receive the trigger packets via the variable latency bus interconnect; re-order the received trigger packets according to the corresponding system-generated timestamps into an order in which the corresponding events occurred; and replay the received trigger packets in the order in which the corresponding events occurred.
In the Figures, like reference numerals refer to like parts throughout the various views unless otherwise indicated. For reference numerals with letter character designations such as “102A” or “102B”, the letter character designations may differentiate two like parts or elements present in the same Figure. Letter character designations for reference numerals may be omitted when it is intended that a reference numeral to encompass all parts having the same reference numeral in all Figures.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects.
In this description, the term “application” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches. In addition, an “application” referred to herein, may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
The term “content” may also include files having executable content, such as: object code, scripts, byte code, markup language files, and patches. In addition, “content” referred to herein, may also include files that are not executable in nature, such as documents that may need to be opened or other data files that need to be accessed.
As used in this description, the terms “component,” “database,” “module,” “system,” and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, software, or software in execution. For example, a component may be, but is not limited to being, a process running on a processor, a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of illustration, both an application running on a computing device and the computing device may be a component. One or more components may reside within a process and/or thread of execution, and a component may be localized on one computer and/or distributed between two or more computers. In addition, these components may execute from various computer readable media having various data structures stored thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the Internet with other systems by way of the signal).
In this description, the terms “communication device,” “wireless device,” “wireless telephone”, “wireless communication device,” and “wireless handset” are used interchangeably. With the advent of third generation (“3G”) wireless technology and four generation (“4G”), greater bandwidth availability has enabled more portable computing devices with a greater variety of wireless capabilities. Therefore, a portable computing device may include a cellular telephone, a pager, a PDA, a smartphone, a navigation device, or a hand-held computer with a wireless connection or link.
Trigger source B 104 comprises a system cache, which may comprise a multi-level system cache. Trigger source C 106 comprises another subsystem on the SoC. The system 100 may comprise any number (N) of integrated elements or subsystems on the SoC that comprise the trigger sources.
As further illustrated in the embodiment of
The timestamp generators 112, 120, and 128 may augment or function in conjunction with the trigger collector generators 114, 122, and 130. As known in the art, the trigger collector generators 114, 122, and 130 may format trace data and generate trigger packets. As illustrated in
As further illustrated in
The re-order/replay unit 138 may provide a trigger input signal for each trace source to the logic analyzer hardware 148. For example, a trigger input 1 signal 140 may correspond to a first trace source. Trigger input 2 signal 142 may correspond to a second trace source. Trigger input 3 signal 144 may correspond to a third trace source. Trigger input N signal 146 may correspond to an Nth trace source. The logic analyzer hardware 148 receives the trigger input signals 140, 142, 144, and 146 and, in response, provides a capture trigger signal 152 to the trace capture block 150.
An exemplary embodiment of the operation and structure of the re-order/replay unit 138 is illustrated in
As illustrated in
To re-order the trigger packets based on the system-generated timestamps, the re-order/replay unit 138 may generate a trigger packet window timer 210 for each received trigger packet. Each trigger packet window timer 210 may have a duration equal to a maximum difference in source-to-receive latency of the trigger sources in the system 100. In the embodiment of
As described below in more detail, when a trigger packet is received, the re-order/replay unit 138 generates a trigger packet window timer 210 with a value equal to the number of clock cycles representing the maximum latency differential. When a trigger packet window timer 210 expires (i.e., the maximum number of clock cycles elapses), the re-order/replay unit 138 searches all other active trigger packet window timers 210. The timestamp values (TSVALactive) for each active trigger packet window timer 210 may be determined and compared to the timestamp values for the expired trigger packet window (TSVALexpired). If TSVALactive is less than or equal to TSVALexpired, the timestamp values are committed to a first-in-first-out (FIFO) structure with cycle information from the last replayed trigger packet. The re-order/replay unit 138 may empty the FIFO structure as follows. If a FIFO word is available and if the word is cycle info from the last replayed trigger packet, the re-order/replay unit 138 counts that many cycles and then proceeds to read the next entry in the FIFO structure if one is available.
In the embodiment, the FIFO may be implemented via a FIFO buffer comprising N entries, with each entry comprising a predetermined number of bits (e.g., M bits), yielding a total number of bits (e.g., N×M bits). The FIFO buffer may present, for example, a FULL flag out of a write interface to the re-order/replay unit 138. The FIFO buffer may present an EMPTY flag out of a read interface to logic replaying into logic analyzer hardware (e.g., trigger generation unit (TGU)). When the FIFO indicates not empty (e.g., EMPTY flag=0), this means one or more valid M-bit words of data are present in the FIFO. The replay unit may read a word when one is available (again via the EMPTY flag indication) and when it is finished replaying the last trigger to the logic analyzer hardware.
It should be appreciated that incoming trigger packets 118, 126, and 134 may comprise timestamp information and trigger number information. For example, consider an exemplary embodiment in which there are 128 distinct triggers scattered throughout the system 100. The centralized logic analyzer 108 may comprise 128 input trigger ports. The trigger number information may be encoded using, for example, a binary encoding of 7 bits. The 7 bits of trigger number information may be stripped out of the corresponding trigger packet and written into the FIFO as part of each entry's M-bits. This information may be read out of the FIFO by the re-order/replay unit 138 to determine which of the 128 trigger inputs to drive a pulse on. It should be appreciated that triggers may be levels of pulses and the trigger packets may carry that info.
Timeline 308 illustrates that the trigger packets T0-A, T1-A, and T2-A are delayed by 9 clock cycles through the variable latency bus interconnect 110. The re-order/replay unit 138 receives the trigger packets T0-A, T1-A, and T2-A at clock cycles T9, T10, and T11, respectively. In response to receiving trigger packet T0-A, a corresponding 8-cycle window timer (T0-A) is generated, which expires at the end of clock cycle T17. In response to receiving trigger packet T1-A, a corresponding 8-cycle window timer (T1-A) is generated, which expires at the end of clock cycle T18. In response to receiving trigger packet T2-A, a corresponding 8-cycle window timer (T2-A) is generated, which expires at the end of clock cycle T19.
Timeline 310 illustrates that the trigger packets T7-B, T8-B, T9-B, and T10-B are delayed by 1 clock cycle through the variable latency bus interconnect 110. The re-order/replay unit 138 receives the trigger packets T7-B, T8-B, T9-B, and T10-B at clock cycles T8, T9, T10, and T11, respectively. In response to receiving trigger packet T7-B, a corresponding 8-cycle window timer (T7-B) is generated, which expires at the end of clock cycle T16. In response to receiving trigger packet T8-B, a corresponding 8-cycle window timer (T8-B) is generated, which expires at the end of clock cycle T17. In response to receiving trigger packet T9-B, a corresponding 8-cycle window timer (T9-B) is generated, which expires at the end of clock cycle T18. In response to receiving trigger packet T10-B, a corresponding 8-cycle window timer (T10-B) is generated, which expires at the end of clock cycle T19.
Timeline 312 illustrates that the trigger packets T2-C, T7-C, T8-C, and T12-C are delayed by 4 clock cycles through the variable latency bus interconnect 110. The re-order/replay unit 138 receives the trigger packets T2-C, T7-C, T8-C, and T12-C at clock cycles T6, T11, T12, and T16, respectively. In response to receiving trigger packet T2-C, a corresponding 8-cycle window timer (T2-C) is generated, which expires at the end of clock cycle T14. In response to receiving trigger packet T7-C, a corresponding 8-cycle window timer (T7-C) is generated, which expires at the end of clock cycle T19. In response to receiving trigger packet T8-C, a corresponding 8-cycle window timer (T8-C) is generated, which expires at the end of clock cycle T20. In response to receiving trigger packet T12-C, a corresponding 8-cycle window timer (T12-C) is generated, which expires at the end of clock cycle T24.
Following the example in
The window slot machine 508 may be connected to a window expired decision module 526 for determining when the window timers 518 for active slots 510 have expired. The window expired decision module 526 may be connected to the slot machine manager 506. The window expired decision module 526 may provide to the slot machine manager 506 an index identifying slot(s) 510 having expired window timer(s) 518.
As mentioned above, when a window timer 518 expires, the re-order/replay unit 138 may determine all other active trigger packet window timers 518. The timestamp values (TSVALactive) for each active trigger packet window timer 518 may be determined and compared to the timestamp values for the expired trigger packet window (TSVALexpired). If TSVALactive is less than or equal to TSVALexpired, the timestamp values are committed to a first-in-first-out (FIFO) structure 530 with cycle information from the last replayed trigger packet. As illustrated in the embodiment of
As further illustrated in
As mentioned above, the system 100 may be incorporated into any desirable computing system.
A display controller 628 and a touch screen controller 630 may be coupled to the CPU 602. In turn, the touch screen display 625 external to the on-chip system 601 may be coupled to the display controller 616 and the touch screen controller 618.
Further, as shown in
As further illustrated in
It should be appreciated that one or more of the method steps described herein may be stored in the memory as computer program instructions, such as the modules described above. These instructions may be executed by any suitable processor in combination or in concert with the corresponding module to perform the methods described herein.
Certain steps in the processes or process flows described in this specification naturally precede others for the invention to function as described. However, the invention is not limited to the order of the steps described if such order or sequence does not alter the functionality of the invention. That is, it is recognized that some steps may performed before, after, or parallel (substantially simultaneously with) other steps without departing from the scope and spirit of the invention. In some instances, certain steps may be omitted or not performed without departing from the invention. Further, words such as “thereafter”, “then”, “next”, etc. are not intended to limit the order of the steps. These words are simply used to guide the reader through the description of the exemplary method.
Additionally, one of ordinary skill in programming is able to write computer code or identify appropriate hardware and/or circuits to implement the disclosed invention without difficulty based on the flow charts and associated description in this specification, for example.
Therefore, disclosure of a particular set of program code instructions or detailed hardware devices is not considered necessary for an adequate understanding of how to make and use the invention. The inventive functionality of the claimed computer implemented processes is explained in more detail in the above description and in conjunction with the Figures which may illustrate various process flows.
In one or more exemplary aspects, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted as one or more instructions or code on a computer-readable medium. Computer-readable media include both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another. A storage media may be any available media that may be accessed by a computer. By way of example, and not limitation, such computer-readable media may comprise RAM, ROM, EEPROM, NAND flash, NOR flash, M-RAM, P-RAM, R-RAM, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to carry or store desired program code in the form of instructions or data structures and that may be accessed by a computer.
Also, any connection is properly termed a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (“DSL”), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium.
Disk and disc, as used herein, includes compact disc (“CD”), laser disc, optical disc, digital versatile disc (“DVD”), floppy disk and blu-ray disc where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Alternative embodiments will become apparent to one of ordinary skill in the art to which the invention pertains without departing from its spirit and scope. Therefore, although selected aspects have been illustrated and described in detail, it will be understood that various substitutions and alterations may be made therein without departing from the spirit and scope of the present invention, as defined by the following claims.