In some systems, such as Advance Driver Assistance Systems (ADAS), multiple high bandwidth sensors may be used for computer vision techniques. These sensors may include, for example, cameras, radars, light detection and ranging (LIDAR) systems, and the like. Data collected from these sensors may be used, for example, for perception of surrounding areas, automated driving assistance, etc. The data may be processed through dedicated hardware accelerators (“HWAs”).
At times, it may be desirable to capture live data as it is transmitted within a system, thereby “tracing” the data through the various capture and processes. By capturing such trace data, a designer can test new processing techniques or replay data through the system. In addition, trace information may be desirable during playback, for example, to mimic actual sensor data. Capture and playback of sensor data may be useful, for example, to improve automated driving algorithms.
This disclosure relates to a system, e.g., a system-on-chip (SoC), that includes one or more processing resources, e.g., HWAs, an on-chip memory, and interfaces. The one or more HWAs are configured to receive sensor data from one or more sensors. The one or more HWAs are configured to transmit the sensor data without trace information to external memory via a first interface and to the on-chip memory with associated trace information added. The data with associated trace information is then transmitted to a second interface.
An example system includes a plurality of processing resources, each including a buffer, a scheduler, and a controller. The system further includes an event bus; a system bus; a memory coupled to each of the plurality of processing resources via the system bus; a first interface coupled to the controller of each of the plurality of processing resources; a direct memory access (DMA) controller coupled to the event bus; and a second interface coupled to the DMA controller. With this arrangement, the controller of each processing resource transmits data from the buffer of that processing resource to the first interface for storage. In the example in which the system is a system-on-chip (SoC) the storage may be an off-chip memory. The controller of each processing resource also transmits that data with associated trace information to the memory, which is on-chip in the SoC example. The controller may use different channels for these two transmissions. The DMA controller then transmits the data with the associated trace information from the on-chip memory to the second interface.
In another example, a method includes transmitting, by a first direct memory access (DMA) controller, data from a buffer of a processing resource to a first interface, in which the data is comprised of elements; transmitting, by the first DMA controller, the elements of the data, each with associated trace information, to a memory; transmitting, by the first DMA controller, a first signal to a second DMA controller over an event bus, wherein the first signal indicates completion of the transmission of the elements of the data, each with the associated trace information, to the memory; transmitting, in response to the first signal, by the second DMA controller, the elements of the data, each with the associated trace information, to a second interface; and transmitting, by the second DMA controller, a second signal to a scheduler for the buffer of the processing resource, wherein the second signal indicates completion of the transmission of the elements of the data, each with the associated trace information, to the second interface.
For a detailed description of various examples, reference will now be made to the accompanying drawings in which:
The same reference number is used in the drawings for the same or similar (either by function and/or structure) features.
The following description is directed to a technique for unobtrusively capturing trace data on a system-on-chip (SoC), according to one or more embodiments using existing SoC, without usage of trace resourcing of SoC for high bandwidth data. According to some embodiments, a high bandwidth data tracing utilizes system resources in a manner which affects the performance of the system. As such, a traced system acts differently than an untraced system. An unobtrusive capture of trace data uses a unique architecture such that performance of the system is not affected. In particular, in some embodiments, an improved architecture may be used to manage trace data for the capture and processing of sensor and/or HWA data on an SoC. By unobtrusively tracing data, the performance of the SoC is not impacted by tracing functionality.
Embodiments described herein are directed to providing an event interface within an SoC encompassing various high throughput trace source and sinks. In addition, on-chip memory may be used to store the data from the sources along with trace information. The on-chip memory may be configured to be used for trace data as well as algorithm data for use when processing the sensor data. The share of the on-chip memory for trace data and algorithm data may be dynamically modified; for example, memory allocation may be modified by software in some embodiments.
According to some embodiments, the SoC includes one or more processing resources, such as HWAs, each having an internal buffer, a scheduler, and a DMA controller. The various HWAs may be individual components configured to perform specific tasks. For example, in some embodiments, the SoC may include HWAs configured to perform image processing functions, machine learning techniques, optical flow, and the like. The HWA processes data, and the scheduler of a given HWA causes the DMA controller to transmit data from the internal buffer to the common on-chip memory within an SoC. A first DMA channel transmits the data to an external memory interface (e.g., DRAM), whereas a second DMA channel transmits the data to on-chip memory along with trace information associated with the data. The second DMA controller reads data from a common on-chip buffer memory and sends out the data to external interface (e.g., PCIE, CSI2) for tracing. The first and second DMA controller use events to inform production and consumption of data at smaller sizes (granularity). In some embodiments, the on-chip memory allows the SoC to capture not only the transmitted sensor data but also intermediate data. Intermediate data is a processed version of sensor data. For example, an HWA may perform a Fourier transform of the sensor data to convert the time-domain data to the frequency domain for analysis. The scheduler monitors for the completion of writes of data to external memory and on-chip memory prior to removing the data from internal buffer on the HWA as well as availability of internal memory for allow adding data on it. The scheduler receives signals over the event bus indicating completion of the transmission events or sends signals over the event bus indicating data availability.
According to some embodiments, techniques described herein allow tracing high bandwidth masters without relying on compression and without interfering with the operation of algorithms being run that utilize the sensor data. As such, embodiments described herein allow for unintrusive tracing of sensor data processing. In addition, the techniques described herein allow for flexibility in recording and tracking trace data. Further, the techniques described herein provide the flexibility to throttle input and/or output to manage data flow without any underflow and overflow.
The trace technique may be performed in a computing device. As illustrated in
Stored data, i.e., data stored by a storage device 115, may be accessed by SOC 105 during the execution of computer-executable instructions or process steps to instruct one or more components within the computing device 100. Storage device 115 may be partitioned or split into multiple sections that may be accessed by different software programs. For example, storage device 115 may include a section designated for specific purposes, such as storing program instructions or data for updating software of the computing device 100. In certain cases, the computing device 100 may include multiple operating systems. For example, the computing device 100 may include a general-purpose or real-time operating system which is utilized for normal operations. The computing device 100 may include a bootloader for performing specific tasks, such as upgrading and recovering the operating system, and allowing access to the computing device 100 at a level generally not available through the operating system. Both the operating system and bootloader may have access to the section of storage 115 designated for specific purposes.
The one or more communications interfaces 120 may include a Peripheral Component Interconnect (PCI), Serial Peripheral Interface (SPI), Camera Serial Interface (CSI), or Display (DSI, DP) communications interface for interfacing with one or more other SoC or board components. In certain cases, elements coupled to the processor may be included on hardware shared with the processor. For example, the communications interface 120, storage device 115, and memory 110 may be included, along with other elements such as memory, in a single chip or package, such as an SoC.
Once captured, the trace data in storage 206 may be managed by software 208. Specifically, the software can be used to read, display, edit the trace data, and the like. In a playback use case 210, software 208 may be used to feed in captured sensor data to storage 214 on PCB 212. The captured sensor data may then be used by system 100B as if system 100B were capturing data from sensors 130. As such, the performance of SoC 105B and DRAM 110B may be monitored to mimic the performance of SoC 105A and DRAM 110A using the same sensor data. Accordingly, a nonintrusive trace is provided which avoids using resources affecting the performance of the system which is being traced.
In some embodiments, multiple trace techniques may be employed on the same SoC. For example, a trace bus may be used for software trace and low throughput HWAs, whereas techniques described herein with respect to the on-chip memory may be used for high throughput HWAs.
The processor 302 may be coupled to a first trace architecture, as shown at embedded trace macrocell (ETM) 304. Similarly, the DSP/GPU 306 may be connected to trace macrocell 308. Other various components used for tracing data from various tracing sources may be utilized. For example, a sniffer 332 sniffs the outgoing data (intended for the system data bus 334) from HWA 330. That sniffed data are transmitted over trace bus 344 using system trace macrocell (STM) 336. The collected trace data and corresponding timestamp(s) across all sources on trace bus 344 are collected in embedded trace buffer (ETB) 310. Collected trace from trace bus 344 is concurrently captured by trace port interface unit (TPIU) 312. From TPIU 312, the trace data and time stamp(s) exported to digital output pins 314 and/or high-speed bus interface 316 (e.g., Aurora) of SoC 300.
In addition, techniques described herein provide an alternative trace architecture, which may reside coincidently with the first trace architecture described above with respect to the low throughput masters on the SoC. This additional trace architecture includes using system data bus 334, on-chip memory 324, event bus 322, which provides communication across various components, such as high throughput HWAs 318 and 320, DMA controller (CTRL) 348 and communication interface PCIE 328 and/or CSI2 326. In some embodiments, event bus 322 may be configured to transmit signals across various high throughput providers, such as HWA 318 and HWA 320, along with other components, such as DMA controller 348. In some embodiments, the event bus 322 may connect other components not used for managing trace data for the high throughput HWAs, such as processor 302 and DPS/GPU 306.
The SoC 300 also includes a system data bus 334 configured to transmit live data and trace information among various components, including the HWA 318, HWA 320, DMA CTRL 348, on-chip memory 324 and interfaces, such as CSI2 TX 326 and (PCIE) 328. The high throughput HWAs 318 and 320 may receive data from various sources, such as from sensors within the same system as the SoC 300, from other HWAs, and the like. A high throughput HWA may be one which requires significant data flow, such as on the order of gigabytes per second. According to some embodiments, the HWAs 318 and 320 may be enhanced for tracing, for example, by including a scheduler, an event interface, and a DMA engine, as will be described below with respect to
According to one or more embodiments, lines of data captured in buffer 402 are transferred to larger memory, such as external memory 412. The data may be captured, for example, from sensors, from other masters, and the like. To enable tracing, the scheduler 404 causes DMA CTRL 408 to transmit the data from the buffer 402 to an external memory interface 410 and an on-chip memory 416 across an event bus 406. In addition, DMA CTRL 408 may transmit the same particular set of data in smaller chunks (e.g., lines) along with trace header/footer information for the data to on-chip memory 416. DMA CTRL 408 may transmit and receive signals across the event bus 406 indicating the start and completion of transmission. The DMA CTRL 418 external to HWA 400 may receive the signal over the event bus 406 indicating that the first data and trace data have been transmitted to the on-chip memory. The signal may trigger DMA CTRL 418 to transmit the data with trace information from the on-chip memory 416 to an interface, such as CSI2 TX 420 and/or PCIE 421 for transmission out of the SoC.
DMA CTRL 418 transmits a signal across the event bus 406 when the transmission is complete. The scheduler 404 can track when a particular data chunk set has been transmitted to the external memory interface 410 as well as the second interface CSI2 TX 420 based on signals received over the event bus 406. In turn, the scheduler 404 can clear a particular data set from the buffer 402 when the received signals indicate that the data has been transmitted to the external memory interface 410 and to the second interface CSI2 TX 420.
In some embodiments, if the data generation rate from the HWA and consumption to send out external memory as well as communication interface are different, then flow control (speeding up or down at smaller chunk, i.e., line) is required. This requires special handling of the event within the HWA.
The flowchart 600 begins at block 602, where a first set of data is written to buffer 402 of HWA 400 on a chip. HWA may be a specialized component configured to perform a particular function. For example, in some embodiments, the HWA is configured to process sensor data to perform machine vision functions. The data may be stored on a buffer 402 within HWA 400 prior to being transmitted off the chip. For purposes of clarity, flowchart 600 describes a technique using a single HWA; however, it should be understood that multiple HWAs may be used for tracing according to some embodiments. Moreover, in some embodiments, the data written to the buffer may include other types of data, such as data from other accelerators or external memory. Further, in some embodiments, the HWA may transmit or receive data from sensors, other accelerators, or external memory.
The flowchart continues at block 604, where a DMA controller for the HWA receives a signal from the scheduler over an internal event bus indicating that the fist set of data should be written out of the buffer. In some embodiments, the DMA controller reads and transmits a first amount of data that may be determined based on computer-readable instruction. For example, the amount of data transmitted by the DMA controller with each read may be programmable and software dependent. That is, the data may be transmitted from the buffer in transmission packets of a size that are software dependent. In some embodiments, the amount of data may be throttled based on events occurring within the SoC. The data may be transmitted, for example, at the line level, the slice level, or the frame level.
The flowchart continues at block 606, where the DMA of the HWA transmits the data from block 604 to an external memory over a system data bus. In particular, the DMA may transmit the first set of data to a memory interface associated with the external memory. In response to transmitting the data, the HWA DMA may transmit a signal over the event bus indicating the transmission is complete. The flowchart continues at block 610, where completion of transmission of a given chunk of data to the external memory is notified to the HWA over the event bus.
The HWA DMA may also transmit the data with trace information for the data to on-chip memory over the system data bus, as shown at block 612. As shown, the data may be transmitted to the external memory interface and the on-chip memory in parallel in small chunks (e.g., lines). For example, the transmissions may occur over different channels of the DMA controller over a system data bus. Notably, the on-chip memory may be a memory 416 on the SoC which is different and separate from a memory external to the SoC (i.e., external memory 412) onto which the sensor data and/or HWA data is gathered in entire frames by accumulating smaller chunks (e.g., lines). The flowchart continues at block 614, where the HWA DMA controller transmits a signal across the event bus indicating the transmission to the on-chip memory has occurred.
The flowchart continues at block 616 where a DMA controller external to the HWA receives the signal over the event bus indicating that the transmission to the on-chip memory is complete and transmits the data from the on-chip memory to an interface for transmittal out of the SoC. The interface may include, for example, CSI2 TX or PCIE, as shown at 420. Accordingly, the trace data may be provided to an external component via the interfaces. From there, the system can use the trace data, for example, for recording or testing systems using the sensors. At block 618, the external DMA controller causes a signal to be transmitted on the event bus indicating the transmission to the interface has occurred.
The flowchart concludes at block 620 where a scheduler within the HWA receives the signal over the event bus indicating the data transmission to the external memory is complete from block 610, and the signal over the event bus indicating transmission to the interface is complete at block 618. In response, the scheduler can clear the buffer of the data portion which was transmitted at 602 for additional sensor data.
The programmable processor (e.g., ARM) writes before start of trace data, header of trace and after transmission trace footer containing various information associated with trace data.
Trace information accompanies the traced data and includes information to identify time instance, producer, etc. This additional trace information is transmitted with the data, for example, from the buffer to the on-chip memory by the DMA controller of the HWA. As shown in the example of
Examples of data that may be found in the header 808 include a Start Data identifier to identify the beginning and length of the header, for example, in bytes and a timestamp. The timestamp may be a 64-bit count, for example, and may be derived from an “always on” domain with a frequency of at least 10 MHz, for example. Header data may also include a number of buffers sent in the upcoming payload, a number of slices the current buffers are split and interleaved across multiple buffers, a buffer format, a buffer width (e.g., number of pixels), and a buffer height e.g., number of lines). Other information that may be included in the header includes an amount of line padding in each line (e.g., in pixels, an amount of lines padded in toward the end of the frame or buffer, a CRC signature of the header, and a string identifying the end of the header.
Examples of data that may be found in the footer 812 include a string indicating the start of the footer, a buffer slice status indicating whether errors have occurred during transmission of the slice, a CRC signature for the footer, and a string indicating the end of the footer. Software on the processor 802 may generate the header and footer.
Turning to
As shown at the remote end 902, the virtually constructed buffer can be reconstructed after transmission to form reconstructed buffer 912. Accordingly, the remote end 902 will have interleaved slices of data from multiple buffers. As described above, the header and/or footer may contain trace information which allows for better management of the buffer data. In some embodiments, the trace information in the header will allow for demultiplexing of the buffer data into the original buffers.
In this description, the term “couple” may cover connections, communications, or signal paths that enable a functional relationship consistent with this description. For example, if device A generates a signal to control device B to perform an action: (a) in a first example, device A is coupled to device B by direct connection; or (b) in a second example, device A is coupled to device B through intervening component C if intervening component C does not alter the functional relationship between device A and device B, such that device B is controlled by device A via the control signal generated by device A.
A device that is “configured to” perform a task or function may be configured (e.g., programmed and/or hardwired) at a time of manufacturing by a manufacturer to perform the function and/or may be configurable (or reconfigurable) by a user after manufacturing to perform the function and/or other additional or alternative functions. The configuration may be through firmware and/or software programming of the device, through a construction and/or layout of hardware components and interconnections of the device, or a combination thereof.
This application is a continuation of U.S. patent application Ser. No. 17/677,638, filed Feb. 22, 2022, the content of which is incorporated by reference herein.
Number | Date | Country | |
---|---|---|---|
Parent | 17677638 | Feb 2022 | US |
Child | 18819007 | US |