Disclosed aspects are directed to improving configuration of registers in a firmware-hardware interface for expedited hardware processing, in example applications such as computer vision and video encoding.
In conventional designs of hardware (HW)-firmware (FW) processing flow, HW configurable registers, e.g., software interface (SWI) registers, are derived and programmed in a sequence by FW, at fixed intervals. The fixed intervals may be, for example, on a per-frame basis for video processing. In these designs, a HW processor or core initiates an interrupt (IRQ) indicating the end of a current interval to the FW, upon detection of which, a FW processor (e.g., a video processor) can initiate the configuration of a new sequence in the SWI registers. The programming of these SWI registers involves some processing times, which includes the time taken by the FW processor to derive new values for the sequences, memory access latencies for reading related information from an external memory, and related cache misses. These processing times can become significant and limit HW processing time of the HW core. This problem is exacerbated at higher frame rates. The use of multi-processing architectures, or multi-threaded processing of each frame, e.g., for computer vision algorithms also adds pressure on the needs for reducing the processing times.
Correspondingly, there is a need to mitigate the aforementioned delays associated with programming the registers and for improving the processing speeds.
This summary identifies features of some example aspects, and is not an exclusive or exhaustive description of the disclosed subject matter. Whether features or aspects are included in, or omitted from this summary is not intended as indicative of relative importance of such features. Additional features and aspects are described, and will become apparent to persons skilled in the art upon reading the following detailed description and viewing the drawings that form a part thereof.
An exemplary method implemented in a firmware domain and a hardware domain is disclosed. The method may comprise writing, by the firmware domain, configuration information to a memory for a plurality of passes of hardware processing. The method may also comprise programming, by the hardware domain, configuration registers with the configuration information retrieved from the memory. The method may further comprise processing, by the hardware domain, the plurality of passes in accordance with the configuration information programmed in the configuration registers. Programming the configuration registers may occur subsequent to the configuration information being written to the memory.
An exemplary apparatus is disclosed. The apparatus may comprise a firmware domain, a hardware domain, and a memory accessible by both the firmware and hardware domains. The firmware domain may comprise a firmware processor configured to write configuration information to the memory for a plurality of passes of hardware processing. The hardware domain may comprise a register reconfiguration using direct descriptor fetch (RRDF) controller and a hardware core. The RRDF controller may be configured to program configuration registers with the configuration information retrieved from the memory. The hardware core may be configured to process the plurality of passes in accordance with the configuration information programmed in the configuration registers. The RRDF controller may program the configuration registers subsequent to the firmware processor writing the configuration information to the memory.
Another exemplary apparatus is disclosed. The apparatus may comprise means for writing configuration information to a memory for a plurality of passes of hardware processing. The apparatus may also comprise means for programming configuration registers with the configuration information retrieved from the memory. The apparatus may further comprise means for processing the plurality of passes in accordance with the configuration information programmed in the configuration registers. The means for programming may program the configuration registers subsequent to the means for writing writes the configuration information to the memory.
Other objects and advantages associated with the aspects disclosed herein will be apparent to those skilled in the art based on the accompanying drawings and detailed description.
The accompanying drawings are presented to aid in the description of examples of one or more aspects of the disclosed subject matter and are provided solely for illustration of the examples and not limitation thereof:
Aspects of the subject matter are provided in the following description and related drawings directed to specific examples of the disclosed subject matter. Alternates may be devised without departing from the scope of the disclosed subject matter. Additionally, well-known elements will not be described in detail or will be omitted so as not to obscure the relevant details.
The word “exemplary” is used herein to mean “serving as an example, instance, or illustration.” Any aspect described herein as “exemplary” is not necessarily to be construed as preferred or advantageous over other aspects. Likewise, the term “aspects” does not require that all aspects include the discussed feature, advantage, or mode of operation.
The terminology used herein describes particular aspects only and should not be construed to limit any aspects disclosed herein. As used herein, the singular forms “a,” “an,” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. Those skilled in the art will further understand that the terms “comprises,” “comprising,” “includes,” and/or “including,” as used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
Further, various aspects may be described in terms of sequences of actions to be performed by, for example, elements of a computing device. Those skilled in the art will recognize that various actions described herein can be performed by specific circuits (e.g., an application specific integrated circuit (ASIC)), by program instructions being executed by one or more processors, or by a combination of both. Additionally, these sequences of actions described herein can be considered to be embodied entirely within any form of non-transitory computer-readable medium having stored thereon a corresponding set of computer instructions that upon execution would cause an associated processor to perform the functionality described herein. Thus, the various aspects described herein may be embodied in a number of different forms, all of which have been contemplated to be within the scope of the claimed subject matter. In addition, for each of the aspects described herein, the corresponding form of any such aspects may be described herein as, for example, “logic configured to” and/or other structural components configured to perform the described action.
Exemplary aspects of this disclosure are directed to decoupling programming of the SWI registers through FW and the HW processing, to minimize the FW interventions in the overall processing. In exemplary aspects, the FW is configured to prepare all the required programming sequences for the SWI registers in advance and to store the prepared programming sequences in external memory (e.g., DDR). The FW may then initiate the exemplary HW scheme referred to as Register Reconfiguration using Direct Descriptor Fetch (RRDF). RRDF is designed to fetch the programming sequence data from the external memory and configure the SWI registers using HW (e.g., a HW core may fetch the programming sequence from the external memory and configure the SWI registers in HW without requiring FW intervention). Accordingly, FW is enabled to independently compute all the required parameters and prepare the required buffers to store the programming sequences, while HW may read and program the registers at a significantly higher speed than is possible with the conventional FW programming of the SWI registers.
The processing system 200 may also include memory 220 (e.g., DDR memory), with respective interfaces 216 and 218 to the FW domain 206 and the HW domain 210. The memory 220 may be external to both the FW and HW domains 206, 210. The RRDF controller 225 may be configured to fetch the programming sequence data from the memory 220 and configure the SWI registers 212.
The processing system 200 implementing the RRDF controller 225 can significantly reduce the latencies as compared to the conventional processing system 100. In the conventional processing system 100, the FW processor 102 programs the SWI register 112, the HW core 114 processes a thread in accordance with the programming and initiates the IRQ when the current interval ends, and the FW processor 102 programs the SWI register 112 for the next interval upon detection of the IRQ. This loop can take a considerable amount of time—i.e., can result in significant latencies. Also, the loop is repeated multiple times, meaning that the latencies can accumulate. Note that in the conventional processing system 100, the HW core 114 only reads from the SWI registers 112, as indicated by a single ended arrow from the SWI registers 112 to the HW core 114. The SWI registers 112 are programmed by the FW processor 102 in each loop.
However, in the proposed processing system 200, the FW domain 206 (e.g., the FW processor 202) may prepare and prepopulate the memory 220 with multiple programming sequences and simply initiate the HW domain 210 (e.g., the HW core 214) to proceed. The HW core 214, with the RRDF controller 225, may then retrieve the programming sequence from the memory 220 to program the SWI registers 212 and process the thread in accordance with the programming. When the current interval ends, instead of generating the IRQ and waiting for the FW processor 202, the HW domain 210 (e.g., HW core 214, RRDF controller 225) itself can retrieve the next programming sequence from the memory 220 to program the SWI registers 212 (which is unlike the conventional processing system 100) and process the thread accordingly. As a result, the latencies associated with programming the SWI registers 212 can be reduced significantly. Note that unlike the conventional processing system 100, the HW core 214 of the proposed processing system 200 may also write to, as well as read from, the SWI registers 212, as indicated by a double ended arrow between the SWI registers 212 and the HW core 214.
With reference to
On the other hand, for RRDF programming denoted by the reference numeral 310, a single FW programming event 312a is sufficient for the programming of both of the HW threads 314a-b in the example shown. Interrupt processing (IRQ) does not incur delays such as 306a-b shown in the non-RRDF conventional implementation. This is because the FW need not monitor the IRQ reception from the HW threads 314a-b for subsequent programming of the SWI registers. Rather, once the execution of HW thread 314a is completed, for example, based on an initial programming of the SWI registers during FW programming 312a (which can involve writing to external memory and reading from external memory to program the SWI registers through a HW core), the execution of the subsequent HW thread 314b may proceed entirely within HW by consulting the previously programmed SWI registers. Accordingly,
To further explain the exemplary implementations, a general approach to reducing the cycles involved in block communication is considered. While an additional controller may be added among the various processing blocks of a processing sequence, in conventional implementations, a special interface or protocol may be required for interactions with the controller. On the other hand, in the exemplary implementations, the RRDF (e.g., RRDF controller 225) itself acts as the controller between the blocks (e.g., the HW threads) without the need for any special interfaces. In the RRDF, the existing interfaces may be reused, without requiring any changes to the blocks. Such RRDF implementations may be suitable in example implementations which utilize an SWI interface.
Once the FW derives the required programming sequence in exemplary aspects, the FW may prepare a buffer with the required programming sequence, wherein the programming sequence may comprise the SWI address and corresponding write data. The RRDF scheme enables the FW to prepare the programming sequences corresponding to independent HW threads (or passes) in advance.
In some aspects, the SWI address and data may be separated into two different buffers. Since the programming sequence may be substantially invariable or fixed for a particular use case, the data may vary while the addresses remain constant. Thus, by separating out the address and data buffers, the address buffer can be copied a single time and the data buffers, as they vary, can be modified in the SWI programming, without requiring multiple copies of the invariant address buffer. Thus, further efficiencies may be achieved in these aspects.
Accordingly, the FW may initially prepare the address buffer a single time, and while retaining the same address buffer, continue to update the configuration (data) values in the SWI registers for subsequent passes. This way, the FW may reduce the overhead involved in maintaining the buffers, rather than configuring both address and data every time.
With reference now to
Accordingly, separating the SWI address and the data into two descriptors is seen to provide more generality and flexibility for the FW in the exemplary RRDF scheme. For example, with continued reference to
In an aspect of multi-pass processing and multi-thread synchronization, in addition to the above-described programming of the SWI registers, the RRDF scheme also allows for synchronization between multiple threads. In an example process denoted as “SYNC_EVENT”, the FW can insert one or more predetermined addresses in the SWI address buffer 402a, referred to herein as a “MARKER”. Each MARKER may correspond to an interrupt from a particular thread. If a MARKER is encountered in an address buffer 402a, the RRDF scheme may halt further programming and wait for the completion of the corresponding EVENT. The advantages of this feature are illustrated in the following example scenarios: (1) there are multiple concurrent HW threads wherein execution of each thread may be dependent on the completion of other thread, and (2) multi-pass processing of a video frame in the same HW thread, wherein the SWI configuration of each pass is predetermined.
With combined reference to
In an example with combined reference to
Referring to
Exemplary aspects of RRDF may be used in the implementation of video processing algorithms (e.g., CVP) like object detection and video processing like true motion estimation. The following is an non-exhaustive list of some example use cases that may benefit from the exemplary RRDF scheme: TME-Search. TME-HOG, CV-HOG-SVM, and High Frame Rate Vcodec processing
In
In block 810 of
In block 820, the firmware processor may provide configuration retrieve parameters to the hardware domain, e.g., to the RRDF controller. Then in block 830, the RRDF controller may program the configuration registers with the configuration information retrieved from the memory. The configuration registers may be programmed based on the configuration retrieve parameters. In block 840, the hardware core may process the plurality of passes in accordance with the configuration information programmed in the configuration registers.
In an aspect, block 810 may precede block 830. That is, the firmware processor may write the configuration information to the memory for the plurality of passes of hardware processing, and then the RRDF controller may program the configuration registers with the configuration information retrieved from the memory.
As indicated above, the SWI registers, which may be in the hardware domain, may be examples of the configuration registers. Also, the memory may comprise an SWI address buffer (e.g., address buffer 402a) and an SWI data buffer (e.g., data buffer 402b). The configuration information may be viewed as comprising a plurality of SWI addresses and a plurality of SWI data corresponding to the plurality of SWI addresses.
In this instance, the firmware processor may implement block 810 by writing the plurality of SWI addresses into the SWI address buffer, and by writing the plurality of SWI data into the SWI data buffer. Also, the RRDF controller may implement block 830 by retrieving the plurality of SWI addresses from the SWI address buffer, retrieving the plurality of SWI data from the SWI data buffer, and pairing, by the hardware, each SWI address with its SWI data to program the SWI registers.
In an aspect, when the configuration registers comprise the SWI registers, then it may be said that the configuration retrieve parameters include SWI retrieve parameters. In this instance, the SWI retrieve parameters provided by the firmware processor in block 820 to the RRDF controller may include an SWI address buffer start and an SWI data buffer start. The SWI address buffer start may be a start address of the SWI address buffer, and the SWI data buffer start may be a start address of the SWI data buffer. The RRDF controller may implement block 830 by retrieving the plurality of SWI addresses starting from the SWI address buffer start, and retrieving the plurality of SWI data starting from the SWI data buffer start. Further, the hardware core may implement block 840 by processing the plurality of passes in accordance with the programmed SWI registers.
In an alternative aspect, the SWI retrieve parameters may also include a program size indicating a total number of SWI registers to be programmed. Then the RRDF controller may implement block 830 by retrieving the plurality of SWI addresses and the plurality of SWI data until the program size is reached.
Recall that in block 810, the firmware processor may write the configuration information to the memory for a plurality of passes. Nevertheless, in another aspect, the RRDF controller and the hardware core may implement blocks 830 and 840 one pass at a time. That is, the RRDF controller may program the SWI registers and the hardware core may process the plurality of passes in accordance with the programmed SWI registers such that one pass is programmed in the SWI registers and processed prior to a next pass being programmed in the SWI registers and processed.
In a further aspect, synchronization among the plurality of passes may be maintained by the firmware processor inserting one or more predetermined address values in the SWI address buffer referred to as MARKERs. For example, if first and second passes are passes of the plurality of passes and the second pass is dependent on the first pass, the firmware processor may write a MARKER in the SWI address buffer corresponding to a second pass. This may occur when the firmware processor is performing block 810.
When the RRDF controller retrieves the MARKER from the SWI address buffer during retrieving the SWI addresses of the second pass, e.g., when performing block 830, the RRDF controller may wait for an IRQ from the first pass, e.g., when the first pass is processed. Upon detecting the IRQ from the first pass, the RRDF controller may resume retrieving the plurality of SWI addresses and the plurality of SWI data of the second pass. The first and second passes may be passes of a same hardware thread or passes of different hardware threads.
The method 800 may end after block 840. Optionally, in block 850, the firmware processor may prepare and write the next configuration information into the memory. Recall from above that one reason for separating out the SWI address and data into different buffers is that the programming sequence may be substantially invariable or fixed for a particular use case, i.e., the data may vary while the addresses remain constant. Thus, the firmware processor may implement block 850 by writing the next plurality of SWI data into the SWI data buffer without writing any of the next plurality of SWI addresses into the SWI address buffer, e.g., when the SWI addresses do not change. If block 850 is performed, then the method may proceed back to block 820.
Those of skill in the art will appreciate that information and signals may be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, symbols, and chips that may be referenced throughout the above description may be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination thereof.
Further, those of skill in the art will appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the aspects disclosed herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The methods, sequences and/or algorithms described in connection with the aspects disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor.
Accordingly, an aspect of the invention can include a computer-readable media embodying a method implemented in a firmware-hardware interface. Accordingly, the invention is not limited to illustrated examples and any means for performing the functionality described herein are included in aspects of the invention.
While the foregoing disclosure shows illustrative aspects of the invention, it should be noted that various changes and modifications could be made herein without departing from the scope of the invention as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the aspects of the invention described herein need not be performed in any particular order. Furthermore, although elements of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
The present application for patent claims the benefit of U.S. Provisional Application No. 62/650,230, entitled “REGISTER RECONFIGURATION USING DIRECT DESCRIPTOR FETCH FOR EFFICIENT MULTI-PASS PROCESSING OF COMPUTER VISION AND VIDEO ENCODING APPLICATIONS,” filed Mar. 29, 2018, assigned to the assignee hereof, and expressly incorporated herein by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62650230 | Mar 2018 | US |