The accompanying drawings illustrate a number of exemplary embodiments and are a part of the specification. Together with the following description, these drawings demonstrate and explain various principles of the instant disclosure.
Throughout the drawings, identical reference characters and descriptions indicate similar, but not necessarily identical, elements. While the exemplary embodiments described herein are susceptible to various modifications and alternative forms, specific embodiments have been shown by way of example in the drawings and will be described in detail herein. However, the exemplary embodiments described herein are not intended to be limited to the particular forms disclosed. Rather, the instant disclosure covers all modifications, equivalents, and alternatives falling within the scope of the appended claims.
Features from any of the embodiments described herein may be used in combination with one another in accordance with the general principles described herein. These and other embodiments, features, and advantages will be more fully understood upon reading the following detailed description in conjunction with the accompanying drawings and claims.
In many firmware (FW) controlled system designs, FW prepares the programming sequence and programs hardware (HW) in order to achieve a specific functionality. Preparing this sequence, programming the sequence to HW, waiting for HW completion, and monitoring the HW state for any additional information may involve context switching in FW and cause high latency in the processing time. The latency may become critical and significant in throughput driven designs where multiple HW threads work in pipeline fashion to achieve a common task, such as a transcoder that decodes a video sequence of a particular format and encodes the video sequence in different formats and resolutions.
The present disclosure is generally directed to systems and methods for facilitating efficient hardware-firmware interactions. In order to minimize FW context switching and latency in processing times, the systems described herein offload some of this from FW and implement some of the programming features in HW. In one embodiment, a new HW module, called command direct memory access (CDMA), may be added in the transcoder solution or other hardware configuration. In some examples, a CDMA may support a pointer-to-pointer scheme for basic register programming, a special marker that enables the HW to distinguish between register write operations and special operations (e.g., read, wait, etc.), a wait-for-done command, and/or debug and performance traces. This may enable FW to use dedicated buffers for a programming sequence that is common across frames for a given HW thread. In some embodiments, this system may minimize FW buffer updates (or writes) and/or save command list preparation time.
In some embodiments, the systems described herein may improve the functioning of a computing device by increasing the speed at which the computing device performs operations. Additionally, the systems described herein may improve the fields of computational efficiency and/or video transcoding by improving the efficiency at which computing devices can execute certain command sequences, such as the command sequences used in video transcoding.
In some embodiments, the systems described herein may facilitate efficient hardware-firmware interaction.
At step 204, the systems described herein may send, by a FW module, a command to the HW module directing the HW module to perform a non-register-write operation via the special marker. The term “non-register-write operation” may generally refer to any operation performed by hardware that does not exclusively consist of writing data to a memory register. For example, a non-register-write operation may include a register read operation, a wait-for-done operation, a terminate operation, and/or a debug operation. The systems described herein may perform step 204 in a variety of ways. In one example, FW may send a wait-for-done command to the CDMA. In another example, FW may send a terminate command to the CDMA.
At step 206, the systems described herein may receive, by the HW module, the command directing the HW module to perform the non-register-write operation via the special marker. The systems described herein may perform step 206 in a variety of ways. For example, the CDMA may read the command from a command queue. In some embodiments, the CDMA may check a designated section of memory for commands from FW.
At step 208, the systems describe herein may perform, by the HW module, in response to receiving the command, the non-register-write operation signified by the special marker. For example, the CDMA may read data, wait for a thread to complete, and/or terminate operations. In one example, the CDMA may send debug data to FW. For example, upon receiving a debug command signified by a debug opcode in the special marker, the CDMA may output debugging information into external memory that can be used by FW for performance monitoring, analysis, and/or debugging processes. In another example, the CDMA may receive a wait-for-done command and in response, the CDMA may pause operating until detecting that a hardware thread specified by the wait-for-done command has completed.
In some embodiments, the systems described herein may support a programming sequence of a thread that is split across multiple physical buffers in memory. For example, as illustrated in
In one embodiment, FW may provide an address pointer and size for the list of commands stored in command queue 404 and the CDMA may fetch the list of commands via the address pointer and size. In some embodiments, the FW may provide the address pointer and size repeatedly, as the buffer may include instructions that are referenced repeatedly, such as the debug programming sequence, clock sequence, reset sequence, and/or interrupt clear sequence.
In some examples, once the buffers are ready, FW may provide all the pointers to the CDMA through a CSR. In one example, there may be multiple buffers to process, preventing the CDMA from having explicit information about when to send a CDMA interrupt to FW. In one embodiment, the CDMA may provide control to FW to push an enable-interrupt command into the command queue. When this command is received, the CDMA may generate an interrupt after the processing of the corresponding buffer.
In some embodiments, the systems described herein may use a sequence identifier (ID) inserted in the special marker to facilitate cross thread dependency and/or efficiency within a single thread. In some examples, a sequence ID may be represented as a continuously incrementing eight bit value. For example, as illustrated in
In some examples, the main challenge of cross thread dependency modeling may be the variable processing times of each thread. Some of the threads may finish faster than others, making the synchronization difficult. In order to solve this problem, the systems described herein may use a sequence ID. For example, the systems described herein may store the sequence ID for each scalar in the corresponding CDMA thread. In this example, when a dependent thread is waiting, the thread may compare the thread's own wait-sequence-ID against the stored value from the master thread and may proceed as long as the wait-sequence-ID is greater than or equal to the stored sequence ID.
In one example, as illustrated in
In one example, when the CDMA is processing BS 804, the CDMA may identify the wait-for-done marker for ENC 802. The CDMA may internally compare the stored value from ENC 802 to check if it is greater than or equal to the wait-for-done marker and may wait until that condition is met before programming BS 804. In some examples, ENC 802 may not have to wait at each done message for the done to be sampled by all dependent threads. In this example, each thread with variable processing times may not impact other threads.
In some embodiments, a CDMA may terminate processing when certain conditions are met. For example, the CDMA may receive a terminate command from FW.
In some embodiments, a CDMA may timeout under certain conditions. In some examples, between the passes or frames, a CDMA may wait for completion from the corresponding HW thread. In order to recover from any hang scenarios, FW may enable timeout behavior and program a timeout value. Upon reaching the timeout value (e.g., waiting for a hardware thread for an amount of seconds, milliseconds, or other measurement of time that matches the timeout value), a CDMA thread may generate a timeout message to send to FW and wait in the same state until receiving a message from FW. In some examples, FW may continue to wait after receiving the timeout message or may issue a terminate command to the CDMA.
As described above, the systems and methods described herein may improve the efficiency of various computing processes, such as video transcoding, by using a special marker to communicate with a CDMA that receives commands from FW and reads and writes to registers. By storing repeatedly accessed information and command sequences in buffers in memory that can be read by the CDMA via the command queue, the systems described herein may eliminate redundant iterations of programming that same information into buffers by FW in between different sequences. The systems described herein may direct the CDMA via a special marker with different opcodes for different operations, such as debug, terminate, and wait-for-done. Using a wait-for-done command with a sequence ID that specifies a thread may enable the systems described herein to facilitate cross-thread dependency by maintaining and transmitting information about the current status of each thread, enabling threads to wait only for relevant other threads to finish processing rather than having to wait for all threads.
Example 1: A system for facilitating efficient hardware-firmware interactions may include (i) a group of memory registers, (ii) a hardware module that directly reads from and writes to the memory registers and is configured to interpret a special marker that distinguishes between register write operations and non-register-write operations, and (iii) a firmware module that directs the hardware module to perform operations at least in part by sending the special marker.
Example 2: The system of example 1, where the non-register-write operations include at least one of a register read operation, a wait-for-done operation, and/or a debug operation.
Example 3: The system of examples 1-2 may further include an address of a predefined special memory register and an operation code.
Example 4: The system of examples 1-3, where the firmware module prepares a list of commands stored in memory, the firmware module provides at least one address pointer and size for the list of commands to the hardware module, and the hardware module fetches the list of commands via the at least one address pointer and size.
Example 5: The system of examples 1-4, where the firmware module provides, to the hardware module, a plurality of address pointers that each point to a different segment of a single command in the list of commands.
Example 6: The system of examples 1-5, where the hardware module stores the at least one address pointer to a memory register within the plurality of memory registers.
Example 7: The system of examples 1-6, where the firmware module provides the at least one address pointer to the hardware module repeatedly during different points in time.
Example 8: The system of examples 1-7, where the hardware module receives a command to perform a wait-for-done operation, the hardware module pauses operating until detecting that a hardware thread has completed, and the hardware module resumes operating in response to detecting that the hardware thread has completed.
Example 9: The system of examples 1-8, where the command to perform the wait-for-done operation includes a sequence identifier and the hardware module facilitates cross-thread dependency by pausing operating until detecting that the hardware thread specified by the sequence identifier has completed.
Example 10: The system of examples 1-9, where the hardware module receives a command to perform a terminate operation and, in response, the hardware module pauses operating until detecting that at least one hardware thread has completed, drains prefetched data, empties a command queue, and confirms a completion of the terminate operation to the firmware module.
Example 11: The system of examples 1-10, where the hardware module receives a command from the firmware to perform a debug operation and in response, the hardware module writes data to memory that is accessible to the firmware.
Example 12: The system of examples 1-11, where the hardware module stores a timeout value that, when reached, prompts the hardware module to pause operating and send a timeout message to the firmware module.
Example 13: The system of examples 1-12, where the hardware module stores, in at least one memory register within the plurality of memory registers, a current status of the hardware module.
Example 14: A computer-implemented method for facilitating efficient hardware-firmware interactions may include (i) identifying a hardware module that directly reads from and writes to a plurality of memory registers and is configured to interpret a special marker that distinguishes between register write operations and non-register-write operations, (ii) sending, by a firmware module, a command to the hardware module directing the hardware module to perform a non-register-write operation via the special marker, (iii) receiving, by the hardware module, the command directing the hardware module to perform the non-register-write operation via the special marker, and (iv) performing, by the hardware module, in response to receiving the command, the non-register-write operation signified by the special marker.
Example 15: The computer-implemented method of example 14, where the non-register-write operation includes a register read operation and the hardware module performs the register read operation by reading data from a memory register within the plurality of memory registers.
Example 16: The computer-implemented method of examples 14-15, where (i) the non-register-write operation includes a wait-for-done operation, (ii) the hardware module performs the wait-for-done operation by pausing operating until the hardware module detects that a hardware thread has completed, and (iii) the hardware module resumes operating in response to detecting that the hardware thread has completed.
Example 17: The computer-implemented method of examples 14-16, where the computer-executable instructions cause the physical processor to the command to perform the wait-for-done operation includes a sequence identifier and the hardware module facilitates cross-thread dependency by pausing operating until detecting that the hardware thread specified by the sequence identifier has completed.
Example 18: The computer-implemented method of examples 14-17, where the non-register-write operation includes a debug operation and the hardware module performs the debug operation by writing data to memory that is accessible to the firmware.
Example 19: The computer-implemented method of examples 14-18, where the non-register-write operation includes a terminate operation and the hardware module performs the terminate operation by (i) pausing operating until detecting that at least one hardware thread has completed, (ii) draining prefetched data, (iii) emptying a command queue, and (iv) confirming a completion of the terminate operation to the firmware module.
Example 20: An apparatus may include (i) a plurality of memory registers, (ii) a hardware module that directly reads from and writes to the plurality of memory registers and is configured to interpret a special marker that distinguishes between register write operations and non-register-write operations, and (iii) a hardware element configured to execute a firmware module that directs the hardware module to perform operations at least in part by sending the special marker.
As detailed above, the computing devices and systems described and/or illustrated herein broadly represent any type or form of computing device or system capable of executing computer-readable instructions, such as those contained within the modules described herein. In their most basic configuration, these computing device(s) may each include at least one memory device and at least one physical processor.
In some examples, the term “memory device” generally refers to any type or form of volatile or non-volatile storage device or medium capable of storing data and/or computer-readable instructions. In one example, a memory device may store, load, and/or maintain one or more of the modules described herein. Examples of memory devices include, without limitation, Random Access Memory (RAM), Read Only Memory (ROM), flash memory, Hard Disk Drives (HDDs), Solid-State Drives (SSDs), optical disk drives, caches, variations or combinations of one or more of the same, or any other suitable storage memory.
In some examples, the term “physical processor” generally refers to any type or form of hardware-implemented processing unit capable of interpreting and/or executing computer-readable instructions. In one example, a physical processor may access and/or modify one or more modules stored in the above-described memory device. Examples of physical processors include, without limitation, microprocessors, microcontrollers, Central Processing Units (CPUs), Field-Programmable Gate Arrays (FPGAs) that implement softcore processors, Application-Specific Integrated Circuits (ASICs), portions of one or more of the same, variations or combinations of one or more of the same, or any other suitable physical processor.
Although illustrated as separate elements, the modules described and/or illustrated herein may represent portions of a single module or application. In addition, in certain embodiments one or more of these modules may represent one or more software applications or programs that, when executed by a computing device, may cause the computing device to perform one or more tasks. For example, one or more of the modules described and/or illustrated herein may represent modules stored and configured to run on one or more of the computing devices or systems described and/or illustrated herein. One or more of these modules may also represent all or portions of one or more special-purpose computers configured to perform one or more tasks.
In addition, one or more of the modules described herein may transform data, physical devices, and/or representations of physical devices from one form to another. For example, one or more of the modules recited herein may receive image data to be transformed, transform the image data into a data structure that stores user characteristic data, output a result of the transformation to select a customized interactive ice breaker widget relevant to the user, use the result of the transformation to present the widget to the user, and store the result of the transformation to create a record of the presented widget. Additionally or alternatively, one or more of the modules recited herein may transform a processor, volatile memory, non-volatile memory, and/or any other portion of a physical computing device from one form to another by executing on the computing device, storing data on the computing device, and/or otherwise interacting with the computing device.
In some embodiments, the term “computer-readable medium” generally refers to any form of device, carrier, or medium capable of storing or carrying computer-readable instructions. Examples of computer-readable media include, without limitation, transmission-type media, such as carrier waves, and non-transitory-type media, such as magnetic-storage media (e.g., hard disk drives, tape drives, and floppy disks), optical-storage media (e.g., Compact Disks (CDs), Digital Video Disks (DVDs), and BLU-RAY disks), electronic-storage media (e.g., solid-state drives and flash media), and other distribution systems.
The process parameters and sequence of the steps described and/or illustrated herein are given by way of example only and can be varied as desired. For example, while the steps illustrated and/or described herein may be shown or discussed in a particular order, these steps do not necessarily need to be performed in the order illustrated or discussed. The various exemplary methods described and/or illustrated herein may also omit one or more of the steps described or illustrated herein or include additional steps in addition to those disclosed.
The preceding description has been provided to enable others skilled in the art to best utilize various aspects of the exemplary embodiments disclosed herein. This exemplary description is not intended to be exhaustive or to be limited to any precise form disclosed. Many modifications and variations are possible without departing from the spirit and scope of the instant disclosure. The embodiments disclosed herein should be considered in all respects illustrative and not restrictive. Reference should be made to the appended claims and their equivalents in determining the scope of the instant disclosure.
Unless otherwise noted, the terms “connected to” and “coupled to” (and their derivatives), as used in the specification and claims, are to be construed as permitting both direct and indirect (i.e., via other elements or components) connection. In addition, the terms “a” or “an,” as used in the specification and claims, are to be construed as meaning “at least one of.” Finally, for ease of use, the terms “including” and “having” (and their derivatives), as used in the specification and claims, are interchangeable with and have the same meaning as the word “comprising.”