This application claims priority under 35 U.S.C. §119(a) to an Indian Complete Patent Application Serial No. 201641002928, which was filed in the Indian Intellectual Property Office on Jan. 27, 2016, the entire content of which is incorporated herein by reference.
1. Field of the Disclosure
The present disclosure relates generally to a data processing system, and more particularly, to a method and an accelerator unit for interrupt handling.
2. Description of the Related Art
Generally, a processor communicatively coupled to a data processing system carries interrupts from an interrupt controller. As interrupts can be generated in a plurality of devices, such as peripheral devices external to the processor, an interrupt control system is usually used for collecting external interrupt signals received from a plurality of interrupt sources and sending the external interrupt signals to the processor as interrupt request signals (IRQs). Examples of the data processing system may include an embedded system, a mobile device, a computer, etc.
An IRQ notifies the processor of an occurrence of an irregular and/or exceptional event. An interrupt is usually the result of an event occurring outside of a central processor, such as an event occurring in peripheral device or from an internal component, which requires the processor to pause current operations that are being executed and switch to an interrupt service routine (ISR), which may be stored in a memory. Generally, an ISR is specifically configured to handle the interrupt. Thereafter, the processor resumes a normal instruction routine.
Further, the interrupt controller may provide a vector address for each external interrupt signal. The interrupt controller may be associated with interrupt controller hardware, e.g., a vectored interrupt controller (VIC), a generic interrupt controller (GIC), etc., which are used to prioritize interrupts in order to provide the highest priority interrupt to the processor. Once the processor receives the vector address corresponding to the external interrupt signal from the VIC, the processor abandons a current execution routine and restarts multi-cycle instruction, such as load or store load multiple registers (LDM), PUSH, or POP, if an interrupt request received from the VIC is made part way through execution. The ISR can then be fetched from instruction cache, the memory, or a tightly-coupled memory (TCM), while the data cache lines fill up.
For example, in order for the ARM® Cortex®-R4 to respond to an interrupt takes 20 cycles using a dedicated fast interrupt request (FIQ), which is the best case and assumes an ISR is immediately available in the R4 TCM or cache. Worst case response increases to 30 cycles when using the IRQ.
Also, during post ISR scheduling, the time consumed by a scheduler decision procedure to identify highest ready task to be activated, and context restore for the preempted or relinquished task cause a delay to process the IRQ.
While there are some conventional methods that attempt to reduce interrupt latency by using nested interrupt handlers, which allow further interrupts to occur while currently servicing an interrupt routine, these conventional methods are only used in some systems, and there is not much evidence about any actual reduction in interrupt latency. Even an optimal real time operating system (RTOS) requires a minimal program code to store a return state (context) when a running task is interrupted.
Additionally, a faster interrupt response is always critical in any data processing system. Thus, there exists a need to reduce interrupt latency for a deterministic behavior of the data processing system.
An aspect of the present disclosure is to provide a method and an accelerator unit for interrupt handling.
Another aspect of the present disclosure is to provide a mechanism for receiving an interrupt request at an accelerator unit.
Another aspect of the present disclosure is to provide a mechanism through which an accelerator unit is utilized for stacking a general purpose register or a plurality of general purpose registers in an inbuilt last in first out (LIFO) unit.
Another aspect of the present disclosure is to provide a mechanism through which an accelerator unit can send a vector address corresponding to an interrupt request to a processor to process the interrupt request.
Another aspect of the present disclosure is to provide a mechanism for receiving an interrupt request by a processor, from an accelerator unit, and processing the interrupt request.
Another aspect of the present disclosure is to provide a mechanism for detecting, by a processor, a scheduler indication associated with an interrupt request.
Another aspect of the present disclosure is to provide a mechanism for performing, by a processor, an action based on scheduler indication post processing of an interrupt request.
In accordance with an aspect of the present disclosure a method is provided for interrupt handling. The method includes receiving, by an accelerator unit, an interrupt request; stacking, by the accelerator unit, a plurality of general purpose registers in an inbuilt last in first out (LIFO) unit; and sending, by the accelerator unit, a vector address corresponding to the interrupt request to a processor, which processes the interrupt request.
In accordance with another aspect of the present disclosure a method is provided for handling interrupt. The method includes receiving, by a processor from an accelerator unit, an interrupt request; processing, by the processor, the interrupt request; detecting, by the processor, a scheduler indication associated with the interrupt request; and performing, by the processor, an action based on the scheduler indication.
In accordance with another aspect of the present disclosure an interrupt controller is provided for interrupt handling including a processor; and an accelerator unit including an inbuilt last in first out (LIFO) unit. The accelerator unit is configured to receive an interrupt request, stack a plurality of general purpose registers in the inbuilt LIFO unit, and send a vector address corresponding to the interrupt request to the processor, which processes the interrupt request.
In accordance with another aspect of the present disclosure an accelerator unit is provided, which includes an inbuilt Last-in-First-Out (LIFO) unit for interrupt handling. The accelerator unit is configured to receive an interrupt request. Further, the accelerator unit is configured to stack a plurality of general purpose registers in the inbuilt LIFO unit. Furthermore, the accelerator unit is configured to send a vector address corresponding to the interrupt request to a processor to process the interrupt request.
In accordance with another aspect of the present disclosure accelerator unit is provided, which includes a master unit that receives an interrupt request; an inbuilt last in first out (LIFO) unit that stacks a plurality of general purpose registers in the inbuilt LIFO unit. The master unit is also configured to send the vector address corresponding to the interrupt request to a processor, which processes the interrupt request.
In accordance with another aspect of the present disclosure a processor is provided for handling an interrupt. The processor is configured to receive an interrupt request from an accelerator unit; process the interrupt request; detect a scheduler indication associated with the interrupt request; and perform an action based on the scheduler indication.
The above and other aspects, features, and advantages of certain embodiments of the present disclosure will be more apparent from the following detailed description taken in conjunction with the accompanying drawings, in which:
Various embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. In the following description, specific details such as detailed configuration and components are merely provided to assist the overall understanding of these embodiments of the present disclosure. Therefore, it should be apparent to those skilled in the art that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. In addition, descriptions of well-known functions and constructions are omitted for clarity and conciseness.
Throughout the drawings, similar reference characters may refer to corresponding features.
Further, examples used herein are intended merely to facilitate an understanding of ways in which the embodiments herein can be practiced and to further enable those skilled in the art to practice the embodiments herein. Accordingly, the examples should not be construed as limiting the scope of the disclosure.
Herein, the term “or” refers to a non-exclusive or, unless otherwise indicated.
Unless defined otherwise, all technical terms used herein have the same meanings as commonly understood by a person having ordinary skill in the art to which this disclosure belongs.
The term “application programming interface (API) dependent” may non-exclusively refer to an ISR of an interrupt that could invoke RTOS APIs affecting a task scheduling decision of a kernel.
The term “API independent” may non-exclusively refer to an ISR of an interrupt that may not invoke RTOS APIs affecting a task scheduling decision of a kernel.
Accordingly to an embodiment of the present disclosure, a method is provided for interrupt handling, which includes receiving, at an accelerator unit, an interrupt request, stacking, by the accelerator unit, a plurality of general purpose registers in an inbuilt LIFO unit, and sending, by the accelerator unit, a vector address corresponding to the interrupt request to a processor, which processes the interrupt request.
Further, the method may include detecting, by the accelerator unit, an end of the interrupt indication, and unstacking the plurality of general purpose registers from the inbuilt LIFO unit.
For example, for a long term evolution (LTE) modem operating at a higher user equipment (UE) capability, during peak data throughput, there will be a large number of interrupts that will be processed for uplink and downlink data transmissions. The interrupt latency of each interrupt is accountable for the processor scheduling a task, which in turn affects the performance of the data processing system.
During LTE high data throughput, a protocol stack (modem) processor processes multiple interrupts every millisecond. Accordingly, the total number of interrupts processed in the complete system every millisecond will be in several thousand every second. The total Interrupt count further increases for a multi-core platform, wherein a higher number of interrupts is used by a RTOS scheduler for inter-core communication.
Unlike the conventional mechanism, a method and system in accordance with an embodiment of the present disclosure can reduce the delay involved in processing the interrupt, both by an interrupt controller and a processor. Further, unique hardware logic is provided to reduce an interrupt latency and to improve performance of the data processing system.
Referring to
At second phase, an interrupt latency due to the CPU is incurred when the CPU accesses the vector address corresponding to the IRQ received from the interrupt controller. Further, a processor associated with the CPU invokes a corresponding ISR to stack a plurality of general purpose registers associated with the CPU, to switch a register bank, to check whether the interrupt requires ISR to be invoked, to locate or branch to a beginning of an interrupt handler, and to unstack saved registers at the end of the ISR.
Thus, the processor performs an independent operation for stacking and unstacking the CPU registers during an interrupt. Fetching, decoding, and executing pipeline activities are also reset at the IRQ, and consequently, the processor resumes by fetching an instruction from the pipeline, after hardware interrupt process. As a result of the first phase and second phase incurrence, there are significant delays in the data processing system due to the interrupt latency by the hardware and the processor, before the actual ISR is invoked.
Referring to
For example, as illustrated in
At first, task-A is suspended for a message on Queue-X. While task-A is suspended, the kernel schedules task-B.
When task B is scheduled by the kernel, task-B posts a message to Queue-X. As a result, the kernel identifies that there is a higher priority task waiting on Queue-X. Thus, the kernel preempts task-B, and resumes task-A. This is a predictable preemption mode with respect to task scheduling.
As another example, for an interrupt driven (forced) preemption, where task-B is currently running, whenever there is an external interrupt signaled, the control jumps to an interrupt handler preempting task-B execution. The kernel stores preempted task-B (Task-B context in Stack). When the received interrupt signal is serviced, the kernel invokes the scheduler to identify highest priority ready task (Task-B in this case) and restores preempted task (Task-B context from Stack).
This preemption forced (or induced) because of asynchronous external factors like interrupts. Generally, after the completion of the ISR, the original operation is resumed from the point where it was preempted.
Alternatively, when the priority of task-A is greater than priority of task-B and the priority of task B is greater than the priority of task-C, at first, task-A is suspended for a message on Queue-X. While task-A is suspended, the kernel schedules task-B.
Whenever there is an external interrupt signaled, the CPU program control (e.g., referred to as a program counter of a computer processor) jumps to the interrupt handler preempting task-B execution. The kernel stores preempted task context in task-B stack.
After the signaled interrupt, the kernel sends a message to Queue-X, thereby activating task-A. Thus, the kernel invokes a scheduler to identify the highest priority ready task (task-A in this case) and re-stores the relinquished task (task-A context) from the stack.
The kernel resumes task-A from the point where it had relinquished the processor and defers execution of preempted task-B
Referring to
Referring to
Unlike the conventional mechanism illustrated in the
Specifically, the accelerator unit 202 receives an IRQ from an interrupt request logic associated with the interrupt controller 200a. The control register unit 211 is accessed by a processor unit 200b, coupled to the interrupt controller 200a, to configure the accelerator unit 202. The CPU registers are accessed from the accelerator unit 202 through the master unit 212 associated with the bus unit 213. The bus unit 213 can read all of the CPU register values at once and store the register values into the inbuilt LIFO unit 214. The inbuilt LIFO unit 214 may be implemented using a general LIFO memory logic, The master unit 212 and the control register unit 211 may have a logical hardware that is implemented based on the functionalities discussed throughout this disclosure.
When an interrupt is received from an IRQ controller to the accelerator unit 202, the master unit 212 receives the interrupt. The master unit 212 finds the number of CPU registers to be stored in the inbuilt LIFO unit 214 by accessing control register unit 211.
The control register unit 211 is configured by the program running in the processor unit 200b during system initialization. The control register unit 211 contains the information about number of CPU registers to be stored as part of a context store operation, which depends on the RTOS used in the system. The master unit 212 performs the context store operation by accessing the CPU register values from processor unit 200b using inbuilt bus unit 213 and stores the CPU register values to the in-built LIFO unit 214. Upon completing the context store operation, the master unit 212 triggers the interrupt nIRQ to the processor unit 200b and also provides the vector address.
Upon completing the execution of ISR for the triggered interrupt, the processor unit 200b clears the interrupt vector to indicate a completion of interrupt processing. The master unit 212 receives the indication and performs context restore operation by fetching back CPU register values from the in-built LIFO unit 214 to processor unit 200b using bus unit 213 for the specified number of CPU registers configured in control register unit 211.
Further, the CPU register can be stacked in the inbuilt LIFO unit 214 while providing the vector address to the CPU. The number of register(s) pushed to the stack can be variably decided by the CPU and configured accordingly in the control register unit 211.
The processor unit 200b, associated with the CPU, can be configured to process the vector address corresponding to the IRQ received from the interrupt controller 200a.
The inbuilt LIFO unit 214, which is associated with the interrupt controller 200a, may be an inbuilt dedicated memory (e.g., a Random Access Memory (RAM).
The register unit 204 included in the processor unit 200b is used to store the registers values (i.e., CPU registers).
Although
Further, the labels or names of the components in
Referring to
At this time, a bus unit associated with the accelerator unit 302 reads the CPU registers at once and stores the required number of CPU register into an inbuilt LIFO unit. For example, the stored values can be the CPU context during the interrupt.
A processor unit that processes the IRQ (included in the CPU) can therefore initiate the ISR without storing any CPU register context, which saves about 17 cycles to store all the 17 ARM® Cortex®-R4 registers (R0-R15 and CPSR (Current Program Status Register)). In general, ‘n’ cycles can be saved based on RTOS implementation, where the number of CPU registers stored can vary between R0-R(n−1) (where, n<=17).
Upon completing the ISR, the processor unit clears the interrupt, which in-turn informs the interrupt controller 300 that ISR is completed and the interrupt can be marked as inactive. During this operation the accelerator unit 302 restores the CPU registers from the inbuilt LIFO unit.
The same implementation is applicable to the interrupt controller 300 for nested interrupt and interrupt prioritization.
The memory size for an inbuilt LIFO unit can be selected based on a memory required to store a worst case number of CPU registers and depth of nested interrupt supported.
For example, Tables 1 and 2 illustrate profiling results of an RTOS following cycle consumption using a hardware platform based on the ARM® Cortex®-R4 with a vector interrupt controller.
As shown above, there is a 10 to 26 cycle reduction in the above data, if the CPU register context store and restore occurs using an accelerator unit, as described above, (i.e., a flat 26 cycle reduction for RTOS storing R0 to R12 and 10 cycles for certain RTOS storing only R0-R5).
Referring to
In step 404, the accelerator unit 202 stacks a plurality of general purpose registers in an inbuilt LIFO unit.
In step 406, the accelerator unit 202 sends the vector address corresponding to the IRQ to the processor unit 200b to process the IRQ received.
In step 408, the processor unit 200b processes the IRQ received from the accelerator unit 202.
In step 410, the processor unit 200b detects a scheduler indication associated with the IRQ.
In step 412, the processor unit 200b performs an action based on the scheduler indication post processing of the IRQ.
For example, a VIC associated with the accelerator unit 202 receives the interrupt. The VIC processes the received interrupt and provides the vector address based on the priority of the interrupt to the CPU.
Unlike the conventional mechanism, the method illustrated in
Furthermore, the accelerator unit 202 communicates with the processor unit 200b associated with the CPU. The processor unit 200b therefore executes the ISR and provides an end of interrupt indication to the accelerator unit 202. The processor unit 200b restores the CPU registers from the inbuilt LIFO unit, updating the status of the interrupt (mark as inactive).
The various actions, acts, blocks, steps, etc., as illustrated in
For example, the proposed mechanism can be broadly classified as API Independent, which does not invoke any RTOS API that would impact a scheduling decision, or API dependent, which invokes RTOS API that might impact a post interrupt scheduling decision.
Because the particular classification is known at the beginning of the interrupt processing through the accelerator unit 202 and the interrupt controller 200a, the scheduler at the end of interrupt processing can be avoided, and the amount of time spent in context store and context restore operations is reduced.
Referring to
In step 604, the processor unit 200b receives an external interrupt. Each interrupt may be associated with a ‘Sticky-bit’ that categorizes the associated ISR as being API dependent or API independent. A user (e.g., a programmer), at the time of defining or registering the ISR, configures the sticky bit, which could be stored in a global Interrupt table.
In step 606, at the accelerator unit 202, a CPU program controls jumping to the interrupt handler by preempting task-A execution and the kernel stores preempted task context in task-A stack.
In step 608, the processor unit 200b receives an external interrupt for the ISR, from the interrupt handler associated with the accelerator unit 202.
In step 610, the processor unit 200b determines whether to invoke an API dependent configuration.
If processor unit 200b determines that the API dependent configuration is to be invoked, the processor unit 200b invokes a kernel scheduler, based on the information available in a global interrupt table.
In step 614, the kernel scheduler determines whether the task-A still has the highest priority ready.
In step 616, the processor unit 200b restores the context for next highest priority ready task. If the task-A still has the highest priority ready in the step 614 (or the API dependent configuration is not to be invoked in step 610), the processor unit 200b performs a context restore for task-A in step 618.
For example, one of the possible approaches to implement in the current ARM® based platform is by mapping all API dependent ISRs to the nIRQ all the API independent ISRs to nFIQ, and perform only minimal context store (e.g., SRS instruction, save CPSR (Current Program Status Register) & LR (Link Register) in task stack) for API independent ISRs as shown.
The nIRQ is the interrupt vector number. ‘n’ stands for interrupt number which ranges from values 1 to total number of interrupts supported in the system.
The nIRQ is generic ARM literature terminology to denote interrupt number n. The “n” in the “nFIQ” stands for fast interrupt occurred number which ranges from values 1 to Total number of fast interrupts supported in the system. nFIQ is generic ARM literature terminology to denote fast interrupt number n.
Nesting interrupt support can also be performed, where in post first invocation of the API dependent ISR, all further ISRs (i.e., API dependent and API independent) can be treated as API dependent ISRs.
For example, Table 3 below shows the latency reduction during post ISR activities.
As shown in the Table 3, there is a flat reduction of 37 cycles incurred in scheduler activities, which is an approximate 40% reduction of post ISR scheduling latency.
Although the method and the accelerator unit 202 described in the aforementioned embodiments is for interrupt handling, it is to be understood that other embodiments are not limited thereon. A person having ordinary skill in the art may identify that proposed method or accelerator unit 202 can be used to perform various operations in process, a thread component, a task, a job, etc., thereby decreasing the number of CPU cycles as compared with the conventional mechanisms.
For example, in the conventional systems, when a user generated instruction requires information to be read from a register, the CPU uses PUSH and POP operations to read the information from the register. The CPU allocates a separate CPU cycle for each of such PUSH/POP operations.
Unlike the conventional mechanisms, the proposed accelerator unit 202 provides an option to trigger hardware assisted PUSH and POP operation within a single CPU cycle.
Specifically, the CPU instructs the accelerator unit 202 to read the information from the register. The accelerator unit 202 uses a bus unit to read all of the registers at once via PUSH and POP operations within a single CPU cycle, thereby reducing the delay involved in the CPU instruction cycle.
Referring to
The storage unit 712 may include one or more computer-readable storage media. The storage unit 712 may include non-volatile storage elements. Examples of such non-volatile storage elements may include magnetic hard discs, optical discs, floppy discs, flash memories, or forms of electrically programmable memories (EPROM) or electrically erasable and programmable (EEPROM) memories. In addition, the storage unit 712 may, in some embodiments, be considered a non-transitory storage medium. The term “non-transitory” may indicate that the storage medium is not embodied in a carrier wave or a propagated signal. However, the term “non-transitory” should not be interpreted to mean that the storage unit 712 is non-movable. The storage unit 712 may be configured to store larger amounts of information than the memory 710. For example, a non-transitory storage medium may store data that can, over time, change (e.g., in Random Access Memory (RAM) or cache).
The overall computing environment 702 can be composed of multiple homogeneous and/or heterogeneous cores, multiple CPUs of different kinds, special media and other accelerators.
The processing unit 708 is responsible for processing the instructions of the technique. Further, the plurality of processing units 708 may be located on a single chip or over multiple chips.
The technique comprising of instructions and codes required for the implementation are stored in the memory unit 710 and/or the storage 712.
At the time of execution, the instructions may be fetched from the corresponding memory 710 and/or storage 712, and executed by the processing unit 708.
In case of any hardware implementations various networking devices 716 or external I/O devices 714 may be connected to the computing environment to support the implementation through the networking unit and the I/O device unit.
The networking device 716 may be used to perform the instructions received from an accelerator unit. The networking device 716 may be used to communicate interrupt signals associated with the IRQ with the various units of an interrupter controller and further with the various units of the CPU.
The embodiments disclosed herein can be implemented through at least one software program running on at least one hardware device and performing network management functions to control the elements. The elements illustrated in
While the present disclosure has been particularly shown and described with reference to certain embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present disclosure as defined by the following claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
201641002928 | Jan 2016 | IN | national |