The present invention relates generally to a hardware-based approach to calculating CPU utilization.
A real time operating system is an operating environment for software that facilitates multiple time-critical tasks being performed by a processor according to predetermined execution frequencies and execution priorities. Such an operating system includes a complex methodology for scheduling various tasks such that the task is complete prior to the expiration of a deadline. During software development, it is important to understand the typical processor utilization to ensure that the code is sufficiently compact and ensure all deadlines are met.
A method of determining processor utilization includes: counting, via a first counter on a processor, a number of elapsed clock cycles while code is being executed; counting, via a second counter on a processor, a total number of free-running clock cycles; and dividing the number of clock cycles where code is being executed by the total number of free-running clock cycles to determine a CPU utilization.
In one configuration, the processor may include an instruction execution unit configured for software code execution, and a performance monitor unit configured to monitor the performance of the instruction execution unit. The performance monitor unit may be configured to operate separate from the instruction execution unit, and may maintain the first counter in a first register.
The step of counting the number of elapsed clock cycles where code is being executed may include: initializing the first counter to a predetermined value; detecting, via hardware, the start of an interrupt service routine; unfreezing the first counter to allow the counter to begin incrementing clock cycles; detecting, via hardware, the completion of the interrupt service routine; freezing the first counter to prevent the counter from further incrementing; and determining the number of clock cycles that have elapsed since the first counter was initialized.
The above features and advantages and other features and advantages of the present invention are readily apparent from the following detailed description of the best modes for carrying out the invention when taken in connection with the accompanying drawings.
Referring to the drawings, wherein like reference numerals are used to identify like or identical components in the various views,
The present method 10 presents a substantially hardware-based approach to determining CPU utilization, which does not require software intervention to operate. This method may be used, for example, with any processor having a performance monitor unit that is separate from the core's general instruction execution unit.
In general, the performance monitor unit (or other hardware equivalents) is a customizable portion of the core that can count and/or time any of a number of predefined events. The performance monitor unit may be a fully autonomous logic circuit having customizable behavior according to the states of various dedicated memory registers. In other embodiments, the performance monitor unit may include certain low-level, dedicated processing capabilities to allow it to function in the manner described below. As presently configured, the performance monitor unit may be configured to allow the first counter to begin incrementing whenever an interrupt service routine (ISR) is being executed, and may suspend incrementing of the first counter when the ISR has completed and/or when the instruction execution unit has reverted back to a “background idle” task state.
The memory module 24 may be, for example, non-volatile memory that is either on-board the processor 20, or readily accessible by the processor 20. The memory module 24 may include program memory 40 that includes a plurality of interrupt service routines (ISRs) (i.e., ISRs 42, 44, 46, 48, 50). Each ISR may be embodied by software code that is organized into a plurality of sequential commands to accomplish a particular task or computation. Each ISR may be assigned a respective frequency and/or priority at which it should be executed by the core 22.
Within the core 22, the instruction execution unit 26 may be responsible for general software code execution. The instruction execution unit 26 may be in communication with the memory module 24 via a communications bus 60, and may include a plurality of volatile general purpose registers 62, 64, 66. During the software execution, the instruction execution unit 26 may load and execute the various ISRs in a manner that respects their ideal execution frequency and/or priority. A programmable interrupt controller 68, for example, may schedule/prioritize the various ISRs for the instruction execution unit 26, and/or may manage one or more Interrupt Requests (IRQs). Based on the requested execution frequencies and timing, there may be periods of time in which the instruction execution unit 26 has completed the execution of an ISR, and not yet been instructed to begin a subsequent ISR. In these periods of time, the instruction execution unit 26 may operate in a “background idle” state, where it may execute other non-time-critical tasks and/or wait for the next interrupt to occur. While this description of code execution is likely an oversimplification of the operation of a typical microprocessor, it should be viewed as generally illustrative of the handling of ISRs in a real-time operating environment.
The performance monitor unit 28 may be in communication with a clock 30/oscillator that sets the cadence for all operations within the processor 20. In general, the clock 30 alternates between two states (i.e., high (1) and low (0)) on a regular and periodic basis. One cycle of the clock 30 may equal one full “high” state, and one full “low” state.
The performance monitor unit 28 may further include a first register 80 and a second register 82. Each of the first register 80 and second register 82 may be configured as counters to count cycles of the clock 30. The performance monitor unit 28 may be configured to “freeze” the first register 80 (i.e., temporarily suspend it from further counting) while the instruction execution unit 26 is in a background idle state, and may “unfreeze” (i.e., allow it to count/increment) while the instruction execution unit 26 is executing code from an ISR. Conversely, the second register 82 may be configured to continuously count clock cycles on a free-running basis, regardless of the behavior of the instruction execution unit 26.
The performance monitor unit 28 may selectively freeze and unfreeze the incrementing of the first register 80 specifically at the direction of a control bit 84 within the MSR 32 (i.e., the performance monitor mark (PMM) bit 84). More particularly, in one configuration, the PMM bit 84 may be set low when an interrupt occurs (i.e., when an ISR is called or initiated), and may return high when the ISR completes and/or when the instruction execution unit 26 returns to a background idle state. In one configuration, the PMM bit 84 may be toggled automatically between high and low states by the CPU 22 when an ISR is called/completed. For example, in one configuration, upon entry into an ISR, the CPU 22 may automatically (via hardware) set the PMM bit 84 low. Upon completion of the ISR, the CPU 22 may return the PMM bit 84 to whatever it was previously set to prior to that ISR. In addition to automatic hardware manipulation, the PMM bit 84 may be manually set to a particular value by software code that may be executed via the instruction execution unit 26. Said another way, in one configuration, PMM bit 84 in the MSR 32 may always be automatically cleared by the CPU 22 and then restored by the CPU 22 at the respective beginning and end of every interrupt. The code that is then executed within the interrupt may also selectively alter the state of the PMM bit 84 at a time between the hardware manipulations.
Periodically, and at a low priority an ISR (e.g., ISR 48) may interface with the first and/or second performance monitor unit registers 80, 82 to compute a CPU utilization rate (i.e., step 16 from
Using the number of clock cycles counted by the first counter, total CPU utilization may be computed in two slightly differing manners.
As shown in
While the method 110 illustrated in
As shown in
While the methods 110, 130 described above are useful in determining a total processor utilization rate (i.e., processor utilization across all ISRs), the performance monitor unit 28 may also be used to aid in determining a utilization rate for one or more specific tasks (rather than for all tasks, as described above with respect to
In a task-specific monitoring configuration the performance monitor unit 28 may be initialized to count clock cycles (i.e., increment register 80) only while the PMM bit 84 is set high (as opposed to when it is set low, which is described above with respect to
While the best modes for carrying out the invention have been described in detail, those familiar with the art to which this invention relates will recognize various alternative designs and embodiments for practicing the invention within the scope of the appended claims. The states of “high” and “low” for the PMM bit 84 should not be read as specifically limiting, though should be understood as being distinct from each other. It is contemplated that the performance monitor unit 28 may be configured to freeze a counter at a high state and unfreeze at a low state, or vice versa. It is intended that all matter contained in the above description or shown in the accompanying drawings shall be interpreted as illustrative only and not as limiting.