This disclosure is related to multi-core processors.
The statements in this section merely provide background information related to the present disclosure. Accordingly, such statements are not intended to constitute an admission of prior art.
Processors are electronic devices that are configured with central processing unit (CPU)(s) and associated memory and storage devices that execute routines to perform tasks. Performance of a processor may be improved by increasing clock speed of the CPU, thus resulting in faster execution of routines. There is an upper bound on clock speed and associated processor performance due to mechanical, electrical and thermal limitations of the processor hardware and interface devices.
Multi-core processors have been introduced to improve performance in executing routines to perform tasks. In such architectures, the presence of multiple processing cores enables the capability of true parallel task execution. However, tasks simultaneously executing on different cores may need to synchronize and/or coordinate with one another because of application-level requirements.
A method for managing task execution in a multi-core processor includes employing a spinlock to effect a dynamically enforceable mutual exclusion constraint and employing a multi-processor priority ceiling protocol to effect the dynamically enforceable mutual exclusion constraint to synchronize a plurality of tasks executing in the first and second processing cores of the multi-core processor.
One or more embodiments will now be described, by way of example, with reference to the accompanying drawings, in which:
Referring now to the drawings, wherein the showings are for the purpose of illustrating certain exemplary embodiments only and not for the purpose of limiting the same,
Each of a plurality of tasks Ti includes a sequence of runnables, wherein m(i) denotes the number of runnables belonging to task Ti. The individual runnables are denoted as Ri,1 through Ri,m(i). The worst-case execution time of each runnable Ri,j is assumed to be known and denoted as Ci,j. The cumulative worst-case execution time of task Ti is the sum of all constituent runnable execution times, which is denoted as C, (i.e., Ci=Ci,1+Ci,2+ . . . +Ci,m(i). The term P(Ti) denotes the processing core to which task Ti is assigned. The first runnable Ri,1 of any task Ti is assumed to be either triggered periodically every Ti or by an event Ei,1 set by another task with period Ti. All subsequent runnables Ri,j(j>1) are assumed to be triggered by either the completion of the previous runnable Ri,j-1 or an external event Ei,j. Each runnable Ri,j is also given an offset Oi,j>0 such that the runnable is eligible to execute only after Oi,j time units have elapsed since the corresponding release of Ri,1.
The runnable that triggers or sets an event Ei,j is denoted by πi,j. In the scenario that a runnable Ri,j (j>1) is triggered by the previous runnable Ri,j-1 then Ei,j is set by πi,j=Ri,j-1. For convenience, πi,j=O for any runnable Ri,1 that is assumed to be triggered periodically every Ti. Each task has a deadline equal to its period Ti. This assumption follows from the fact that another iteration of task Ti will start if Ti does not complete in Ti time units. The priority assignment is assumed to follow rate-monotonic scheduling. Tasks with shorter periods are assigned higher scheduling priorities. Without loss of generality, the task set is given in non-decreasing order of periods and increasing order of priorities. The term hp(Ti) is employed to denote the set of tasks with higher priority than Ti, and lp(Ti) is employed to denote the set of tasks with lower priority than Ti. The term p(Ti) is employed to denote the priority of task Ti. For any lock M protecting a mutually exclusive shared resource, the term I(M) is employed to denote the number of tasks that access lock M, and CM is employed to represent the maximum duration for which M can be held.
Synchronization structures such as precedence constraints may be realized using events for multi-core precedence constraints. By way of example, two tasks are considered, including task T1 executing on processing core P1 and task T2 executing on processing core P2. The application requires a runnable R2,d of task T2 to start execution after the completion of a runnable R1,s of task T1. In this scenario, runnable R2,d on P2 can be made to Pend/Wait on an event E2,d, which can in turn be set by the completion of runnable R1,s on P1. Set/wait events are employed to statically enforce mutual exclusion constraints by enforcing precedence relationships between runnables in different tasks running on the same core, and are generalized to the context of multi-core processors.
Analysis of tasks with set/wait events includes developing a response-time analysis for such tasks. By way of example, a task T1 on processing core P with runnables Ri,1 through Ri,m(i) that uses set/wait events is evaluated. In one scenario none of the higher priority tasks Th (i.e., higher priority than Ti) on P use set/wait events to synchronize with other tasks, i.e., ∀Th εhp(Ti) and ∀k>1, πh,k=Rh,k-1. In this scenario, a bound on the worst-case response time of task Ti can be derived, as follows. Let F(Ri,j) denote an upper bound on the finish time of runnable Ri,j. In order to calculate F(Ri,j), the last runnable Ri,e of Ti before Ri,j is employed, which was triggered by an external event, i.e., e<j is the largest value such that e=1 or πi,e≠Ri,e. The time instant at which this external event is set can be denoted as Si,e. If Wi,{e . . . j} denotes the worst-case response time of the Ti segment including runnables Ri,e through Ri,j, then F(Ri,j) is determined as follows.
F(Ri,j)=Si,e+Wi,{e . . . j} [1]
The finish time of runnable Ri,j is thus no more than Wi,{e . . . j} from the setting of event Ei,e at Si,e.
An upper bound on the worst-case response time Wi,{e . . . j} can be obtained under the assumption that none of the higher-priority tasks on processing core P use external set/wait events. The worst-case response time Wi,{e . . . j} is calculated by using the standard response-time test, which is the convergence of the following:
Assuming that runnables set their corresponding events by the end of their execution, the result includes the following.
Si,e=F(πi,e) [3]
This operation is shown with reference to
In the scenario shown in
A mechanism includes assigning release offsets in addition to the event triggers to address release jitter. By way of example, a runnable triggered by an external event Ei,e is evaluated. The event Ei,e is guaranteed to be set by the worst-case finish time F(πi,e) of the runnable πi,e setting Ei,e. Therefore, assigning a static offset of Φi,e=F (πi,e) guarantees that the event Ei,e is set before runnable Ri,e starts executing. Static offsets thus act as a simple period enforcer, which allows for a periodic release of runnable Ri,e and relieves tasks with lower priority than task Ti from dealing with the release jitter of runnable Ri,e. If runnables are not assigned such static offsets, it may lead to longer worst-case response times and unpredictable runnable release times during system operation.
Details of the operation depicted in
The simultaneous execution of runnable R1,1 211 of T1210 and runnable R4,1 241 of T4240 allows execution of runnable R1,2 212 of T1210 at timestep 5 followed by execution of runnable R1,3 213 of T1210 at timestep 6. This action precludes execution of runnable R2,2 222 of T2220 due to its lower priority in the first processing core. Thus, T2220 fails to execute in its allotted time period, and a fault ensues, as indicated by element 235.
Introducing the static offset 214 after execution of runnable R1,1 211 frees a timestep in the first processing core during which lower priority T2220 may execute its runnable. Thus, introducing the static offset 214 after execution of runnable R1,1 211 increases the likelihood that lower priority T2220 executes its runnables in a timely manner without affecting response time of execution of T1210. Furthermore, introducing the static offset 214 improves timing predictability of lower priority tasks, e.g., T2220.
A spinlock primitive may be employed as a mechanism for run-time inter-core task synchronization in a multi-core operating system. A spinlock is a synchronization mechanism wherein a processing thread waits in a loop or spins while waiting to synchronize. The thread remains active but is not performing a useful task. Once acquired, a spinlock is held until released unless synchronization or another action releases the lock.
As described herein, runnables are required to acquire a spinlock before accessing any shared resource. If a resource is currently being used and the spinlock is being held, then the runnable requesting this resource continues to spin (nm) on the lock until it is released or until it is preempted by another higher priority runnable on the same core. However, a spinning runnable can be preempted. This acts as a dynamic run-time mechanism for mutual exclusion. A spinlock can be employed to improve system utilization by avoiding the overhead introduced by set/wait events. Shared logic resources can be protected across multiple processing cores using the spinlock primitive. Tasks accessing any mutually-exclusive shared resource may hold the spinlock before doing so. If a task tries to acquire a spinlock that is currently being held by some other task, the task spins or busy waits on the lock until it is released by the task. This mechanism thus provides support for realizing mutual exclusion of a shared resource in a multi-core processor.
In multi-core processor architectures, both spinlocks and set/wait events may be employed. For tasks where the mutual exclusion constraints need to be statically enforced, set/wait events are employed. For tasks where the mutual exclusion constraints can be dynamically enforced, spinlocks and/or a multi-processor priority ceiling protocol (MPCP) may be employed. For example, a system with multiple sensors and actuators, where the processing is handled on a dual-core processor is evaluated. The processing tasks themselves may use set/wait event constraints for statically defined mutual exclusion. The sensor and actuator data handling tasks might use run-time mutual exclusion primitives for ensuring consistent data readings. Although spinlocks successfully ensure mutual exclusion from a functional perspective, they may pose multiple problems from a timing perspective. Timing challenges associated with spinlocks include deadlocks, priority inversion, and starvation. A deadlock occurs when a task holding a lock is preempted by a higher-priority task which requires the same lock. In this case, the higher-priority task requesting the resource will be spinning or busy waiting forever. One solution to address this issue is to place design constraints that prevent tasks on the same processing core from using spinlocks.
A priority inversion occurs when a high-priority task waits for a low-priority task to release a resource. The use of spinlocks can lead to an unbounded duration of priority inversion, which presents a significant challenge to tasks with strict deadlines. In situations where there are many medium-priority tasks and multiple preemptions on a task, a high-priority task may end up facing an unbounded amount of priority inversion. Priority inversion is a more serious problem with spinlocks since the high-priority tasks are basically busy-waiting/wasting cycles on their processing cores. Long durations of priority inversion thus lead to significant loss in useful system utilization. Bounding such priority inversion is important to achieve both timing predictability and better utilization.
Starvation occurs when a task is starved from getting access to the shared resource. In one example, a spinlock is acquired and released back-and-forth between lower-priority tasks even though a higher-priority task is waiting on it. This sequence is possible when a hardware test-and-set implementation is not guaranteed to respect task priorities. When using spinlocks, starvation can arise because the hardware scheduling of test-and-set operations can be more favorable to certain processing cores, or the task spinning on the lock might be preempted on its processing core whenever the lock gets released by other tasks. As with the nature of spinlocks, the busy waiting during such starvation also leads to utilization loss from wasted processor cycles.
Task response-times with spinlocks can be determined for achieving bounded timing behavior that takes into consideration design constraints and assumptions. This includes using the analysis provided in EQS. 1, 2, and 3 by adding spinning time to account for the lock-waiting delays and a blocking time for the non-preemptive duration of lower-priority tasks. The blocking terms are calculated as follows. A lock-waiting delay is determined using the previously described set of design constraints. Each time a mutually exclusive task, i.e., a mutually exclusive event (mutex) M is requested by a Runnable Ri,j of task Ti, the lock-waiting time using spinlocks is restricted to a lock-waiting delay of (I(M)−1)CM, wherein CM is the maximum lock-holding time. A cumulative spinning time CI(i,j) can be calculated for each task Ti. A mutex is a process that ensures that no two processes or processing threads access a shared logic resource, e.g., shared memory cache or location, during the same time period, thus preventing corruption of the shared resource. A non-preemptive duration is determined for each event which includes each periodically-triggered runnable Ri,j of each task Ti. A non-preemptive duration Bn(i, j) of lower-priority tasks is bounded by using the maximum non-preemptive duration, which is determined as a maximum I(M)CM over all mutexes M that can be held by any task with lower priority than Ti. The lower-priority task is assumed to be non-preemptive over this entire duration. For runnables Ri,j (j>1) that are triggered by the completion of previous runnable Ri,j (j>1), the non-preemptive blocking duration Bn(i, j) is set as 0.
The spinning term CI(i,j) is added to the worst-case execution time of each runnable Ri,j for all tasks (including the higher-priority tasks). The blocking term Bn(i,j) needs to only be applied for the task Ti, whose response time is calculated using EQ. 2. The aforementioned analysis described is thus extended to include the use of spinlocks.
Placing a restrictive set of design constraints can lead to bounded response times with spinlocks. However, the spinlock mechanism still does not ensure a priority-ordered service of resource requests. In order to relax the assumptions and ensure a priority-driven service, a multi-processor priority ceiling protocol (MPCP) may be employed. The MPCP is employed for task synchronization in a multi-core processor to achieve a priority-ordered service of resource requests by relaxing assumptions. Terms of interest include a global mutex MG, which is a mutex shared by tasks deployed on different processing cores. The corresponding critical sections are referred to as global critical sections (GCS). Conversely, a local mutex is only shared between tasks on the same processing core, and the corresponding critical sections are local critical sections. When a task T acquires MG, it executes the GCS corresponding to the global mutex MG at a priority that is set as follows:
p(MG)=p(G)+p(T0)
wherein
Each MPCP minimizes remote blocking and priority inversions when global resources are shared. Each MPCP includes the following characteristics, including tasks use assigned priorities unless within critical sections, and a single processor priority ceiling protocol is used for all requests to local mutexes. A task within a global critical section (GCS) guarded by global mutex MG has the priority of its GCS, i.e., (p(G)+p(T0)). A task within a GCS can preempt another task T* within a GCS if the priority of the GCS for T0 is greater than the priority of the GCS for T*. When a task T requests a global mutex MG, the global mutex MG can be granted to T by means of an atomic transaction on shared memory if MG is not held by another task. If a request for a global mutex MG cannot be granted, the task T is added to a prioritized queue on MG before being preempted. The priority used as the key for queue insertion is the normal priority assigned to T. When a task T attempts to release a global mutex MG, the highest priority task Th waiting for MG is signaled and becomes eligible for execution at Th's host processing core at its GCS priority. If no tasks are suspended on global mutex MG, it is released.
A benefit of using the multi-processor priority ceiling protocol includes permitting priority-driven access to shared resources, wherein each global mutex has a priority queue of tasks suspended on it. When the resource is released, it is given to the highest priority task waiting on it. This property is not provided by spinlocks, which allows the hardware test-and-set primitives to determine the task that gets the resource. Another benefit of using the multi-processor priority ceiling protocol includes a restricted preemption of tasks that are holding locks. Under MPCP, a global priority ceiling p(G) is employed that allows for tasks holding mutexes to be preempted by mutexes with a higher remote priority ceiling, wherein p(M_G) represents the remote priority ceiling. This ensures the responsiveness of highest priority tasks that may have short deadlines.
Another benefit of using the multi-processor priority ceiling protocol includes no cycles wasted in a busy waiting mode, including in a suspension-based implementation of MPCP where tasks are allowed to suspend when the requested lock is not available. The task is added to a priority queue on the mutex and it is notified when the resource is granted. This avoids any cycles wasted in busy waiting. However, the task suspension itself introduces the possibility of lower-priority tasks executing and requesting global mutexes. This may lead to preemptions from such tasks at a later point of execution. In order to avoid such a penalty, a spin-based implementation of MPCP can be employed where tasks spin on the requested lock until it is available.
The aforementioned response-time test for tasks with set/wait events can be readily extended to handle synchronization with MPCP. Blocking terms are defined for a global-mutex-waiting delay, and a lower-priority global-mutex-holding duration. The global-mutex-waiting delays are described as follows. The tasks acquire global mutexes in priority order with the MPCP. The mutex can thus be viewed as a resource scheduled in a fixed-priority order. Blocking time BMi,j of a runnable Ri,j of task Ti accessing a global mutex M is defined as follows.
The first term corresponds to the maximum duration for which the mutex M can be held by any lower-priority task Tl when Ti requests M. The second term represents the maximum duration for which higher-priority tasks Th can hold the mutex M before task Ti can acquire M. Here, W′lM is the maximum global-mutex-holding time of task Tl with respect to global mutex M. Under MPCP, tasks holding the global mutex M can still be preempted by tasks holding mutexes with higher remote priority ceiling. The maximum global-mutex-holding time of task Tl is thus given by the convergence as follows.
In EQ. 5, the first term CM represents the worst-case execution time when holding the global mutex M. The second term represents the maximum preemption possible when Ti is holding the global mutex M by tasks Tk on the same processing core as Ti when they acquire mutexes M′ with higher remote priority ceilings than the global mutex M.
The total global-mutex-waiting delay Bi,j for runnable Ri,j is determined by summing the blocking times BMi,j for each access from Ti,j over all mutexes M. Bi is represented as the sum of waiting delays Bi,j over all runnables Ri,j belonging to task Ti.
A lower-priority global-mutex-holding duration is determined as follows. Whenever a task Tl with lower priority than Ti acquires a global mutex M, its priority gets promoted to the remote priority ceiling of M. This remote priority ceiling is defined to be above all normal execution priorities and hence causes a preemption of Ti even if Ti is not holding any locks. This preemption from the lower-priority global-mutex-holding duration can be accommodated by the blocking term Hi,j for each runnable Ri,j. If Ri,j is triggered externally and it acquires ρi,j global mutexes during execution, as follows.
If Ri,j (with j>1) is triggered by the completion of Ri,j-1 and it acquires ρi,j global mutexes during execution, then:
The total lower-priority global-mutex-holding duration Hi experienced by task Ti can be calculated by summing up the Hi,j over all runnables Ri,j belonging to task Ti.
The worst-case response-time convergence given in EQ. 2 can be modified as follows.
The analysis thus includes mutual exclusion using the Multi-processor Priority Ceiling Protocol (MPCP). In addition to providing priority-driven service and bounded timing properties, the use of MPCP eliminates the constraints and assumptions required for spinlocks.
The disclosure has described certain preferred embodiments and modifications thereto. Further modifications and alterations may occur to others upon reading and understanding the specification. Therefore, it is intended that the disclosure not be limited to the particular embodiment(s) disclosed as the best mode contemplated for carrying out this disclosure, but that the disclosure will include all embodiments falling within the scope of the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5727209 | Slingwine et al. | Mar 1998 | A |
6782440 | Miller | Aug 2004 | B2 |
6823511 | McKenney et al. | Nov 2004 | B1 |
7007270 | Martin et al. | Feb 2006 | B2 |
7058948 | Hoyle | Jun 2006 | B2 |
7844973 | Dice | Nov 2010 | B1 |
7861042 | Larson et al. | Dec 2010 | B2 |
8799591 | Larson et al. | Aug 2014 | B2 |
20030041173 | Hoyle | Feb 2003 | A1 |
20030115476 | McKee | Jun 2003 | A1 |
20050050257 | Shakula | Mar 2005 | A1 |
20050081204 | Schopp | Apr 2005 | A1 |
20090007117 | Cho | Jan 2009 | A1 |
20090158075 | Biberstein et al. | Jun 2009 | A1 |
20090271141 | Coskun et al. | Oct 2009 | A1 |
20090307707 | Gellerich et al. | Dec 2009 | A1 |
20100268913 | Van Halder et al. | Oct 2010 | A1 |
20110185216 | Zhao et al. | Jul 2011 | A1 |
Entry |
---|
Singhal et al. “A Distributed Mutual Exclusion Algorithm for Mobile Computing Environments” 1997 IEEE, pp. 557-561. |
Keane et al. “A Simple Local-Spin Group Mutual Exclusion Algorithm” 2001 IEEE, pp. 673-685. |
Li et al. “Synthesis of Structurally Simple Supervisors Enforcing Generalized Mutual Exclusion Constraints in Petri Nets” 2010 IEEE, pp. 330-340. |
Furst, Simon, AUTOSAR Technical Overview, 2nd AUTOSAR Open Conference, May 13, 2010, Tokyo, Japan. |
Moessinger, Juergen, AUTOSAR—The Standard for Global Cooperation in Automotive SW Development, ESEC 2008, May 14-16, 2008, Tokyo, Japan. |
Schauer, Bryan, Multicore Processors—A Necessity; ProQuest Discovery Guides, Sep. 2008. |
Number | Date | Country | |
---|---|---|---|
20140040904 A1 | Feb 2014 | US |