A common concept in computer programming is the execution of one or more instructions repetitively according to a given criterion. This repetitive execution can be accomplished by programming using recursion, fixed point iteration, or looping constructs, such as nested loops. In various instances computer programs can include nested repetitions of processes, in which a first repetitive process may execute a certain number of times according to a criterion, and in one or more instances of the execution of the first repetitive process a second repetitive process can execute according to a criterion. In such an instance, if the first repetitive process criterion directs the first repetitive process to execute “n” number of times, and the second repetitive process criterion directs the second repetitive process to execute “m” number of times, the total number of executions of the repetitive processes can be as great as n*m executions.
In some computer systems with multiple processors or multi-core processors, execution of processes can be run in parallel with each other on the multiple processors or cores. Such parallel execution of repetitive processes can improve the performance of the computer system. For example, in a computer system with four or more processors or processor cores, if the first repetitive process criterion directs the first repetitive process to execute n number of times, n can be split into p divisions, for example n0, n1, n2, . . . np. The p divisions of n can each represent a subset of the number of times to execute the first repetitive process. The first repetitive process can be assigned to execute on respective processors or processor cores for one of the subsets n0, n1, n2, . . . np. Each of the processors or processor cores can also execute the second repetitive process within the first repetitive process for the subset of n to which they are assigned.
However, in many computer systems, this does not alleviate an issue with the overall overhead involved in executing nested repetitive processes. In a task-based run-time system, a separate task can be created for each execution of the p divisions of first repetitive process and the m iterations of the second repetitive processes, creating p*m tasks. The greater the number of tasks the greater an amount of overhead is created for managing all of the tasks.
The methods and apparatuses of various aspects provide circuits and methods for task-based handling of nested repetitive processes. An aspect method may include partitioning iterations of an outer repetitive process into a first plurality of outer partitions, initializing a first task for executing iterations of a first outer partition, initializing a first shadow task for executing iterations of an inner repetitive process for the first task, initializing a second task for executing iterations of a second outer partition, executing the first task by a first processor core and the second task by a second processor core in parallel, and executing the first shadow task for the iterations of the inner repetitive process each time a condition calls for executing the inner repetitive process upon availability of the second processor core and assignment to the second processor core.
An aspect method may further include completing execution of the second task, determining whether the first outer partition is divisible, and partitioning the first outer partition of the first task into a second plurality of outer partitions in response to determining that the first outer partition is divisible.
An aspect method may further include assigning a third outer partition of the second plurality of outer partitions to the first task, assigning a fourth outer partition of the second plurality of outer partitions to the second task, executing the first task on the third outer partition by the first processor core and the second task on the fourth outer partition by the second processor core in parallel, completing execution of the second task a subsequent time resulting in availability of the second processor core, and assigning the first shadow task to the second processor core.
An aspect method may further include discarding the second task, initializing a third task for executing iterations of a fourth outer partition of the second plurality of outer partitions, assigning a third outer partition of the second plurality of outer partitions to the first task, assigning the fourth outer partition of the second plurality of outer partitions to the third task, executing the first task on the third outer partition by the first processor core and the third task on the fourth outer partition by the second processor core in parallel, completing execution of the third task resulting in availability of the second processor core, and assigning the first shadow task to the second processor core.
In an aspect, completing execution of the second task results in availability of the second processor core, and an aspect method may further include determining whether the inner repetitive process of the first task is divisible in response to determining that the first outer partition of the outer repetitive process is indivisible, partitioning the iterations of the inner repetitive process into a first plurality of inner partitions in response to determining that the inner repetitive process of the first task is divisible, assigning the iterations of the inner repetitive process to the first shadow task, in which the iterations of the inner repetitive process comprise a first inner partition, and assigning the first shadow task to the second processor core.
An aspect method may further include initializing a second shadow task for executing the iterations of the inner repetitive process for the first task upon availability of a third processor core, assigning a second inner partition to the second shadow task, assigning the second shadow task to the third processor core, and executing the second shadow task for iterations of the second inner partition of the inner repetitive process each time a condition calls for executing the inner repetitive process.
An aspect method may further include partitioning the iterations of the inner repetitive process by a number of partitions equivalent to a number of available processor cores.
An aspect method may further include partitioning the iterations of the outer repetitive process by a number of partitions equivalent to a number of available processor cores.
An aspect method may further include initializing a first pointer for the first task, updating the first pointer to indicate the execution of the iterations of the inner repetitive process of the first outer partition, and checking the first pointer to determine an iteration of the inner repetitive process of the first outer partition for executing by the first shadow task.
An aspect includes a computing device having a plurality of processor cores in which at least one processor core is configured with processor-executable instructions to perform operations of one or more of the aspect methods described above.
An aspect includes a non-transitory processor-readable medium having stored thereon processor-executable software instructions to cause a plurality of processor cores to perform operations of one or more of the aspect methods described above.
An aspect includes a computing device having means for performing functions of one or more of the aspect methods described above.
The accompanying drawings, which are incorporated herein and constitute part of this specification, illustrate example aspects of the invention, and together with the general description given above and the detailed description given below, serve to explain the features of the invention.
The various aspects will be described in detail with reference to the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts. References made to particular examples and implementations are for illustrative purposes, and are not intended to limit the scope of the invention or the claims.
The terms “computing device” is used herein to refer to any one or all of cellular telephones, smartphones, personal or mobile multi-media players, personal data assistants (PDA's), personal computers, laptop computers, tablet computers, smartbooks, ultrabooks, palm-top computers, wireless electronic mail receivers, multimedia Internet enabled cellular telephones, wireless gaming controllers, desktop computers, compute servers, data servers, telecommunication infrastructure rack servers, video distribution servers, application specific servers, and similar personal or commercial electronic devices which include a memory, and one or more programmable multi-core processors.
The terms “system-on-chip” (SoC) and “integrated circuit” are used interchangeably herein to refer to a set of interconnected electronic circuits typically, but not exclusively, including multiple hardware cores, a memory, and a communication interface. The hardware cores may include a variety of different types of processors, such as a general purpose multi-core processor, a multi-core central processing unit (CPU), a multi-core digital signal processor (DSP), a multi-core graphics processing unit (GPU), a multi-core accelerated processing unit (APU), and a multi-core auxiliary processor. A hardware core may further embody other hardware and hardware combinations, such as a field programmable gate array (FPGA), an application-specific integrated circuit (ASCI), other programmable logic device, discrete gate logic, transistor logic, performance monitoring hardware, watchdog hardware, and time references. Integrated circuits may be configured such that the components of the integrated circuit reside on a single piece of semiconductor material, such as silicon. Such a configuration may also be referred to as the IC components being on a single chip.
In an aspect, a process executing in a scheduler, within or separate from an operating system, for a multi-processor or multi-core processor system may reduce the overhead of nested repetitive processes (e.g., nested loops) in task-based run-time systems employing parallel processing, across multiple processors or processor cores, of tasks including portions of the processing of an outer repetitive process (or first repetitive process) by creating shadow tasks for each task for potentially processing an inner repetitive process (or second repetitive process). In an aspect, the outer repetitive process may have a criterion to execute until an outer repetition value (or first repetition value) with a relationship to a value n is realized. The relationship between the outer repetition value and the value n may be any arithmetic or logical relationship. To employ parallel processing of the outer repetitive process, tasks may be initialized for subsets, or partitions, of the criterion. For example, if the criterion is to repeat the outer repetitive process for each value between a starting value and the value n by incrementing the outer repetition value until it equals n, then the task may be assigned a subset of the repetitions between the starting value and the value n.
The number of tasks, represented here by p, and how they are assigned their respective subsets may vary. In an aspect, the number of tasks may be equal to the number of available processors or processor cores. For example, with four available processors or processor cores (i.e., p=4), four subsets may be initialized, represented here by n0, n1, n2, and n3, and four tasks t may be initialized, represented here by t0, t1, t2, and t3. Each subset may be associated with a task t, for example, n0 with t0, n1 with t1, n2 with t2, and n3 with t3.
While each task is executed by its respective processor or processor core, there is the potential for an inner repetitive process, nested within the outer repetitive process, to be executed. In a task-based run-time system, processing the inner repetitive process would require initializing a new task each time the inner repetitive process is to be executed until an inner repetition value (or second repetition value) with a relationship to a value m is realized. As discussed above, this may potentially result in p*m initialized tasks. To avoid initializing a task for each time the inner repetitive process is to be executed, a shadow task for the inner repetitive process may be initialized for each task of the outer repetitive process. In other words, there may be p shadow tasks. Continuing with the example above, shadow task st0 may be initialized for task t0, st1 may be initialized for task t1, st2 may be initialized for task t2, and st3 may be initialized for task t3. During execution of the tasks, the computer system may store a pointer, or other type of reference, for each task to a memory location accessible by the respective shadow task and indicating the progress of the respective task. In different cases, the shadow task may or may not execute for various iterations of its respective task. With each iteration of the inner repetitive processes of the tasks, the respective pointers may be updated. By implementing the pointers accessible to the shadow tasks, the computer system may not have to delete existing shadow tasks or initialize new shadow tasks. In an aspect in which a condition exists for the shadow task to execute, the shadow task may check the pointer associated with the respective task to determine the iteration of the inner repetitive process that the respective task is executing, partition the remaining inner iterations and execute its share of the inner iteration space while the respective task works on its share of the inner iteration space. The shadow task may create new tasks to help with the inner iteration space.
In an aspect in which the task completes its iterations, i.e., the outer repetition value for the task equals a final repetition value for the task's subset of n, the processor may discard the task and its shadow task. While one task may complete, one or more of the other tasks may continue to execute. Discarding the completed task may make the respective processor or processor core that executed the completed task available for other work. While at least one task is still executing, the scheduler may further divide the subset of the executing task into one or more new subsets, or subpartitions, and initialize one or more tasks and shadow tasks to execute for the new subsets on the now available processor(s) or processor core(s). In an aspect, rather than discarding completed tasks and shadow tasks, while other tasks continue to execute, the scheduler may reassign the completed task and shadow task to a new subset of the further divided subset. When the executing task subset can no longer be subdivided, the scheduler may initialize one or more shadow tasks associated with subsets of the criterion for executing the inner repetitive process to be executed on the available processors or processor cores when the shadow task is executed.
The computing device 10 and/or SoC 12 may include one or more memories 16 configured for various purposes. The memory 16 may be a volatile or non-volatile memory configured for storing data and processor-executable code for access by the processor 14. In an aspect, the memory 16 may be configured to, at least temporarily, store data related to tasks of nested repetitive processes as described herein. As discussed in further detail below, each of the processor cores of the processor 14 may assigned a task comprising a subset, or partition, of the n iterations of the outer repetitive process by a scheduler of a high level operating system running on the computing device 10.
The communication interface 18, communication component 22, antenna 26 and/or network interface 28, may work in unison to enable the computing device 10 to communicate over a wireless network 30 via a wireless connection 32, and/or a wired network 44 with the remote computing device 50. The wireless network 30 may be implemented using a variety of wireless communication technologies, including, for example, radio frequency spectrum used for wireless communications, to provide the computing device 10 with a connection to the Internet 40 by which it may exchange data with the remote computing device 50.
The storage interface 20 and the storage component 24 may work in unison to allow the computing device 10 to store data on a non-volatile storage medium. The storage component 24 may be configured much like an aspect of the memory 16 in which the storage component 24 may store the data related to tasks of nested repetitive processes, such that the data may be accessed by one or more processors 14. The storage interface 20 may control access the storage device 24 and allow the processor 14 to read data from and write data to the storage device 24.
It should be noted that some or all of the components of the computing device 10 may be differently arranged and/or combined while still serving the necessary functions. Moreover, the computing device 10 may not be limited to one of each of the components, and multiple instances of each component, in various configurations, may be included in the computing device 10
In the example illustrated in
In
Each of the groups of processor cores illustrated in
In an aspect, the number of tasks may be equal to the number of partitions, as described above, or to the number available processors or processor cores. For example, with four available processors or processor cores (see
In an aspect, a shadow task may execute the iterations of the inner repetitive process on a different processor or processor core from the related task, while the related task executes and the different processor or processor core is available. In an aspect, a task may execute all of the iterations of the outer repetitive process and inner repetitive process before a processor or processor core becomes available to execute the related shadow task, and the related shadow task may not execute any iterations of the inner repetitive task.
In row 1106, each of the processors or processor cores may begin to execute their respective tasks. Executing the tasks may include executing the assigned partitions of the iterations of the outer repetitive processes and the associated inner repetitive processes. In an aspect, as described further herein, a shadow task of the respective tasks may help execute the iterations of the associated inner repetitive processes when hardware resources are available. In row 1108, the processors or processor cores may encounter inner repetitive processes for respective tasks. Upon encountering the inner repetitive process for the first time during the execution of each task, in row 1110, each of the processors or processor cores may initialize a shadow task for a respective task, the shadow task being initialized for potentially executing the inner repetitive process, or inner loop, of the outer repetitive process. The shadow task may be initialized regardless of whether the shadow task executes or not. In this example, shadow task st0 may be initialized for task t0 and processor 0, shadow task st1 may be initialized for task t1 and processor 1, shadow task st2 may be initialized for task t2 and processor 2, and shadow task stp may be initialized for task tp and processor p. In an aspect, a shadow task may be initialized whenever a task is executed in anticipation of potentially executing an inner repetitive process, regardless of whether an inner repetitive process exists. In another aspect, a shadow task may be initialized whenever an inner repetitive process is identified for a task, either before or upon encountering the inner repetitive process during execution of the task. For each task, one shadow task may suffice, and the shadow task may be executed multiple times depending on whether multiple iterations of the partition of the outer repetitive process of the task require the execution of the inner repetitive process. In an aspect, one shadow task may be initialized to handle multiple inner repetitive processes, or multiple shadow tasks may be initialized to handle one or more inner repetitive processes.
Also upon encountering the inner repetitive task for the first time during the execution of each task, in row 1112, a pointer, or other reference type, may be initialized for the respective task. In this example, pointer 0 may be initialized for task t0, pointer 1 may be initialized for task t1, pointer 2 may be initialized for task t2, and pointer p may be initialized for task tp. The pointers may be used to track the progress of the execution of the inner repetitive processes for their respective tasks, and the pointers may be accessible by shadow tasks for use in determining when to execute the shadow tasks and for which iteration of the inner repetitive process, as described further herein. In an aspect, a pointer may be initialized for each of one or more inner repetitive processes for each task. The shadow task may access the pointer of the respective task to identify the inner repetitive process iteration of the task when instructed to execute. In row 1114, the processors or processor cores may update the respective pointers to indicate the start or completion of execution of the inner repetitive processes of the respective tasks. Throughout the execution of the tasks, the pointers may be repeatedly updated to indicate the iteration of the inner repetitive processes for the iteration of the outer repetitive processes being executed.
Several of the states in the above described rows 1108, 1110, 1112, 1114 may be repeated to complete execution of the tasks for all of the iterations of the respective partitions of the outer repetitive process and all the iterations of one or more inner repetitive processes on each of the processors or processor cores. Depending on various factors, such as size of the partitions, characteristics of the processors or processor cores, and number of executions of the inner repetitive process, one or more of the tasks may complete executing at the same or different times. For example, in rows 1116 and 1118, tasks t2 and tp finish executing, while the remaining tasks, in this example t0 and t1, may continue to execute. As described herein, after completing the execution of a task, the processor or processor core may become available for further processing, and various schemes may be implemented to engage the available processor or processor core with further task execution.
In this example, processor 2 and processor p may implement different schemes. The scheme for processor 2 may include discarding the completed task t2 in row 1118. Again, depending on the implemented scheme for processor 2, the related shadow task st2 in row 1120 may be discarded when there are no iterations of the inner repetitive process for the respective shadow task to execute. In row 1122, processor 2 may be assigned a subpartition of one of the ongoing tasks being executed by another of the processors or processor cores. The subpartition may be one or more iterations of the outer repetitive process that has yet to be executed by one of the ongoing tasks. The partition of the remaining iterations of the ongoing task may be divided into two or more subpartitions, and the subpartitions may be assigned to tasks. Particularly, one of the subpartitions may be assigned to the original task of the partition, and the other subpartition(s) may be assigned to other new or existing but completed tasks. In this example, partition 0 of ongoing task t0 being executed on processor 0 may include unexecuted iterations of the outer repetitive process. Partition 0 may be divided into two subpartitions, one of which may be assigned to processor 0 and task t0, and the other may be assigned to processor 2 and a newly initialized task tp+1 in rows 1122 and 1124. Much like above, in row 1126, processor 2 may begin executing task tp+1, encounter an inner repetitive processes for the respective task in row 1128, initialize a shadow task stp+1 for task tp+1 in row 1130, and initialize a pointer, or other reference type, for the respective task in row 1132. In an aspect, initializing the point may involve initializing a new pointer for the task, or updating the existing pointer. Also as described above, during the execution of task tp+1, the respective pointer for task tp+1 may be updated for the current or last executed iteration of the inner repetitive process.
The scheme for processor p differs from the scheme for processor 2 described above, in that rather than discarding the completed task and shadow task, and initializing a new task and shadow task to execute a subpartition of the iterations of the outer repetitive process, processor p uses the existing completed task and shadow task. In this example, partition 1 of ongoing task t1 being executed on processor 1 may include unexecuted iterations of the outer repetitive process. Partition 1 may be divided into two subpartitions, one of which may be assigned to processor 1 and task t1, and the other may be assigned to processor p and existing completed task tp in row 1120. Much like above, in row 1122 processor p may begin executing task tp for the subpartition, encounter an inner repetitive process in row 1124, and update the respective pointer for the iteration of the inner repetitive process for task tp in row 1126. In this example scheme, there is no need to initialize a new pointer or shadow task, as they both may exist from the previous execution of task tp, however one or both of a new pointer and new shadow task may be initialized if so desired. In an aspect, when the previous execution of task tp did not result in initializing a pointer and shadow task, a pointer, or other reference type, and shadow task may be initialized upon encountering the inner repetitive process during this execution of task tp.
For the respective scheme implemented to engage the available processor or processor core with further task execution, several of the states in the above described rows 1124, 1126, 1128, 1130, and 1132 may be repeated to complete execution of the tasks for all of the iterations of the respective subpartitions of the outer repetitive process and the related inner repetitive processes on each of the processors or processor cores. Depending on various factors, such as the ones described above, one or more of the tasks may complete executing at the same or different times. For example, in row 1134, tasks t1, tp+1, and tp may finish executing, while task t0 may continue to execute. In an aspect, where only one ongoing task remains and the ongoing task is executing the final iteration of its partition of the iterations of the outer repetitive process, the partition cannot be subpartitioned to assign iterations of the outer repetitive process to the available processors or processor cores like in rows 1120 and 1122 described above. However, it may be possible to reassign the existing shadow task for the ongoing task to an available processor or processor core, and initialize extra shadow tasks for the ongoing task to aid in executing the iterations of the inner repetitive process. Continuing with the example in
While the final ongoing task continues to execute its last iteration, several of the states in the above described rows 1146 and 1148 may be repeated to aid in executing the iterations of the inner repetitive process when necessary. In row 1150 the final ongoing task, task t0 in this example, may complete its execution. With no remaining outer or inner repetitive process iterations, task t0 and shadow tasks may be discarded in row 1152.
It should be noted that the various described states of the processors or processor cores may occur in a different order than in the examples described herein. The descriptions of
In block 1206 the iterations of the outer repetitive process may be divided into partitions for execution as part of the initialized tasks in parallel on the multiple processors or processor cores. In an aspect, the number of partitions may be determined by the number of initialized tasks, or available processors or processor cores. The make up of each partition may be determined by various factors including characteristics of the processors or processor cores, characteristics of the program and/or the nested repetitive process, and states of the computing device, including temperature and power availability. The partitions may equally as possible divide the number of iterations of the outer repetitive process, or the partitions may be unequal in number of iterations of the outer repetitive process.
In block 1208 the partitions of the outer repetitive process may be assigned to respective tasks. In block 1210 the initialized tasks, and thereby the respective partitioned iterations of the outer repetitive process, may be assigned to respective processors or processor cores. Much like initializing the tasks and partitioning the iterations, assignments to particular processors or processor cores may be determined by various factors including characteristics of the processors or processor cores, characteristics of the program and/or the nested repetitive process, and states of the computing device, including temperature and power availability. In block 1212, the assigned tasks may begin executing in parallel on the respective processors or processor cores to which the task are assigned.
During the execution of an iteration of the outer repetitive process of a task, an inner repetitive process may be encountered. In determination block 1214, the processor or processor core may determine whether an inner repetitive loop is encountered. In response to determining that an inner repetitive process has not been encountered (i.e., determination block 1214=“No”), the processor or processor cores may determine whether the iterations of the outer repetitive process for a respective task are complete in determination block 1224. In response to determining that an inner repetitive process is encountered (i.e., determination block 1214=“Yes”), the processor or processor cores may determine whether it is the first encounter of the inner repetitive process for the task in determination block 1216. In response to determining that the encountered inner repetitive process is encountered for the first time for the executing task (i.e., determination block 1216=“Yes”), the processor or processor core may initialize a pointer, or other type of reference, in block 1218 for each task encountering the inner repetitive process. The pointer may be accessible by its respective task and a respective shadow task. The pointer may be used to track the iterations of the inner repetitive processes so that the respective tasks and shadow tasks know which iterations of the inner repetitive process to execute. The processor or processor cores may initialize a shadow task for the executing task, in block 1220, so that the shadow task may potentially execute the iterations of the inner repetitive process when processing resources are available. In block 1222, the respective pointers for the tasks may be updated to reflect changes in the iterations of the inner repetitive processes of the executing tasks, such as completion or starting of an iteration if the inner repetitive processes. In response to determining that it is not the first encounter of the inner repetitive process (i.e., determination block 1216=“No”), the respective pointers for the tasks may be updated in block 1222 as described above.
In an aspect, rather than determining whether an inner repetitive process is encountered and/or determining it is the first encounter of the inner repetitive process for an executing task before initializing the shadow task, the shadow task and pointer, or other reference type, may be initialized along with or shortly after initialization of the related task. Therefore, in an aspect, determination block 1216 may be obviated, and blocks 1218 and 1220 may execute regardless of the presence of an inner repetitive process. In such an aspect, in response to determining that an inner repetitive process is encountered (i.e. determination block 1214=“Yes”), the pointers may be updated in block 1222 as described above.
In determination block 1224, the processor or processor core may determine whether the iterations of the outer repetitive process for a respective task are complete. In response to determining that the iterations of the outer repetitive process for a respective task are incomplete, or there are remaining iterations for execution, (i.e., determination block 1224=“No”), the processor or processor core may continue to execute the respective task in block 1226, and again check whether an inner repetitive process is encountered in determination block 1214. In response to determining that the iterations of the outer repetitive process for a respective task are complete, or there are no remaining iterations for execution, (i.e., determination block 1224=“Yes”), in determination block 1228 the processor or processor core may determine whether the remaining iterations for another respective task are divisible. In determining whether the remaining iterations for the other respective task are divisible, the remaining iterations may be divisible when more than the executing iteration remain to be executed. The remaining iterations may be indivisible when only the executing iteration for the other respective task remains. In response to determining that the remaining iterations for the other respective task are divisible (i.e., determination block 1228=“Yes”), depending on the implemented scheme the processor or processor core may divide the remaining iterations of the outer repetitive process into subpartitions as described below in either method 1300 (see
In block 1302, the completed task and its completed, related shadow task may be discarded. In block 1304, the iterations of the ongoing task may be divided into subpartitions of the partition of iterations assigned to the ongoing task. For example, a partition of iterations of an outer repetitive process assigned to a task may include 500 iterations. In such an example, the ongoing task may have executed 174 iterations, and the task may be executing the 175th iteration, leaving 325 iterations yet to be executed. With resources, such as processor and processor cores being available to aid in executing these remaining iterations of the task, the remaining 325 iterations may be divided into subpartitions of the original 500 iteration partition or what is now the 325 remaining iterations partition. In this example, one or more processors or processor cores may be available, and the remaining 325 iterations may be divided up in any manner over any number of the available processors or processor cores. For instance, the remaining iterations may be divided equally or unequally over the available processors or processor cores, and it is possible that at least one available processor or processor core is not assigned a subpartition of the remaining iterations. Further, the processor or processor core executing the task with the remaining iterations may be assigned at least the executing iteration of the task at the time the remaining iterations are divided. How the remaining iterations are divided into subpartitions may depend on a variety of factors including characteristics of the processors or processor cores (e.g., relative processing speed, relative power efficiency/current leakage, etc.), characteristics of the program and/or the nested repetitive process, and states of the computing device, including temperature and power availability (e.g., on-battery or charging).
In block 1306 tasks may be initialized for the remaining unassigned subpartitions. In block 1308 one subpartition may be assigned to the ongoing task for which the iterations are being divided. Thus, all of the subpartitions get assigned to either the existing ongoing task or a newly initialized task for executing on the available processor(s) or process core(s).
In determination block 1310, the processor or processor core may determine whether the task is an ongoing task or a new task. In response to determining that the task is an ongoing task (i.e., determination block 1310=“Yes”), the processor or processor core executing the ongoing task may continue executing the task in block 1226 (see
In block 1402, the remaining iterations of an ongoing task may be divided into subpartitions much like in block 1304 described above with reference to
The completed task may have freed up processing resources, like one of the processors or processor cores for execution of other tasks or shadow tasks. In optional block 1502, the shadow task of a completed task may execute on the available processor or processor core; however, there may be no iterations if the inner repetitive processes remaining for execution. In block 1504, the completed task and its completed, related shadow task may be discarded. In determination block 1506, the processor or processor core may determine whether any ongoing tasks are executing indivisible partitions. As described above, an indivisible partition of iterations is a partition containing only the executing iteration of the outer repetitive process. In response to determining that both no divisible partitions and no indivisible partitions remain (i.e., determination block 1506=“No”), method 1500 may end. In response to determining that at least one indivisible partition remains (i.e., determination block 1506=“Yes”), inner repetitive process iterations of the ongoing task may be partitioned in block 1508 in much the same way as the iterations of the outer repetitive process in block 1206 described above with reference to
The smartphone computing device 1600 may have one or more radio signal transceivers 1608 (e.g., Peanut, Bluetooth, Zigbee, Wi-Fi, RF radio) and antennae 1610, for sending and receiving communications, coupled to each other and/or to the multi-core processor 1602. The transceivers 1608 and antennae 1610 may be used with the above-mentioned circuitry to implement the various wireless transmission protocol stacks and interfaces. The smartphone computing device 1600 may include a cellular network wireless modem chip 1616 that enables communication via a cellular network and is coupled to the processor.
The smartphone computing device 1600 may include a peripheral device connection interface 1618 coupled to the multi-core processor 1602. The peripheral device connection interface 1618 may be singularly configured to accept one type of connection, or may be configured to accept various types of physical and communication connections, common or proprietary, such as USB, FireWire, Thunderbolt, or PCIe. The peripheral device connection interface 1618 may also be coupled to a similarly configured peripheral device connection port (not shown).
The smartphone computing device 1600 may also include speakers 1614 for providing audio outputs. The smartphone computing device 1600 may also include a housing 1620, constructed of a plastic, metal, or a combination of materials, for containing all or some of the components discussed herein. The smartphone computing device 1600 may include a power source 1622 coupled to the multi-core processor 1602, such as a disposable or rechargeable battery. The rechargeable battery may also be coupled to the peripheral device connection port to receive a charging current from a source external to the smartphone computing device 1600. The smartphone computing device 1600 may also include a physical button 1624 for receiving user inputs. The smartphone computing device 1600 may also include a power button 1626 for turning the smartphone computing device 1600 on and off.
The various aspects described above may also be implemented within a variety of other computing devices, such as a laptop computer 1700 illustrated in
The various aspects may also be implemented on any of a variety of commercially available server devices, such as the server 1800 illustrated in
Computer program code or “program code” for execution on a programmable processor for carrying out operations of the various aspects may be written in a high level programming language such as C, C++, C#, Smalltalk, Java, JavaScript, Visual Basic, a Structured Query Language (e.g., Transact-SQL), Perl, or in various other programming languages. Program code or programs stored on a computer readable storage medium as used in this application may refer to machine language code (such as object code) whose format is understandable by a processor.
Many computing devices operating system kernels are organized into a user space (in which non-privileged code runs) and a kernel space (in which privileged code runs). This separation is of particular importance in Android and other general public license (GPL) environments where code that is part of the kernel space must be GPL licensed, while code running in the user-space may not be GPL licensed. It should be understood that the various software components/modules discussed here may be implemented in either the kernel space or the user space, unless expressly stated otherwise.
The foregoing method descriptions and the process flow diagrams are provided merely as illustrative examples and are not intended to require or imply that the operations of the various aspects must be performed in the order presented. As will be appreciated by one of skill in the art the order of operations in the foregoing aspects may be performed in any order. Words such as “thereafter,” “then,” “next,” etc. are not intended to limit the order of the operations; these words are simply used to guide the reader through the description of the methods. Further, any reference to claim elements in the singular, for example, using the articles “a,” “an” or “the” is not to be construed as limiting the element to the singular.
The various illustrative logical blocks, modules, circuits, and algorithm operations described in connection with the various aspects may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and operations have been described above generally in terms of their functionality. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The hardware used to implement the various illustrative logics, logical blocks, modules, and circuits described in connection with the aspects disclosed herein may be implemented or performed with a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general-purpose processor may be a microprocessor, but, in the alternative, the processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other such configuration. Alternatively, some operations or methods may be performed by circuitry that is specific to a given function.
In one or more aspects, the functions described may be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored as one or more instructions or code on a non-transitory computer-readable medium or a non-transitory processor-readable medium. The operations of a method or algorithm disclosed herein may be embodied in a processor-executable software module that may reside on a non-transitory computer-readable or processor-readable storage medium. Non-transitory computer-readable or processor-readable storage media may be any storage media that may be accessed by a computer or a processor. By way of example but not limitation, such non-transitory computer-readable or processor-readable media may include RAM, ROM, EEPROM, FLASH memory, CD-ROM or other optical disk storage, magnetic disk storage or other magnetic storage devices, or any other medium that may be used to store desired program code in the form of instructions or data structures and that may be accessed by a computer. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk, and blu-ray disc, wherein disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above are also included within the scope of non-transitory computer-readable and processor-readable media. Additionally, the operations of a method or algorithm may reside as one or any combination or set of codes and/or instructions on a non-transitory processor-readable medium and/or computer-readable medium, which may be incorporated into a computer program product.
The preceding description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the following claims and the principles and novel features disclosed herein.
This application claims the benefit of priority to U.S. Provisional Application No. 61/968,720 entitled “Method for Exploiting Parallelism in Nested Parallel Patterns in Task-based Systems” filed Mar. 21, 2014, the entire contents of which are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
61968720 | Mar 2014 | US |