This invention pertains to the field of information processing, particularly to techniques for managing execution of multiple concurrent, multi-task software programs on parallel processing hardware.
Conventional microprocessor and computer system architectures rely on system software for handling runtime matters relating to sharing processing resources among multiple application programs and their instances, tasks etc., as well as orchestrating the concurrent (parallel and/or pipelined) execution between and within the individual applications sharing the given set of processing resources. However, the system software consumes by itself ever increasing portions of the system processing capacity, as the number of applications, their instances and tasks and the pooled processing resources would grow, as well as the more frequently the optimizations of the dynamic resource management among the applications and their tasks would be needed to be performed, in response to variations in the applications' and their instances' and tasks' processing loads etc. variables of the processing environment. As such, the conventional approaches for supporting dynamic execution of concurrent programs on shared processing capacity pools will not scale well.
This presents significant challenges to the scalability of the networked utility (‘cloud’) computing model, in particular as there will be a continuously increasing need for greater degrees of concurrent processing also at intra-application levels, in order to enable increasing individual application on-time processing throughput performance, without the automatic speed-up from processor clock rates being available due to the practical physical and economic constraints faced by the semiconductor etc. physical hardware implementation technologies.
To address the challenges per above, there is a need for inventions enabling scalable, multi-application dynamic concurrent execution on parallel processing systems, with high resource utilization efficiency, high application processing on-time throughput performance, as well built-in, architecture-based security and reliability.
An aspect of the invention provides systems and methods for arranging secure and reliable, concurrent execution of a set of internally parallelized and pipelined software programs on a pool of processing resources shared dynamically among the programs, wherein the dynamic sharing of the resources is based at least in part on i) processing input data loads for instances and tasks of the programs and ii) contractual capacity entitlements of the programs.
An aspect of the invention provides methods and systems for intelligent, destination task defined prioritization of inter-task communications (ITC) for a computer program, for architectural ITC performance isolation among a set of programs executing concurrently on a dynamically shared data processing platform, as well as for prioritizing instances of the program tasks for execution at least in part based on which of the instances have available to them their input data, including ITC data, enabling any given one of such instances to execute at the given time.
An aspect of the invention provides a system for prioritizing instances of a software program for execution. Such a system comprises: 1) a subsystem for determining which of the instances are ready to execute on an array of processing cores, at least in part based on whether a given one of the instances has available to it input data to process, and 2) a subsystem for assigning a subset of the instances for execution on the array of cores based at least in part on the determining. Various embodiments of that system include further features such as features whereby a) the input data is from a data source such that the given instance has assigned a high priority for purposes of receiving data; b) the input data is such data that it enables the given program instance to execute; c) the subset includes cases of none, some as well as all of the instances of said program; d) the instance is: a process, a job, a task, a thread, a method, a function, a procedure or an instance any of the foregoing, or an independent copy of the given program; and/or e) the system is implemented by hardware logic that is able to operate without software involvement.
An aspect of the invention provides a hardware logic implemented method for prioritizing instances of a software program for execution, with such a method involving: classifying instances of the program into the following classes, listed in the order from higher to lower priority for execution, i.e., in their reducing execution priority order: (I) instances indicated as having high priority input data for processing, and (II) any other instances. Various embodiments of that method include further steps and features such as features whereby a) the other instances are further classified into the following sub-classes, listed in their reducing execution priority order: (i) instances indicated as able to execute presently without the high priority input data, and (ii) any remaining instances; b) the high priority input data is data that is from a source where its destination instance, of said program, is expecting high priority input data; c) a given instance of the program comprises tasks, with one of said tasks referred to as a destination task and others as source tasks of the given instance, and for the given instance, a unit of the input data is considered high priority if it is from such one of the source tasks that the destination task has assigned a high priority for inter-task communications to it; d) for any given one of the instances, a step of computing a number of its non-empty source task specific buffers among its input data buffers such that belong to source tasks of the given instance indicated at the time as high priority source tasks for communications to the destination task of the given instance, with this number referred to as an H number for its instance, and wherein, within the class I), the instances are prioritized for execution at least in part according to magnitudes of their H numbers, in descending order such that an instance with a greater H number is prioritized before an instance with lower H number; e) in case of two or more of the instances tied for the greatest H number, such tied instances are prioritized at least in part according to their respective total numbers of non-empty input data buffers, and/or f) at least one of the instances is either a process, a job, a task, a thread, a method, a function, a procedure, or an instance any of the foregoing, or an independent copy of the given program.
An aspect of the invention provides a system for processing a set of computer programs instances, with inter-task communications (ITC) performance isolation among the set of program instances. Such a system comprises: 1) a number of processing stages; and 2) a group of multiplexers connecting ITC data to a given stage among the processing stages, wherein a multiplexer among said group is specific to one given program instance among said set. The system hosts each task of the given program instance at different one of the processing stages, and supports copies of same task software code being located at more than one of the processing stages in parallel. Various embodiments of this system include further features such as a) a feature whereby at least one of processing stages comprises multiple processing cores such as CPU execution units, with, for any of the cores, at any given time, one of the program instances assigned for execution; b) a set of source task specific buffers for buffering data destined for a task of the given program instance located at the given stage, referred to as a destination task, and hardware logic for forming a hardware signal indicating whether sending ITC is presently permitted to a given buffer among the source task specific buffers, with such forming based at least in part on a fill level of the given buffer, and with such a signal being connected to a source task for which the given buffer is specific to; c) a feature providing, for the destination task, a set of source task specific buffers, wherein a given buffer is specific to one of the other tasks of the program instance for buffering ITC from said other task to the destination task; d) feature wherein the destination task provides ITC prioritization information for other tasks of the program instance located at their respective ones of the stages; d) a feature whereby the ITC prioritization information is provided by the destination task via a set of one or more hardware registers, with each register of the set specific to one of the other tasks of the program instance, and with each register configured to store a value specifying a prioritization level of the task that it is specific to, for purposes of ITC communications to the destination task; e) an arbitrator controlling from which source task of the program instance the multiplexer specific to that program instance will read its next ITC data unit for the destination task; and/or f) a feature whereby the arbitrator prioritizes source tasks of the program instance for selection by the multiplexer to read its next ITC data unit based at least in part on at least one of: (i) source task specific ITC prioritization information provided by the destination task, and (ii) source task specific availability information of ITC data for the destination task from the other tasks of the program instance.
Accordingly, aspects of the invention involve application-program instance specific hardware logic resources for secure and reliable ITC among tasks of application program instances hosted at processing stages of a multi-stage parallel processing system. Rather than seeking to inter-connect the individual processing stages or cores of the multi-stage manycore processing system as such, the invented mechanisms efficiently inter-connect the tasks of any given application program instance using the per application program instance specific inter-processing stage ITC hardware logic resources. Due to the ITC being handled with such application program instance specific hardware logic resources, the ITC performance experience by one application instance does not depend on the ITC resource usage (e.g., data volume and inter-task communications intensiveness) of the other applications sharing the given data processing system per the invention. This results in effective inter-application isolation for ITC in a multi-stage parallel processing system shared dynamically among multiple application programs.
An aspect of the invention provides systems and methods for scheduling instances of software programs for execution based at least in part on (1) availability of input data of differing priorities for any given one of the instances and/or (2) availability, on their fast-access memories, of memory contents needed by any given one of the instances to execute.
An aspect of the invention provides systems and methods for optimally allocating and assigning input port capacity to a data processing system among data streams of multiple software programs based at least in part on input data load levels and contractual capacity entitlements of the programs.
An aspect of the invention provides systems and methods for resolution of resource access contentions, for resources including computing, storage and communication resources such as memories, queues, ports or processors. Such methods enable multiple potential user systems for a shared resource, in a coordinated and fair manner, to avoid conflicting resource access decisions, even while multiple user systems are deciding on access to set of shared resources concurrently, including at the same clock cycle.
An aspect of the invention provides systems and methods for load balancing, whereby the load balancer is configured to forward, by its first layer, any packets without destination instance within its destination application specified (referred to as no-instance-specified packets or NIS packets for short) it receives from its network input to such one of the processing systems in the local load balancing group that presently has the highest score for accepting NIS packets for the destination app of the given NIS packet. The load balancers further have destination processing system (i.e., for each given application, instance group) specific sub-modules, which, for NIS packets forwarded to them by the first layer balancing logic, specify a destination instance among the available, presently inactive instance resources of the destination app of a given NIS packet to which to forward the given NIS packet. In at least some embodiments of the invention, the score for accepting NIS packets for a destination processing system among the load balancing group is based at least in part on the amount of presently inactive instance resources at the given processing system for the destination application of a given NIS packet.
FIGS. and related descriptions in the following provide specifications for embodiments and aspects of hardware-logic based systems and methods for inter-task communications (ITC) with destination task defined source task prioritization, for input data availability based prioritization of instances of a given application task for execution on processing cores of a processing stage hosting the given task, for architecture-based application performance isolation for ITC in multi-stage manycore data processing system, as well as for load balancing of incoming processing data units among a group of such processing systems.
The invention is described herein in further detail by illustrating the novel concepts in reference to the drawings. General symbols and notations used in the drawings:
General notes regarding this specification (incl. text in the drawings):
Illustrative embodiments and aspects of the invention are described in the following with references to the FIGS.
Platform Overview
This embodiment provides a functionality and architecture oriented, end-to-end technical description of the multi-application dynamic parallel program execution environment for a parallel program development and execution platform-as-as-service (PaaS) 800.
The parallel program development and execution PaaS 800 enables application software developers and service providers to develop, test and deploy their application programs on the manycore processors per this description with high productivity and cost-efficiency, taking advantage of the dynamic parallel program execution features of the manycore processors as described in the following.
A major productivity and cost-efficiency challenge facing many high-performance application software developers and service providers is the need to keep improving the application program processing throughput performance as it is becoming economically and physically infeasible to increase the processor hardware (CPU) clock rates. Moreover, conventional software development tools, operating systems and manycore processor hardware architectures do not enable the type of dynamic parallelized processing, especially in environments of multi-user shared processors that are becoming the norm in the cloud-computing age, that would be needed to keep cost-efficiently scaling up the application processing throughput.
The PaaS 800 based on the herein described dynamic parallel execution technology addresses this pressing challenge of the application software developers and service providers by offering an end-to-end platform that automates and optimizes the back-end development and execution of the customers' application programs on the manycore processors per this description that are designed for optimized-throughput, dynamic parallel processing of client applications.
Optimizing resource usage dynamically in a large capacity parallel processing system among a large number of applications and their instances and tasks, in pursuing both predictable, high performance for each individual application as well as efficient system resource utilization, does present a complex problem, resolving which would consume plenty of the system's resources if handled in software. It is not trivial to answer the question: To which application task instance should any given processing resource be assigned at any given time, to achieve optimal system-wide application processing throughput?
To address the above challenge, the dynamic parallel execution environment described herein is based on an architecture for extensible, application program load and type adaptive, multi-stage manycore processing systems (
By partitioning the system-wide dynamic resource management functionality per above, the individual functions of resource management for dynamically shared manycore arrays become feasible (e.g., in terms of complexities of data structures needed) for direct hardware (e.g., FPGA) implementation. The all-hardware implementation of such system functions further adds to the scalability of the architecture system software overhead reduction. Since the hardware automated system functions do not consume any of the system processor capacity no matter how frequently the capacity is reallocated, and since the hardware algorithms run in just a few clock cycles, as well as since hardware automated task switching for the processor cores is non-visible to software, this architecture also enables re-optimizing the system resource assignment as frequently as useful to accommodate the applications' processing load variations.
The main structures and elements of the architecture, and their operation, are described in the following, following generally the flow of data through the system, starting from the load balancers in front of an array of the multi-stage manycore processors.
System Dimensioning
Sizing cloud processing platforms based on the multi-stage manycore processing systems per this description involves setting a set of parameter values as follows:
Load Balancing
The load balancing per
The mechanisms per the above three bullet points are designed to eliminate all packet drops in the system such that are avoidable by system design, i.e., for reasons other than app-instance specific buffer overflows caused be systemic mismatches between input data loads to a given app-inst and the capacity entitlement level subscribed to by the given app.
In the architecture per
General operation of the application load adaptive, multi-stage parallel data processing system per
The application program tasks executing on the entry stage manycore processor are typically of ‘master’ type for parallelized/pipelined applications, i.e., they manage and distribute the processing workloads for ‘worker’ type tasks running (in pipelined and/or parallel manner) on the worker stage manycore processing systems (note that the processor system hardware is similar across all instances of the processing stages 300). The instances of master tasks typically do preliminary processing (e.g., message/request classification, data organization) and workflow management based on given input data units (packets), and then typically involve appropriate worker tasks at their worker stage processors to perform the data processing called for by the given input packet, potentially in the context of and in connection with other related input packets and/or other data elements (e.g., in memory or storage resources accessible by the system) referred to by such packets. (The processors have access to system memories through interfaces also additional to the IO ports shown in
To provide isolation among the different applications configured to run on the processors of the system, by default the hardware controller of each processor 300, rather than any application software (executing on a given processor), inserts the application ID #bits for the data packets passed to the PS 200. That way, the tasks of any given application running on the processing stages in a system can trust that the packets they receive from the PS are from its own application. Note that the controller determines, and therefore knows, the application ID #that each given core within its processor is assigned to at any given time, via the application-instance to core mapping info that the controller produces. Therefore the controller is able to insert the presently-assigned app ID #bits for the inter-task data units being sent from the cores of its processing stage over the core-specific output ports to the PS.
While the processing of any given application (server program) at a system per
Notably, the architecture enables the aforesaid flexibility and efficiency through its hardware logic functionality, so that no system or application software running on the system needs to either keep track of whether or where any of the instances of any of the app-tasks may be executing at any given time, or which port any given inter-task or external communication may have used. Thus the system, while providing a highly dynamic, application workload adaptive usage of the system processing and communications resources, allows the software running on and/or remotely using the system to be designed with a straightforward, abstracted view of the system: the software (both remote and local programs) can assume that all the applications, and all their tasks and instances, hosted on the given system are always executing on their virtual dedicated processor cores within the system. Also, where useful, said virtual dedicated processors can also be considered by software to be time-share slices on a single (unrealistically high speed) processor.
The presented architecture thereby enables achieving, at the same time, both the vital application software development productivity (simple, virtual static view of the actually highly dynamic processing hardware) together with high program runtime performance (scalable concurrent program execution with minimized overhead) and resource efficiency (adaptively optimized resource allocation) benefits. Techniques enabling such benefits of the architecture are described in the following through more detailed technical description of the system 1 and its subsystems.
The any-to-any connectivity among the app-tasks of all the processing stages 300 provided by the PS 200 enables organizing the worker tasks (located at the array of worker stage processors) flexibly to suit the individual demands (e.g., task inter-dependencies) of any given application program on the system: the worker tasks can be arranged to conduct the work flow for the given application using any desired combinations of parallel and pipelined processing. E.g., it is possible to have the same task of a given application located on any number of the worker stages in the architecture per
The set of applications configured to run on the system can have their tasks identified by (intra-app) IDs according to their descending order of relative (time-averaged) workload levels. Under such (intra-app) task ID assignment principle, the sum of the intra-application task IDs, each representing the workload ranking of its tasks within its application, of the app-tasks hosted at any given processing system is equalized by appropriately configuring the tasks of differing ID #s, i.e., of differing workload levels, across the applications for each processing system, to achieve optimal overall load balancing. For instance, in case of T=4 worker stages, if the system is shared among M=4 applications and each of that set of applications has four worker tasks, for each application of that set, the busiest task (i.e., the worker task most often called for or otherwise causing the heaviest processing load among tasks of the app) is given task ID #0, the second busiest task ID #1, the third busiest ID #2, and the fourth ID #3. To balance the processing loads across the applications among the worker stages of the system, the worker stage #t gets task ID #t+m (rolling over at 3 to 0) of the application ID #m (t=0, 1, . . . T−1; m=0, 1, . . . M−1) (note that the master task ID #4 of each app is located at the entry/exit stages). In this example scenario of four application streams, four worker tasks per app as well as four worker stages, the above scheme causes the task IDs of the set of apps to be placed at the processing stages per Table 1 below:
As seen in the example of Table 1, the sum of the task ID #s (with each task ID #representing the workload ranking of its task within its app) is the same for any row i.e., for each worker stage. This load balancing scheme can be straightforwardly applied for differing numbers of processing stages/tasks and applications, so that the overall task processing load is to be, as much as possible, equal across all worker-stage processors of the system. Advantages of such schemes include achieving optimal utilization efficiency of the processing resources and eliminating or at least minimizing the possibility and effects of any of the worker-stage processors forming system-wide performance bottlenecks.
A non-exclusive alternative task to stage placement principle targets grouping tasks from the apps in order to minimize any variety among the processing core types demanded by the set of app-tasks placed on any given individual processing stage; that way, if all app-tasks placed on a given processing stage optimally run on the same processing core type, there is no need for reconfiguring the core slots of the manycore array at the given stage regardless which of the locally hosted app-tasks get assigned to which of its core slots (see the Task-type Adaptive Core Reconfiguration section below for task type adaptive core slot reconfiguration, which may be used when the app-task located on the given processing stage demand different execution core types).
For a system of
Besides the division of the app-specific submodules 202 of the stage RX logic per
The app-instance specific RX logic per
Note that when considering the case of RX logic of the entry-stage processing system of the multi-stage architecture per
Before the actual multiplexer, the app-instance specific RX logic per
For clarity, the “local” task refers to the task of the app-instance that is located at the processing stage 300 that the RX logic under study interfaces to, with that processing stage or processor being referred to as the local processing stage or processor. Please recall that per any given app, the individual tasks are located at separate processing stages. Note though that copies of the same task for a given app can be located at multiple processing stages in parallel. Note further that, at any of the processing stages, there can be multiple parallel instances of any given app executing concurrently, as well as that copies of the task can be located in parallel at multiple processing stages of the multi-stage architecture, allowing for processing speed via parallel execution at application as well as task levels, besides between the apps.
The app-instance RX module 203 per
Each given app-instance software provides a logic vector 595 to the arbitrating logic 270 of its associated app-instance RX module 203 such that has a priority indicator bit within it per each of its individual source stage specific FIFO modules 245: while a bit of such a vector relating to a particular source stage is at its active state (e.g., logic ‘1’), ITC from the source stage in question to the local task of the app-instance will be considered to be high priority, and otherwise normal priority, by the arbitrator logic in selecting the source stage specific FIFO from where to read the next ITC packet to the local (destination) task of the studied app-instance.
The arbitrator selects the source stage specific FIFO 260 (within the array 240 of the local app-instance RX module 203) for reading 265, 290 the next packet per the following source priority ranking algorithm:
Note that the ITC source task prioritization info 595 from the task software of app-instances to their RX logic modules 203 can change dynamically, as the processing state and demands of input data for a given app-instance task evolve over time, and the arbitrator modules 270 (
In addition, the app-instance RX logic per
Each of the source stage specific FIFO modules 245 of a given app-instance at the RX logic for a given processing stage maintains a signal 212 indicating whether the task (of the app instance under study) located at the source stage that the given FIFO 260 is specific to is presently permitted to send ITC to the local (destination) task of the app-instance under study: the logic denies the permit when the FIFO fill level is above a defined threshold, while it otherwise grants the permit.
As a result, any given (source) task, when assigned for execution at a core 520 (
Each given processing stage receive and monitor ITC permit signal signals 212 from those of the processing stages that the given stage actually is able to send ITC data to; please see
The ITC permit signal buses 212 will naturally be connected across the multi-stage system 1 between the app-instance specific modules 203 of the RX logic modules 202 of the ITC destination processing stages and the ITC source processing stages (noting that a given stage 300 will be both a source and destination for ITC as illustrated in
Note that, notwithstanding the functional illustration in
Each source task applies these ITC send permission signals from a given destination task of its app-instance at times that it is about to begin sending a new packet over its (assigned execution core specific) processing stage output port 210 to that given destination task. The ITC destination FIFO 260 monitoring threshold for allowing/disallowing further ITC data to be sent to the given destination task (from the source task that the given FIFO is specific to) is set to a level where the FIFO still has room for at least one ITC packet worth of data bytes, with the size of such ITC packets being configurable for a given system implementation, and the source tasks are to restrict the remaining length of their packet transmissions to destination tasks denying the ITC permissions according to such configured limits.
The app-level RX logic per
The logic for prioritizing the instances of the given app for its execution priority list 535, via a continually repeating process, signals (via hardware wires dedicated for the purpose) to the controller 540 of the local manycore processor 500 (
The process periodically starts from priority order 0 (i.e., the app's instance with the greatest priority score P), and steps through the remaining priority orders 1 through the maximum supported number of instances for the given application (specifically, for its task located at the processing stage under study) less 1, producing one instance entry per each step on the list that is sent to the controller as such individual entries. Each entry of such a priority list comprises, as its core info, simply the instance ID #(as the priority order of any given instance is known from the number of clock cycles since the bit pulse marking the priority order 0 at the start of a new list). To simplify the logic, also the priority order (i.e., the number of clock cycles since the bit pulse marking the priority order 0) of any given entry on these lists is sent along with the instance ID #.
At the beginning of its core to app-instance assignment process, the controller 540 of the manycore processor uses the most recent set of complete priority order lists 535 received from the application RX modules 202 to determine which (highest priority) instances of each given app to assign for execution for the next core allocation period on that processor.
Per the foregoing, the ITC source prioritization, program instance execution prioritization and ITC flow control techniques provide effective program execution optimization capabilities for each of a set of individual programs configured to dynamically share a given data processing system 1 per this description, without any of the programs impacting or being impacted by in any manner the other programs of such set. Moreover, for ITC capabilities, also the individual instances (e.g., different user sessions) of a given program are fully independent from each other. The herein described techniques and architecture thus provide effective performance and runtime isolation between individual programs among groups of programs running on the dynamically shared parallel computing hardware.
From here, we continue by exploring the internal structure and operation of a given processing stage 300 beyond its RX logic per
Per
The monitoring of the buffered input data availability 261 at the destination app-instance FIFOs 260 of the processing stage RX logic enables optimizing the allocation of processing core capacity of the local manycore processor among the application tasks hosted on the given processing stage. Since the controller module 540 of the local manycore processor determines which instances of the locally hosted tasks of the apps in the system 1 execute at which of the cores of the local manycore array 515, the controller is able to provide the dynamic control 560 for the muxes 450 per
Internal elements and operation of the application load adaptive manycore processor system 500 are illustrated in
As illustrated in
Any of the cores 520 of a processor per
The hardware logic-based controller 540 module within the processor system, through a periodic process, allocates and assigns the cores 520 of the processor among the set of applications and their instances based on the applications' core demand figures (CDFs) 530 as well as their contractual core capacity entitlements (CEs). This application instance to core assignment process is exercised periodically, e.g., at intervals such as once per a defined number (for instance 64, 256 or 1024, or so forth) of processing core clock or instruction cycles. The app-instance to core assignment algorithms of the controller produce, per the app-instances on the processor, identification 550 of their execution cores (if any, at any given time), as well as per the cores of the fabric, identification 560 of their respective app-instances to execute. Moreover, the assignments 550, 560 between app-insts and the cores of the array 515 control the access between the cores 520 of the fabric and the app-inst specific memories at the fabric network and memory subsystem 800 (which can be implemented e.g., per the section below titled Memory Access Subsystem).
The app-instance to core mapping info 560 also directs the muxing 450 of input data from the RX buffers 260 of an appropriate app-instance to each core of the array 515, as well as the muxing 580 of the input data read control signals (570 to 590, and 575 to 595) from the core array to the RX logic submodule (
Similarly, the core to app-inst mapping info 560 also directs the muxing 600 of the (source) app-instance specific ITC permit signals (212 to 213) from the destination processing stages to the cores 520 of the local manycore array, according to which app-instance is presently mapped to which core.
Controller
Control Process
The app-instance to core mapping process implemented by the controller 540 of the manycore processor (of any given processing stage in the given multi-stage system) is used for maximizing the (value-add of the) application program processing throughput of the manycore fabric 510 shared among a number of software programs. This process, periodically selecting and mapping the to-be-executing instances of the set of app-tasks to the array of processing cores of the local processor, involves the following steps:
The periodically produced and updated outputs of the controller process will be used for periodically reconfiguring connectivity through the processor input data and read control multiplexers as well as the manycore fabric memory access subsystem.
Hardware Automation of Dynamic Resource Management
To enable rapidly re-optimizing the allocation and assignment of the system processing core capacity among the instances and tasks of the applications sharing the processing system per
Algorithm for Allocating the Cores Among the Applications
Objectives for the core allocation algorithm include maximizing the processor core utilization (i.e., generally minimizing, and so long as there are ready app-insts, eliminating, core idling), while ensuring that each application gets at least up to its entitled (e.g., a contract-based minimum) share of the processor core capacity whenever it has processing load to utilize such amount of cores. Each application configured for a given manycore processor is specified its entitled quota of the cores, at least up to which number of cores it is to be allocated whenever it is able to execute on such number of cores in parallel. Naturally, the sum of the applications' core entitlements (CEs) is not to exceed the total number of core slots in the given processor. Each application program on the processor gets from each run of the core allocation algorithm:
This algorithm allocating the cores to application programs runs as follows:
Moreover, the iterations of steps (ii) and (iii) per above are started from a revolving application program ID #s within the set, e.g., so that the application ID #to be served first by these iterations is incremented by one (and returning to 0 after reaching the highest application ID #) for each successive run of the algorithm. Furthermore, the revolving start app ID #s for the steps (ii) and (iii) are kept at offset from each other equal to the number of apps sharing the processor divided by two.
Accordingly, all cores of the array are allocated on each run of the above algorithm according to applications' processing load variations while honoring their contractual entitlements. I.e., the allocating of the array of cores by the algorithm is done in order to minimize the greatest amount of unmet demands for cores (i.e., greatest difference between the CDF and allocated number of cores for any given application) among the set of programs, while ensuring that any given program gets its CDF at least within its CE met on each successive run of the algorithm.
Algorithm for Assigning App-Insts for the Cores
Following the allocation of the array of cores among the applications, for each application on the processor that was allocated one or more cores by the latest run of the core allocation algorithm, the individual ready-to-execute app-insts are selected and mapped to the number of cores allocated to the given application at module 1010 of
The app-instance to core assignment algorithm for each given application begins by keeping any continuing app-insts, i.e., app-insts selected to run on the core array both on the present and the next core allocation period, mapped to their current cores. After that rule is met, any newly selected app-insts for the given application are mapped to available cores. Specifically, assuming that a given application was allocated k (a positive integer) cores beyond those used by its continuing app-insts, k highest priority ready but not-yet-mapped app-insts of the application are mapped to k next available (i.e., not-yet-assigned) cores within the array. In case that any given application had less than k ready but not-yet-mapped app-insts, the highest priority other (e.g., waiting) app-insts are mapped to the remaining available cores among the number of cores allocated to the given application; these other app-insts can thus directly begin executing on their assigned cores once they become ready.
Note further that, when the app-instance to core mapping module 1010 of the controller gets an updated list of selected app-insts for the applications (following a change in either or both of core to application allocations or app-instance priority lists of one or more applications), it will be able to identify from them the following: I. The set of activating, to-be-mapped, app-insts, i.e., app-insts within lists not mapped to any core by the previous run of the placement algorithm. This set I is produced by taking those app-insts from the updated selected app-instance lists whose ‘present assignment core’ in the latest app-instance assignment table was indicated as presently not mapped; II. The set of deactivating app-insts, i.e., app-insts that were included in the previous, but not in the latest, selected app-instance lists. This set II is produced by taking those app-insts from the latest assignment table whose core ID #indicated the app-instance as presently mapped, but that were not included in the updated selected app-instance lists; and III. The set of available cores, i.e., cores which in the latest assignment table were assigned to the set of deactivating app-insts (set II above).
The app-instance to core assignment algorithm uses the above info to map the active app-insts to cores of the array in a manner that keeps the continuing app-insts executing on their present cores, to maximize the utilization of the core array for processing the user applications. Specifically, the placement algorithm maps the individual app-insts within the set I of activating app-insts in their increasing app-instance ID #order for processing at core instances within the set III of available cores in their increasing core ID #order.
Moreover, regarding placement of activating app-insts (set I as discussed above), the assignment algorithm seeks to minimize the amount of core slots for which the activating app-instance demands a different execution core type than the deactivating app-instance did. I.e., the app-instance to core assignment algorithm will, to the extent possible, place activating app-insts to such core slots (within the core array of the local processor) where the deactivating app-instance had the same execution core type. E.g., activating app-instance demanding the DSP type execution core will be placed to the core slots where the deactivating app-insts also had run on DSP type cores. This sub-step in placing the activating app-insts to their target core slots uses as one of its inputs the new and preceding versions of the core slot ID indexed active app-instance ID and core type arrays, to allow matching the activating app-insts and the available core slots according to the core type, in order to minimize the need for core slot reconfigurations. For details on the core slot dynamic reconfiguration, please see the section below titled Task-type Adaptive Core Reconfiguration.
Summary of Process Flow and Information Formats for the App-Instance to Core Mapping Process
The production of updated mappings and control signals between the selected-for-execution app-instances and the processing core slots of the manycore array and the controller (of a given processing stage) from the core demand figures (CDFs) and app-instance priority lists of the apps (sharing the given manycore processor), as functionally detailed above, proceeds through the following stages and (intermediate) results.
The logic at the core allocation module 1010 of the controller 540 periodically samples the applications' CDF bits and, based on such samples, forms an application ID-indexed table (per Tbl. 2 below) as a ‘snapshot’ of the application CDFs as an input for next exercising of the core allocation algorithm (which is the first phase of the app-instance to core slot mapping process of the controller). An example of such format of the information is provided in Tbl. 2 below note however that in the hardware logic implementation, the application ID index, e.g., for range A through P, is represented by a digital number, e.g., in range 0 through 15, and as such, the application ID #serves as the index for the CDF entries of this array, eliminating the need to actually store any representation of the application ID for the app-ID indexed look-up Tbl. 2:
Regarding Tbl. 2 above, note that the values of entries shown naturally are simply examples of possible values of some of the application CDFs, and that the CDF values of the applications can change arbitrarily for each new run of the controller process.
Based on the app ID #indexed CDF array per Tbl. 2 above (and on the CEs of the app:s), the core allocation algorithm produces another similarly formatted app ID indexed table, whose entries at this stage are the number of cores allocated to each application, as shown in Tbl. 3 below:
Regarding Tbl. 3 above, note again that the values of entries shown are simply examples of possible number of cores allocated to some of the applications after a given run on the core allocation algorithm, as well as that in hardware logic this look-up-table is simply the numbers of cores allocated per application, since the application ID #for any given entry of this array is knowable from the index #of the given entry in the array.
The app-instance selection sub-process, done individually for each app, uses as its inputs the per-application core allocations per Tbl. 3 above, as well as priority ordered lists of ready app-instance IDs of each given app. Each such app specific list has the (descending) app-instance priority level as its index, and, as values stored at each such indexed element, the intra-application scope instance ID #, plus where applicable, an indication of the target core type (e.g., CPU, DSP, GPU or a specified ASP) demanded by the app-inst, per the example of Tbl. 4 below:
Notes regarding implicit indexing and non-specific examples used for values per Tbl:s 2 and 3 apply also for Tbl. 4.
The input data receive (RX) logic writes per each app the intra-app instance priority list per 4 to controller 540, to be used as an input for the active app-instance selection sub-process, which produces per-app listings of selected app-instances, along with their corresponding target core types where applicable. Based at least on the app specific lists of selected app-instances, the core to app-instance assignment algorithm produces a core ID #indexed array indexed with the app and instance IDs, and provides as its contents the assigned processing core ID (if any) for the app-instance with ID equal to the index of the given entry, per Tbl. 5 below:
Finally, by inverting the roles of index and contents from Tbl. 5, an array expressing to which app-instance ID #each given core of the manycore array got assigned, per Tbl. 6 below, is formed. Specifically, the Tbl. 6 format can be formed by using as its index the contents of Tbl. 5, i.e., the core ID numbers (other than those marked ‘Y’), and as its contents the app-instance ID index from Tbl. 5 corresponding to each core ID #along with, where applicable, the core type demanded by the given app-inst, with the core type for any given selected app-instance being denoted as part of the information flow produced from a data array per Tbl. 4. The format for the app-instance to core mapping info, along with demanded core slot type info (noting that in the logic implementation, the app ID #bits are used as such to determine the demanded core type), is illustrated in the example below:
Regarding Tbl:s 5 and 6 above, note that the symbolic application IDs (A through P) used here for clarity will in digital logic implementation map into numeric representations, e.g., in the range from 0 through 15. Also, the notes per Tbl:s 2-4 above regarding the implicit indexing (e.g., core ID for any given app-instance ID entry is given by the index of the given entry, eliminating the need to store the core IDs in this array) apply for the logic implementation of Tbl:s 5 and 6 as well.
By comparing Tbl:s 5 and 6 above, it is seen that the information contents at Tbl. 5 are the same as at Tbl. 6; the difference in purposes between them is that while Tbl. 6 gives for any core slot its active app-instance ID #to process, along with the demanded core type, Tbl. 5 gives for any given app-instance its processing core slot (if any at a given time).
Note further that when the app-instance to core placement module gets an updated list of selected app-instances for one or more applications (following a change in either or both of core to application allocations or app-instance priority lists of one or more applications), it will be able to identify from Tbl:s 5 and 6 the sets I, II and III discussed in the section above titled Algorithm for assigning app-insts for the cores.
Finally, note that the primary purpose of the description of the specific info formats and the associated processing in this subchapter is to give a concrete example of the operation of the controller algorithms. The actual hardware logic implementation differs somewhat from these (illustration purposes info formats) in order to achieve higher efficiency of the logic implementation.
Task-Switching
The capabilities per
To direct write and read control access from the array of cores 515 to the array of app-instance specific memories 1110, the controller 540 identifies, for app-instance specific muxes (
Based on the control by the controller 540 for a given core indicating that it will be subject to an app-instance switchover, the currently executing app-instance is made to stop executing and its processing state from the core is backed up to the segment of that exiting app-instance at the memory array, while the processing state of the next instance assigned to execute on the given core is retrieved to the core from the memory array. Note that ‘processing state’ herein refers to processing status data, if any, stored at the core, such as the current executing app-instance-specific processor register file contents. During these app-instance-switching proceedings the operation of the cores subject to instance switchover is controlled through the controller and switchover logic at the cores, with said switchover logic backing up and retrieving the outgoing and incoming app-instance processing states from the memories. Cores not indicated by controller as being subject to instance switchover continue their processing uninterruptedly through the core allocation period transitions.
Note that applying of updated app-instance ID #configurations for the core specific mux:s of the XC (
Memory Access Subsystem
Architecture
Each processing stage of the herein described multi-stage manycore processing system includes a memory access subsystem per
A key benefit of the herein described fast-access memory content optimization and associated task instance scheduling optimizations, as is the case with the rest of the system runtime functionality per this description, is that neither any user or system software running on the processors utilizing these inventive techniques needs to get involved with or even be aware of the these hardware automated routines handling the dynamic optimization of the execution environment for the user programs. This system architecture thus enables scaling the application program capacities and processing performance beyond the limits of conventional systems where the increase in system software overhead would place a limit on scalability.
Regarding the text in
Since the different application instances are isolated from each other in accessing their memories, the operation of the memory access system per
Updating On-Chip RAMs
The processing stage controller periodically assigns an instance of one of the app-tasks hosted at the local processor for execution on one of the cores within its manycore fabric. The given task instance executing on its assigned core accesses its program instructions and processing data from its dedicated fast-access, e.g., on-chip, random access memory 1410. The task, with the hardware resources per
When a task thus needs access to instructions or data that has to be fetched from the slow-access RAM 1420, the task writes to specified hardware device registers (at defined addresses within the task's memory space) information that is used by the associated hardware logic, referred to as the “RAM broker” in
A specification for the task-instance specific hardware device registers (in the RAM broker logic module) writeable and readable by software executing on a core assigned at that time for the given application task instance, controlling the memory transfer operations performed by the RAM broker, is provided in Tbl. 7 below:
Regarding the example fast/slow-access memory content transfer control and status device registers in Tbl. 7, note that in various scenarios, multiple types of variations of the information formats are possible. For instance, it is possible that the software configuring the commands for copying contents between certain blocks at fast and slow access memories, instead of specifying the actual hardware memory address ranges, uses e.g., enumerated references to memory blocks to be copied, with the hardware providing a look-up table storing the physical memory address ranges corresponding to any given target memory block referred to by the software via such shorthand notations. That way, the software requesting a slow/fast access memory transfer by configuring this device register does not need to know the actual source or destination physical memory addresses of the content block to be transferred.
As illustrated in Tbl. 7, in addition to specifying the memory ranges to be copied between fast- and slow-access RAMs, the app instance software also sets a device register bit indicating, when applicable, that the app instance is waiting for requested memory content transfers to be completed before it can resume executing. While it has that way signaled (through an associated bit to the controller) that it is waiting for updating of its fast-access memory, the app instance software can however back up its state (e.g., processor register file contents) from its present execution core to its dedicated RAM. The RAM broker module resets this memory contents transfer status bit once the specified transfers are completed, and this status bit is provided as a status back to the app instance software (readable by the task from the same device register location to where the task set that bit). This memory content transfer completion status is also provided from the RAM broker to the controller, so that the controller knows which program task instances at any given time are waiting for updating of their fast-access RAM contents before such task instances are able to resume their execution.
Forming and Usage of App Core Demand Figures and Instance Priority Lists Based on App-Instance Fast-Access RAM Status
The processing stage controller uses (among any other relevant info, incl. the input data availability as described in previous chapters), these fast-access memory contents ready/not ready status bits from the application task instances hosted on its local processor in deciding which task instances to select for execution on the cores of the local processor at any given time. To minimize core idling, the controller task selection algorithm gives greater selection priority, at least among instances which otherwise would have equal selection priority, to such task instances whose status indicates that the task is not waiting for a completion of fast/slow-access memory content transfers before it can continue its execution.
The controller process uses the fast-access memory ready status indications of the application task instances sharing the array of processing cores as follows:
Specifically, the execution priority order of the instances of the given application is determined according to their decreasing order of prio_index signals per the below Verilog code (TOP STAGE INDEX equals the count of worker stages in the processing system):
Accordingly, the intra-app instance execution order prioritization is done per the below steps:
Note that, per
The app-task-instance to core assignment algorithms that the controller periodically performs result in the controller providing dynamic configuration for the interface logic between the cores of its local processor and its app-task-instance specific RAMs as well as device registers so that each given core has read and write access to the RAM and the device registers of the app-task-instance assigned presently for execution on the given core, and so that external input and inter-task communication data gets connected dynamically to that core where any given app-task-instance may be executing at any given time.
The control outputs from the controller also include indications for presently executing task instances that were not selected for execution on the next CAP to back up their processing context from their present execution cores to their memories before the assigned tasks are switched for these cores. Note that this minimal interaction between the software and the processor hardware fabric can also be replaced by hardware routines, as follows: When an app-task-instance software is signaled by the controller to exit its present execution core, the software running on it configures a given device register at its present core to launch a hardware routine to automatically copy the state variables (e.g., processor core register file contents) from the core to a specified address range of the RAM associated with the app-task-instance signaled to exit. Moreover, as the exiting app-task-instance's processing state thus gets transferred to the existing app-task-instance's RAM, another hardware routine copies the previously backed up processing state for the next app-task-instance assigned for the given core (e.g., to the core's register file) from the RAM of such incoming app-task-instance.
Access to Off-Chip RAMs
In addition to a dedicated fast-access RAM on the processor for each of its locally hosted application task instances, there is a dedicated slow-access RAM for each application program hosted on the given processor. Such an application-program specific RAM has memory segments within for each of its task instances dynamically executing on that processor. Note that the input and output (IO) pin capacity limits of the processor chip may not allow providing separate slow-access RAMs for each application task instance hosted on the given processor, and that's why the task instances of any given application may have to share the same application-specific slow-access RAM. The RAM broker logic in such implementation scenarios also are specific to a given application program, and for each given application, its specific RAM broker arbitrates the write and read accesses to the slow-access RAM of the application requested by its local task instances.
In such implementations, the RAM broker, in arbitrating access among the memory content transfer requests of the instances of its associated application to the slow-access RAM of that application, uses the a request priority index formed from following criteria in selecting the next memory transfer request to be performed: (1) the time that a given requested memory context transfer has been waiting to be served, with longer elapsed waiting times increasing the priority of the request, (2) the execution priority of the requesting instance as considered without regard to whether any given instance is waiting for completion of its memory content transfer requests, and (3) the indicated length of the requested memory content transfer, with the longer transfers getting reduced priority. With such prioritization and (dynamic) scheduling of the memory transfer requests, the RAM broker can determine the present expected total time until the completion of any given fast/slow-access RAM content transfer requested by an app instance, and accordingly advertise this as the memory content transfer status (as a number of clock cycles until completion of updating the fast-access RAM for a given app instance) to the controller of the local processing stage, for the controller to take as an input in prioritizing and selecting the task instances for execution on the cores under its control for the successive CAPs.
Access to Non-Volatile Memory
Furthermore, besides the slow-access RAM, also a non-volatile media for storing the application programs hosted on a given processor that utilizes the invented memory management techniques. Note that in certain implementation scenarios, what is in
While there is dedicated fast-access RAM for each supported instance of each application task hosted on a given processor, along with a dedicated slow-access RAM for each application hosted on that processor, there is a common volatile memory for storing the program code and any back up data for all the applications dynamically sharing the given manycore processor. This practice reduces the IO pin count for the processor chip while still providing sufficient memory access performance, since accesses by the applications to their (slowest-access of the three memories discussed) slow-access non-volatile memory will in practice be relatively infrequent, and can in certain cases be limited to mainly application start up periods. In cases of such common non-volatile memory being shared among all the applications running on a given processor, the application specific RAM brokers interact through a per-processor-chip common arbitrator hardware logic module which provides for each application its fair share of interface bandwidth to the common backup memory as well as enforces write and read access rules between the different applications, e.g., by keeping any given application-specific segments of the memory as non-writeable and/or non-readable by the other applications, as well as potentially non-writeable also by the application to whom such memory segments belong to. In a particular implementation scenario, the arbitrator connecting the processor to common backup memory interface simply time-divides the access to the memory among the applications on the processor either evenly or according to contractual entitlements to such interface capacity by the applications. In an alternative implementation, the arbitrator for accessing the memory allows any given application to get as much of its demand for the interface bandwidth (e.g., time share over a specified monitoring period) as is possible without violating any other application's actually-materialized demand for access to its fair or contractually entitled share of the interface capacity. In alternative implementation scenarios still, there is a dedicated non-volatile memory for each application hosted on a given processor, with no need for arbitration among the applications for access to the interfaces.
Inter-App Isolation
Together, the memory architecture and resource access systems and methods per above keep the individual applications among a set of application dynamically sharing a given manycore processor effectively, as desired, isolated from each other's. Such isolation means that, e.g., the resource access levels available for any given application among such set the will not be negatively impacted by the behavior of any of other application among that set, at least compared to a case of static resource allocation among the applications and possibly their tasks and instances. Moreover, the hardware based dynamic resource management techniques per these disclosures do not enable any undesired or unauthorized interaction between the applications sharing the manycore processor systems according to these specifications. However, the applications running on the processors using the described inventive techniques benefit from the cost-efficiencies created by the secure, deterministic, yet dynamically optimized sharing of the processing resources.
Task-Type Adaptive Core Reconfiguration
Background
Note: This chapter applies to programmable logic (FPGA) implementations of the manycore array (of a processing stage as otherwise described herein).
The following publications provide 3rd party (FPGA vendor created) material for the description in this chapter:
The reference [X1] provides user documentation for reconfiguring portions of programmable logic chips. The references [X2], [X3], [X4], [X5] and [X6] discuss implementation techniques for, under the control of user logic, reconfiguring portions (slots) in programmable logic chips, such as the core slots of the herein described manycore array, with identified alternative hardware logic functions, such as the differing processing core types discussed, e.g., Application Specific Processors (ASPS). The reference [A1] discusses techniques translating functions of software programs to custom hardware logic implementations, e.g., ASPS.
More specifically, concerning reconfiguring the logic of parts of programmable logic devices or field programmable gate array microchips (FPGAs), [X2] discusses techniques for how the FPGA logic can control reconfiguring sub-areas of the FPGA, while [X3] details an implementation of an FPGA logic design to control an “Internal Configuration Access Port” (ICAP) of a Xilinx FPGA to reconfigure a particular area of the FPGA with an identified logic configuration bitstream; see in particular pp. 46-47 of the source journal of [X3] referring to the FIGS. 2 and 3 of the article, under its captions “Reconfiguration Process” and “Inside ICAP”. [X4] describes interacting with said ICAP (specifically, ICAPE2 in Xilinx Series 7 FPGAs) by user designed logic, including specifying a configuration bitstream (by its start address in a non-volatile memory storing multiple alternative full and/or partial configuration bitstreams) to be used for a (partial) reconfiguration of the FPGA; see, in particular subsections ‘IPROG’ and ‘WBSTAR’ on pp. 122-123, and “IPROGReconfiguration” and “IPROGUsing ICAPE2” on pp. 124-125. [X5] provides documentation for creating partial reconfiguration logic programming bit files, while [X6] describes techniques for partial reconfiguration of the logic etc. in a defined sub-area of an FPGA chip, while keeping the functions of the chip not subject to any given partial reconfiguration process unimpacted during such partial reconfigurations. [A1] discusses an OpenCL compiler for translating software (C-language) program functions to hardware that implements each operation of such functions.
Note that these 3rd party technologies however do not enable adapting the types of processing resources in a given resource pool according to the processing load and type demand variations presented by a group of applications configured to dynamically share the given pool of processing resources. The technology as herein described enables accomplishing that goal.
General
The process to adapt the execution core slots of the manycore arrays to match the types of the app tasks assigned for execution on any given core slot are operationally independent from each other, and thus the description of such a process in the following is focused on the reconfiguration of just an (arbitrary) single core slot within the manycore array of any of the processing stages of the given multi-stage manycore processing system (as otherwise described in this description). Moreover, since there is just one task type per any given application located at any given processing stage, any and all instances of any given application present the same task for processing on the core slot under study. Thus for the purposes of the descriptions in this chapter, all instances of the given app assigned for the given core slot under study are identical, and moreover, so are all instances of those applications whose tasks hosted at the given processing stage under study demand the same core type.
Logic Implementation
In the context of
Per
In the specific logic system illustrated in
Please see the reference [X4], especially pp. 124-125, for details of a particular implementation possibility; in such implementation scenario, the value for Warm Boot Start Address (WBSTAR) register can be used to identify the logic configuration file for the partial reconfiguration demanded to reprogram the hardware logic of a given target core slot to the demanded core type to match the processing application assigned for such target core slot, and the issuing of IPROG command can be used to launch the demanded reconfiguration with the identified partial reconfiguration file. Note that in these implementation scenarios, the individual partial logic reconfiguration files also identify their target core slot; in such scenarios, for each core type, an individual file is needed per each possible target core slot among the array. The RAP further provides for the RAPIF status of the demanded core slot logic reprogramming, including of its completion. Based on the timing of control and status of the configuration access port, the RAPIF provides any applicable control, such as reset, for the core slot instance subject to the reconfiguration. Such control provided during the reconfiguration of a given core slot prevents unintended interactions between that core slot and the rest of the system, by keeping the inputs to and outputs from (other than the inputs and any outputs used for reconfiguration) the core slot under reconfiguration in their passive values. The reference [X3] provides specification for a possible implementation of such control and status signals.
Furthermore, besides the identification of a core type for a given core slot, the signals from the processing stage controller includes an identification of changes in the demanded core type for the given core slot. This information about change events in the core type demanded for a given core slot is used by the RAPIF (
Note that the techniques per above, along with those per e.g., [A1] for synthesizing segments of software programs into custom hardware logic designs, referred to here as application specific processors (ASPS), enable creating logic configuration files such that configure the programmable logic of their target core slot into a hardware logic implementation that performs the information processing function directly according to their source software program (segment) without a need for any executable program instructions. I.e., such ASPS, for which the techniques described herein enable configuring processing cores as demanded, are able to produce the intended processing results of their associated software programs or tasks thereof without any software overhead (including without fetching, caching, scheduling, pipelining or serially processing any instructions), by processing the appropriate input data directly in the custom hardware logic to produce the requested results, e.g., output data. For instance, an ASP can process in parallel custom hardware logic gates all of the functionality of the source software program for the ASP that do not need to be processed sequentially. Such ASPS, compared to conventional processor cores that rely on sequences of program instructions for controlling their operation, can thus significantly speed up a given information processing function as well as improve the energy etc. resource efficiency of the processing, in particular when used in combination with the other application load and type adaptive processing techniques per this description including its incorporated references.
Billing Sub-System
Objectives
The presented billing techniques are designed for maximizing the value-add of the application processing throughput of a multi-user-application parallel computing platform across a set of users of the service provided with the platform. These billing techniques, for any given user contract among the contracts supported by the platform, and on any given billing assessment period, determine a level of a demand for the capacity of the platform associated with the given contract that is met by a level of access to the capacity of the platform allocated to the given contract, and assess billables for the given contract based on (1) such met demand and (2) a level of assured access to the capacity of the platform associated with the given contract, as well as (3) billing rates, applicable for the given billing assessment period, for (a) the met demand and (b) the level of assured access associated with the given contract.
A logic block diagram of a billing subsystem 1610 for each processing stage of the cloud processor per the foregoing is presented in
The presented cloud processor billing techniques target maximizing: i) the on-time data processing throughput per unit cost for the users of a given processing system per this description, and ii) the revenue over a period of time for the service provider operating such a system of a certain total cost. Accordingly, these techniques have the following objectives:
These objectives reflect the utility for the users running their programs on a system per this description; the users are assumed to perceive value in, and be willing to pay for, assured access to their desired level of capacity of a given compute system and their actual usage of the platform capacity. Accordingly, the above objectives 1) and 2) are among principal factors driving the revenue for the operator of the given system per this description.
Billing Formula
Per
B=x*CE+y*DBCA (Equation 1),
wherein CE stands for core entitlement for the user, DBCA stands for the amount of core allocations to that user's program to meet its CDFs for the Core Allocation Periods (CAPs, e.g., 1 microsecond each) during the contract time period in question, and x and y are billing rates per the contract that convert CE and DBCA into monetary figures.
An advantage of this billing method is that a portion (i.e., the term y*DBCA) of the cost of the utility computing service for a user running its program on a system per this description is based on the CDFs of the user's program (to the degree that CDFs are met by core allocations). Therefore, each user of the system per this description has an economic incentive to configure its programs so that they eliminate any CDFs beyond the number of cores that the given program is actually able to utilize at the given time. If so allowed for a given user contract, the system will generate the CDFs for the user automatically based on the input data load levels for the user program instances. Whether the CDFs are generated by user programs or the system on their behalf, the users have the incentive to not automatically (cause a) demand (for) at least their CE worth of cores irrespective of on how many cores the given program is able to execute on in parallel at any given time. This incentive leads to increasing the average amount of surplus cores for runs of the core allocation algorithm i.e., cores that can be allocated in a fully demand driven manner (rather than in a manner to just meet the CDFs by each application for their CE figure worth of cores). Such maximally demand driven core allocation (which nevertheless allows guaranteeing each user application an assured, contract defined minimum capacity access level whenever actually demanded) facilitates providing maximized value-adding processing throughput per normalized cost across the set of user applications dynamically sharing the system per this description.
Moreover, either or both of the billing rates x and y for Equation 1 can be specified in the user contract to vary over time. The term x*CE can take a form of a sum such as x1*CE1+x2*CE2, wherein, for example, x1 is the billing rate for a core entitlement during specified premium businesses hours (e.g., Monday-Friday 9 am-5 pm at the local time zone of the given platform or user) and x2 the billing rate for a core entitlement outside the premium business hours, while CE1 and CE2 are core entitlements for the given user contract for the premium and non-premium hours, respectively. Naturally, there can be more than just two time phases with their respective billing rates. For instance, in addition to premium pricing during the business hours, also evening hours 5 pm-1 am could have a different billing rate than 1 am-9 am, and so forth, depending on the popularity of the compute capacity usage during any given hours of the day. Similarly, different days of the week, special calendar days etc. can have different billing rates, based on expected popularity of compute capacity on such days. Naturally, this discussion applies also the for the coefficient y of the term y*DBCA in Equation 1.
Per
Usage Scenarios
The compute capacity provider operating a platform based on system(s) per this description can offer different types of CE time profiles for different application types. For instance, a service provider operating the platform could sell four basic contract types with differing CE time profiles per examples of contract plans A, B, C and D in Tbl. 8 below:
As illustrated in Tbl. 8, the capability to allow configuring compute capacity contracts with differing CE time profiles, particularly contract types with non-overlapping CE peaks on a given platform per this description, can be used both for improving the computing cost-efficiency for the users of the compute service provided through the platform as well as increasing the revenues that the compute capacity service provider is able to achieve with the platform of a certain cost of ownership. Either or both of the CE and DBCA billing rates can be set for different values on the different billing assessment periods (BAPS) within day, week, month, etc., in order to optimally even out the user program's collective processing load for a given system per this description over time, and thereby, maximize the cost efficiency for the users of the computing service provided with the given platform and/or the revenue generation rate for the service provider operating the platform. For instance, in an example scenario, the CE billing rate on business days could be $0.08 per a core for the BAP of the business hours, $0.04 for the BAP of the evening hours, and $0.01 for the BAP of night hours, while DBCA billing rate, per the average number of demand based cores allocated to a given program over the eight hours of these daily BAPS, could be $0.04 for the business, $0.02 for evening, and $0.01 for night BAPS. These daily BAP billing rates can naturally be set to any other values as well, and can have differing values on different calendar days, as well as different weekdays (e.g., Monday-Friday versus Saturday-Sunday) can have non-uniform BAP phasing (e.g., Saturday-Sunday could replace the business hour BAP of Monday-Friday with ‘extended’ evening hour BAP), etc.
With the example values of Tbl. 8 for a mix (or ‘basket’) of enterprise, entertainment (including news etc.), batch job (overnight block data processing), and always-on type of applications, it can be seen that the capability to configure applications for a given platform per this description with different CE time profiles enables the service provider operating the platform to support a given set of applications, with their collective CE requirements, with a significantly reduced system processing core capacity requirement, i.e., with a lower cost base for the revenues generated by the given set of user applications. With the numerical example shown in Tbl. 8, this system core utilization efficiency gain with time-profiled contract CEs compared to flat CEs enables a reduction from 30 to 16 cores needed for the provided mix of user contracts. In turn, this compute resource utilization efficiency gain through time profiled CEs reduces the cost of revenue for the utility computing service provider by an accordant factor. Put differently, the service provider's revenue per unit cost of the service provided (driven by the number of cores needed to support a given set of contracts) is multiplied accordingly.
Note that in discussion herein regarding the example of Tbl. 8, also the flat CE reference, against which the cost-efficiency of the time profiled CE contracts are compared, is assumed to be implemented on a system per this description that supports the application load adaptive core allocation dynamic parallel execution techniques per the preceding chapters. Since the described dynamic compute resource allocation with contract specified minimum system access level guarantees (to be met when so demanded) is not supported by conventional computing systems, the contracts supported with a platform per this description, i.e., contracts with the capability to burst to up to the full system core capacity while having a contract defined minimum assured level of access to the shared system capacity, provide a higher market value than conventional contract types, which provide either only a dedicated capacity share (but without a capability to dynamically, without user or platform operator involvement, burst beyond the dedicated cores) or a capability to burst (but without a contract defined minimum core count based access level that the user contract is guaranteed to get whenever demanded).
Moreover, regarding Tbl. 8, also note that CE level of 0 does not imply that such contract type would not allow the application under that contract to execute on its host system per this description during the hours in question; instead, CE of 0 indicates that, while the application is not guaranteed to have its CDFs met for up to any specified minimum core count, it will still in practice get its demand based fair of share of the cores allocated to it after the CDFs of set of the applications up to their CE levels have been met (per the algorithm for allocating the cores among the applications). In fact, at times when there are no other user application expressing a positive CDF at a given system per this description, the application with CE of 0 will get its CDFs met all the way to the total core count of the array.
The 24-hour cycle for the CE time profiles per example of Tbl. 8 here is merely to illustrate the capability to facilitate efficient combining of applications with differing demand time profiles for compute capacity into a shared compute capacity pool. In various scenarios, there can be, for instance, further variants of plans within the basic contract types (e.g., plans A through D per Tbl. 8) such that offer greater CE levels than the norm for the given base plan (e.g., plan A) at specified seasons or calendar dates of the year (either during the peak hours of the profile or throughout given 24-hour days) in exchange of lower CE levels than the norm for that base plan at other dates or seasons. Besides combining contracts with differing CE profiles within 24 h cycles as illustrated in Tbl. 8 to dynamically share the same capacity pools, the system also facilitates combining the seasonally differing variants of contracts within a given plan type (i.e., variants with non-coinciding seasonal peaks in their CE profiles) in the same capacity pools for further capacity utilization efficiency gains, in addition to the 8-hour phases shown in Tbl. 8. Moreover, there can be variants of contract types within a given base plan that have finer time granularity in their CE profiles. For instance, among the contracts of type B, there can be a variant that offers greater than the standard CE level of the plan type for the night hours (e.g., 1 am-9 am) at specific timeslots (e.g., for a news casts at for 15 minutes at 6 am, 7 am, 8 am) in exchange of lower CE at other times during the night hours. The system facilitates efficiently combining these types of variants of contracts within a given base type with complementary peaks and valleys in their CE profiles also within a given (8 hour) phase of the 24 h cycle. As well, this type of combining of complementary variants (either seasonally, within 24 h cycles, etc.) of a given contract type can take place within the aggregate CE subpool of the contracts of the given base type. In the example shown in Tbl. 8, this type of intra contract type combining of complementary variants can thus take place, e.g., among the three contracts of type B, whose aggregate CE level is, for instance, during the night hours worth 3*2=6 cores for each CAP. At systems per this description with greater number of cores, there will normally be a greater number of applications of any given type sharing the systems (and a greater subpool of CEs for each contract type) than what is shown in the simple illustration example of Tbl. 8.
Hardware Implementation for High Resolution Billing with Minimized Overhead
The direct hardware logic implementation of the user application billing counters per
Summary
The presented dynamic parallel cloud computing billing model enables combining the desired aspects of per-user dedicated and multi-user shared-capacity-based computing services. Each user is guaranteed its access to its contract-specified level of the processing capacity whenever actually demanded. However, the contract specified capacity entitlements are neither kept locked down to their associated programs (at times when the processing load associated with a given user program does not demand its entitlement worth of processing core capacity) nor are they any limits for maximum capacity available for their user programs (at times when the processing load of a given user program exceeds its entitlement worth of core capacity). In fact, the incentives that the billing model provides for the user programs to economize on their core capacity demand expressions (i.e., to demand just as much capacity as their current processing load demands, rather than at least their capacity entitlement worth of processing cores regardless of the actual processing load) lead to maximization of the portion of the system processing capacity available for realtime application processing load variation based capacity allocation, to match the processing capacity demand peaks of the user programs (beyond their capacity entitlement levels).
Accordingly, the presented billing techniques for parallel processing system capacity utilization and application processing performance (per normalized cost) optimization described in the foregoing provide the following fundamental advantages:
The presented pricing optimization and billing techniques, in particular when combined with dynamic parallel cloud computing techniques per the preceding chapters of this execution environment system description, thus are designed for maximizing the overall utility computing cost-efficiency, particularly for workflows demanding parallel execution for on-time processing throughput performance gain.
Further reference specifications for aspects and embodiments of the invention are in the references [1] through [10].
The functionality of the invented systems and methods described in this specification, where not otherwise mentioned, is implemented by hardware logic of the system (wherein hardware logic naturally also includes any necessary signal wiring, memory elements and such).
Generally, this description and drawings are included to illustrate architecture and operation of practical embodiments of the invention, but are not meant to limit the scope of the invention. For instance, even though the description does specify certain system elements to certain practical types or values, persons of skill in the art will realize, in view of this description, that any design utilizing the architectural or operational principles of the disclosed systems and methods, with any set of practical types and values for the system parameters, is within the scope of the invention. Moreover, the system elements and process steps, though shown as distinct to clarify the illustration and the description, can in various embodiments be merged or combined with other elements, or further subdivided and rearranged, etc., without departing from the spirit and scope of the invention. Finally, persons of skill in the art will realize that various embodiments of the invention can use different nomenclature and terminology to describe the system elements, process phases etc. technical concepts in their respective implementations. Generally, from this description many variants and modifications will be understood by one skilled in the art that are yet encompassed by the spirit and scope of the invention.
This application is a continuation of U.S. application Ser. No. 18/116,389 filed Mar. 2, 2023, which is a continuation of U.S. application Ser. No. 17/979,542 filed Nov. 2, 2022 (now U.S. Pat. No. 11,687,374), which is a continuation of U.S. application Ser. No. 17/859,657 filed Jul. 7, 2022 (now U.S. Pat. No. 11,500,682), which is a continuation of U.S. application Ser. No. 17/470,926 filed Sep. 9, 2021 (now U.S. Pat. No. 11,385,934), which is a continuation application of U.S. application Ser. No. 17/463,098 filed Aug. 31, 2021 (now U.S. Pat. No. 11,347,556), which is a continuation application of U.S. application Ser. No. 17/344,636 filed Jun. 10, 2021 (now U.S. Pat. No. 11,188,388), which is a continuation application of U.S. application Ser. No. 17/195,174 filed Mar. 8, 2021 (now U.S. Pat. No. 11,036,556), which is a continuation application of U.S. application Ser. No. 16/434,581 filed Jun. 7, 2019 (now U.S. Pat. No. 10,942,778), which is a continuation application of U.S. application Ser. No. 15/267,153 filed Sep. 16, 2016 (now U.S. Pat. No. 10,318,353), which is a continuation application of U.S. application Ser. No. 14/318,512 filed Jun. 27, 2014 (now U.S. Pat. No. 9,448,847), which claims the benefit and priority of the following provisional applications: [1]. U.S. Provisional Application No. 61/934,747 filed Feb. 1, 2014; and[2]. U.S. Provisional Application No. 61/869,646 filed Aug. 23, 2013; This application is also related to the following co-pending, previously pending, or patented applications: [3]. U.S. Utility application Ser. No. 13/184,028, filed Jul. 15, 2011;[4]. U.S. Utility application Ser. No. 13/270,194, filed Oct. 10, 2011 (now U.S. Pat. No. 8,490,111);[5]. U.S. Utility application Ser. No. 13/277,739, filed Nov. 21, 2011 (now U.S. Pat. No. 8,561,078);[6]. U.S. Utility application Ser. No. 13/297,455, filed Nov. 16, 2011;[7]. U.S. Utility application Ser. No. 13/684,473, filed Nov. 23, 2012 (now U.S. Pat. No. 8,789,065);[8]. U.S. Utility application Ser. No. 13/717,649, filed Dec. 17, 2012 (now U.S. Pat. No. 8,745,626);[9]. U.S. Utility application Ser. No. 13/901,566, filed May 24, 2013 (now U.S. Pat. No. 8,793,698); and[10]. U.S. Utility application Ser. No. 13/906,159, filed May 30, 2013 (now U.S. Pat. No. 8,935,491). All above identified applications are hereby incorporated by reference in their entireties for all purposes.
Number | Name | Date | Kind |
---|---|---|---|
4402046 | Cox et al. | Aug 1983 | A |
4403286 | Fry et al. | Sep 1983 | A |
4404628 | Angelo | Sep 1983 | A |
4956771 | Neustaedter | Sep 1990 | A |
5031146 | Umina et al. | Jul 1991 | A |
5237673 | Orbits et al. | Aug 1993 | A |
5303369 | Borcherding et al. | Apr 1994 | A |
5341477 | Pitkin et al. | Aug 1994 | A |
5452231 | Butts et al. | Sep 1995 | A |
5519829 | Wilson | May 1996 | A |
5600845 | Gilson | Feb 1997 | A |
5612891 | Butts et al. | Mar 1997 | A |
5692192 | Sudo | Nov 1997 | A |
5752030 | Konno et al. | May 1998 | A |
5809516 | Ukai et al. | Sep 1998 | A |
5931959 | Kwiat | Aug 1999 | A |
6072781 | Feeney et al. | Jun 2000 | A |
6108683 | Kamada et al. | Aug 2000 | A |
6211721 | Smetana | Apr 2001 | B1 |
6212544 | Borkenhagen et al. | Apr 2001 | B1 |
6289434 | Roy | Sep 2001 | B1 |
6289440 | Casselman | Sep 2001 | B1 |
6334175 | Chih | Dec 2001 | B1 |
6345287 | Fong et al. | Feb 2002 | B1 |
6353616 | Elwalid et al. | Mar 2002 | B1 |
6366157 | Abdesselem et al. | Apr 2002 | B1 |
6605960 | Veenstra | Aug 2003 | B2 |
6721948 | Morgan | Apr 2004 | B1 |
6728959 | Merkey | Apr 2004 | B1 |
6769017 | Bhat et al. | Jul 2004 | B1 |
6782410 | Bhagat et al. | Aug 2004 | B1 |
6816905 | Sheets et al. | Nov 2004 | B1 |
6909691 | Goyal et al. | Jun 2005 | B1 |
6912706 | Stamm et al. | Jun 2005 | B1 |
6986021 | Master et al. | Jan 2006 | B2 |
7028167 | Soltis, Jr. et al. | Apr 2006 | B2 |
7058868 | Guettaf | Jun 2006 | B2 |
7093258 | Miller et al. | Aug 2006 | B1 |
7099813 | Nightingale | Aug 2006 | B2 |
7110417 | El-Hennawey et al. | Sep 2006 | B1 |
7117372 | Trimberger | Oct 2006 | B1 |
7165256 | Boudnik et al. | Jan 2007 | B2 |
7177961 | Brice, Jr. et al. | Feb 2007 | B2 |
7178145 | Bono | Feb 2007 | B2 |
7200837 | Stevens | Apr 2007 | B2 |
7307445 | Liang | Dec 2007 | B2 |
7315897 | Hardee et al. | Jan 2008 | B1 |
7328314 | Kendall et al. | Feb 2008 | B2 |
7349414 | Sandstrom | Mar 2008 | B2 |
7370013 | Aziz et al. | May 2008 | B1 |
7389403 | Alpert et al. | Jun 2008 | B1 |
7406407 | Larus | Jul 2008 | B2 |
7437730 | Goyal | Oct 2008 | B2 |
7444454 | Yancey et al. | Oct 2008 | B2 |
7447873 | Nordquist | Nov 2008 | B1 |
7461376 | Geye et al. | Dec 2008 | B2 |
7469311 | Tsu et al. | Dec 2008 | B1 |
7478097 | Emeis et al. | Jan 2009 | B2 |
7490328 | Gavish et al. | Feb 2009 | B2 |
7503045 | Aziz et al. | Mar 2009 | B1 |
7518396 | Kondapalli et al. | Apr 2009 | B1 |
7581079 | Pechanek | Aug 2009 | B2 |
7599753 | Taylor et al. | Oct 2009 | B2 |
7631107 | Pandya | Dec 2009 | B2 |
7665092 | Vengerov | Feb 2010 | B1 |
7669035 | Young et al. | Feb 2010 | B2 |
7685409 | Du et al. | Mar 2010 | B2 |
7698541 | Robles | Apr 2010 | B1 |
7738496 | Raza | Jun 2010 | B1 |
7743001 | Vermeulen et al. | Jun 2010 | B1 |
7760625 | Miyaho et al. | Jul 2010 | B2 |
7765512 | Neuendorffer | Jul 2010 | B1 |
7765547 | Cismas et al. | Jul 2010 | B2 |
7774532 | Yamazaki | Aug 2010 | B2 |
7802255 | Pilkington | Sep 2010 | B2 |
7805706 | Ly et al. | Sep 2010 | B1 |
7818699 | Stuber et al. | Oct 2010 | B1 |
7861063 | Golla et al. | Dec 2010 | B1 |
7908606 | Depro et al. | Mar 2011 | B2 |
7971072 | Donlin et al. | Jun 2011 | B1 |
7984246 | Yung et al. | Jul 2011 | B1 |
7990974 | Gmuender et al. | Aug 2011 | B1 |
7996346 | Bell, Jr. et al. | Aug 2011 | B2 |
8001549 | Henmi | Aug 2011 | B2 |
8015392 | Naik et al. | Sep 2011 | B2 |
8018866 | Kasturi et al. | Sep 2011 | B1 |
8018961 | Gopinath et al. | Sep 2011 | B2 |
8024731 | Cornwell et al. | Sep 2011 | B1 |
8032889 | Conrad et al. | Oct 2011 | B2 |
8046766 | Rhine | Oct 2011 | B2 |
8050256 | Bao et al. | Nov 2011 | B1 |
8055880 | Fujisawa et al. | Nov 2011 | B2 |
8059674 | Cheung et al. | Nov 2011 | B2 |
8060610 | Herington | Nov 2011 | B1 |
8087029 | Lindholm et al. | Dec 2011 | B1 |
8095662 | Lappas et al. | Jan 2012 | B1 |
8098255 | Fouladi et al. | Jan 2012 | B2 |
8136153 | Zhang et al. | Mar 2012 | B2 |
8144149 | Jiao et al. | Mar 2012 | B2 |
8145894 | Casselman | Mar 2012 | B1 |
8174287 | German | May 2012 | B2 |
8195896 | Barsness et al. | Jun 2012 | B2 |
8230070 | Buyya et al. | Jul 2012 | B2 |
8234652 | Arimilli et al. | Jul 2012 | B2 |
8271730 | Piry et al. | Sep 2012 | B2 |
8296434 | Miller et al. | Oct 2012 | B1 |
8299816 | Yamada et al. | Oct 2012 | B2 |
8327126 | Bell, Jr. et al. | Dec 2012 | B2 |
8352609 | Maclinovsky et al. | Jan 2013 | B2 |
8352611 | Maddhuri et al. | Jan 2013 | B2 |
8407658 | Needham | Mar 2013 | B2 |
8429630 | Nickolov et al. | Apr 2013 | B2 |
8429656 | Duluk, Jr. et al. | Apr 2013 | B1 |
8443377 | Inoue et al. | May 2013 | B2 |
8447933 | Nishihara | May 2013 | B2 |
8484287 | Gavini et al. | Jul 2013 | B2 |
8516491 | Nitta et al. | Aug 2013 | B2 |
8528001 | Song et al. | Sep 2013 | B2 |
8533674 | Abrams et al. | Sep 2013 | B2 |
8539207 | LeGrand | Sep 2013 | B1 |
8544014 | Gopalan et al. | Sep 2013 | B2 |
8561183 | Muth et al. | Oct 2013 | B2 |
8566836 | Ramaraju et al. | Oct 2013 | B2 |
8572622 | Alexander et al. | Oct 2013 | B2 |
8595832 | Yee et al. | Nov 2013 | B1 |
8612330 | Certain et al. | Dec 2013 | B1 |
8626970 | Craddock et al. | Jan 2014 | B2 |
8635675 | Kruglick | Jan 2014 | B2 |
8645955 | Yim et al. | Feb 2014 | B2 |
8656077 | Miloushev et al. | Feb 2014 | B2 |
8683471 | Brent et al. | Mar 2014 | B2 |
8713572 | Chambliss et al. | Apr 2014 | B2 |
8713574 | Creamer et al. | Apr 2014 | B2 |
8719415 | Sirota et al. | May 2014 | B1 |
8738333 | Behera et al. | May 2014 | B1 |
8738860 | Griffin et al. | May 2014 | B1 |
8745241 | Waldspurger | Jun 2014 | B2 |
8762595 | Muller et al. | Jun 2014 | B1 |
8789065 | Sandstrom | Jul 2014 | B2 |
8793698 | Sandstrom | Jul 2014 | B1 |
8850437 | Shutkin et al. | Sep 2014 | B2 |
8850574 | Ansel et al. | Sep 2014 | B1 |
8881141 | Koch et al. | Nov 2014 | B2 |
8893016 | Diamond | Nov 2014 | B2 |
8910109 | Orthner | Dec 2014 | B1 |
8924481 | Conlon et al. | Dec 2014 | B2 |
8935491 | Sandstrom | Jan 2015 | B2 |
9038072 | Nollet et al. | May 2015 | B2 |
9047137 | Solihin | Jun 2015 | B2 |
9092339 | Giacomoni et al. | Jul 2015 | B1 |
9104453 | Anand et al. | Aug 2015 | B2 |
9141350 | Stravers et al. | Sep 2015 | B2 |
9154442 | Mital et al. | Oct 2015 | B2 |
9164953 | Lippett | Oct 2015 | B2 |
9183052 | Muthiah et al. | Nov 2015 | B2 |
9218195 | Anderson et al. | Dec 2015 | B2 |
9262360 | Wagh et al. | Feb 2016 | B2 |
9323794 | Indeck et al. | Apr 2016 | B2 |
9348724 | Ota et al. | May 2016 | B2 |
9390046 | Wagh | Jul 2016 | B2 |
9390130 | Kakarlamudi et al. | Jul 2016 | B2 |
9411636 | Ting et al. | Aug 2016 | B1 |
9424090 | Sandstrom | Aug 2016 | B2 |
9448847 | Sandstrom | Sep 2016 | B2 |
9465667 | Sandstrom | Oct 2016 | B1 |
9483291 | Chen et al. | Nov 2016 | B1 |
9503093 | Karras et al. | Nov 2016 | B2 |
9507632 | Hartog et al. | Nov 2016 | B2 |
9507640 | Capps, Jr. et al. | Nov 2016 | B2 |
9519518 | Kamath et al. | Dec 2016 | B2 |
9589088 | Mishra et al. | Mar 2017 | B1 |
9608933 | Emaru | Mar 2017 | B2 |
9632833 | Sandstrom | Apr 2017 | B2 |
9645854 | Sander et al. | May 2017 | B2 |
9690600 | Jung et al. | Jun 2017 | B2 |
9697161 | Mangano et al. | Jul 2017 | B2 |
9774520 | Kasturi et al. | Sep 2017 | B1 |
9841994 | Henrikkson | Dec 2017 | B2 |
9910708 | Williamson | Mar 2018 | B2 |
9985848 | Ward, Jr. | May 2018 | B1 |
10009441 | Xue et al. | Jun 2018 | B2 |
10013662 | Brandwine et al. | Jul 2018 | B2 |
10133599 | Sandstrom | Nov 2018 | B1 |
10133600 | Sandstrom | Nov 2018 | B2 |
10223317 | Atta | Mar 2019 | B2 |
10282330 | Khan | May 2019 | B2 |
10318353 | Sandstrom | Jun 2019 | B2 |
10360168 | Griffin | Jul 2019 | B1 |
10430242 | Sandstrom | Oct 2019 | B2 |
10452997 | Yang et al. | Oct 2019 | B2 |
10515046 | Fleming et al. | Dec 2019 | B2 |
10521357 | Ramey | Dec 2019 | B1 |
10606750 | Mattina | Mar 2020 | B1 |
10650452 | Parsons et al. | May 2020 | B2 |
10789099 | Sandstrom | Sep 2020 | B1 |
10942778 | Sandstrom | Mar 2021 | B2 |
10963306 | Sandstrom | Mar 2021 | B2 |
11036556 | Sandstrom | Jun 2021 | B1 |
11150948 | Sandstrom | Oct 2021 | B1 |
11182320 | Khan et al. | Nov 2021 | B2 |
11188388 | Sandstrom | Nov 2021 | B2 |
11347556 | Sandstrom | May 2022 | B2 |
11385934 | Sandstrom | Jul 2022 | B2 |
11500682 | Sandstrom | Nov 2022 | B1 |
11550500 | Gao et al. | Jan 2023 | B2 |
11687374 | Sandstrom | Jun 2023 | B2 |
11915055 | Sandstrom | Feb 2024 | B2 |
20020040400 | Masters | Apr 2002 | A1 |
20020056033 | Huppenthal | May 2002 | A1 |
20020107962 | Richter et al. | Aug 2002 | A1 |
20020112091 | Schott et al. | Aug 2002 | A1 |
20020124012 | Liem et al. | Sep 2002 | A1 |
20020129080 | Hentschel et al. | Sep 2002 | A1 |
20020141343 | Bays | Oct 2002 | A1 |
20020143843 | Mehta | Oct 2002 | A1 |
20020152305 | Jackson et al. | Oct 2002 | A1 |
20020169828 | Blanchard | Nov 2002 | A1 |
20030018807 | Larsson et al. | Jan 2003 | A1 |
20030200408 | Mekhiel | Oct 2003 | A1 |
20030235200 | Kendall et al. | Dec 2003 | A1 |
20040088488 | Ober et al. | May 2004 | A1 |
20040098718 | Yoshii et al. | May 2004 | A1 |
20040111724 | Libby | Jun 2004 | A1 |
20040128401 | Fallon et al. | Jul 2004 | A1 |
20040158637 | Lee | Aug 2004 | A1 |
20040168170 | Miller | Aug 2004 | A1 |
20040177245 | Murphy | Sep 2004 | A1 |
20040193806 | Koga et al. | Sep 2004 | A1 |
20040210900 | Jones et al. | Oct 2004 | A1 |
20040215987 | Farkas et al. | Oct 2004 | A1 |
20050010502 | Birkestrand et al. | Jan 2005 | A1 |
20050013705 | Farkas et al. | Jan 2005 | A1 |
20050021931 | Anderson et al. | Jan 2005 | A1 |
20050036515 | Cheung et al. | Feb 2005 | A1 |
20050044344 | Stevens | Feb 2005 | A1 |
20050055694 | Lee | Mar 2005 | A1 |
20050080999 | Angsmark et al. | Apr 2005 | A1 |
20050081202 | Brokenshire et al. | Apr 2005 | A1 |
20050182838 | Sheets et al. | Aug 2005 | A1 |
20050188372 | Inoue et al. | Aug 2005 | A1 |
20050193081 | Gruber et al. | Sep 2005 | A1 |
20050193186 | Gazsi et al. | Sep 2005 | A1 |
20050198476 | Gazsi et al. | Sep 2005 | A1 |
20050235070 | Young et al. | Oct 2005 | A1 |
20050257030 | Langhammer | Nov 2005 | A1 |
20050268298 | Hunt et al. | Dec 2005 | A1 |
20050278551 | Goodnow et al. | Dec 2005 | A1 |
20060036774 | Schott et al. | Feb 2006 | A1 |
20060059485 | Onufryk et al. | Mar 2006 | A1 |
20060061794 | Ito et al. | Mar 2006 | A1 |
20060070078 | Dweck et al. | Mar 2006 | A1 |
20060075265 | Hamaoka et al. | Apr 2006 | A1 |
20060085554 | Shah et al. | Apr 2006 | A1 |
20060136606 | Guzy et al. | Jun 2006 | A1 |
20060179194 | Jensen | Aug 2006 | A1 |
20060195847 | Amano et al. | Aug 2006 | A1 |
20060212870 | Arndt et al. | Sep 2006 | A1 |
20060218376 | Pechanek | Sep 2006 | A1 |
20070074011 | Borkar et al. | Mar 2007 | A1 |
20070153802 | Anke et al. | Jul 2007 | A1 |
20070198981 | Jacobs et al. | Aug 2007 | A1 |
20070220517 | Lippett | Sep 2007 | A1 |
20070226482 | Borkar et al. | Sep 2007 | A1 |
20070283311 | Karoubalis et al. | Dec 2007 | A1 |
20070283349 | Creamer et al. | Dec 2007 | A1 |
20070283358 | Kasahara et al. | Dec 2007 | A1 |
20070291576 | Yang | Dec 2007 | A1 |
20080046997 | Wang | Feb 2008 | A1 |
20080077927 | Armstrong et al. | Mar 2008 | A1 |
20080086395 | Brenner et al. | Apr 2008 | A1 |
20080134191 | Warrier et al. | Jun 2008 | A1 |
20080164907 | Mercaldi-Kim et al. | Jul 2008 | A1 |
20080189703 | Im et al. | Aug 2008 | A1 |
20080201716 | Du et al. | Aug 2008 | A1 |
20080222640 | Daly et al. | Sep 2008 | A1 |
20080244588 | Leiserson et al. | Oct 2008 | A1 |
20080256339 | Xu et al. | Oct 2008 | A1 |
20080270752 | Rhine | Oct 2008 | A1 |
20080276322 | Sueyoshi | Nov 2008 | A1 |
20080285581 | Maiorana et al. | Nov 2008 | A1 |
20080288747 | Inglett et al. | Nov 2008 | A1 |
20090025004 | Barnard et al. | Jan 2009 | A1 |
20090037554 | Herington | Feb 2009 | A1 |
20090049443 | Powers et al. | Feb 2009 | A1 |
20090070762 | Franaszek et al. | Mar 2009 | A1 |
20090178047 | Astley et al. | Jul 2009 | A1 |
20090187756 | Nollet et al. | Jul 2009 | A1 |
20090198866 | Chen et al. | Aug 2009 | A1 |
20090265712 | Herington | Oct 2009 | A1 |
20090278564 | Dehon et al. | Nov 2009 | A1 |
20090282477 | Chen et al. | Nov 2009 | A1 |
20090320031 | Song | Dec 2009 | A1 |
20090327446 | Wittenschlaeger | Dec 2009 | A1 |
20100011116 | Thornton et al. | Jan 2010 | A1 |
20100043008 | Marchand | Feb 2010 | A1 |
20100046546 | Ram et al. | Feb 2010 | A1 |
20100049963 | Bell, Jr. et al. | Feb 2010 | A1 |
20100058346 | Narang et al. | Mar 2010 | A1 |
20100100883 | Booton | Apr 2010 | A1 |
20100131955 | Brent et al. | May 2010 | A1 |
20100138913 | Saroj et al. | Jun 2010 | A1 |
20100153700 | Capps, Jr. et al. | Jun 2010 | A1 |
20100153955 | Sirota et al. | Jun 2010 | A1 |
20100153956 | Capps, Jr. et al. | Jun 2010 | A1 |
20100161573 | Chan et al. | Jun 2010 | A1 |
20100162230 | Chen et al. | Jun 2010 | A1 |
20100192155 | Nam et al. | Jul 2010 | A1 |
20100205602 | Zedlewski et al. | Aug 2010 | A1 |
20100228951 | Liu | Sep 2010 | A1 |
20100232396 | Jing et al. | Sep 2010 | A1 |
20100268889 | Conte et al. | Oct 2010 | A1 |
20100287320 | Querol et al. | Nov 2010 | A1 |
20100333099 | Kupferschmidt et al. | Dec 2010 | A1 |
20110014893 | Davis et al. | Jan 2011 | A1 |
20110035749 | Krishnakumar et al. | Feb 2011 | A1 |
20110047546 | Kivity et al. | Feb 2011 | A1 |
20110050713 | McCrary et al. | Mar 2011 | A1 |
20110055480 | Guyetant et al. | Mar 2011 | A1 |
20110078411 | Maclinovsky et al. | Mar 2011 | A1 |
20110083125 | Komatsu et al. | Apr 2011 | A1 |
20110096667 | Arita et al. | Apr 2011 | A1 |
20110119674 | Nishikawa | May 2011 | A1 |
20110125960 | Casselman | May 2011 | A1 |
20110154348 | Elnozahy et al. | Jun 2011 | A1 |
20110161969 | Arndt et al. | Jun 2011 | A1 |
20110161976 | Alexander et al. | Jun 2011 | A1 |
20110173432 | Cher et al. | Jul 2011 | A1 |
20110197048 | Chung et al. | Aug 2011 | A1 |
20110238792 | Phillips et al. | Sep 2011 | A1 |
20110247012 | Uehara | Oct 2011 | A1 |
20110249678 | Bonicatto et al. | Oct 2011 | A1 |
20110258317 | Sinha et al. | Oct 2011 | A1 |
20110296138 | Carter et al. | Dec 2011 | A1 |
20110307661 | Smith et al. | Dec 2011 | A1 |
20110321057 | Mejdrich et al. | Dec 2011 | A1 |
20120005473 | Hofstee et al. | Jan 2012 | A1 |
20120017218 | Branson et al. | Jan 2012 | A1 |
20120022832 | Shannon et al. | Jan 2012 | A1 |
20120079501 | Sandstrom | Mar 2012 | A1 |
20120089985 | Adar et al. | Apr 2012 | A1 |
20120173734 | Kimbrel et al. | Jul 2012 | A1 |
20120216012 | Vorbach et al. | Aug 2012 | A1 |
20120221886 | Barsness et al. | Aug 2012 | A1 |
20120222038 | Katragadda et al. | Aug 2012 | A1 |
20120222042 | Chess et al. | Aug 2012 | A1 |
20120246450 | Abdallah | Sep 2012 | A1 |
20120266176 | Vojnovic et al. | Oct 2012 | A1 |
20120284492 | Zievers | Nov 2012 | A1 |
20120303809 | Patel et al. | Nov 2012 | A1 |
20120324458 | Peterson et al. | Dec 2012 | A1 |
20130013903 | Bell, Jr. et al. | Jan 2013 | A1 |
20130179895 | Calder et al. | Jul 2013 | A1 |
20130182555 | Raaf et al. | Jul 2013 | A1 |
20130222402 | Peterson et al. | Aug 2013 | A1 |
20130285739 | Blaquiere et al. | Oct 2013 | A1 |
20130312002 | Yamauchi et al. | Nov 2013 | A1 |
20130325998 | Hormuth et al. | Dec 2013 | A1 |
20130339977 | Dennis et al. | Dec 2013 | A1 |
20140089635 | Shifer et al. | Mar 2014 | A1 |
20140092728 | Alvarez-Icaza Rivera et al. | Apr 2014 | A1 |
20140123135 | Huang et al. | May 2014 | A1 |
20140149993 | Sandstrom | May 2014 | A1 |
20140181501 | Hicok et al. | Jun 2014 | A1 |
20140317378 | Lippett | Oct 2014 | A1 |
20140331236 | Mitra et al. | Nov 2014 | A1 |
20140372167 | Hillier | Dec 2014 | A1 |
20140380025 | Kruglick | Dec 2014 | A1 |
20150100772 | Jung et al. | Apr 2015 | A1 |
20150178116 | Jorgensen et al. | Jun 2015 | A1 |
20150277920 | Bradbury et al. | Oct 2015 | A1 |
20150339798 | Peterson et al. | Nov 2015 | A1 |
20150378776 | Lippett | Dec 2015 | A1 |
20160034295 | Cochran | Feb 2016 | A1 |
20160048394 | Vorbach et al. | Feb 2016 | A1 |
20160080201 | Huang et al. | Mar 2016 | A1 |
20160087849 | Balasubramian et al. | Mar 2016 | A1 |
20160328222 | Arumugam et al. | Nov 2016 | A1 |
20160378538 | Kang | Dec 2016 | A1 |
20170024573 | Bhattacharyya et al. | Jan 2017 | A1 |
20170097838 | Nagapudi et al. | Apr 2017 | A1 |
20170310794 | Smith et al. | Oct 2017 | A1 |
20180089119 | Khan et al. | Mar 2018 | A1 |
20180097709 | Box et al. | Apr 2018 | A1 |
20190361745 | Sandstrom | Nov 2019 | A1 |
20200192454 | De Rochemont | Jun 2020 | A1 |
20200285517 | Sandstrom | Sep 2020 | A1 |
20210055965 | Sandstrom | Feb 2021 | A1 |
20210191781 | Sandstrom | Jun 2021 | A1 |
20210303354 | Sandstrom | Sep 2021 | A1 |
20210303361 | Sandstrom | Sep 2021 | A1 |
20210397479 | Sandstrom | Dec 2021 | A1 |
20210397484 | Sandstrom | Dec 2021 | A1 |
20210406083 | Sandstrom | Dec 2021 | A1 |
20220276903 | Sandstrom | Sep 2022 | A1 |
20220342715 | Sandstrom | Oct 2022 | A1 |
20230046107 | Sandstrom | Feb 2023 | A1 |
20230053365 | Sandstrom | Feb 2023 | A1 |
Number | Date | Country |
---|---|---|
3340123 | May 1985 | DE |
255857 | Feb 1988 | EP |
889622 | Jul 1999 | EP |
1372084 | Dec 2003 | EP |
2309388 | Apr 2011 | EP |
2328077 | Jun 2011 | EP |
2704022 | Mar 2014 | EP |
1236177 | Jun 1971 | GB |
2145255 | Mar 1985 | GB |
2272311 | May 1994 | GB |
05197619 | Aug 1993 | JP |
06004314 | Jan 1994 | JP |
11353291 | Dec 1999 | JP |
2014-230174 | Dec 2014 | JP |
1327106 | Jul 1987 | SU |
2000070426 | Nov 2000 | WO |
2001061525 | Aug 2001 | WO |
0209285 | Jan 2002 | WO |
2008061162 | May 2008 | WO |
2008112779 | Sep 2008 | WO |
2010037177 | Apr 2010 | WO |
2011090776 | Jul 2011 | WO |
2011123467 | Oct 2011 | WO |
2012040691 | Mar 2012 | WO |
Entry |
---|
Wolf et al., “Runtime support for multicore packet processing systems,” IEEE Network. Jul. 23, 2007;21(4): 29-37. (previously submitted in related U.S. Appl. No. 17/979,526). |
Qiang, W. and Wolf, T., “Dynamic Workload Profiling and Task Allocation in Packet Processing Systems”, 2008 International Conference on High Performance Switching and Routing, IEEE, May 15, 2008, pp. 123-130. (previously submitted in related U.S. Appl. No. 17/979,526). |
Ye, X. et al., “MAPS: Multi-Algorithm Parallel Circuit Simulation”, 2008 IEEE/ACM International Conference on Computer-Aided Design, IEEE, Nov. 10, 2008, pp. 73-78. (previously submitted in related U.S. Appl. No. 17/979,526). |
Notice of Allowance issued in U.S. Appl. No. 18/116,389 dated Jun. 7, 2022. (previously submitted in related U.S. Appl. No. 18/116,389). |
Notice of Allowance issued in U.S. Appl. No. 17/979,526 dated Mar. 29, 2023. (previously submitted in related U.S. Appl. No. 17/979,526). |
Non-Final Rejection issued in related U.S. Appl. No. 18/116,389 dated Jan. 12, 2022. (previously submitted in related U.S. Appl. No. 18/116,389). |
Wu, “Dynamic Resource Management For High-Performance Many-Core Packet Processing Systems,” submitted to the Graduate School of the University of Massachusetts Amherst, Feb. 2011. (previously submitted in related U.S. Appl. No. 18/116,389). |
Banerjee et al., “Multi-Stage Parallel Processing of Design Element Access Tasks in FPGA-based Logic Emulation Systems,” 2011 3rd Asia Symposium on Quality Electronic Design (ASQED), Kuala Lumpur, Malaysia, 2011, pp. 301-309, doi: 10.1109/ASQED.2011.6111765. (previously submitted in related U.S. Appl. No. 18/116,389). |
Non-Final Office Action issued in U.S. Appl. No. 18/116,389 dated Mar. 14, 2023. (previously submitted in related U.S. Appl. No. 18/116,389). |
Notice of Allowance mailed in U.S. Appl. No. 18/116,389 on Jul. 6, 2023. (previously submitted in related U.S. Appl. No. 18/116,389). |
Zynq-7000 Extensible Processing Platform: Technical Reference Manual, Xilinx, May 8, 2012. (previously submitted in related U.S. Appl. No. 18/116,389). |
Ostler, Patrick Sutton, FPGA Bootstrapping Using Partial Reconfiguration, Theses and Dissertations, BYU Scholars Archive, Sep. 28, 2011. (previously submitted in related U.S. Appl. No. 18/116,389). |
Lockwood, John W., et al., A Low-Latency Library in FPGA Hardware for High-Frequency, 2012 IEEE 20th Annual Symposium on High Performance Interconnects, 2012. (previously submitted in related U.S. Appl. No. 18/116,389). |
Zeineddini, Amir & Wesselkamper, Jim, PRC/ERPC: Data Integrity and Security Controller for Partial Reconfiguration, Xilinx, Jun. 7, 2012. (previously submitted in related U.S. Appl. No. 18/116,389). |
Blott, Michaela, et al., FPGA Research Design Platform Fuels Network Advances, Xcell Journal, Fourth Quarter 2010, pp. 24-29. (previously submitted in related U.S. Appl. No. 18/116,389). |
Morris, Kevin, Xilinx Spartan-6 FPGAs Enable PCI Express Complaint System Design for Low-Power, Low-Cost Connectivity Applications, Electronic Engineering Journal, 2009 <https://www.eejournal.com/article/20091005_03/>. (previously submitted in related U.S. Appl. No. 18/116,389). |
Xilinx, Virtex-4 Family Overview, DS112 (ve.1), Aug. 30, 2010. (previously submitted in related U.S. Appl. No. 18/116,389). |
Sporull, Todd, et.al., Control and Configuration Software for a Reconfigurable Networking Hardware Platform, 2002 10th Annual Symposium on Field-Programmable Custom Computing Machines, 2002. (previously submitted in related U.S. Appl. No. 18/116,389). |
Moscola, James, et al., Implementation of a Content-Scanning Module for an Internet Firewall, 2003 11th Annual Symposium on Field-Programmable Custom Computing Machines, 2003. (previously submitted in related U.S. Appl. No. 18/116,389). |
Lockwood, John W., et al., An Extensible, System-On-ProgrammableChip, Content-Aware Internet Firewall, 2003. (previously submitted in related U.S. Appl. No. 18/116,389). |
Lockwood, John W., et al., Reprogrammable Network Packet Processing on the Field Programmable Port Extender (Fpx), Fgpa 2001, Feb. 11-12, 2001. (previously submitted in related U.S. Appl. No. 18/116,389). |
Lockwood, John W., Network Packet Processing in Reconfigurable Hardware, Reconfigurable Computing: the Theory and Practice of Fpga-Based Computation (Scott Hauck & Andre DeHon eds., 2008). (previously submitted in related U.S. Appl. No. 18/116,389). |
Attig, Michael & Lockwood, John, SIFT: Snort Instrusion Filter for Tcp, Iicc Symposium on High Performance Interconnects (Hot Interconnects-13), Aug. 17-19, 2005. (previously submitted in related U.S. Appl. No. 18/116,389). |
Xilinx, Virtex-Il Pro and Virtex-Il Pro X Platform FPGAs: Complete Data Sheet, SD (v5.0) Jun. 21, 2011. (previously submitted in related U.S. Appl. No. 18/116,389). |
Xilinx, Virtex-Il Platform FPGA User Guide, UG002 (v2.0) Mar. 23, 2005. (previously submitted in related U.S. Appl. No. 18/116,389). |
Xilinx, Virtex E1.8 V Field Programmable Gate Arrays, DS022-1 (v2.3) Jul. 17, 2002. (previously submitted in related U.S. Appl. No. 18/116,389). |
Xilinx, 7 Series FPGAs Configuration: User Guide, UG470 (v1.7) Oct. 22, 2013. (previously submitted in related U.S. Appl. No. 18/116,389). |
Nielson, Matt, Using a Microprocessor to Configure 7 Series FPGAs via Slave Serial or Slave SelectMap Mode, Xilinx, XAPP583 (v.10), May 31, 2012. (previously submitted in related U.S. Appl. No. 18/116,389). |
Lin, Mingjie, The Amorphous FPGA Architecture, FGPA '08, Feb. 24-26, 2008. (previously submitted in related U.S. Appl. No. 18/116,389). |
Final Office Action mailed in U.S. Appl. No. 18/116,389 on Aug. 2, 2023. (previously submitted in related U.S. Appl. No. 18/116,389). |
Non-Final Office Action mailed in U.S. Appl. No. 18/116,389 on Apr. 4, 2023. (previously submitted in related U.S. Appl. No. 18/116,389). |
Non-Final Office Action issued in U.S. Appl. No. 18/116,389 dated Mar. 20, 2023. (previously submitted in related U.S. Appl. No. 18/116,389). |
Papadimitriou et al., “Performance Of Partial Reconfiguration In FPGA Systems: A Survey And A Cost Model”, ACM Transactions on Reconfigurable Technology and Systems, vol. 4, Issue 4, Dec. 28, 2011, pp. 1-24. (previously submitted in related U.S. Appl. No. 17/979,542). |
Russinovich, M. and Solomon, D., “Microsoft Windows Internals, Fourth Edition: Microsoft Windows Server 2003, Windows XP, and Windows 2000”, Microsoft Press, Dec. 1, 2004. (previously submitted in related U.S. Appl. No. 17/979,542). |
Tuan, Vu Manh, “A Study On A Multitasking Environment For Dynamically Reconfigurable Processors”, School of Science for Open and Evironmental Systems, Graduate School of Science and Technology, Keio University, 2009. (previously submitted in related U.S. Appl. No. 17/979,542). |
Tullsen, Dean Michael, “Simultaneous Multithreading”, University of Washington, 1996. (previously submitted in related U.S. Appl. No. 17/979,542). |
Wang, Zheng, “Internet QoS Architectures and Mechanisms for Quality of Service”, Morgan Kaufmann, 2001. (Partial reference submitted, Chapter 2, pp. 60-64) (previously submitted in related U.S. Appl. No. 17/979,542). |
Yadev, et al., “Scheduling Algorithm: Tasks Scheduling Algorithm For Multiple Processors With Dynamic Reassignment”, Journal of Computer Systems, Networks, and Communications, Jan. 1, 2008. (previously submitted in related U.S. Appl. No. 17/979,542). |
Non-Final Rejection issued in related U.S. Appl. No. 18/116,389 dated Jun. 11, 2023, 6 pages. |
Notice of Allowance issued in related U.S. Appl. No. 18/116,389 dated Aug. 23, 2023, 11 pages. |
Supplemental Notice of Allowability issued in U.S. Appl. No. 18/116,389 dated Jan. 24, 2024, 3 pages. |
Notice of Allowance issued in U.S. Appl. No. 18/116,389 dated Jun. 5, 2019. (previously submitted in related U.S. Appl. No. 18/116,389). |
Supplemental Notice of Allowability issued in U.S. Appl. No. 18/116,389 dated Aug. 21, 2019. (previously submitted in related U.S. Appl. No. 18/116,389). |
Blodget, B., Bobda, C., Huebner, M., Niyonkuru, A. (2004). Partial and Dynamically Reconfiguration of Xilinx Virtex-Il FPGAs. In: Becker, J., Platzner, M., Vernalde, S. (eds) Field Programmable Logic and Application. FPL 2004. Lecture Notes in Computer Science, vol. 3203. Springer, Berlin, Heidelberg. (Year: 2004). (previously submitted in related U.S. Appl. No. 18/116,389). |
Opencores, PCI Bridge, 2001 (Year: 2001). (previously submitted in related U.S. Appl. No. 18/116,389). |
Non-Final Rejection issued in related U.S. Appl. No. 18/116,389 dated Jul. 15, 2021. (previously submitted in related U.S. Appl. No. 18/116,389). |
Notice of Allowance issued in U.S. Appl. No. 17/195,174 dated May 14, 2021. (previously submitted in related U.S. Appl. No. 18/116,389). |
Corrected Notice of Allowability issued in U.S. Appl. No. 18/116,389 dated Sep. 16, 2021. (previously submitted in related U.S. Appl. No. 18/116,389). |
Final Office Action issued in U.S. Appl. No. 18/116,389 dated Dec. 2, 2019. (previously submitted in related U.S. Appl. No. 18/116,389). |
Non-Final Office Action issued in U.S. Appl. No. 18/116,389 dated Mar. 27, 2019. (previously submitted in related U.S. Appl. No. 18/116,389). |
Non-Final Rejection issued in related U.S. Appl. No. 18/116,389 dated Mar. 7, 2014. (previously submitted in related U.S. Appl. No. 18/116,389). |
Non-Final Rejection issued in related U.S. Appl. No. 18/116,389 dated Aug. 6, 2018. (previously submitted in related U.S. Appl. No. 18/116,389). |
Notice of Allowance issued in U.S. Appl. No. 18/116,389 dated Mar. 3, 2020. (previously submitted in related U.S. Appl. No. 18/116,389). |
Notice of Allowance issued in U.S. Appl. No. 18/116,389 dated Dec. 11, 2019. (previously submitted in related U.S. Appl. No. 18/116,389). |
Notice of Allowance issued in related U.S. Appl. No. 18/116,389 dated Aug. 9, 2019. (previously submitted in related U.S. Appl. No. 18/116,389). |
Supplemental Notice of Allowability issued in U.S. Appl. No. 18/116,389 dated Jan. 30, 2019. (previously submitted in related U.S. Appl. No. 18/116,389). |
Notice of Allowance issued in U.S. Appl. No. 18/116,389 dated Jul. 8, 2019. (previously submitted in related U.S. Appl. No. 18/116,389). |
Final Rejection issued in related U.S. Appl. No. 18/116,389 dated Nov. 14, 2018. (previously submitted in related U.S. Appl. No. 18/116,389). |
Notice of Allowance issued in U.S. Appl. No. 18/116,389 dated Aug. 13, 2019. (previously submitted in related U.S. Appl. No. 18/116,389). |
Notice of Allowance issued in related U.S. Appl. No. 18/116,389 dated Aug. 29, 2018. (previously submitted in related U.S. Appl. No. 18/116,389). |
Notice of Allowance issued in related U.S. Appl. No. 18/116,389 dated Aug. 27, 2018. (previously submitted in related U.S. Appl. No. 18/116,389). |
Notice of Allowance issued in related U.S. Appl. No. 18/116,389 dated Dec. 13, 2018. (previously submitted in related U.S. Appl. No. 18/116,389). |
Notice of Allowance issued in related U.S. Appl. No. 18/116,389 dated Feb. 6, 2019. (previously submitted in related U.S. Appl. No. 18/116,389). |
Non-Final Office Action issued in U.S. Appl. No. 18/116,389 dated Nov. 18, 2021. (previously submitted in related U.S. Appl. No. 18/116,389). |
Notice of Allowance issued in U.S. Appl. No. 18/116,389 dated Jul. 23, 2020. (previously submitted in related U.S. Appl. No. 18/116,389). |
Notice of Allowance issued in U.S. Appl. No. 18/116,389 dated Jan. 19, 2021. (previously submitted in related U.S. Appl. No. 18/116,389). |
Supplemental Notice of Allowability issued in U.S. Appl. No. 18/116,389 dated Feb. 24, 2021. (previously submitted in related U.S. Appl. No. 18/116,389). |
Casavant et al., “A Taxonomy of Scheduling in General-Purpose Distributed Computing Systems,” IEEE Transactions on Software Engineering, Feb. 1988, 14(2): 141-154. (previously submitted in related U.S. Appl. No. 18/116,389). |
Diessel et al., “Dynamic Scheduling of Tasks on Partially Reconfigurable FPGAs,” Aug. 2, 1999, 21 pages. (previously submitted in related U.S. Appl. No. 18/116,389). |
Elgindy et al., “Task Rearrangement on Partially Reconfigurable FPGAs with Restricted Buffer,” 2000, 10 pages. (previously submitted in related U.S. Appl. No. 18/116,389). |
Fangpeng et al., “Scheduling Algorithms for Grid Computing: State of the Art and Open Problems,” School of Computing, Queen's University, Kingston, Ontario, Jan. 2006, pp. 1-55. (previously submitted in related U.S. Appl. No. 18/116,389). |
Chen et al., “Configuration-Sensitive Process Scheduling for FPGA-Based Computing Platforms,” 2000, 6 pages. (previously submitted in related U.S. Appl. No. 18/116,389). |
Hauck, “Configuration Prefetch for Single Context Reconfigurable Coprocessors,” ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, 1998, 10 pages. (previously submitted in related U.S. Appl. No. 18/116,389). |
Kazuya et al., “Proposal of Adapted Load Balancing Model for Dynamically Reconfigurable Devices,” Graduate School of Computer Science and Engineering, The University of Aizu, 2006, Japan, 1 page. (previously submitted in related U.S. Appl. No. 18/116,389). |
Rutten et al., “Eclipse: A Heterogeneous Multiprocessor Architecture for Flexible Media Processing,” IEEE Design & Test of Computers, 2002, 17 pages. (previously submitted in related U.S. Appl. No. 18/116,389). |
Schmit et al., “Pipeline Reconfigurable FPGAs,” Journal of VLSI Signal Processing Systems, 2000, 24:129-146. (previously submitted in related U.S. Appl. No. 18/116,389). |
Teich et al., “Compile-Time Optimization of Dynamic Hardware Reconfigurations,” 1999, Germany, 7 pages. (previously submitted in related U.S. Appl. No. 18/116,389). |
Teich et al., “Optimization of Dynamic Hardware Reconfigurations,” The Journal of Supercomputing, 2001. (previously submitted in related U.S. Appl. No. 18/116,389). |
Walder et al., “Online Scheduling for Block-partitioned Reconfigurable Devices,” Design, Automation and Test in Europe Conference and Exhibition, IEEE, 2003, 6 pages. (previously submitted in related U.S. Appl. No. 18/116,389). |
Wigley et al., “The Development of an Operating System for Reconfigurable Computing,” 2001, Australia, 11 pages. (previously submitted in related U.S. Appl. No. 18/116,389). |
Taylor, D. et al., “Dynamic Hardware Plugins: Exlpoiting Reconfigurable Hardware For High Performance Programmable Routers” Computer Networks 38, 3 (Feb. 2002) pp. 295-310. (previously submitted in related U.S. Appl. No. 17/979,526). |
Anwer, M. and Feamster, N., “Building A Fast, Virtualized Data Plane With Programmable Hardware”, Proceedings of the 1st ACM Workshop on Virtualized Infrastructure Systems and Architectures, Aug. 17, 2009, pp. 1-8. (previously submitted in related U.S. Appl. No. 17/979,526). |
Balkan, A. et al., “Mesh-of-Trees and Alternative Interconnection Networks for Single-Chip Parallelism”, IEEE Transactions On Very Large Scale Integration (VLSI) Systems, vol. 17, No. 10, Oct. 10, 2009, pp. 1419-1432. (previously submitted in related U.S. Appl. No. 17/979,526). |
Bell, S. et al., “TILE64 Processor: A 64-Core SoC with Mesh Interconnect”, 2008 IEEE International Solid-State Circuits Conference-Digest of Technical Papers, Feb. 3, 2008, pp. 88-598. (previously submitted in related U.S. Appl. No. 17/979,526). |
Dally, W. and Towles, B., “Route Packets, Not Wires: On-Chip Interconnection Networks”, Proceedings of the 38th Annual Design Automation Conference, Jun. 22, 2001, pp. 684-689. (previously submitted in related U.S. Appl. No. 17/979,526). |
Lockwood, J. et al., “NetFPGA - An Open Platform for Gigabit-rate Network Switching and Routing”, 2007 IEEE International Conference on Microelectronic Systems Education, Jun. 3, 2007, pp. 160-161. (previously submitted in related U.S. Appl. No. 17/979,526). |
Mallik, A. et al., “Automated Task Distribution in Multicore Network Processors using Statistical Analysis”, Proceedings of the 3rd ACM/IEEE Symposium on Architecture for Networking and Communications Systems, Dec. 3, 2007, pp. 67-76. (previously submitted in related U.S. Appl. No. 17/979,526). |
Ouaiss, I. and Vemuri, R., “Hierarchical Memory Mapping During Synthesis in FPGA-Based Reconfigurable Computers”, Proceedings Design, Automation and Test in Europe. Conference and Exhibition, Mar. 13, 2001, pp. 650-657. (previously submitted in related U.S. Appl. No. 17/979,526). |
Plishker, W. et al., “Automated Task Allocation for Network Processors”, Network System Design Conference Proceedings, Oct. 2004, pp. 235-245. (previously submitted in related U.S. Appl. No. 17/979,526). |
Al-Fares et al., “Hedera: Dynamic Flow Scheduling for Data Center Networks”, Nsdi, vol. 10, No. 8, Apr. 28, 2010. (previously submitted in related U.S. Appl. No. 17/195,174). |
Binotto et al., “Dynamic Self-Rescheduling of Tasks over a Heterogeneous Platform,” 2008 International Conference on Reconfigurable Computing and FPGAs, 2008, pp. 253-258. (previously submitted in related U.S. Appl. No. 17/195,174). |
Clemente et al., “A Task-Graph Execution Manager for Reconfigurable Multi-tasking Systems,” pp. 73-83, 2010, Microprocessors and Microsystems, vol. 34, Issues 2-4. (previously submitted in related U.S. Appl. No. 17/195,174). |
Ebrahimi et al., “Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multi- Core Memory Systems”, Acm Sigplan Notices, vol. 45, No. 3, Mar. 2010, pp. 335-346. (previously submitted in related U.S. Appl. No. 17/195,174). |
George et al., “Novo-G: At the Forefront of Scalable Reconfigurable Supercomputing”, Computing in Science Engineering, vol. 13, Issue 1, Dec. 30, 2010, pp. 82-86. (previously submitted in related U.S. Appl. No. 17/195,174). |
Gohringer et al., “CAP-OS: Operating system for runtime scheduling, task mapping and resource management on reconfigurable multiprocessor architectures,” 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010, pp. 1-8, doi: 10.1109/IPDPSW.2010.5470732. (previously submitted in related U.S. Appl. No. 17/195,174). |
Gohringer et al., “Operating System for Runtime Reconfigurable Multiprocessor Systems,” International Journal of Reconfigurable Computing, Feb. 14, 2011, pp. 1-17, vol. 2011, Hindawi Publishing Corporation. (previously submitted in related U.S. Appl. No. 17/195,174). |
Jacobs et al., “Reconfigurable Fault Tolerance: A Comprehensive Framework for Reliable and Adaptive FPGA- Based Space Computing,” ACM Trans. Reconfigurable Technol. Syst. 5, 4, Article 21 (Dec. 2012), 30 pages. (previously submitted in related U.S. Appl. No. 17/195,174). |
Joselli et al., “An architecture with automatic load balancing for real-time simulation and visualization systems,” Journal of Computational Interdisciplinary Sciences, 2010, 1(3): 207-224. (previously submitted in related U.S. Appl. No. 17/195,174). |
May et al., “Queueing Theory Modeling of a CPU-GPU System,” Northrup Grumman Corporation, Electronic Systems Sector, May 11, 2010, 2 pages. (previously submitted in related U.S. Appl. No. 17/195, 174). |
Notice of Allowance issued in U.S. Appl. No. 16/434,581 dated Oct. 27, 2020. (previously submitted in related U.S. Appl. No. 17/195,174). |
Odajima et al., “GPU/CPU Work Sharing with Parallel Language XcalableMP-dev for Parallelized Accelerated Computing,” 2012 41st International Conference on Parallel Processing Workshops, Pittsburgh, PA, 2012, pp. 97-106, doi: 10.1109/ICPPW.2012.16. (previously submitted in related U.S. Appl. No. 17/195,174). |
Ranjan et al., “Parallelizing a Face Detection and Tracking System for Multi-Core Processors,” Proceedings of the 2012 9th Conference on Computer and Robot Vision, CRV 2012 (2012), pp. 290-297, 10.1109/CRV.2012.45. (previously submitted in related U.S. Appl. No. 17/195,174). |
Roy et al., “Efficient Autoscaling in the Cloud using Predictive Models for Workload Forecasting”, 2011 IEEE 4th International Conference on Cloud Computing, Washington DC, Jul. 4-9, 2011, pp. 500-507. (previously submitted in related U.S. Appl. No. 17/195,174). |
Supplemental Notice of Allowability issued in U.S. Appl. No. 16/014,658 dated Sep. 18, 2018, 34 pages. (previously submitted in related U.S. Appl. No. 17/195,174). |
Supplemental Notice of Allowability issued in U.S. Appl. No. 17/195,174 dated Sep. 7, 2018, 26 pages. (previously submitted in related U.S. Appl. No. 17/195,174). |
Toss, Julio, “Work Stealing Inside GPUs,” Universidade Federal do Rio Grande do Sul. Instituto de Informática, 39 p. 2011, Curso de Ciência da Computação: Ênfase em Ciência da Computação: Bacharelado. (previously submitted in related U.S. Appl. No. 17/195,174). |
Wu et al., “Runtime Task Allocation in Multicore Packet Processing Systems,” IEEE Transactions on Parallel and Distributed Systems, vol. 23, No. 10, pp. 1934-1943, Oct. 2012, doi: 10.1109/TPDS.2012.56. (previously submitted in related U.S. Appl. No. 17/195,174). |
Ziermann et al., “Adaptive Traffic Scheduling Techniques for Mixed Real-Time and Streaming Applications on Reconfigurable Hardware,” 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010, pp. 1-4, doi: 10.1109/IPDPSW.2010.5470738. (previously submitted in related U.S. Appl. No. 17/195,174). |
Notice of Allowance issued in Application No. 17/195, 174 dated May 14, 2021. (previously submitted in related U.S. Appl. No. 17/470,926). |
Hutchings et al., “Implementation approaches for reconfigurable logic applications,” Field-Programmable Logic and Applications. Springer Berlin/Heidelberg, 1995. < http://citeseerx.ist.psu.edu/viewdoc/download? doi=10.1.1.17.3063&rep=rep1&type=pdf>. (previously submitted in related U.S. Appl. No. 17/470,926). |
“Introduction to Implementing Design Security with Microsemi SmartFusion2 and IGLOO2 FPGAs,” by Microsemi, Nov. 2013, 13 pages. (previously submitted in related U.S. Appl. No. 17/470,926). |
Shin et al., “AVANT-GUARD: Scalable and Vigilant Switch Flow Management in Software-Defined Networks,” 2013. (previously submitted in related U.S. Appl. No. 17/470,926). |
“Design of a Secure Plane Bridge,” Microsemi, 2013. (previously submitted in related U.S. Appl. No. 17/470,926). |
Unnikrishnan et al., “ReClick—A Modular Dataplane Design Framework for FPGA-Based Network Virtualization,” 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems, 2011, pp. 145-155, doi: 10.1109/ANCS.2011.31. (previously submitted in related U.S. Appl. No. 17/470,926). |
Notice of Allowance issued in U.S. Appl. No. 17/470,926 dated Oct. 14, 2021. (previously submitted in related U.S. Appl. No. 17/470,926). |
Supplemental Notice of Allowability issued in U.S. Appl. No. 17/470,926 dated Nov. 5, 2021. (previously submitted in related U.S. Appl. No. 17/470,926). |
Non-Final Office Action issued in U.S. Appl. No. 17/470,926 dated Nov. 26, 2021. (previously submitted in related U.S. Appl. No. 17/470,926). |
Non-Final Rejection issued in related U.S. Appl. No. 17/859,657 dated Aug. 16, 2022. (previously submitted in related U.S. Appl. No. 17/859,657). |
Notice of Allowance issued in U.S. Appl. No. 17/979,542 dated Sep. 21, 2022. (previously submitted in related U.S. Appl. No. 17/979,542). |
Supplemental Notice of Allowability issued in U.S. Appl. No. 17/979,542 dated Oct. 13, 2022. (previously submitted in related U.S. Appl. No. 17/979,542). |
Decision Granting Institution of Inter Partes Review in IPR2022-00527 dated Sep. 19, 2022. (previously submitted in related U.S. Appl. No. 17/979,542). |
Decision Granting Institution of Inter Partes Review in IPR2022-00528 dated Sep. 19, 2022. (previously submitted in related U.S. Appl. No. 17/979,542). |
Decision Granting Institution of Inter Partes Review in IPR2022-00574 dated Sep. 19, 2022. (previously submitted in related U.S. Appl. No. 17/979,542). |
Decision Denying Institution of Inter Partes Review in IPR2022-00757 dated Nov. 1, 2022. (previously submitted in related U.S. Appl. No. 17/979,542). |
Decision Denying Institution of Inter Partes Review in IPR2022-00758 dated Oct. 11, 2022. (previously submitted in related U.S. Appl. No. 17/979,542). |
Agrawal et al.“Adaptive scheduling with parallelism feedback.” Proceedings of the eleventh Acm Sigplan symposium on Principles and practice of parallel programming. Mar. 29, 2006. pp. 100-109. (previously submitted in related U.S. Appl. No. 17/979,542). |
Aron et al.“Cluster reserves: A mechanism for resource management in cluster-based network servers.” Proceedings of the 2000 Acm Sigmetrics international conference on Measurement and modeling of computer systems. Jun. 1, 2000. pp. 90-101. (previously submitted in related U.S. Appl. No. 17/979,542). |
Blelloch et al.“Provably efficient scheduling for languages with fine-grained parallelism.” Journal of the ACM (JACM) vol. 46, Issue 2, Mar. 1, 1999, pp. 281-321. (previously submitted in related U.S. Appl. No. 17/979,542). |
Chen et al.“Configuration-sensitive process scheduling for FPGA-based computing platforms.” Proceedings Design, Automation and Test in Europe Conference and Exhibition. Vol. 1. IEEE, Febuary 16, 2004, pp. 486-493. (previously submitted in related U.S. Appl. No. 17/979,542). |
Coffman Jr., E.G. and Whitt, W., “Recent asymptotic results in the probabilistic analysis of schedule makespans.” (1995). (previously submitted in related U.S. Appl. No. 17/979,542). |
Compton, K., and Hauck, S., “Reconfigurable computing: a survey of systems and software.” ACM Computing Surveys (csuR) vol. 34, Issue 2 (Jun. 1, 2002), pp. 171-210. (previously submitted in related U.S. Appl. No. 17/979,542). |
Feitelson, D. G., “Job Scheduling In Multiprogrammed Parallel Systems”, Institute of Computer Science, The Hebrew University, Aug. 1997. (previously submitted in related U.S. Appl. No. 17/979,542). |
Gabor et al., “Service Level Agreement for Multithreaded Processors”, ACM Transactions on Architecture and Code Optimization, vol. 6, Issue 2, Jun. 2009, pp. 1-33. (previously submitted in related U.S. Appl. No. 17/979,542). |
Gregori, E. et al., eds. Networking 2002: Networking Technologies, Services, and Protocols; Performance of Computer and Communication Networks; Mobile and Wireless Communications: Second International IFIP-TC6 Networking Conference, Pisa, Italy, May 19-24, 2002 Proceedings. vol. 2345. Springer Science & Business Media, 2007. (Partial reference submitted, pp. 65-68) (previously submitted in related U.S. Appl. No. 17/979,542). |
Karam, et al., “Trends In Multi-Core DSP Platforms”, IEEE Signal Processing Magazine, vol. 26, No. 6, Nov. 2009, pp. 38-49. (previously submitted in related U.S. Appl. No. 17/979,542). |
Kumar, et al., “Single-ISA Heterogeneous Multi-Core Architectures: The Potential For Processor Power Reduction”. Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, Dec. 5, 2003, pp. 81-92. (previously submitted in related U.S. Appl. No. 17/979,542). |
Lety, et al., “MiMaze, a 3D Multi-Player Game on the Internet”, Proceedings of the 4th International Conference on Virtual System and Multimedia, Gifu, Japan. vol. 1. 1998. (previously submitted in related U.S. Appl. No. 17/979,542). |
Dingchao et al., “Scheduling Task Graphs Onto Heterogeneous Multiprocessors,” Proceedings of TENCON '94-1994 IEEE Region 10's 9th Annual International Conference on: ‘Frontiers of Computer Technology’, vol. 2, 1994, pp. 556-563. (previously submitted in related U.S. Appl. No. 17/979,542). |
Magro et al., “Hyper-Threading Technology: Impact on Compute-Intensive Workloads”, Intel Technology Journal, vol. 6, Issue 1, Feb. 14, 2002. (previously submitted in related U.S. Appl. No. 17/979,542). |
[#HADOOP-3445] Implementing core scheduler functionality in Resource Manager (V1) for Hadoop, Accessed May 18, 2018, 12 pages, https://issues.apache.org/jira/si/jira.issueviews:issue-html/HADOOP-3445/HADOOP-3445.html. (previously submitted in related U.S. Appl. No. 15/267,153). |
7 Series FPGAs Configuration User Guide, a Xilinx, Inc. User Guide UG470 (v1.4) Jul. 19, 2012. (previously submitted in related U.S. Appl. No. 15/267,153). |
Borges, et al., “Sun Grid Engine, a new scheduler for EGEE middleware,” (2018). (previously submitted in related application No. 15/267, 153). |
Cooper, Brian F. et al., Building a Cloud for Yahoo!, 2009, 9 pages, IEEE Computer Society Technical Committee on Data Engineering, https://www.researchgate.net/profile/Rodrigo_Fonseca3/publication/220282767_Building_a_Cloud_for_Yahoo/links/0912f5109da99ddf6a000000/Building-a-Cloud-for-Yahoo.pdf. (previously submitted in related U.S. Appl. No. 15/267,153). |
Dye, David, Partial Reconfiguration of Xilinx FPGAs Using ISE Design Suite, a Xilinx, Inc. White Paper WP374 (v1.2), May 30, 2012. (previously submitted in related U.S. Appl. No. 15/267,153). |
Examination Report issued in IN Application No. 1219/MUM/2012 dated Jul. 19, 2019. (previously submitted in related U.S. Appl. No. 17/195,174). |
Examination Report issued in IN Application No. 2414/MUM/2011 dated Jul. 25, 2019. (previously submitted in related U.S. Appl. No. 17/195,174). |
Examiner's Answer issued in related U.S. Appl. No. 13/297,455 dated Feb. 10, 2016, 9 pages. (previously submitted in related U.S. Appl. No. 15/267,153). |
Final Rejection issued in related U.S. Appl. No. 15/267,153 dated Apr. 18, 2013, 18 pages. (previously submitted in related U.S. Appl. No. 15/267,153). |
Final Rejection issued in related U.S. Appl. No. 15/267,153 dated Mar. 26, 2015, 14 pages. (previously submitted in related U.S. Appl. No. 15/267,153). |
Final Rejection issued in related U.S. Appl. No. 15/267,153 dated Sep. 3, 2014, 18 pages. (previously submitted in related U.S. Appl. No. 15/267,153). |
Final Rejection issued in related U.S. Appl. No. 14/521,490 dated Jul. 28, 2017, 16 pages. (previously submitted in related application No. 15/267, 153). |
First Examination Report issued in IN Application No. 401/MUM/2011 on Nov. 9, 2018. (previously submitted in related application No. 15/267, 153). |
Fischer, Michael J. et al., Assigning Tasks for Efficiency in Hadoop, 2010, 11 pages, https://www.researchgate.net/ profile/Xueyuan_Su/publication/221257628_Assigning_tasks_for_efficiency_in_Hadoop/links/53df31100cf216e4210c5fd1/Assigning-tasks-for-efficiency-in-Hadoop.pdf. (previously submitted in related U.S. Appl. No. 15/267,153). |
Gentzsch, et al., “Sun Grid Engine: Towards Creating a Compute Power Grid.” IEEE Computer Society, Proceedings of the 1st International Symposium on Cluster Computing and the Grid (2001). (previously submitted in related U.S. Appl. No. 15/267,153). |
Ghodsi, Ali, et al., Dominant Resource Fairness: Fair Allocation of Multiple Resource Types, Proceedings of NSDI '11: 8th USENIX Symposium on Networked Systems Design and Implementation, Mar. 30, 2011, pp. 323-336. (previously submitted in related U.S. Appl. No. 15/267,153). |
Han, Wei, et al., Multi-core Architectures with Dynamically Reconfigurable Array Processors for the WiMAX Physical layer, pp. 115-120, 2008. (previously submitted in related U.S. Appl. No. 15/267,153). |
Hindman, Benjamin, et al., Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center, Proceedings of NSDI '11: 8th USENIX Symposium on Networked Systems Design and Implementation, Mar. 30, 2011, pp. 295-308. (previously submitted in related U.S. Appl. No. 15/267,153). |
Isard, Michael et al., Quincy: Fair Scheduling for Distributed Computing Clusters, Accessed May 18, 2018, 20 pages, https://www.sigops.org/sosp/sosp09/papers/isard-sosp09.pdf. (previously submitted in related U.S. Appl. No. 15/267,153). |
Ismail, M. I., et al., “Program-based static allocation policies for highly parallel computers,” Proceedings International Phoenix Conference on Computers and Communications, Scottsdale, AZ, 1995, pp. 61-68. (previously submitted in related U.S. Appl. No. 15/267,153). |
Jean, J et al., Dynamic reconfirmation to support concurrent applications, IEEE Transactions on Computers, vol. 48, Issue 6, pp. 591-602, Jun. 1999. (previously submitted in related U.S. Appl. No. 15/267,153). |
Lamonnier et al., Accelerate Partial Reconfiguration with a 100% Hardware Solution, Xcell Journal, Issue 79, Second Quarter 2012, pp. 44-49. (previously submitted in related U.S. Appl. No. 15/267,153). |
Lim, Harold C. et al., Automated Control in Cloud Computing: Challenges and Opportunities, Jun. 19, 2009, 6 pages, ACM, https://www2.cs.duke.edu/nicl/pub/papers/acdc09-lim.pdf. (previously submitted in related U.S. Appl. No. 15/267,153). |
Loh, Gabriel H., 3 D-Stacked Memory Architectures for Multi-Core Processors, IEEE Computer Society, pp. 453-464, 2008. (previously submitted in related application No. 15/267, 153). |
McCan, Cathy, et al., A Dynamic Processor Allocation Policy for Multiprogrammed Shared-Memory Multiprocessors, 1993, ACM, 33 pages (146-178). (previously submitted in related U.S. Appl. No. 15/267,153). |
Mohan, Shiwali et al., Towards a Resource Aware Scheduler in Hadoop, Dec. 21, 2009, 10 pages, Computer Science and Engineering, University of Michigan, Ann Arbor, https://pdfs.semanticscholar.org/d2e3/c7b60967934903f0837219772c6972ede93e.pdf. (previously submitted in related U.S. Appl. No. 15/267,153). |
Morishita, et al., Design of a multiprocessor system supporting interprocess message communication, Journal of the Faculty of Engineering, University of Tokyo, Series A, No. 24, 1986, pp. 36-37. (previously submitted in related U.S. Appl. No. 15/267,153). |
Murthy, Arun C., et al., Architecture of Next Generation Apache Hadoop MapReduce Framework, 2011, 14 pages. (previously submitted in related U.S. Appl. No. 15/267,153). |
Non-Final Rejection issued in related U.S. Appl. No. 15/267,153 dated Jun. 19, 2014, 15 pages. (previously submitted in related U.S. Appl. No. 15/267,153). |
Non-Final Rejection issued in related U.S. Appl. No. 13/297,455 dated Mar. 14, 2013, 23 pages. (previously submitted in related application No. 15/267, 153). |
Non-Final Rejection issued in related U.S. Appl. No. 15/267,153 dated Oct. 3, 2014, 29 pages. (previously submitted in related U.S. Appl. No. 15/267,153). |
Non-Final Rejection issued in related U.S. Appl. No. 15/267,153 dated Feb. 12, 2016, 25 pages. (previously submitted in related U.S. Appl. No. 15/267,153). |
Non-Final Rejection issued in related U.S. Appl. No. 15/267,153 dated Jun. 1, 2016, 18 pages. (previously submitted in related U.S. Appl. No. 15/267,153). |
Non-Final Rejection issued in related U.S. Appl. No. 15/267,153 dated May 17, 2018, 23 pages. (previously submitted in related U.S. Appl. No. 15/267,153). |
Non-Final Rejection issued in related U.S. Appl. No. 15/267,153 dated May 4, 2017, 19 pages. (previously submitted in related U.S. Appl. No. 15/267,153). |
Non-Final Rejection issued in related U.S. Appl. No. 15/267,153 dated Aug. 24, 2018, 54 pages. (previously submitted in related application No. 17/195, 174). |
Non-Final Rejection issued in related U.S. Appl. No. 17/195,174 dated Mar. 9, 2018, 23 pages. (previously submitted in related U.S. Appl. No. 17/195,174). |
Notice of Allowance issued in Application No. 15/267, 153 dated Jan. 17, 2019. (previously submitted in related U.S. Appl. No. 17/195,174). |
Partial Reconfiguration Tutorial, PlanAhead Design Tool, a Xilinx, Inc. User Guide UG743 (v14.1) May 8, 2012. (previously submitted in related U.S. Appl. No. 15/267,153). |
Partial Reconfiguration User Guide, a Xilinx, Inc. user document UG702 (v14.2) Jul. 25, 2012. (previously submitted in related U.S. Appl. No. 15/267,153). |
Sandholm, Thomas et al., Dynamic Proportional Share Scheduling in Hadoop, Accessed May 18, 2018, 20 pages, Hewlett-Packard Laboratories, http://citeseerx.ist.psu.edu/viewdoc/download? doi=10.1.1.591.4477&rep=rep1&type=pdf. (previously submitted in related U.S. Appl. No. 15/267,153). |
Shankar, Uma, Oracle Grid Engine Administration Guide, Release 6.2 Update 7, Aug. 2011, 202 pages, Oracle Corporation. (previously submitted in related U.S. Appl. No. 15/267,153). |
Shieh, Alan, et al., Sharing the Data Center Network, Proceedings of NSDI '11: 8th USENIX Symposium on Networked Systems Design and Implementation, Mar. 30, 2011, pp. 309-322. (previously submitted in related U.S. Appl. No. 15/267,153). |
Singh, Deshanand, Implementing FPGA Design with the OpenCL Standard, an Altera Corporation White Paper WP-01173-2.0, Nov. 2012 (previously submitted in related U.S. Appl. No. 15/267,153). |
Tam et al., Fast Configuration of PCI Express Technlogy through Partial Reconfiguration, A Xilinx, Inc. Application Note XAPP883 (v1.0) Nov. 19, 2010. (previously submitted in related U.S. Appl. No. 15/267/153). |
Tian, Chao et al., A Dynamic MapReduce Scheduler for Heterogeneous Workloads, 2009, pp. 218-224, IEEE Computer Society, https://pdfs.semanticscholar.org/679f/73d810e2ac9e2e84de798d853b6fb0b0206a.pdf. (previously submitted in related U.S. Appl. No. 15/267,153). |
Tsai, Chang-Hao, System Architectures with Virtualized Resources in a Large-Scale Computing Infrastructure, 2009, 146 pages, Computer Science and Engineering, The University of Michigan, https://kabru.eecs.umich.edu/papers/thesis/chtsai-thesis pdf. (previously submitted in U.S. Appl. No. 15/267,153). |
Warneke et al., “Nephele: efficient parallel data processing in the cloud,” MTAGS '09 Proceedings of the 2nd Workshop on Many-Task Computing on Grids and Supercomputers, Article No. 8 (2009). (previously submitted in U.S. Appl. No. 17/195,174). |
Wen et al., “Minimizing Migration on Grid Environments: an Experience on Sun Grid Engine” Journal of Information Technology and Applications, vol. 1, No. 4, pp. 297-304 (2007). (previously submitted in related U.S. Appl. No. 15/267,153). |
Zaharia, Matei et al., Job Scheduling for Multi-User MapReduce Clusters, Apr. 30, 2009, actual publication date unknown, 18 pages, Electrical Engineering and Computer Sciences, University of California at Berkeley, https://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-55.pdf. (previously submitted in related U.S. Appl. No. 15/267,153). |
Number | Date | Country | |
---|---|---|---|
20240126609 A1 | Apr 2024 | US |
Number | Date | Country | |
---|---|---|---|
61934747 | Feb 2014 | US | |
61869646 | Aug 2013 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 18116389 | Mar 2023 | US |
Child | 18394944 | US | |
Parent | 17979542 | Nov 2022 | US |
Child | 18116389 | US | |
Parent | 17859657 | Jul 2022 | US |
Child | 17979542 | US | |
Parent | 17470926 | Sep 2021 | US |
Child | 17859657 | US | |
Parent | 17463098 | Aug 2021 | US |
Child | 17470926 | US | |
Parent | 17344636 | Jun 2021 | US |
Child | 17463098 | US | |
Parent | 17195174 | Mar 2021 | US |
Child | 17344636 | US | |
Parent | 16434581 | Jun 2019 | US |
Child | 17195174 | US | |
Parent | 15267153 | Sep 2016 | US |
Child | 16434581 | US | |
Parent | 14318512 | Jun 2014 | US |
Child | 15267153 | US |