The field of invention relates generally to computing system design, and, more specifically, to an apparatus and method for invocation of a multi-threaded accelerator.
As semiconductor manufacturing processes are reaching an era that approaches 1 trillion transistors per die, design engineers are presented with the issue of how to most effectively put to use all the available transistors. One design approach is to implement specific computation intensive functions with dedicated hardware “acceleration” on die along with one or more general purpose CPU cores.
Acceleration is achieved with dedicated logic blocks designed to perform specific computation intensive functions. Migrating intensive computations to such dedicated logic blocks frees the CPU core(s) from executing significant numbers of instructions thereby increasing the effectiveness and efficiency of the CPU core(s).
Although “acceleration” in the form of co-processors (such as graphics co-processors)) are known in the art, such traditional co-processors are viewed by the OS as a separate “device” (within a larger computing system) that is external to the CPU core(s) that the OS runs on. These co-processors are therefore accessed through special device driver software and do not operate out of the same memory space as a CPU core. As such, traditional co-processors do not share or contemplate the virtual addressing-to-physical address translation scheme implemented on a CPU core.
Moreover, large latencies are encountered when a task is offloaded by an OS to a traditional co-processor. Specifically, as a CPU core and a traditional co-processor essentially correspond to separate, isolated sub-systems, significant communication resources are expended when tasks defined in the main OS on a GPP core are passed to the “kernel” software of the co-processor. Such large latencies favor system designs that invoke relatively infrequent tasks on the co-processor from the main OS but with large associated blocks of data per task. In effect, traditional co-processors are primarily utilized in a coarse grain fashion rather than a fine grain fashion.
As current system designers are interested in introducing more acceleration into computing systems with finer grained usages, a new paradigm for integrating acceleration in computing systems is warranted.
The present invention is illustrated by way of example and not limitation in the figures of the accompanying drawings, in which like references indicate similar elements and in which:
Although “acceleration” in the form of co-processors (such as graphics co-processors)) are known in the art, such traditional co-processors are viewed by the OS as a separate “device” (within a larger computing system) that is external to the CPU core(s) that the OS runs on. These co-processors are therefore accessed through special device driver software and do not operate out of the same memory space as a CPU core. As such, traditional co-processors do not share or contemplate the virtual addressing-to-physical address translation scheme implemented on a CPU core.
Moreover, large latencies are encountered when a task is offloaded by an OS to a traditional co-processor. Specifically, as a CPU core and a traditional co-processor essentially correspond to separate, isolated sub-systems, significant communication resources are expended when tasks defined in the main OS on a GPP core are passed to the “kernel” software of the co-processor. Such large latencies favor system designs that invoke relatively infrequent tasks on the co-processor from the main OS but with large associated blocks of data per task. In effect, traditional co-processors are primarily utilized in a coarse grain fashion rather than a fine grain fashion.
As current system designers are interested in introducing more acceleration into computing systems with finer grained usages, a new paradigm for integrating acceleration in computing systems is warranted.
Here, standard instructions are read from memory and executed by the core's traditional functional units in the CPU core 102. Other types of instructions that are received by the processing core 100_1, however, will trigger an accelerator into action. In a particular implementation, the underlying hardware supports the software's ability to call out a specific accelerator in code. That is, a specific command can be embodied into the code by the software programmer (or by a compiler), where, the specific command calls out and defines the input operand(s) for a specific accelerator unit.
The command is ultimately represented in some form of object code. During runtime, the underlying hardware “executes” the object code and, in so-doing, invokes the specific accelerator with the associated input data.
Upon being invoked, the accelerator operates out of the same memory space as the CPU core 102. As such, data operands may be identified to the accelerator with virtual addresses whose corresponding translation into physical address space is the same as those used by the CPU core 102. Moreover, generally, the execution time of an accelerator unit's execution of a command is longer than that of a traditional/standard instruction (owing to the complex nature of the tasks being performed). The input operand(s) and/or resultant may also be larger than the standard register sizes of the instruction execution pipeline(s) within the CPU 102.
An accelerator can therefore be generally viewed as being coarser grained (having larger execution times and/or operating on larger data chunks) than the traditional functional units and instructions of the CPU 102. At the same time, an accelerator can also generally be viewed as being finer grained, or at least more tightly coupled to the CPU core 102 than a traditional co-processor.
Specifically, the avoidance of a time expensive “driver call” in order to invoke the accelerator and/or the sharing of same memory space by the accelerator and general purpose CPU 102 corresponds to tighter coupling to the between the general purpose CPU 102 and accelerator as compared to that of a traditional co-processor. Moreover, the specific individual tasks that the accelerators are called on to perform may also be more fine grained than the larger, wholesale tasks traditionally performed by a co-processor. Specific individual tasks that are suitable for implementation with an accelerator as a single “invokable” operation include texture sampling, motion search or motion compensation, security related computations (e.g., cryptography, encryption, etc.), specific financial computations, and/or specific scientific computations.
The general purpose CPU core 102 may have one or more instruction execution pipelines. Modern day CPU cores are typically capable of concurrently executing multiple threads. Concurrent execution of multiple threads with multiple pipelines is a straightforward concept. However, a single pipeline can also be designed to support concurrent execution of multiple threads as well.
As the purpose of an accelerator is to provide higher performance for specific computations than what the general purpose CPU core is capable of providing, some discussion of how “higher performance” might be obtained is worthwhile.
At point 202, thread 201_1 invokes an accelerator. In a typical implementation, the accelerator includes one or more special purpose execution units that are specially designed to perform complex tasks. The invocation of these special purpose execution units provides some of the acceleration provided by the accelerator. As observed in
Micro-threads 203_1 to 203_Y are to be distinguished from micro-code. Micro-code is atomic program code internal to an execution unit that the execution unit utilizes to perform the instructions it is designed to perform. The execution units of the accelerator may be micro-coded but need not be. Micro-threads 203_1 to 203_Y, by contrast, are instruction streams like threads 201_1 to 201_X. That is, micro-threads 203_1 to 203_Y specify the instructions to be performed by the execution units of the accelerator, rather than correspond to internal program code within the execution units.
Referring to
Spawning multiple micro-threads 203_1 to 203_Y into an instruction set architecture (ISA) that is different than the ISA of the main thread 201_1 is more common than spawning multiple micro-threads into an ISA that is the same as the main thread. In a typical case where the accelerator ISA is different than the main thread ISA (e.g., a GPU), the accelerator and main thread execute out of different and isolated program memory regions and data memory regions. In a phrase, the general purpose CPU core and accelerator are different isolated machines each having their own respective program code and data domains.
By contrast, in the case where the mirco-threads 203_1 to 203_Y are executed on the same ISA as the main thread 201_1, significantly closer linkage between the accelerator and the main thread/general purpose CPU core is possible. For example, referring to
In order to support invocation of a multi-threaded accelerator where the ISA does not change as compared to the general purpose CPU certain semantic definitions should be established. These include definitions for: i) initial micro-thread architectural state; ii) maintenance of micro-architectural state; iii) thread scheduling; and, iv) final micro-thread architectural state. Other semantic definitions should be defined for exceptions, interrupts and violations. Each of these are discussed in succession below.
Micro-Thread Initialization, Maintenance, Scheduling and Conclusion
Initialization of micro-thread architectural state corresponds to the environment in which each of the micro-threads are started. In an embodiment where the accelerator and general purpose CPU core are closely linked, the micro-threads may be started in a fashion that is the same or similar to the manner in which subroutines are called by a main program executing as a typical/standard thread on the general purpose CPU core. As such, some if not all aspects of an Application Binary Interface (ABI) or embedded ABI (EABI) that is supported by the general purpose CPU core ISA are used to start the micro-threads. As such, there is similarity between the manner in which the main thread invokes the accelerator and the manner in which the main thread calls upon a typical subroutine that is executed on the general purpose CPU core.
An ABI or EABI essentially defines how a function call is made from a first routine to a second, subroutine. Part of an ABI or EABI (hereinafter, simply “ABI”) specifies standard conventions for file formats, data types, register usage, stack frame organization, and function parameter passing of an embedded software program. Another factor for consideration is that a thread, by definition, has its own “context”, where, context corresponds to the specific values within instruction, data and control register and/or memory space. Two different threads, unless they correspond to identical programs operating on identical data are expected to have different contexts over the course of their execution. In this sense, the multiple micro-threads of the accelerator can be viewed as independent threads having their own respective contexts.
As such, in an embodiment, the invocation of a multi-threaded accelerator by a general purpose CPU core thread includes passing multiple instances of the main thread's context to each of the multiple accelerator threads, where, each passing of the main thread's context is performed consistently with the ABI of the main core's ISA. At one extreme, the accelerator has its own associated registers 330 including dedicated register space for each individual micro-thread, and, copies of the main thread's context is copied over multiple times 331 into each micro-thread's dedicated accelerator register space 330. In this case, the general purpose CPU core includes first logic circuitry 310 to copy 331 the different copies of the main thread's context into the dedicated register space 330 of the accelerator.
According to this same approach, the main thread's context can subsequently be “switched out” of the general purpose CPU core while the accelerator is performing its task. As such, another thread, e.g., of another program can have its context switched into the general purpose CPU core and execute in its place during accelerator execution.
In another extreme, each of the accelerator's multiple micro-threads is simply given access to the main thread's context as it is sitting in the register space 340 of the main CPU core. In this case, the main thread's context may or may not be switched out of the main CPU core depending on designer preference as to whether or not the main thread's context is to be made permanently available or only initially available to the accelerator micro-threads. The main thread's context can be switched out as described above in the later case.
In another approach where the micro-threads operate out of the general CPU core's register space 303, just prior to the actual invocation of the accelerator code, the main thread executes one or more allocation instructions to allocate the stack region and a copy of the main thread's context, for each micro-thread, within the main CPU core register space 303. The allocated stack reserves space for each micro thread to make its own function call.
With respect to the allocation of the main thread's context, each micro-thread of the accelerator has its own copy of the main thread's context in register (for ease of drawing neither these copies nor the coupling between the accelerator and register space 303 is shown). Therefore, again, the general purpose CPU core includes logic circuitry 310 to store multiple copies of the main thread's context (although in this case the different copies are stored in the register space 303 of the general purpose CPU core). Technically speaking, register space 303 corresponds to the operand and control register space used by the instruction execution pipeline that is executing the main thread. Here, logic circuitry 310 may be the logic circuitry used to execute an allocation instruction (aloc).
In another intermediate approach, the main thread's context is copied over to an intermediate buffer or storage area (e.g., spare register and/or memory space (not shown in
In a further embodiment, as matter of efficiency, less than all of the context of the main thread is made available to the micro-threads by logic circuitry 310. For example, according to one approach, if the accelerator does not make use of content with a particular type of register space (such as SIMD register space), the context of that register type is not made available to the micro-threads (e.g., the micro-threads are not provided with SIMD context).
In another or related embodiment, the micro-threads are only provided from logic circuitry 310 with context that can be identified and/or used by the ABI. Here it is pertinent to understand the dynamic of a function call made through an ABI. Typically, a function call only passes a few “input” parameters to the sub-routine it is calling. The sub routine then performs its operations without any further reference to the calling thread's data. As such, only a limited set of register space that is used to pass just the input parameters to the called routine, are actually utilized by the ABI. Thus an ABI may set a limit or otherwise identify a smaller subset of registers than the entire context of the calling thread.
Along a similar train of thought, when a main thread invokes an accelerator and its multiple micro-threads, the context information of the main thread that is passed to the micro-threads only corresponds to the limited subset of context information permitted by the ABI. In the case of an ISA that supports multiple, different ABIs, the subset of registers may correspond to one, more than one or all of the ABIs can be chosen as the permissible set of context that can be passed to the micro-threads. As such, the general purpose CPU core either re-uses the logic circuitry used to effect a typical sub-routine function call amongst threads processed by the general purpose CPU core for the purpose of invoking the accelerator, or, has added logic circuitry used to effect an invocation of the accelerator consistent with the ABI.
In various embodiments, regardless of how the accelerator is provided with context of the main thread, the context of the main thread (e.g., the architectural state of the machine for the main thread when (e.g., immediately after) the accelerator is invoked) is not modified except for the instruction pointer being modified (e.g., incremented) and some registers being adjusted to reflect the execution status (success/failure and related details). Said another way, any changes made to the main threads context/architectural state in micro-thread are hidden from the invoking application of the main thread. This allows the micro-threads to execute at order.
In a further embodiment, since the accelerator can operate out of the same program memory space as the main thread, the instruction pointer is changed as part of the invocation process to point to the start of the accelerator code. As such, the instruction pointer context of the main thread is not copied as part of the accelerator invocation that the micro-threads operate out of (it can be copied to return program flow to the main thread after the accelerator has completed its operations). Here, it is worthwhile to point out that a programmed, multi-threaded accelerator is expected to have its own instruction fetch unit(s) for fetching the micro-thread instructions from program memory.
Depending on approach, one of the accelerator's micro-threads can be deemed the master micro-thread that starts operation before the other micro-threads, and, controls the start of one or more of the other micro-threads. In this case, micro-thread scheduling is essentially performed by the compiler that creates the micro-threads (through its crafting of the master micro thread code). In an embodiment, the instruction pointer is changed to point to the start of the master micro-thread. This approach may be suitable where there is some relatedness amongst the threads (i.e., the threads are not operating in total isolation).
In an alternate embodiment, e.g., where the micro-threads have no relation or dependencies on each other, a group of instruction pointers is passed to the accelerator each having a respective starting address for a different one of multiple micro-threads (such as all the micro-threads of the code to be executed by the accelerator). The micro-threads simply start, e.g., in parallel, through immediate reference to their respective instruction pointer. The group of instruction addresses can be passed as input parameters of the invocation made by the main thread of the general purpose CPU core. Here, a separate register permitted for use by the ABI may be used for each different starting address. In another approach, if the accelerator code is able to refer to SIMD register space, the starting addresses may be kept within a single vector within the SIMD register space.
If the micro-threads have some relatedness, in an embodiment, micro-thread scheduling hints are provided to the accelerator by the main thread as an input parameter of the accelerator invocation. For example, specific input parameters describing some aspect of the start sequence order of the different micro-threads may be passed from the main thread to the accelerator hardware. The accelerator hardware instruction fetch logic refers to this information to understand or determine which micro-threads should be started at which cycle time. The instruction address pointers for the individual micro-threads may be passed to the accelerator by any of the techniques discussed just above. In an embodiment, the compiler adds the hints to the main thread code.
The specific results returned by the accelerator to the main thread are, in an embodiment, also presented in a manner consistent with the ABI. In an embodiment, a master micro-thread of the accelerator combines and/or oversees the results of the multiple micro-threads into a single result. In an embodiment, just before the accelerator resultant is returned to the main thread, any micro-thread context within the general purpose CPU core's register space is switched out of the general purpose CPU core's register space, and, if the main thread's context was switched out of the main CPU core's register space during accelerator operation, it is switched back into the general purpose CPU core's register space. As such, the main thread returns to the state that it invoked the accelerator from and reawakes to find the result returned from the accelerator. The result can be a scalar provided in scalar register space, or, a vector provided in vector (e.g., SIMD register space).
Here, for any of the context switching activities described above, where main thread or micro-thread context is switched in/out of the general purpose CPU core's register space, the general purpose CPU core has respective logic circuitry designed to effect the respective context switching activity.
Exceptions, Interrupts and Violations
An exception is a problem detected within one of the accelerator's micro-threads, typically by the micro-thread itself. An interrupt is an event that is external to the accelerator (e.g., a new user command is entered).
In an embodiment, referring to
The main thread then invokes an exception and/or interrupt handler which handles the problem (e.g., by referring to the externally saved micro-thread state information in the case of an exception) 404. After the problem is handled, the interrupt/exception handler restores the externally saved state of the micro-threads back into the accelerator 405. The accelerator's micro-threads then resume operation from the point of the original interrupt/exception 406.
In an alternate approach, in the case of an exception, rather than return to the main thread to have it call the exception handler, instead, the accelerator hardware calls the exception handler directly without waking the main thread and passes a pointer to the location of the saved state information of the excepting thread (here, the internal state information of the micro-threads within the accelerator are again externally saved in response to the exception). According to this approach, the exception handler refers directly to the excepting code and fixes the problem. Execution is subsequently returned to the accelerator, e.g., without involvement of the main CPU thread. The accelerator recalls the externally saved micro-thread state information and resumes operation.
In an embodiment, state information associated with the original invocation of the accelerator by the main thread is saved in register or memory space so that program control can pass from the exception handler to the accelerator directly without involvement of the main thread. According to one approach, the logic circuitry that implements an IRET instruction (or similar instruction used to determine where program flow is directed upon return from the interrupt handler) include micro-code or other circuitry that uses the saved invocation state information to return flow to the accelerator. As such, the IRET instruction has an input parameter of some kind that indicates the interrupt is from the accelerator, and, in response, returns program flow to the accelerator. Without the input parameter indicating the exception is from the accelerator, the IRET instruction logic/micro-code returns program flow to a main CPU thread.
In other embodiments, the exception handler may be redesigned to use the saved invocation state information to return flow to the accelerator, or, the excepting micro-thread is allowed to complete, in effect, continuing operation to the extent possible as if no exception was thrown. In the later approach, accelerator micro-thread state need not be externally saved. When the accelerator finally returns its result to the main thread, the earlier exception causes the main thread to invoke the exception handler. When the exception handler fixes the problem, the accelerator is re-invoked from scratch as if the accelerator had not been invoked earlier. Here, the saved state information of the saved invocation can be used to re-invoke the accelerator.
A violation is trigged from code that does not comply with a requirement established by the underlying machine. According to one possibility, the accelerator itself may impose restrictions, such as restrictions on the accelerated application callback code (e.g., 64 bit mode only, etc.). In the case that micro-thread code does not comply with a requirement established for the accelerator, the violation can be labeled as such but treated the same or similarly to that as an exception.
In one embodiment, the exception handler can use the storage area where micro-thread state is saved to complete the originally accelerated operation in a non-accelerated mode (e.g., with the general purpose CPU core). At completion of the operation, the exception handler returns execution to the instruction following the accelerator invocation. Alternatively or in combination. the instruction that triggered the violation is executed in software (e.g., with instructions executed by the general purpose CPU core) in order to effect its operation. The micro-thread that raised the violation has its state saved as described above by the handler with a marker to return operation to the next instruction following the violating instruction when its operation is resumed.
In another approach, the violation is hidden from the software (e.g., the main application software program that invoked the accelerator) altogether. That is, there is no exception handler. Instead, the processor uses micro-code to perform the following in a manner that is hidden from the software: 1) freeze the state of the threads ‘as if” an exception handler were being called (e.g., externally save micro-thread state); and; 2) not invoke an exception handler and instead, continue execution (e.g., with micro-code) from the freeze point on the general purpose CPU (e.g., which supports all the instructions so there's no problem). At that point the recovery can either switch execution back to the accelerator (“unfreezing”) from the “updated” point (since we executed at least one instruction in the general purpose CPU), or, just finish the execution of all the micro-threads on the general purpose CPU, without switching back to the accelerator at all. This is very different from allowing a software exception handler to use the general purpose CPU, because a software handler isn't invoked at all. As far as the software is concerned, the system “just works” without any exceptions.
Here, the general purpose CPU and/or accelerator may have respective one or more logic circuits to effect any of the processes discussed above.
Any of the processes taught by the discussion above may be performed with software, hardware logic circuitry or some combination thereof. It is believed that processes taught by the discussion above may also be described in source level program code in various object-orientated or non-object-orientated computer programming languages. An article of manufacture may be used to store program code. An article of manufacture that stores program code may be embodied as, but is not limited to, one or more memories (e.g., one or more flash memories, random access memories (static, dynamic or other)), optical disks, CD-ROMs, DVD ROMs, EPROMs, EEPROMs, magnetic or optical cards or other type of machine-readable media suitable for storing electronic instructions. Program code may also be downloaded from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of data signals embodied in a propagation medium (e.g., via a communication link (e.g., a network connection)).
In the foregoing specification, the invention has been described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.
Number | Name | Date | Kind |
---|---|---|---|
4943915 | Wilhelm et al. | Jul 1990 | A |
4982402 | Beaven et al. | Jan 1991 | A |
5276798 | Peaslee et al. | Jan 1994 | A |
5329615 | Peaslee et al. | Jul 1994 | A |
5371849 | Peaslee et al. | Dec 1994 | A |
5423025 | Goldman et al. | Jun 1995 | A |
5430841 | Tannenbaum et al. | Jul 1995 | A |
5550988 | Sarangdhar et al. | Aug 1996 | A |
5649230 | Lentz | Jul 1997 | A |
5890010 | Nishigami | Mar 1999 | A |
6061711 | Song et al. | May 2000 | A |
6081849 | Born et al. | Jun 2000 | A |
6105127 | Kimura et al. | Aug 2000 | A |
6148326 | Born et al. | Nov 2000 | A |
6247040 | Born et al. | Jun 2001 | B1 |
6275497 | Varma et al. | Aug 2001 | B1 |
6331857 | Hussain et al. | Dec 2001 | B1 |
6341324 | Caulk, Jr. et al. | Jan 2002 | B1 |
6397240 | Fernando et al. | May 2002 | B1 |
6725416 | Dadurian | Apr 2004 | B2 |
6742104 | Chauvel et al. | May 2004 | B2 |
6779085 | Chauvel | Aug 2004 | B2 |
6944746 | So | Sep 2005 | B2 |
6952214 | Naegle et al. | Oct 2005 | B2 |
6957315 | Chauvel | Oct 2005 | B2 |
7065625 | Alderson | Jun 2006 | B2 |
7079147 | Wichman et al. | Jul 2006 | B2 |
7082508 | Khan et al. | Jul 2006 | B2 |
7200741 | Mine | Apr 2007 | B1 |
7209996 | Kohn et al. | Apr 2007 | B2 |
7234042 | Wilson | Jun 2007 | B1 |
7302627 | Mimar | Nov 2007 | B1 |
7370243 | Grohoski et al. | May 2008 | B1 |
7480838 | Wilkerson et al. | Jan 2009 | B1 |
7545381 | Huang et al. | Jun 2009 | B2 |
7583268 | Huang et al. | Sep 2009 | B2 |
7598958 | Kelleher | Oct 2009 | B1 |
7676649 | Rapp et al. | Mar 2010 | B2 |
7746350 | Danilak | Jun 2010 | B1 |
7765388 | Barrett et al. | Jul 2010 | B2 |
7793080 | Shen et al. | Sep 2010 | B2 |
7865675 | Paver et al. | Jan 2011 | B2 |
7930519 | Frank | Apr 2011 | B2 |
8020039 | Reid et al. | Sep 2011 | B2 |
8055872 | Biles et al. | Nov 2011 | B2 |
8063907 | Lippincott et al. | Nov 2011 | B2 |
8082426 | Paltashev et al. | Dec 2011 | B2 |
8140823 | Codrescu et al. | Mar 2012 | B2 |
8141102 | Aho et al. | Mar 2012 | B2 |
8190863 | Fossum et al. | May 2012 | B2 |
8212824 | Allen et al. | Jul 2012 | B1 |
8230442 | Aho et al. | Jul 2012 | B2 |
8281185 | Nussbaum et al. | Oct 2012 | B2 |
8345052 | Diard | Jan 2013 | B1 |
8424018 | Aho et al. | Apr 2013 | B2 |
8683175 | Ekanadham et al. | Mar 2014 | B2 |
8776084 | Aho et al. | Jul 2014 | B2 |
8780123 | Crow et al. | Jul 2014 | B2 |
8959311 | Akkar et al. | Feb 2015 | B2 |
9003102 | Lassa | Apr 2015 | B2 |
9003166 | Sinha et al. | Apr 2015 | B2 |
9015443 | Aho et al. | Apr 2015 | B2 |
9053025 | Ben-Kiki et al. | Jun 2015 | B2 |
9086813 | Zeng et al. | Jul 2015 | B2 |
9189261 | Masood | Nov 2015 | B2 |
9275491 | Bolz et al. | Mar 2016 | B2 |
9361116 | Ben-Kiki et al. | Jun 2016 | B2 |
9396020 | Ginzburg et al. | Jul 2016 | B2 |
9417873 | Ben-Kiki et al. | Aug 2016 | B2 |
9703603 | Roy et al. | Jul 2017 | B1 |
20010042210 | Blaker et al. | Nov 2001 | A1 |
20020004904 | Blaker et al. | Jan 2002 | A1 |
20020069327 | Chauvel | Jun 2002 | A1 |
20020178349 | Shibayama et al. | Nov 2002 | A1 |
20030028751 | McDonald et al. | Feb 2003 | A1 |
20030093648 | Moyer | May 2003 | A1 |
20030126416 | Marr et al. | Jul 2003 | A1 |
20030135718 | Dewitt et al. | Jul 2003 | A1 |
20030135719 | Dewitt et al. | Jul 2003 | A1 |
20030135789 | Dewitt et al. | Jul 2003 | A1 |
20030212874 | Alderson | Nov 2003 | A1 |
20040055003 | Sundaram et al. | Mar 2004 | A1 |
20040073836 | Shimada | Apr 2004 | A1 |
20040111594 | Feiste et al. | Jun 2004 | A1 |
20040160446 | Gosalia et al. | Aug 2004 | A1 |
20040215444 | Patel et al. | Oct 2004 | A1 |
20040227763 | Wichman et al. | Nov 2004 | A1 |
20040268071 | Khan et al. | Dec 2004 | A1 |
20050149937 | Pilkington | Jul 2005 | A1 |
20050166038 | Wang et al. | Jul 2005 | A1 |
20050257186 | Zilbershlag | Nov 2005 | A1 |
20060050077 | D'Amora et al. | Mar 2006 | A1 |
20060095721 | Biles et al. | May 2006 | A1 |
20060095807 | Grochowski et al. | May 2006 | A1 |
20060200802 | Mott et al. | Sep 2006 | A1 |
20060288193 | Hsu | Dec 2006 | A1 |
20070050594 | Augsburg et al. | Mar 2007 | A1 |
20070103476 | Huang et al. | May 2007 | A1 |
20070226464 | Chaudhry et al. | Sep 2007 | A1 |
20080005546 | Wang et al. | Jan 2008 | A1 |
20080052532 | Akkar et al. | Feb 2008 | A1 |
20080104425 | Gunther et al. | May 2008 | A1 |
20080222383 | Spracklen et al. | Sep 2008 | A1 |
20090019264 | Correale et al. | Jan 2009 | A1 |
20090024836 | Shen et al. | Jan 2009 | A1 |
20090141034 | Pryor et al. | Jun 2009 | A1 |
20090144519 | Codrescu et al. | Jun 2009 | A1 |
20090150620 | Paver et al. | Jun 2009 | A1 |
20090150722 | Reid et al. | Jun 2009 | A1 |
20090198966 | Gschwind | Aug 2009 | A1 |
20090216958 | Biles et al. | Aug 2009 | A1 |
20090254907 | Neary | Oct 2009 | A1 |
20090259996 | Grover et al. | Oct 2009 | A1 |
20090309884 | Lippincott et al. | Dec 2009 | A1 |
20100058356 | Aho et al. | Mar 2010 | A1 |
20100153686 | Frank | Jun 2010 | A1 |
20100274972 | Babayan et al. | Oct 2010 | A1 |
20100332901 | Nussbaum et al. | Dec 2010 | A1 |
20110040924 | Selinger | Feb 2011 | A1 |
20110047533 | Gschwind | Feb 2011 | A1 |
20110072234 | Chinya et al. | Mar 2011 | A1 |
20110093637 | Gupta et al. | Apr 2011 | A1 |
20110145778 | Chen | Jun 2011 | A1 |
20110271059 | Aho et al. | Nov 2011 | A1 |
20120023314 | Crum et al. | Jan 2012 | A1 |
20120036339 | Frazier | Feb 2012 | A1 |
20120124588 | Sinha et al. | May 2012 | A1 |
20120131309 | Johnson et al. | May 2012 | A1 |
20120139926 | Clohset et al. | Jun 2012 | A1 |
20120155777 | Schweiger et al. | Jun 2012 | A1 |
20120159090 | Andrews et al. | Jun 2012 | A1 |
20120166777 | McLellan et al. | Jun 2012 | A1 |
20120239904 | Ekanadham et al. | Sep 2012 | A1 |
20120311360 | Balasubramanian, et al. | Dec 2012 | A1 |
20120331310 | Burns et al. | Dec 2012 | A1 |
20130054871 | Lassa | Feb 2013 | A1 |
20130159630 | Lichmanov | Jun 2013 | A1 |
20130167154 | Peng et al. | Jun 2013 | A1 |
20130179884 | Masood | Jul 2013 | A1 |
20130205119 | Rajwar et al. | Aug 2013 | A1 |
20130332937 | Gaster et al. | Dec 2013 | A1 |
20140025822 | Guha et al. | Jan 2014 | A1 |
20140176569 | Meixner | Jun 2014 | A1 |
20140189317 | Ben-Kiki et al. | Jul 2014 | A1 |
20140189333 | Ben-Kiki et al. | Jul 2014 | A1 |
20140282580 | Zeng et al. | Sep 2014 | A1 |
20140331236 | Mitra et al. | Nov 2014 | A1 |
20140344815 | Ginzburg et al. | Nov 2014 | A1 |
20150317161 | Murphy | Nov 2015 | A1 |
20160246597 | Ben-Kiki et al. | Aug 2016 | A1 |
20160335090 | Weissmann et al. | Nov 2016 | A1 |
20160342419 | Oren-Kiki et al. | Nov 2016 | A1 |
20170017491 | Ben-Kiki et al. | Jan 2017 | A1 |
20170017492 | Ben-Kiki et al. | Jan 2017 | A1 |
20170109281 | Weissmann et al. | Apr 2017 | A1 |
20170109294 | Weissmann et al. | Apr 2017 | A1 |
20170153984 | Weissmann et al. | Jun 2017 | A1 |
Number | Date | Country |
---|---|---|
1164704 | Nov 1997 | CN |
1608246 | Apr 2005 | CN |
1981280 | Jun 2007 | CN |
101083525 | Dec 2007 | CN |
101452423 | Jun 2009 | CN |
101667138 | Mar 2010 | CN |
101855614 | Oct 2010 | CN |
102270166 | Dec 2011 | CN |
102314671 | Jan 2012 | CN |
102567556 | Jul 2012 | CN |
102741806 | Oct 2012 | CN |
102741826 | Oct 2012 | CN |
2013147887 | Oct 2013 | NO |
2014105128 | Jul 2014 | WO |
2014105152 | Jul 2014 | WO |
Entry |
---|
Microthreading as a Novel Method for Close Coupling of Custom Hardware Accelerators to SVP Processors Jaroslav Sykora, Leos Kafka, Martin Danek, and Lukas Kohout Published: 2011. |
Operating System Support for Overlapping-ISA Heterogeneous Multi-core Architectures Tong Li, Paul Brett, Rob Knauerhase, David Koufaty, Dheeraj Reddy, and Scott Hahn Published: 2010. |
Unifying Software and Hardware of Multithreaded Reconfigurable Applications Within Operating System Processes by Miljan Vuletic pp. i, iii, vii-ix, 1-44, 61-99, and 125-136 Published: 2006. |
Developing a reference implementation for a microgrid of microthreaded microprocessors Mike Lankamp pp. 3, 5, 7, 9-10, 15-41, and 55 Published: 2007. |
Making multi-cores mainstream—from security to scalability Chris Jesshope, Michael Hicks, Mike Lankamp, Raphael Poss and Li Zhang Published: 2010. |
A Unified View of Core Selection and Application Steering in Heterogeneous Chip Multiprocessors Sandeep Suresh Navada Abstract, Chapters 1, 3, and 6 Jun. 15, 2012. |
Accelerating Video-Mining Applications Using Many Small, General-Purpose Cores Li, E.; Xiaofeng Tong; Jianguo Li; Yurong Chen; Tao Wang; Wang, P.P; Wei Hu; Yangzhou Du; Yimin Zhang; Yen-Kuang Chen Publisehd: 2008. |
64-bit PowerPC ELF Application Binary Interface Supplement 1.9 Ian Taylor Chapters 1 and 3-3.2.5 Published: 2004. |
Architectures and ABIs detailed Thiago Macieira Published: Jan. 2012. |
PowerPC storage model and AIX programming Mike Lyons Published: 2005. |
Hardware Support for Irregular Control Flow in Vector Processor Huy Vo Published: May 7, 2012. |
Apple-Core: Microgrids of SVP cores Flexible, general-purpose, fine-grained hardware concurrency management Published: Sep. 5, 2012. |
PCT/US2013/048339 Notification of Transmittal of the International Search Report and the Written Opinion of the International Searching Authority, mailed Nov. 1, 2013, 10 pages. |
PCT/US2013/048339 International Preliminary Report on Patentability, mailed Jun. 30, 2015, 6 pages. |
Office action from U.S. Appl. No. 13/730,055, mailed Jul. 27, 2015, 11 pages. |
PCT/US2013/046166 Notification of Transmittal of the International Search Report and the Written Opinion of the International Searching Authority, mailed Sep. 2, 2013, 9 pages. |
PCT/US2013/046166 Notification Concerning Transmittal of International Preliminary Report on Patentability , mailed Jul. 9, 2015, 6 pages. |
Office action from U.S. Appl. No. 13/730,143, mailed Aug. 11, 2015, 13 pages. |
PCT/US2013/046911 Notification of Transmittal of the International Search Report and the Written Opinion of the International Searching Authority, mailed Sep. 27, 2013, 10 pages. |
PCT/US2013/046911 Notification Concerning Transmittal of International Preliminary Report on Patentability, mailed Jul. 9, 2015, 7 pages. |
Notice of Allowance from U.S. Appl. No. 13/730,055, mailed Feb. 25, 2016, 5 pages. |
Notice of Allowance from U.S. Appl. No. 13/730,055, mailed Apr. 15, 2016, 8 pages. |
Final Office action from U.S. Appl. No. 13/730,143, mailed May 4, 2016, 15 pages. |
IBM Technical Disclosure Bulletin NN84102905 “Multiprocessor System to Improve Context Switching,” Oct. 1984, 4 pages. |
Non-Final Office Action from U.S. Appl. No. 15/281,944 dated November 25, 2016, 23 pages. |
Non-Final Office Action from U.S. Appl. No. 15/282,082 dated November 28, 2016, 26 pages. |
Notice of Allowance for foreign counterpart Korean Application No. 10-2015-7012861, dated Mar. 29, 2016, 3 pages. |
Office action for foreign counterpart Korean Application No. 10-2015-7012861, dated Dec. 11, 2015, 5 pages. |
Office Action from Foreign Counterpart Chinese Patent Application No. 201380059874.8, dated Sep. 5, 2016, 17 pages. |
Notice of Allowance from U.S. Appl. No. 13/730,719, dated Nov. 13, 2017, 11 pages. |
Notice of Allowance from U.S. Appl. No. 15/145,748, dated Mar. 12, 2018, 14 pages. |
Notice of Allowance from U.S. Appl. No. 15/281,944, dated Dec. 8, 2017, 28 pages. |
Notice of Allowance from U.S. Appl. No. 15/282,082, dated Dec. 20, 2017, 14 pages. |
Office Action from foreign counterpart China Patent Application No. 201380059874.8, dated May 16, 2017, 3 pages. |
Request for Amendment from foreign counterpart Korean Patent Application No. KR1020157012861, dated May 27, 2015, 5 pages. |
Second Office Action from foreign counterpart Chinese Patent Application No. 201380059888, dated Oct. 16, 2017, 15 pages. |
Choi J., et al., “Impact of Cache Architecture and Interface on Performance and Area of FPGA-Based Processor/Parallel-Accelerator Systems,” IEEE 20th International Symposium on Field-Programmable Custom Computing Machines, 2012, pp. 17-24. |
Decision to Grant a Patent from foreign counterpart Korean Patent Application No. 10-2016-7029367, dated Apr. 26, 2017, 4 pages. |
Final Office Action from U.S. Appl. No. 13/729,915, dated Dec. 2, 2015, 24 pages. |
Final Office Action from U.S. Appl. No. 13/730,143, dated Aug. 7, 2017, 16 pages. |
Final Office Action from U.S. Appl. No. 13/730,719, dated Dec. 15, 2015, 10 pages. |
Final Office Action from U.S. Appl. No. 15/145,748, dated Mar. 29, 2017, 41 pages. |
Final Office Action from U.S. Appl. No. 15/226,875, dated May 17, 2017, 36 pages. |
Final Office Action from U.S. Appl. No. 15/281,944, dated May 10, 2017, 12 pages. |
Final Office Action from U.S. Appl. No. 15/282,082, dated Apr. 24, 2017, 38 pages. |
First Office Action and Search Report from foreign counterpart China Patent Application No. 20138005989.8, dated Sep. 1, 2017, 10 pages. |
First Office Action and Search Report from foreign counterpart Chinese Patent Application No. 201380059921.9, dated Sep. 20, 2017, 10 pages. (Translation available only for office action). |
Grant of Patent from foreign counterpart Korean Patent Application No. 10-2016-7029366, dated Jul. 31, 2017, 4 pages. |
Gschwind M., “Optimizing Data Sharing and Address Translation for the Cell BE Heterogeneous Chip Multiprocessor,” IEEE, 2008, pp. 478-485. |
Intel 915G/915GV/91 OGL Express Chipset Graphics and Memory Controller Hub (GMCH)—White Paper' Sep. 2004 by Intel, 9 pages. |
International Preliminary Report on Patentability for Application No. PCT/US2013/047387, dated Jun. 30, 2015, 5 pages. |
International Preliminary Report on Patentability for Application No. PCT/US2013/048694, dated Jul. 9, 2015, 6 pages. |
International Preliminary Report on Patentability for International Application No. PCT/US2013/046863, dated Jul. 9, 2015, 6 pages. |
International Search Report and Written Opinion for Application No. PCT/U52013/046863, dated Aug. 28, 2013, 9 pages. |
International Search Report and Written Opinion for Application No. PCT/U52013/047387, dated Sep. 2, 2013, 9 pages. |
International Search Report and Written Opinion for Application No. PCT/US2013/048694, dated Sep. 30, 2013, 9 pages. |
Non-Final Office Action from U.S. Appl. No. 13/729,915, dated Jul. 23, 2015, 22 pages. |
Non-Final Office Action from U.S. Appl. No. 13/729,931, dated Oct. 3, 2014, 13 pages. |
Non-Final Office Action from U.S. Appl. No. 13/730,143, dated Apr. 17, 2017, 36 pages. |
Non-Final Office Action from U.S. Appl. No. 13/730,719, dated May 13, 2015, 11 pages. |
Non-Final Office Action from U.S. Appl. No. 13/730,719, dated Sep. 23, 2016, 19 pages. |
Non-final Office Action from U.S. Appl. No. 15/145,748, dated Oct. 33, 2016, 22 pages. |
Non-final Office Action from U.S. Appl. No. 15/145,748, dated Sep. 22, 2017, 30 pages. |
Non-Final Office Action from U.S. Appl. No. 15/226,875, dated Oct. 34, 2016, 25 pages. |
Non-Final Office Action from U.S. Appl. No. 15/226,875, dated Sep. 12, 2017, 19 pages. |
Non-Final Office Action from U.S. Appl. No. 15/282,082 dated Aug. 2, 2017, 32 pages. |
Notice of Allowance for foreign counterpart Korea Application No. 10-2015-7012995, dated Jul. 20, 2016, 3 pages. |
Aotice of Allowance from U.S. Appl. No. 13/729,915, dated Feb. 17, 2016, 10 pages. |
Notice of Allowance from U.S. Appl. No. 13/729,931, dated Feb. 2, 2015, 5 pages. |
Notice of Allowance from U.S. Appl. No. 13/730,719, dated May 18, 2017, 19 pages. |
Notice of Preliminary Rejection for foreign counterpart Korean Application No. 10-2015-7012995, dated Feb. 22, 2016, 12 pages. |
Notice of Preliminary Rejection from foreign counterpart Korean Patent Application No. KR1020167029366, dated Jan. 19, 2017, 13 pages. |
Notice on Grant of Patent Right for Invention from foreign counterpart Chinese Patent Application No. 201380059874.8, dated Sep. 12, 2017, 4 pages. |
Office Action from foreign counterpart Chinese Patent Application No. 201380059888, dated Feb. 7, 2017, 18 pages. |
Office Action from foreign counterpart Korean Patent Application No. 1020167029367, dated Oct. 27, 2016, 6 pages. |
Abandonment from U.S. Appl. No. 13/730,143, dated Apr. 24, 2018, 3 pages. |
Corrected Notice of Allowability from U.S. Appl. No. 15/145,748, dated Jul. 11, 2018, 9 pages. |
Final Office Action from U.S. Appl. No. 15/226,875, dated Jun. 29, 2018, 26 pages. |
Notice of Allowance from U.S. Appl. No. 15/281,944, dated Jun. 27, 2018, 9 pages. |
Third Office Action from foreign counterpart Chinese Patent Application No. 201380059888.0, dated Jul. 3, 2018, 17 pages. |
First Office Action and Search Report from foreign counterpart Chinese Patent Application No. 2016110888386, dated Apr. 28, 2018, 24 pages. |
Notice of Allowance from U.S. Appl. No. 15/145,748, dated Jun. 6, 2018, 15 pages. |
Notice of Allowance from U.S. Appl. No. 13/730,719, dated Mar. 27, 2018, 19 pages. |
Notice of Allowance from U.S. Appl. No. 15/281,944, dated Mar. 26, 2018, 18 pages. |
Notice of Allowance from U.S. Appl. No. 15/282,082, dated Mar. 30, 2018, 14 pages. |
Notice on Grant of Patent Right for Invention from foreign counterpart Chinese Patent Application No. 201380059899.8, dated May 4, 2018, 4 pages. |
Second Office Action from foreign counterpart Chinese Patent Application No. 201380059921.9, dated Mar. 20, 2018, 7 pages. |
Number | Date | Country | |
---|---|---|---|
20140189713 A1 | Jul 2014 | US |