Executing instruction sequence code blocks by using virtual cores instantiated by partitionable engines

Abstract
A method for executing instructions using a plurality of virtual cores for a processor. The method includes receiving an incoming instruction sequence using a global front end scheduler, and partitioning the incoming instruction sequence into a plurality of code blocks of instructions. The method further includes generating a plurality of inheritance vectors describing interdependencies between instructions of the code blocks, and allocating the code blocks to a plurality of virtual cores of the processor, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines. The code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors.
Description
FIELD OF THE INVENTION

The present invention is generally related to digital computer systems, more particularly, to a system and method for selecting instructions comprising an instruction sequence.


BACKGROUND OF THE INVENTION

Processors are required to handle multiple tasks that are either dependent or totally independent. The internal state of such processors usually consists of registers that might hold different values at each particular instant of program execution. At each instant of program execution, the internal state image is called the architecture state of the processor.


When code execution is switched to run another function (e.g., another thread, process or program), then the state of the machine/processor has to be saved so that the new function can utilize the internal registers to build its new state. Once the new function is terminated then its state can be discarded and the state of the previous context will be restored and execution resumes. Such a switch process is called a context switch and usually includes 10's or hundreds of cycles especially with modern architectures that employ large number of registers (e.g., 64, 128, 256) and/or out of order execution.


In thread-aware hardware architectures, it is normal for the hardware to support multiple context states for a limited number of hardware-supported threads. In this case, the hardware duplicates all architecture state elements for each supported thread. This eliminates the need for context switch when executing a new thread. However, this still has multiple draw backs, namely the area, power and complexity of duplicating all architecture state elements (i.e., registers) for each additional thread supported in hardware. In addition, if the number of software threads exceeds the number of explicitly supported hardware threads, then the context switch must still be performed.


This becomes common as parallelism is needed on a fine granularity basis requiring a large number of threads. The hardware thread-aware architectures with duplicate context-state hardware storage do not help non-threaded software code and only reduces the number of context switches for software that is threaded. However, those threads are usually constructed for coarse grain parallelism, and result in heavy software overhead for initiating and synchronizing, leaving fine grain parallelism, such as function calls and loops parallel execution, without efficient threading initiations/auto generation. Such described overheads are accompanied with the difficulty of auto parallelization of such codes using state of the art compiler or user parallelization techniques for non-explicitly/easily parallelized/threaded software codes.


SUMMARY OF THE INVENTION

In one embodiment, the present invention is implemented as a system for executing instructions using a plurality of memory fragments for a processor. The system includes a global front end scheduler for receiving an incoming instruction sequence, wherein the global front end scheduler partitions the incoming instruction sequence into a plurality of code blocks of instructions and generates a plurality of inheritance vectors describing interdependencies between instructions of the code blocks. The system further includes a plurality of virtual cores of the processor coupled to receive code blocks allocated by the global front end scheduler, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines, wherein the code blocks are executed by using the partitionable engines in accordance with a virtual core mode and in accordance with the respective inheritance vectors. A plurality memory fragments are coupled to the partitionable engines for providing data storage.


Other embodiments of the present invention utilize a common scheduler, a common register file and a common memory subsystem to implement fragmented address spaces for multiple partitionable engines of processor. The partitionable engines can be used to implement a plurality of virtual cores. Fragmentation enables the scaling of microprocessor performance by allowing additional virtual cores to cooperatively execute instruction sequences. The fragmentation hierarchy can be the same across each cache hierarchy (e.g., L1 cache, L2 cache, and the common register file). The fragmentation hierarchy can divide the address space into fragments using address bits, where the address bits are used such that the fragments are above cache line boundaries and below page boundaries. Each fragment can be configured to utilize a multiport bank structure for storage.


The foregoing is a summary and thus contains, by necessity, simplifications, generalizations and omissions of detail; consequently, those skilled in the art will appreciate that the summary is illustrative only and is not intended to be in any way limiting. Other aspects, inventive features, and advantages of the present invention, as defined solely by the claims, will become apparent in the non-limiting detailed description set forth below.





BRIEF DESCRIPTION OF THE DRAWINGS

The present invention is illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which like reference numerals refer to similar elements.



FIG. 1A shows an overview of the manner in which the global front end generates code blocks and inheritance vectors to support the execution of code sequences on their respective partitionable engines.



FIG. 1B shows an overview diagram of partitionable engines and their components, including segmented scheduler and register files, global interconnects and a fragmented memory subsystem for a multicore processor in accordance with one embodiment of the present invention.



FIG. 2 shows a scheduler flow diagram in accordance with one embodiment of the present invention.



FIG. 3 shows a diagram of exemplary hardware circuits that shows a segmented register file storing operands and results with an interconnect in accordance with one embodiment of the present invention.



FIG. 4 shows a diagram depicting a global front end Fetch & scheduler in accordance with one embodiment of the present invention.



FIG. 5 shows an alternative implementation of the distribution of instructions across many virtual cores in accordance with one embodiment of the present invention.



FIG. 6 shows a plurality of register segments with a corresponding plurality of register files and operand & result buffers in accordance with one embodiment of the present invention.



FIG. 7 shows a more detailed diagram of a fragmented memory subsystem for a multicore processor in accordance with one embodiment of the present invention.



FIG. 8 shows a diagram depicting how bits of an address can be used by address generation to enumerate fragments in accordance with one embodiment of the present invention.



FIG. 9 shows a diagram of how loads and stores are handled by embodiments of the present invention.



FIG. 10 shows the manner which fragments can be split into two or more domains in accordance with one embodiment of the present invention.



FIG. 11 shows a mode of operation of the processor wherein Virtual cores are configured as physical cores that correspond to logical cores in executing applications in accordance with one embodiment of the present invention.



FIG. 12 shows a mode of operation of the processor wherein virtual cores are configured as soft cores that correspond to logical cores in executing applications in accordance with one embodiment of the present invention.



FIG. 13 shows a mode of operation of the processor wherein the virtual cores are configured as soft cores that correspond to a single logical core in executing applications in accordance with one embodiment of the present invention.



FIG. 14 shows an exemplary implementation of fragment segmentation being used to support logical core and virtual core functionality in accordance with one embodiment of the present invention.



FIG. 15 shows a fragment memory of an exemplary four fragment processor implementing a many physicals to many logicals mode in accordance with one embodiment of the present invention.



FIG. 16 shows a fragment memory of an exemplary four fragment processor implementing a many physicals to many logicals mode in accordance with an alternative embodiment of the present invention.



FIG. 17 shows a fragment memory of an exemplary four fragment processor implementing a many soft cores to many logical cores mode in accordance with one embodiment of the present invention.



FIG. 18 shows a fragment memory of an exemplary four fragment processor implementing a many soft cores to one logical core mode in accordance with one embodiment of the present invention.



FIG. 19 shows address calculation and execution units, operand/result buffers, threaded register files, and common partition schedulers of an exemplary four fragment processor implementing a physicals to logicals mode in accordance with one embodiment of the present invention.



FIG. 20 shows an alternative implementation for the address calculation and execution units, operand/result buffers, threaded register files, and common partition schedulers of an exemplary four fragment processor to implement a many physicals to many logicals mode in accordance with one embodiment of the present invention.



FIG. 21 shows address calculation and execution units, register files, and common partition schedulers of an exemplary four fragment processor implementing a many soft cores to many logicals mode in accordance with one embodiment of the present invention.



FIG. 22 shows address calculation and execution units, register files, and common partition schedulers of an exemplary four fragment processor implementing a many soft cores to one logical core mode in accordance with one embodiment of the present invention.



FIG. 23 shows a diagram of an exemplary microprocessor pipeline in accordance with one embodiment of the present invention.





DETAILED DESCRIPTION OF THE INVENTION

Although the present invention has been described in connection with one embodiment, the invention is not intended to be limited to the specific forms set forth herein. On the contrary, it is intended to cover such alternatives, modifications, and equivalents as can be reasonably included within the scope of the invention as defined by the appended claims.


In the following detailed description, numerous specific details such as specific method orders, structures, elements, and connections have been set forth. It is to be understood however that these and other specific details need not be utilized to practice embodiments of the present invention. In other circumstances, well-known structures, elements, or connections have been omitted, or have not been described in particular detail in order to avoid unnecessarily obscuring this description.


References within the specification to “one embodiment” or “an embodiment” are intended to indicate that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present invention. The appearance of the phrase “in one embodiment” in various places within the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Moreover, various features are described which may be exhibited by some embodiments and not by others. Similarly, various requirements are described which may be requirements for some embodiments but not other embodiments.


Some portions of the detailed descriptions, which follow, are presented in terms of procedures, steps, logic blocks, processing, and other symbolic representations of operations on data bits within a computer memory. These descriptions and representations are the means used by those skilled in the data processing arts to most effectively convey the substance of their work to others skilled in the art. A procedure, computer executed step, logic block, process, etc., is here, and generally, conceived to be a self-consistent sequence of steps or instructions leading to a desired result. The steps are those requiring physical manipulations of physical quantities. Usually, though not necessarily, these quantities take the form of electrical or magnetic signals of a computer readable storage medium and are capable of being stored, transferred, combined, compared, and otherwise manipulated in a computer system. It has proven convenient at times, principally for reasons of common usage, to refer to these signals as bits, values, elements, symbols, characters, terms, numbers, or the like.


It should be borne in mind, however, that all of these and similar terms are to be associated with the appropriate physical quantities and are merely convenient labels applied to these quantities. Unless specifically stated otherwise as apparent from the following discussions, it is appreciated that throughout the present invention, discussions utilizing terms such as “processing” or “accessing” or “writing” or “storing” or “replicating” or the like, refer to the action and processes of a computer system, or similar electronic computing device that manipulates and transforms data represented as physical (electronic) quantities within the computer system's registers and memories and other computer readable media into other data similarly represented as physical quantities within the computer system memories or registers or other such information storage, transmission or display devices.


Embodiments of the present invention utilize a common global front end scheduler, a plurality of segmented register files, and a memory subsystem to implement fragmented address spaces for multiple cores of a multicore processor. In one embodiment, fragmentation enables the scaling of microprocessor performance by allowing additional virtual cores (e.g., soft cores) to cooperatively execute instruction sequences comprising one or more threads. The fragmentation hierarchy is the same across each cache hierarchy (e.g., L1 cache, L2 cache, and the common register file). The fragmentation hierarchy divides the address space into fragments using address bits, where the address bits are used such that the fragments are identified by bits that are above cache line boundaries and below page boundaries. Each fragment is configured to utilize a multiport bank structure for storage. Embodiments of the present invention are further described in the FIGS. 1A and 1B below.



FIG. 1A shows an overview diagram of a processor in accordance with one embodiment of the present invention. As depicted in FIG. 1A, the processor includes a global front end fetch and scheduler 10 and a plurality of partitionable engines 11-14.



FIG. 1A shows an overview of the manner in which the global front end generates code blocks and inheritance vectors to support the execution of code sequences on their respective partitionable engines. Each of the code sequences 20-23 can belong to the same logical core/thread or to different logical cores/threads, depending upon the particular virtual core execution mode. The global front end fetch and scheduler will process the code sequences 20-23 to generate code blocks and inheritance vectors. These code blocks and inheritance vectors are allocated to the particular partitionable engines 11-14 as shown.


The partitionable engines implement virtual cores, in accordance with a selected mode. A partitionable engine includes a segment, a fragment and a number of execution units. The resources within the partitionable engines can be used to implement virtual cores that have multiple modes. As provisioned by the virtual core mode, one soft core, or many soft cores, can be implemented to support one logical core/thread. In the FIG. 1A embodiment, depending on the selected mode, the virtual cores can support one logical core/thread or four logical cores/threads. In an embodiment where the virtual cores support four logical cores/threads, the resources of each virtual core are spread across each of the partitionable engines. In an embodiment where the virtual cores support one logical core/thread, the resources of all the engines are dedicated to that core/thread. The engines are partitioned such that each engine provides a subset of the resources that comprise each virtual core. In other words, a virtual core will comprise a subset of the resources of each of the engines 11-14. Communication between the resources of each of the engines 11-14 is provided by a global interconnection structure 30 in order to facilitate this process. Alternatively, the engines 11-14 can be used to implement a physical mode where the resources of the engines 11-14 are dedicated to support the execution of a dedicated core/thread. In this manner, the soft cores implemented by the engines comprise virtual cores that have resources spread across each of the engines. The virtual core execution modes are further described in the following figures below.


It should be noted that in a conventional core implementation, only resources within one core/engine is solely allocated to one logical thread/core. In contrast, in embodiments of the present invention, the resources of any engine/core can be partitioned to, collectively with other engine/core partitions, instantiate a virtual core that is allocated to one logical thread/core. Additionally, embodiments of the present invention can implement multiple virtual execution modes in which those same engines can be partitioned to support many dedicated cores/threads, many dynamically allocated cores/threads, or an embodiment where all of the resources of all engines support the execution of a single core/thread. These embodiments are further described in the descriptions below.



FIG. 1B shows an overview diagram of partitionable engines and their components, including segmented scheduler and register files, global interconnects and a fragmented memory subsystem for a multicore processor in accordance with one embodiment of the present invention. As depicted in FIG. 1, four fragments 101-104 are shown. The fragmentation hierarchy is the same across each cache hierarchy (e.g., L1 cache, L2 cache, and the load store buffer). Data can be exchanged between each of the L1 caches, each of the L2 caches and each of the load store buffers via the memory global interconnect 110a.


The memory global interconnect comprises a routing matrix that allows a plurality of cores (e.g., the address calculation and execution units 121-124) to access data that may be stored at any point in the fragmented cache hierarchy (e.g., L1 cache, load store buffer and L2 cache). FIG. 1 also depicts the manner whereby each of the fragments 101-104 can be accessed by address calculation and execution units 121-124 via the memory global interconnect 110a.


The execution global interconnect 110b similarly comprises a routing matrix allows the plurality of cores (e.g., the address calculation and execution units 121-124) to access data that may be stored at any of the segmented register files. Thus, the cores have access to data stored in any of the fragments and data stored in any of the segments via the memory global interconnect 110a or the execution global interconnect 110b. Additionally, it should be noted that in one embodiment, another global interconnect exists between each of the common partition fetch and schedulers. This is shown by the horizontal arrows between and connecting each common partition fetch and scheduler.



FIG. 1B further shows a global front end Fetch & scheduler 150 which has a view of the entire machine and which manages the utilization of the register files segments and the fragmented memory subsystem. Address generation comprises the basis for fragment definition. The global front end Fetch & scheduler functions by allocating instruction sequences to each segment's partition scheduler. The common partition scheduler then dispatches those instruction sequences for execution on the address calculation and execution units 121-124.


It should be noted that in one embodiment, the functionality of the common partition fetch and schedulers can be incorporated into the global front end scheduler 150. In such an embodiment, the segments would not include respective common partition fetch and schedulers, and there would be no need for an interconnect between them.


Additionally, it should be noted that the partitionable engines shown in FIG. 1A can be nested in a hierarchal way. In such an embodiment, a first level partitionable engine would include a local front end fetch and scheduler and multiple secondary partitionable engines connected to it.



FIG. 2 shows a scheduler flow diagram in accordance with one embodiment of the present invention. As depicted in FIG. 2, a bucket buffer is shown that includes Speculative Thread Bucket-pointers, Bucket Sources and destinations lists. The scheduler and execution buckets include a bucket dispatch selector and the virtual register match and read, including the possibility of a register hierarchy and a register cache. The Back end is where executed buckets are logged and exception ordering is enforced before retirement. The register hierarchy/cache also serves as an intermediate storage for the executed bucket results until they are non-speculative and can update the architecture state. The following discloses one possible implementation of the front end, the dispatch stage and the backend where executed buckets are logged.



FIG. 2 shows the manner in which the concept scales from a bucket buffer managing a small number of closely coupled threads into hardware circuits that manage multiple bucket buffers and threads. Those circuits that can be expanded to process larger numbers of threads that might have less close interaction are described as a global front end (e.g., global front end scheduler 150 shown in FIG. 1).


The process starts by fetching a new thread Matrix/bucket/block, and then the new thread bucket is assigned into a vacant bucket slot in the bucket buffer. Each of the thread allocation pointers in the thread allocation pointer array 852 composes an interval of buckets such that the thread is allowed physically to place its blocks/buckets of instruction in it. Each of those threads keeps allocating buckets into the bucket buffer array inside its corresponding interval of contiguous space in round-robin fashion. The buckets/blocks inside each thread space get assigned a new number 852 that is incremented each time a new bucket/block gets assigned. For each valid source in the bucket 850. The valid sources for each bucket has a valid read bit “Rv” indicating that this source is needed for the instructions inside this bucket. By the same convention, each destination register that is to be written back by instructions in this bucket has a valid bit “Wv” set in the bucket and it has a field in a destination inheritance vector 853. When a new bucket is to be fetched into the bucket buffer it inherits the destination inheritance vector from the previous allocated bucket pointed at by the thread bucket allocation pointer 852. The inheritance vector is copied from the previously allocated bucket and then it overwrites those valid destination fields that correspond to the registers which will be updated by those bucket instructions. The valid destinations will be labeled with the current bucket number, while the invalid destinations are copied from the corresponding inheritance vector inside the bucket. Then the thread bucket pointer is updated for the new fetched bucket by incrementing its pointer (it wraps around within its interval).


In the bucket dispatch and execute stage, whenever a bucket is executed without any exception handling, then the bucket execution flag (containing the bucket number) 854 is set and broadcasted through out the bucket buffer and is latched/monitored within each bucket that has a source with that bucket number as a source. It is also possible to pass other related information along the bucket number, such as information about virtual registers locations. When all the execution flags of the sources buckets are set within a bucket, then that bucket ready bit 855 is set and the bucket is ready to be dispatched and executed. When the bucket executes without any exception and it is ready to update the architecture state in the sequential order of the program, then it retires the bucket and the retirement thread pointer 857 is incremented to the next bucket in the array. The retired bucket location can be assigned to a new bucket.


Those closely related threads can all coexist inside the Matrix/bucket/block buffer; each thread will occupy an interval of consecutive buckets that belongs to that thread. The allocation pointer of that thread moves inside this interval of buckets in a round robin fashion fetching new instruction buckets and allocating them inside the thread interval in the described round robin fashion. With such interval sectioning, the whole bucket buffer is divided dynamically with different or equal interval lengths of buckets.


The concept of inheritance vector is introduced here for the instruction bucket as well as for the thread. Each instruction matrix/block/bucket writes into particular registers among the architectural registers. Each new bucket upon allocation stage updates this inheritance vector writing the thread and bucket number of its own into this vector leaving the fields for the registers that it does not write into un-updated. This bucket inheritance vector B_iv 856 is forwarded from each bucket to the next one in program order. In FIG. 2 each matrix writes its own number into the architecture destination registers if the instructions in that matrix write into those registers, otherwise it inherits the value from the B_iv of the previous bucket in that thread.



FIG. 3 shows a diagram of exemplary hardware circuits that shows a segmented register file storing operands and results with an interconnect in accordance with one embodiment of the present invention. FIG. 3 shows an operand result buffer coupled via the execution global interconnect to a plurality of execution units.



FIG. 4 shows a diagram depicting a global front end scheduler in accordance with one embodiment of the present invention. The global front end scheduler is configured to process larger numbers of threads that might have less close interaction (e.g., global front in scheduler 150 shown in FIG. 1). This diagram shows how a sequence of instructions from one logical core is distributed across many virtual cores. This process would be repeated for each logical core present in the machine. It should be noted that the FIG. 4 “Engine” comprises the components of a virtual core, where the register file is explicitly depicted to show aspects of inter virtual core communication at the register file level.


For example, as depicted in FIG. 4, the global front end scheduler can process a thread header 902 but does not need to process the actual instructions within the thread to enforce the dependency check across those distant threads. The header of the thread and the sub headers of its buckets contain only information about the architecture registers that those threads and buckets write into (destination registers of those instructions) no need in those headers to include actual instructions or sources of those instructions. Actually it is enough to list those destination registers or a bit vector where each individual bit is set for each register that is a destination for an instruction. The header does not need to be physically placed as a header for the instructions; it can be any formatted packet or compact representation or of the destination registers of the instructions within the threads, which may or may not be stored with the rest of the instructions information.


This global front-end fetches only the headers of the threads/blocks in program order and generates dynamic thread and/or bucket inheritance vectors 901 (Tiv and/or Biv). Each time a new thread is allocated, those inheritance vectors are forwarded by keeping the old fields that the current thread bucket will not write to or update as shown by 903. Those inheritance vectors are distributed to a large number of engines/cores or processors 904 each of which might include a local front-end and a fetch unit (which will fetch and store the actual instructions produced by the dependency vector for each bucket) and a local matrix/block/bucket buffer with local register files 905. The local front-ends then fetch the actual instructions and use the information from the inheritance vectors obtained from the global front end to fill the dependency information for the instruction sources of the instructions that are brought into those engines for execution. FIG. 4 illustrates a global front-end implementation and the way it disseminates the inheritance vectors to the different engines 904 using only concise information about the instructions (e.g., which is just the registers that those instructions write into). Other information that is of help to place in the header is that information about a change in the control path within or across the threads. A global branch predictor can be used to predict the flow of control across those threads so such headers can include the branching destinations and offsets. In addition to the branch predictor to determine control flow, the hardware/compiler can decide to dispatch independent threads across the 2 control paths of a branch. In such case it will later merge the execution of those 2 paths using the inheritance vector. FIG. 4 also shows the forwarding process when a header of a new thread is fetched by the global front end. Thread 2 (906) for example will update the corresponding inheritance vector 901 that is forwarded to it resulting in vector 910 where registers 1, 2, 3, 4, 6, 0 and 7 are updated with T2 labels. Note that in 910 register 5 was not written by T2 buckets and thus its label was inherited from a previous inheritance vector.


One interesting observation is that the register files allow cross communication among the cores/engines. An early request (to reduce the access latency) of the registers that are needed from cross engines can be placed as soon as the instruction buckets of the thread are fetched and allocated in the local bucket buffer. At that time the source dependency information is populated such that cross engine threads references can be issued at that time probably long before the actual instructions are dispatched for execution. In any case, the instruction will not be dispatched until the cross referenced source is forwarded and arrived. This cross referenced source can be stored in the local multi-threaded register file or register cache. Although this cross referenced source can be stored in a buffer similar to the load store buffer (it can reuse the load store buffer physical storage and dependency check mechanisms but as a register load instead of memory load). Many topologies can be used to connect the register files across the engines/cores, which may be a ring topology or cross bar topology or mesh routed interconnect.


The following discussion can illustrate how register file segmentation can be used inside an engine and also across engines. When the bucket is dispatched, its sources are sent to both (simultaneously or sequentially) to the register file and the register cache. If the register file is physically unified and has direct support for threading, then the operand is read directly from the corresponding thread register section. If the register file is a virtual register, including a physically segmented register file that uses tags, then a tag match has to be done as a part of the virtual register read. If the tag matches, then the read happens from the segmented register file.


Disclosed is register architecture that supports software threads, hardware generated threads, VLIW execution, SIMD & MIMD execution as well as emulation of out-of-order super-scalar execution. Although it is physically segmented, it looks as a unified architecture resource. This segmented register is part of the virtual register file which might include a register hierarchy and a register cache as well as mechanisms to store and check register tags. The tag access can be eliminated if we use a location based scheme that takes advantage of the dependency inheritance vector. The scheme works such that when the executed bucket number is broadcasted during dispatch stage all the sources of subsequent instructions perform a CAM (content addressable match) which compares their sources buckets with the just dispatched/executed bucket to set the ready flag for that source. Here the physical location of where that bucket executed can also be propagated along with the register number so that any ambiguity is resolved.


For example, consider an implementation where there are 4 register file segments, each containing 16 registers. For example upon dispatching a bucket # x to section 2 the bucket number x is broadcasted to the bucket buffer and also the segment #2 is broadcasted with it such that all sources that have a dependency on bucket x will record that it wrote all its registers in segment 2. When time comes to dispatch those instructions, they know that they need to read their register from segment 2 and not any other segment, even though the same register number exists in the other segments. This also applies to the register cache to avoid using tags. We can extend this concept to the global front end where in addition to the thread information the inheritance vector can specify in which engine the instruction bucket writing to this register was allocated.



FIG. 5 shows an alternative implementation of the distribution of instructions across many virtual cores in accordance with one embodiment of the present invention. FIG. 5 shows a runtime optimizer scheduler 550 functioning by distributing inheritance vectors encode segments to the virtual cores. In one embodiment, the optimizer looks at a number of code blocks of instructions and reschedules instructions across all of the code blocks to create code segments and inheritance vectors. The goal of the Optimizer would be to maximize the efficiency of execution of the overlapped execution of code segments on their respective virtual cores.



FIG. 6 shows a plurality of register segments with a corresponding plurality of register files and operand result buffers in accordance with one embodiment of the present invention. As depicted in FIG. 6, an execution global interconnect connects each register segment to a plurality of address calculation and execution units.


The register segments FIG. 6 can be used to implement one of 3 execution modes: either by being grouped together by the compiler/programmer to form a MIMD super instruction matrix, or each matrix can be executed independently in a threaded mode where separate threads execute simultaneously on each of the 4 hardware sections. The last execution mode possible is the ability to dynamically execute 4 different instruction matrices from a single thread using a hardware dependency check to insure no dependency exists between those different matrices that execute simultaneously on the 4 different hardware sections


The register files in FIG. 6 may be alternately configured depending upon the execution mode. In one mode, the register files are viewed as either an MIMD sectioned register file serving a MIMD width of 4 sections or they serve as 4 individual register files, each serving a separate thread. The register files can also support a dynamic execution mode where the 4 sections are one unified register file where data written to any register in a particular section is accessible by all units in the other sections. Switching between those modes can be seamless as different executing modes can alternate between individual thread baseline instruction matrix and MIMD super instruction matrix threads.


In a multithread execution mode, each register file and its execution unit that executes a thread is totally independent of other register files and their threads. This is similar to each thread having its own register state. However, dependency between those threads can be specified. Each matrix that belongs to a thread will execute in the execution unit of that thread's register file. If only one thread or non-threaded single program is executed on the hardware, then the following method is used to allow parallel matrices belonging to that single thread/program to be able to access the results written into the registers in the other sections. The way this is done is by allowing any matrix writing results into any one of the 4 register file sections to generate copies of those registers in the other register file sections. Physically this is done by extending the write ports of each section into the remaining sections. However, this is not scalable, as we cannot build an efficient register file with each memory cell having as many as 4 times the write ports as needed for one section alone. We present a mechanism where the register file is built such that it will not be impacted with such single thread register-broadcast extension.


It should be noted that additional aspects regarding register segments as used in embodiments of the present invention can be found in U.S. Pat. No. 8,677,105.



FIG. 7 shows a more detailed diagram of a fragmented memory subsystem for a multicore processor in accordance with one embodiment of the present invention. FIG. 7 shows a comprehensive scheme and implementation of the synchronization scheme among threads and/or among loads and stores in general. The scheme describes a preferred method for synchronization and disambiguation of memory references across load/store architectures and/or across memory references and/or threads' memory accesses. In FIG. 7, we show multiple segments of register files (address and or data registers), execution units, address calculation units, and fragments of level 1 caches and/or load store buffers and level 2 caches and address register interconnects 1200 and address calculation unit interconnects 1201. Those fragmented elements could be constructed within one core/processor by fragmenting and distributing its centralized resources into several engines or they can be constructed from elements of different cores/processors in multi-core/multi-processor configurations. One of those fragments 1211 is shown in the figure as fragment number 1; the fragments can be scaled to a large number (in general to N fragments as shown in the figure).


This mechanism also serves also as a coherency scheme for the memory architecture among those engines/cores/processors. This scheme starts by an address request from one of the address calculation units in one fragment/core/processor. For example, assume the address is requested by fragment 1 (1211). It can obtain and calculate its address using address registers that belong to its own fragment and or from registers across other fragments using the address interconnect bus 1200. After calculating the address it creates the reference address of either 32-bit address or 64-bit address that is used to access caches and memory. This address is usually fragmented into a tag field and a set and line fields. This particular fragment/engine/core will store the address into its load store buffer and/or L1 and/or L2 address arrays 1202, at the same time it will create a compressed version of the tag (with smaller number of bits than the original tag field of the address) by using a compression technique.


More the different fragments/engines/cores/processors will use the set field or a subset of the set field as an index to identify which fragment/core/processor the address is maintained in. This indexing of the fragments by the address set field bits ensures exclusiveness of ownership of the address in a particular fragment/core/engine even though the memory data that corresponds to that address can live in another or multiple other fragments/engines/cores/processors. Even though the address CAM/tag arrays 1202/1206 are shown in each fragment to be coupled with the data arrays 1207, they might be only coupled in physical proximity of placement and layout or even by the fact that both belong to a particular engine/core/processor, but there is no relation between addresses kept in the address arrays and the data in the data arrays inside one fragment.



FIG. 8 shows a diagram depicting how bits of an address can be used by address generation to enumerate fragments in accordance with one embodiment of the present invention. In the present embodiment, fragments are defined by the address bits that are above page boundaries and that are below cache line boundaries, as depicted in FIG. 8. The present invention advantageously stays above the page boundaries to avoid causing TLB misses during the translation from the virtual addresses to physical addresses. The process stays below the cache line boundary in order to have complete cache lines in order to correctly fit within the hardware cache hierarchy. For example, in a system that employs 64 byte cache lines, the fragment boundary would avoid the last six address bits. In comparison, a system that employs 32 byte cache lines, the fragment boundary would avoid the last five bits. Once defined, the fragment hierarchy is the same across all cache hierarchies of the processor.



FIG. 9 shows a diagram of how loads and stores are handled by embodiments of the present invention. As depicted in FIG. 9, each fragment is associated with its load store buffer and store retirement buffer. For any given fragment, loads and stores that designate an address range associated with that fragment or another fragment are sent to that fragment's load store buffer for processing. It should be noted that they may arrive out of order as the cores execute instructions out of order. Within each core, the core has access to not only its own register file but each of the other cores' register files.


Embodiments of the present invention implement a distributed load store ordering system. The system is distributed across multiple fragments. Within a fragment, local data dependency checking is performed by that fragment. This is because the fragment only loads and stores within the store retirement buffer of that particular fragment. This limits the need of having to look to other fragments to maintain data coherency. In this manner, data dependencies within a fragment are locally enforced.


With respect to data consistency, the store dispatch gate enforces store retirement in accordance with strict in-program order memory consistency rules. Stores arrive out of order at the load store buffers. Loads arrive out of order also at the load store buffers. Concurrently, the out of order loads and stores are forwarded to the store retirement buffers for processing. It should be noted that although stores are retired in order within a given fragment, as they go to the store dispatch gate they can be out of order from the multiple fragments. The store dispatch gate enforces a policy that ensures that even though stores may reside across store retirement buffers out of order, and even though the buffers may forward stores to the store dispatch gate out of order with respect to other buffers' stores, the dispatch gate ensures that they are forwarded to fragment memory strictly in order. This is because the store dispatch gate has a global view of stores retiring, and only allows stores to leave to the global visible side of the memory in order across all the fragments, e.g., globally. In this manner, the store dispatch gate functions as a global observer to make sure that stores ultimately return to memory in order, across all fragments.



FIG. 10 shows the manner which fragments can be split into two or more domains in accordance with one embodiment of the present invention. FIG. 10 shows the manner in which a single fragment can be split into multiple domains. Domain splitting can be implemented via the address generation process. Domain splitting changes the manner in which load store checks have to be done within a fragment, since in this case they only have to be done per domain, as opposed to across the entire fragment. Domain splitting also is advantageous in that it can enable single ported memory to behave like multiport memory, where the single port is accessed per different domain.



FIG. 11 shows a mode of operation of the processor wherein the hardware resources of the partitionable engines are used to function like logical cores in executing applications in accordance with one embodiment of the present invention. In this embodiment, the hardware resources of the engines of the virtual cores are configured as physical cores. In the FIG. 11 mode, each physical core of is configured to function as a logical core. Multithreaded applications and multithreaded functionality is up to the threaded programmability of the software of the application.



FIG. 12 shows a mode of operation of the processor wherein soft cores are used to function like logical cores in executing applications in accordance with one embodiment of the present invention. In this embodiment, the partitionable engines of virtual cores will support a plurality of soft cores. In the FIG. 12 mode, each soft core is configured to function as a logical core. Multithreaded applications and multithreaded functionality is up to the threaded programmability of the software of the application.



FIG. 13 shows a mode of operation of the processor wherein the soft cores are used to function like a single logical core in executing applications in accordance with one embodiment of the present invention. In the FIG. 13 mode, each soft core is configured to function as a single logical core. In such an implementation, a single threaded application has its instruction sequence divided up and allocated among the virtual cores where they are cooperatively executed to achieve high single threaded performance. In this manner, single threaded performance can scale with the addition of additional soft cores.


A number of strategies can be used in choosing the operating mode of the processor. For a processor having a large number of engines (e.g., 8 engines, 12 engines, etc.), a number of soft cores can be configured to function as a single logical core, while the remaining cores can operate in the other modes. This attribute allows an intelligent partition of resources to ensure maximum utilization of the hardware and/or minimal wasted power consumption. For example, in one embodiment, cores (e.g., soft or logical) can be allocated on a per thread basis depending upon the type of application that is executing.



FIG. 14 shows an exemplary implementation of fragment segmentation being used to support logical core and virtual core functionality in accordance with one embodiment of the present invention. As described above, the fragment segmentation allows the processor to be configured to support different virtual core execution modes, as described above.


The global interconnect allows cores' threads to access any of the ports 1401. It should be noted that the term “thread” as used herein refers to either a representation of instruction sequences from different logical cores, instruction sequences from the same logical core, or some mixture of the two.


The manner in which the threads utilize one of the ports 1401 to access the load store buffer is adjustable in accordance with the policies of the arbiters, as shown. Thus, a thread using any one of the ports 1401 can have a greater or lesser amount of access to the load store buffer via the ports 1402. The size of the allocation and the manner in which the allocation is managed is controlled by the arbiter. The arbiter can dynamically allocate access to the ports in accordance with demands of a particular thread.


The load store buffer is configured to have a plurality of entries spread across the ports. Access to the load store buffer is controlled by the arbiter. In this way, the arbiter can dynamically allocate entries in the load store buffer to the different threads.



FIG. 14 also shows arbiters on the ports between load store buffer and the L1 cache. Thus, as with the load store buffer described above, a thread using any one of the ports 1403 can have a greater or lesser amount of access to the L1 cache via the ports 1404. The size of the allocation and the manner in which the allocation is managed is controlled by the arbiter. The arbiter can dynamically allocate access to the ports in accordance with demands of a particular thread.


The L1 cache is configured to have a plurality of ways spread across the ports. Access to the L1 cache is controlled by the arbiter. In this way, the arbiter can dynamically allocate entries in the L1 cache to the different threads.


In one embodiment, the arbiters are configured to function with a plurality of counters 1460 that are used for tracking functionality and a plurality of threshold limit registers 1450 that provide a limiting function. The limiting function specifies the maximum resource allocation percentage for a given thread. The tracking function tracks the actual resources allocated to a given thread at any given time. These tracking and limiting functionalities affect the allocation of the number of per thread entries, ways, or ports for the load store buffer, L1 cache, L2 cache or the global interconnects. For example, the total number of entries in the load store buffer allocated for each thread can be dynamically checked against a variable threshold. This variable threshold can be updated in accordance with a given thread's forward progress. For example, in one embodiment, threads that are slowed down (e.g., large number or L2 misses, etc.) are quantified as making slow forward progress, and thus, their respective resource allocation thresholds are lowered, including the entries thresholds, the ways thresholds and the ports thresholds.



FIG. 14 also shows a shared L2 cache. In the present embodiment, the shared L2 cache has a fixed port arrangement without any arbitration between accesses coming from the L1 cache. Threads executing on the processor would all share access to the L2 cache and the resources of the L2 cache.



FIG. 15 shows a fragment memory of an exemplary four fragment processor implementing a many physicals to many logicals mode in accordance with one embodiment of the present invention.


One example logical core and its relationship with the resources of the processor is shown by the shading on FIG. 15. In the FIG. 11 mode of operation, the many physical cores to many logical cores mode, wherein the physical cores are used to function like logical cores in executing applications, each logical core will be configured to have a fixed ratio of the resources of the load store buffer and the L1 cache. The ports can be specifically assigned to each thread or core. Entries in the load store buffer can be specifically reserved per thread or core. Ways within the L1 cache can be specifically reserved per thread or core. Multithreaded applications and multithreaded functionality is up to the threaded programmability of the software of the application. This is shown by the one logical core having an allocated port and an allocated portion of the store buffer and the L1 cache of each of the fragments. In this manner, the logical core comprises a fixed allocated slice of the resources of each fragment.


In one embodiment, in the many physical cores to many logical cores mode, the four fragments can be partitioned in accordance with the number of ports (e.g., ports 1401) that access each fragment. For example, in an embodiment where there are 6 ports per fragment, the resources of each fragment, and hence the resources of each partition will engine, can be divided in such a way to support 6 physical cores across the 4 fragments and the 4 partition double engines. Each partition can be allocated its own port. Similarly, the resources of the load store buffer and the L1 cache would be allocated in such a way to support 6 physical cores. For example, in an embodiment where the load store buffer has 48 entries, the 48 entries can be allocated such that there are 12 entries per physical core to support a mode where 4 physical cores are implemented, or they can be allocated such that there are 8 entries per physical core where 6 physical cores are implemented.



FIG. 16 shows a fragment memory of an exemplary four fragment processor implementing a many physicals to many logicals mode in accordance with an alternative embodiment of the present invention.


As with FIG. 15, the one example logical core and its relationship with the resources of the processor is shown by the shading on FIG. 16. In the FIG. 11 mode of operation, the many physical cores to many logical cores mode, an entire partition table engine is dedicated to support the execution of a single logical core. This is shown by the shading in FIG. 16. The physical resource is engine is used to function like logical core in executing applications.



FIG. 17 shows a fragment memory of an exemplary four fragment processor implementing a many soft cores to many logical cores mode in accordance with one embodiment of the present invention.


One example logical core and its relationship with the resources of the processor is shown by the shading on FIG. 17. In the FIG. 12 mode of operation, the many soft cores to many logicals mode, where virtual cores are used to function like logical cores in executing applications, the size of the allocation of the resources of the load store buffer and the manner in which the allocation is managed is controlled by the arbiter. The arbiter can dynamically allocate access to the ports in accordance with demands of a particular thread or core. Similarly, the size of the allocation of the resources of the L1 cache and the manner in which the allocation is managed is controlled by the arbiter. The arbiter can dynamically allocate access to the ports in accordance with demands of a particular thread or core. Thus, at any given instance the logical thread/core (e.g. shaded) can use different arbiters and different ports.


In this manner, access to the resources of the load store buffer and access to the resources of the L1 cache can be more policy driven and can be more based on the needs of individuals threads or cores making forward progress. This is shown by the one logical core having a dynamically allocated port and a dynamically allocated portion of the store buffer and the L1 cache of each of the fragments. In this manner, the logical core comprises a non-fixed, dynamically allocated slice of the resources of each fragment.



FIG. 18 shows a fragment memory of an exemplary four fragment processor implementing a many soft cores to one logical core mode in accordance with one embodiment of the present invention.


In the FIG. 13 mode of operation, the many soft cores to one logical core mode, wherein the soft cores are used to function like a single logical core in executing applications, each of the soft cores is configured to function cooperatively with the other soft cores as a single logical core. A single thread or core has all the resources of the load store buffers and all of the resources of the L1 caches. In such an implementation, a single threaded application has its instruction sequence divided up and allocated among the soft cores where they are cooperatively executed to achieve high single threaded performance. In this manner, single threaded performance can scale with the addition of additional soft cores. This is shown in FIG. 18 where the one example logical core and its relationship with the resources of the processor is shown by the shading of all of the resources of the processor.



FIG. 19 shows address calculation and execution units, operand/result buffers, threaded register files, and common partition schedulers of an exemplary four fragment processor implementing a many physicals to many logicals mode in accordance with one embodiment of the present invention.


One example logical core and its relationship with the resources of the processor is shown by the shading on FIG. 19. In the FIG. 11 mode of operation, the many physical cores to many logical cores mode, wherein the physical cores are used to function like logical cores in executing applications, each logical core will be configured to have a fixed ratio of the resources of the address calculation units, operand/result buffers, threaded register files, and common partition scheduler. Multithreaded applications and multithreaded functionality is up to the threaded programmability of the software of the application. This is shown by the one logical core having an allocated address calculation and execution unit, an allocated threaded register file and an allocated common partition scheduler. In this manner, the logical core comprises a fixed allocated segment. However, in one embodiment, in this mode of operation, the address calculation and execution units can still be shared (e.g., meaning each of the address calculation and execution units would be un-shaded).



FIG. 20 shows an alternative implementation for the address calculation and execution units, operand/result buffers, threaded register files, and common partition schedulers of an exemplary four fragment processor to implement a many physicals to many logicals mode in accordance with one embodiment of the present invention.


One example logical core and its relationship with the resources of the processor is shown by the shading on FIG. 20. In the FIG. 20 embodiment however, the resources of a physical core is spread across each of the segments and each of the partitionable engines. This is shown by the one logical core having an allocated portion of the address calculation and execution units, an allocated portion of the threaded register files and an allocated portion of common partition schedulers across each of the segments. Additionally, FIG. 20 shows how the one logical core would have been allocated portion of the resources of each of the address calculation execution units. In this manner, the logical core comprises a fixed allocated portion of each of the segments.



FIG. 21 shows address calculation and execution units, register files, and common partition schedulers of an exemplary four fragment processor implementing a many soft cores to many logical cores mode in accordance with one embodiment of the present invention.


One example logical core and its relationship with the resources of the processor is shown by the shading on FIG. 21. In the FIG. 12 mode of operation, the many soft cores to many logical cores mode, wherein the soft cores are used to function like logical cores in executing applications, each logical core will be configured to have a shared access to any one the address calculation units, and a dynamic allocated portion of the operand/result buffers, threaded register files, and common partition scheduler. Multithreaded applications and multithreaded functionality is up to the threaded programmability of the software of the application.



FIG. 22 shows address calculation and execution units, register files, and common partition schedulers of an exemplary four fragment processor implementing a many soft cores to one logical core mode in accordance with one embodiment of the present invention.


One example logical core and its relationship with the resources of the processor is shown by the shading on FIG. 22. In the FIG. 13 mode of operation, the many soft cores to one logical core mode, wherein the soft cores are used to function like a single logical core in executing applications, each logical core will be configured to have a shared access to all of the address calculation units, and all of the operand/result buffers, threaded register files, and common partition schedulers. In such an implementation, a single threaded application has its instruction sequence divided up and allocated among the virtual cores where they are cooperatively executed to achieve high single threaded performance. In this manner, single threaded performance can scale with the addition of additional soft cores.



FIG. 23 shows a diagram of an exemplary microprocessor pipeline 2300 in accordance with one embodiment of the present invention. The microprocessor pipeline 2300 includes a fetch module 2301 that implements the functionality of the process for identifying and extracting the instructions comprising an execution, as described above. In the FIG. 23 embodiment, the fetch module is followed by a decode module 2302, an allocation module 2303, a dispatch module 2304, an execution module 2305 and a retirement modules 2306. It should be noted that the microprocessor pipeline 2300 is just one example of the pipeline that implements the functionality of embodiments of the present invention described above. One skilled in the art would recognize that other microprocessor pipelines can be implemented that include the functionality of the decode module described above.


The foregoing description, for the purpose of explanation, has been described with reference to specific embodiments. However, the illustrated discussions above are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. Embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the invention and various embodiments with various modifications as may be suited to the particular use contemplated.

Claims
  • 1. A system for executing instructions, the system comprising: a plurality of virtual cores of a processor coupled to receive instruction threads allocated by a scheduler, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines, wherein resources of each partitionable engine are operable to be partitioned to instantiate a virtual core with partitioned resources of other partitionable engines, wherein communication between resources of each of said plurality of partitionable engines is supported by a global interconnection structure, wherein the instruction threads are executed using the partitionable engines in accordance with a virtual core execution mode; anda plurality of register file segments coupled to the partitionable engines for providing data storage, wherein the global interconnection structure links each of the plurality of register file segments to each of the plurality of partitionable engines wherein each register file segment further comprises: a respective common partition scheduler, a respective operand and result buffer and a respective threaded register file.
  • 2. The system of claim 1, wherein the plurality of register file segments implements an execution mode wherein a subset of physical resources of each register file segment are allocated to support execution of a single logical thread of a logical core, and wherein each logical core is configurable to have a fixed ratio of resources from a common partition scheduler, an operand and result buffer, and a threaded register file of an associated register file segment.
  • 3. The system of claim 2, wherein each logical core is further configurable to have a fixed ratio of resources from address calculation and execution units of an associated partitionable engine.
  • 4. The system of claim 2, wherein each logical core is further configurable to share resources from address calculation and execution units of each of the plurality of partitionable engines.
  • 5. The system of claim 2, wherein each register file segment implements a portion of a plurality of logical cores, and wherein each logical core is further configurable to have an allocated portion of resources from a common partition scheduler, an operand and result buffer, and a threaded register file of each of the plurality of register file segments.
  • 6. The system of claim 5, wherein each logical core is further configurable to have an allocated portion of resources from an address calculation and execution unit of each of the plurality of partitionable engines.
  • 7. The system of claim 2, wherein the plurality of register file segments implement an execution mode wherein the subset of physical resources of each register file segment are allocated to support execution of a single logical thread, and wherein the single logical thread will be configured to have a fixed allocated portion of resources from a common partition scheduler, an operand and result buffer, and a threaded register file of each of the plurality of register file segments.
  • 8. The system of claim 7, wherein the single logical thread is configurable to have a fixed allocated portion of resources from an address calculation and execution units of each of the plurality of partitionable engines.
  • 9. The system of claim 1, wherein the plurality of register file segments implement an execution mode wherein physical resources of each register file segment are dynamically allocated in accordance with an adjustable threshold to support execution of a single logical thread of a single logical core and wherein each logical core is configurable to have a shared access of resources from a common partition scheduler, an operand and result buffer, and a threaded register file of each of the plurality of register file segments.
  • 10. The system of claim 9, wherein each logical core is further configurable to share resources from address calculation and execution units of each of the plurality of partitionable engines.
  • 11. A processor for executing instructions, the processor comprising: a plurality of virtual cores coupled to receive instruction sequences allocated by a scheduler, wherein each virtual core comprises a respective subset of resources of a plurality of partitionable engines, wherein resources of each partitionable engine are operable to be partitioned to instantiate a virtual core with partitioned resources of other partitionable engines, wherein communication between resources of each of said plurality of partitionable engines is supported by a global interconnection structure, wherein the instruction sequences are executed using the partitionable engines in accordance with a virtual core execution mode; anda plurality of register file segments coupled to the partitionable engines for providing data storage, wherein each register file segment further comprises: a respective common partition scheduler, a respective operand and result buffer and a respective threaded register file.
  • 12. The processor of claim 11, wherein the plurality of register file segments implements an execution mode wherein a subset of physical resources of each register file segment are allocated to support execution of a single logical thread of a logical core, and wherein each logical core is configurable to have a fixed ratio of resources from a common partition scheduler, an operand and result buffer, and a threaded register file of an associated register file segment.
  • 13. The processor of claim 12, wherein each logical core is further configurable to have a fixed ratio of resources from address calculation and execution units of an associated partitionable engine.
  • 14. The processor of claim 12, wherein each logical core is further configurable to share resources from address calculation and execution units of each of the plurality of partitionable engines.
  • 15. The processor of claim 12, wherein the plurality of register file segments implement an execution mode wherein the set of physical resources of each register file segment are allocated to support execution of a single logical thread, and wherein the single logical thread is configurable to have a fixed allocated portion of resources from a common partition scheduler, an operand and result buffer, and a threaded register file of each of the plurality of register file segments.
  • 16. The processor of claim 15, wherein the single logical thread is configurable to have a fixed allocated portion of resources from an address calculation and execution units of each of the plurality of partitionable engines.
  • 17. The processor of claim 11, wherein each register file segment implements a portion of a plurality of logical cores, and wherein each logical core is configurable to have an allocated portion of resources from a common partition scheduler, an operand and result buffer, and a threaded register file of each of the plurality of register file segments.
  • 18. The processor of claim 17, wherein each logical core is further configurable to have an allocated portion of resources from an address calculation and execution unit of each of the plurality of partitionable engines.
  • 19. The processor of claim 11, wherein the plurality of register file segments implement an execution mode wherein physical resources of each register file segment are dynamically allocated in accordance with an adjustable threshold to support execution of a single logical thread of a single logical core and wherein each logical core is configurable to have a shared access of resources from a common partition scheduler, an operand and result buffer, and a threaded register file of each of the plurality of register file segments.
CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No. 13/428,440 filed Mar. 23, 2012 (now U.S. Pat. No. 9,766,893 issued Sep. 19, 2017), which claims the benefit of U.S. Provisional Application No. 61/467,944 filed Mar. 25, 2011, which are hereby incorporated by reference. This application is related to U.S. application Ser. No. 12/296,919 which is the national stage of International Application No. PCT/US2007/066536 filed Apr. 12, 2007 (now U.S. Pat. No. 8,327,115 issued Dec. 4, 2012), which is hereby incorporated by reference. This application is related to U.S. application Ser. No. 12/514,303 which is the national stage of International Application No. PCT/US2007/084710 filed Nov. 14, 2007 (now U.S. Pat. No. 8,677,105 issued Mar. 18, 2014), which is hereby incorporated by reference.

US Referenced Citations (455)
Number Name Date Kind
727487 Swan May 1903 A
4075704 O'Leary Feb 1978 A
4245344 Richter Jan 1981 A
4356550 Katzman et al. Oct 1982 A
4414624 Summer, Jr. et al. Nov 1983 A
4524415 Mills, Jr. et al. Jun 1985 A
4527237 Frieder et al. Jul 1985 A
4577273 Hopper et al. Mar 1986 A
4597061 Cline et al. Jun 1986 A
4600986 Scheuneman et al. Jul 1986 A
4633434 Scheuneman Dec 1986 A
4682281 Woffinden et al. Jul 1987 A
4727487 Masui et al. Feb 1988 A
4816991 Watanabe et al. Mar 1989 A
4835680 Hogg et al. May 1989 A
4920477 Colwell et al. Apr 1990 A
4930066 Yokota May 1990 A
4943909 Huang Jul 1990 A
5197130 Chen et al. Mar 1993 A
5294897 Notani et al. Mar 1994 A
5317705 Gannon et al. May 1994 A
5317754 Blandy et al. May 1994 A
5339398 Shah et al. Aug 1994 A
5404470 Miyake Apr 1995 A
5469376 Abdallah Nov 1995 A
5471593 Branigin Nov 1995 A
5509130 Trauben et al. Apr 1996 A
5517651 Huck et al. May 1996 A
5524090 Iwamura Jun 1996 A
5548742 Wang et al. Aug 1996 A
5559986 Alpert et al. Sep 1996 A
5574878 Onodera et al. Nov 1996 A
5581725 Nakayama Dec 1996 A
5590084 Miyano et al. Dec 1996 A
5634068 Nishtala et al. May 1997 A
5649136 Shen et al. Jul 1997 A
5675759 Shebanow et al. Oct 1997 A
5699537 Sharangpani et al. Dec 1997 A
5710902 Sheaffer et al. Jan 1998 A
5724565 Dubey et al. Mar 1998 A
5751982 Morley May 1998 A
5752260 Liu May 1998 A
5754818 Mohamed May 1998 A
5761476 Martell Jun 1998 A
5787494 Delano et al. Jul 1998 A
5793941 Pencis et al. Aug 1998 A
5802602 Rahman et al. Sep 1998 A
5806085 Berliner Sep 1998 A
5813031 Chou et al. Sep 1998 A
5819088 Reinders Oct 1998 A
5829028 Lynch et al. Oct 1998 A
5835951 McMahan Nov 1998 A
5852738 Bealkowski et al. Dec 1998 A
5860146 Vishin et al. Jan 1999 A
5864657 Stiffler Jan 1999 A
5872985 Kimura Feb 1999 A
5881277 Bondi et al. Mar 1999 A
5901294 Tran et al. May 1999 A
5903750 Yeh et al. May 1999 A
5905509 Jones et al. May 1999 A
5911057 Shiell Jun 1999 A
5918251 Yamada et al. Jun 1999 A
5956753 Glew et al. Sep 1999 A
5974506 Sicola et al. Oct 1999 A
5978906 Tran Nov 1999 A
5983327 Achilles et al. Nov 1999 A
6016533 Tran Jan 2000 A
6016540 Zaidi et al. Jan 2000 A
6021484 Park Feb 2000 A
6065105 Zaidi et al. May 2000 A
6073230 Pickett et al. Jun 2000 A
6075938 Bugnion et al. Jun 2000 A
6085315 Fleck et al. Jul 2000 A
6088780 Yamada et al. Jul 2000 A
6092172 Nishimoto et al. Jul 2000 A
6101577 Tran Aug 2000 A
6108769 Chinnakonda et al. Aug 2000 A
6115809 Mattson, Jr. et al. Sep 2000 A
6134634 Marshall, Jr. et al. Oct 2000 A
6138226 Yoshioka et al. Oct 2000 A
6157998 Rupley, II et al. Dec 2000 A
6167490 Levy et al. Dec 2000 A
6170051 Dowling Jan 2001 B1
6178482 Sollars Jan 2001 B1
6185660 Mulla et al. Feb 2001 B1
6205545 Shah et al. Mar 2001 B1
6209085 Hammond et al. Mar 2001 B1
6212542 Kahle et al. Apr 2001 B1
6212613 Belair Apr 2001 B1
6216215 Palanca et al. Apr 2001 B1
6226732 Pei et al. May 2001 B1
6247097 Sinharoy Jun 2001 B1
6253316 Tran et al. Jun 2001 B1
6256727 McDonald Jul 2001 B1
6256728 Witt et al. Jul 2001 B1
6260131 Kikuta et al. Jul 2001 B1
6260138 Harris Jul 2001 B1
6269439 Hanaki Jul 2001 B1
6272616 Fernando et al. Aug 2001 B1
6272662 Jadav et al. Aug 2001 B1
6275917 Okada Aug 2001 B1
6282583 Pincus et al. Aug 2001 B1
6282638 Dowling Aug 2001 B1
6308323 Douniwa Oct 2001 B1
6321298 Hubis Nov 2001 B1
6327650 Bapst et al. Dec 2001 B1
6332189 Baweja et al. Dec 2001 B1
6339822 Miller Jan 2002 B1
6341324 Caulk, Jr. et al. Jan 2002 B1
6345357 Sato Feb 2002 B1
6360311 Zandveld et al. Mar 2002 B1
6408367 Achilles et al. Jun 2002 B2
6437789 Tidwell et al. Aug 2002 B1
6449671 Patkar et al. Sep 2002 B1
6457120 Sinharoy Sep 2002 B1
6473833 Arimilli et al. Oct 2002 B1
6490673 Heishi et al. Dec 2002 B1
6502187 Miyagawa Dec 2002 B1
6529928 Resnick et al. Mar 2003 B1
6542984 Keller et al. Apr 2003 B1
6557083 Sperber et al. Apr 2003 B1
6557095 Henstrom Apr 2003 B1
6594755 Nuechterlein et al. Jul 2003 B1
6604187 McGrath et al. Aug 2003 B1
6609189 Kuszmaul et al. Aug 2003 B1
6615340 Wilmot, II Sep 2003 B1
6658549 Wilson et al. Dec 2003 B2
6668316 Gorshtein et al. Dec 2003 B1
6681395 Nishi Jan 2004 B1
6704860 Moore Mar 2004 B1
6721874 Le et al. Apr 2004 B1
6728866 Kahle et al. Apr 2004 B1
6775761 Wang et al. Aug 2004 B2
6829698 Arimilli et al. Dec 2004 B2
6850531 Rao et al. Feb 2005 B1
6882177 Reddy et al. Apr 2005 B1
6907600 Neiger et al. Jun 2005 B2
6912644 O'Connor et al. Jun 2005 B1
6920530 Musumeci Jul 2005 B2
6944744 Ahmed et al. Sep 2005 B2
6948172 D'Souza Sep 2005 B1
6950927 Apisdorf et al. Sep 2005 B1
6954846 Leibholz et al. Oct 2005 B2
6985591 Graunke Jan 2006 B2
6988183 Wong Jan 2006 B1
7007108 Emerson et al. Feb 2006 B2
7020879 Nemirovsky et al. Mar 2006 B1
7047322 Bauman et al. May 2006 B1
7111145 Chen et al. Sep 2006 B1
7117347 Col et al. Oct 2006 B2
7139855 Armstrong et al. Nov 2006 B2
7143273 Miller et al. Nov 2006 B2
7149872 Rozas et al. Dec 2006 B2
7150021 Vajjhala et al. Dec 2006 B1
7171535 Naoi Jan 2007 B2
7203824 Bean et al. Apr 2007 B2
7206925 Jacobson et al. Apr 2007 B1
7213106 Koster et al. May 2007 B1
7213248 Arimilli et al. May 2007 B2
7231106 Basavanhally et al. Jun 2007 B2
7257695 Jiang et al. Aug 2007 B2
7278030 Chen et al. Oct 2007 B1
7284092 Nunamaker et al. Oct 2007 B2
7290261 Burky et al. Oct 2007 B2
7313775 Casey et al. Dec 2007 B2
7343476 Floyd et al. Mar 2008 B2
7373637 Dewitt et al. May 2008 B2
7380096 Rozas et al. May 2008 B1
7383427 Yamazaki Jun 2008 B2
7398347 Pechanek et al. Jul 2008 B1
7406581 Southwell et al. Jul 2008 B2
7418579 Guibert et al. Aug 2008 B2
7418582 Iacobovici et al. Aug 2008 B1
7441110 Puzak et al. Oct 2008 B1
7493475 Colavin Feb 2009 B2
7539879 Terechko et al. May 2009 B2
7546420 Shar et al. Jun 2009 B1
7577820 Wentzlaff et al. Aug 2009 B1
7613131 Decasper et al. Nov 2009 B2
7617384 Coon et al. Nov 2009 B1
7634637 Lindholm et al. Dec 2009 B1
7647483 Bates et al. Jan 2010 B2
7680988 Nickolls et al. Mar 2010 B1
7681019 Favor Mar 2010 B1
7707397 Henry et al. Apr 2010 B2
7707578 Zedlewski et al. Apr 2010 B1
7711929 Burky et al. May 2010 B2
7716460 Stempel et al. May 2010 B2
7757065 Jourdan et al. Jul 2010 B1
7770161 Mitran et al. Aug 2010 B2
7783868 Ukai Aug 2010 B2
7783869 Grandou et al. Aug 2010 B2
7809925 Mejdrich et al. Oct 2010 B2
7848129 Deshpande et al. Dec 2010 B1
7856530 Mu Dec 2010 B1
7861060 Nickolls et al. Dec 2010 B1
7877582 Gschwind et al. Jan 2011 B2
7913058 Rozas et al. Mar 2011 B2
7925869 Kelsey et al. Apr 2011 B2
8044951 Brown et al. Oct 2011 B1
8046775 Kang et al. Oct 2011 B2
8082420 Comparan et al. Dec 2011 B2
8108545 Arimilli et al. Jan 2012 B2
8145844 Bruce Mar 2012 B2
8145880 Cismas et al. Mar 2012 B1
8145882 Kishore et al. Mar 2012 B1
8200949 Tarjan et al. Jun 2012 B1
8219996 Morris Jul 2012 B1
8230423 Frigo et al. Jul 2012 B2
8239656 Rozas et al. Aug 2012 B2
8301847 Dantzig et al. Oct 2012 B2
8316435 Varadhan et al. Nov 2012 B1
8327115 Abdallah Dec 2012 B2
8438366 Akizuki et al. May 2013 B2
8522253 Rozas et al. Aug 2013 B1
8539486 Cain, III et al. Sep 2013 B2
8645965 Zimmer et al. Feb 2014 B2
8756329 Reynolds et al. Jun 2014 B2
8868838 Glasco et al. Oct 2014 B1
8930674 Avudaiyappan et al. Jan 2015 B2
9047178 Talagala et al. Jun 2015 B2
9135003 Suh et al. Sep 2015 B2
20010016901 Topham Aug 2001 A1
20010032303 Pechanek et al. Oct 2001 A1
20010049782 Hsu et al. Dec 2001 A1
20020029308 Babaian et al. Mar 2002 A1
20020062435 Nemirovsky et al. May 2002 A1
20020069326 Richardson et al. Jun 2002 A1
20020082824 Neiger et al. Jun 2002 A1
20020083312 Sinharoy Jun 2002 A1
20020099913 Steely et al. Jul 2002 A1
20020126657 Frouin et al. Sep 2002 A1
20020129085 Kubala et al. Sep 2002 A1
20020174321 John et al. Nov 2002 A1
20020188833 Henry et al. Dec 2002 A1
20030035422 Hill Feb 2003 A1
20030065887 Maiyuran et al. Apr 2003 A1
20030088752 Harman May 2003 A1
20030093776 Hilton May 2003 A1
20030101322 Gardner May 2003 A1
20030101444 Wu et al. May 2003 A1
20030126408 Vajapeyam et al. Jul 2003 A1
20030131335 Hamlin Jul 2003 A1
20030149862 Kadambi Aug 2003 A1
20030163642 Borkenhagen et al. Aug 2003 A1
20030169626 Burk et al. Sep 2003 A1
20030200396 Musumeci Oct 2003 A1
20030200412 Peinado et al. Oct 2003 A1
20030202530 Jenkins et al. Oct 2003 A1
20030225938 Glasco et al. Dec 2003 A1
20030226001 Moyer et al. Dec 2003 A1
20030233394 Rudd et al. Dec 2003 A1
20040034762 Kacevas Feb 2004 A1
20040044850 George et al. Mar 2004 A1
20040064668 Kjos et al. Apr 2004 A1
20040073909 Arimilli et al. Apr 2004 A1
20040078538 Dutt et al. Apr 2004 A1
20040093483 Nguyen et al. May 2004 A1
20040098567 Hansen et al. May 2004 A1
20040103251 Alsup May 2004 A1
20040117593 Uhlig et al. Jun 2004 A1
20040117594 Vanderspek Jun 2004 A1
20040122887 Macy Jun 2004 A1
20040138857 Souza et al. Jul 2004 A1
20040139441 Kaburaki et al. Jul 2004 A1
20040143727 McDonald Jul 2004 A1
20040158822 Sandham et al. Aug 2004 A1
20040187123 Tremblay et al. Sep 2004 A1
20040193857 Miller et al. Sep 2004 A1
20040202158 Takeno et al. Oct 2004 A1
20040205296 Bearden Oct 2004 A1
20040215886 Cargnoni et al. Oct 2004 A1
20040216105 Burky et al. Oct 2004 A1
20040216120 Burky et al. Oct 2004 A1
20040225872 Bonanno et al. Nov 2004 A1
20050005085 Miyanaga Jan 2005 A1
20050027961 Zhang Feb 2005 A1
20050044547 Gipp Feb 2005 A1
20050055504 Hass et al. Mar 2005 A1
20050060457 Olukotun Mar 2005 A1
20050066131 Biles et al. Mar 2005 A1
20050108480 Correale, Jr. et al. May 2005 A1
20050108715 Kanai et al. May 2005 A1
20050114603 Buti et al. May 2005 A1
20050120191 Akkary et al. Jun 2005 A1
20050120194 Kissell Jun 2005 A1
20050132145 Dybsetter et al. Jun 2005 A1
20050154867 Dewitt, Jr. et al. Jul 2005 A1
20050204118 Jen et al. Sep 2005 A1
20050216920 Tewari et al. Sep 2005 A1
20050251639 Vishin et al. Nov 2005 A1
20050251649 Yamazaki Nov 2005 A1
20050262270 Latorre et al. Nov 2005 A1
20050289299 Nunamaker et al. Dec 2005 A1
20050289530 Robison Dec 2005 A1
20060004964 Conti et al. Jan 2006 A1
20060026381 Doi et al. Feb 2006 A1
20060026408 Morris et al. Feb 2006 A1
20060036516 Glebe Feb 2006 A1
20060080380 Aizu et al. Apr 2006 A1
20060094446 Duan May 2006 A1
20060095720 Biles et al. May 2006 A1
20060143390 Kottapalli Jun 2006 A1
20060161421 Kissell Jul 2006 A1
20060161921 Kissell Jul 2006 A1
20060179257 Chu et al. Aug 2006 A1
20060179281 Jensen et al. Aug 2006 A1
20060179289 Floyd et al. Aug 2006 A1
20060190707 McIlvaine et al. Aug 2006 A1
20060212687 Chen et al. Sep 2006 A1
20060230243 Cochran et al. Oct 2006 A1
20060230253 Codrescu et al. Oct 2006 A1
20060230409 Frigo et al. Oct 2006 A1
20060236074 Williamson et al. Oct 2006 A1
20060236080 Doing et al. Oct 2006 A1
20060242365 Ali et al. Oct 2006 A1
20060242384 Ahmed et al. Oct 2006 A1
20060256641 Johnstone Nov 2006 A1
20060277365 Pong Dec 2006 A1
20060282839 Hankins et al. Dec 2006 A1
20070006231 Wang et al. Jan 2007 A1
20070074005 Abernathy et al. Mar 2007 A1
20070198665 De Matteis et al. Aug 2007 A1
20070214343 Lindholm et al. Sep 2007 A1
20070226722 Chou Sep 2007 A1
20070262270 Huang et al. Nov 2007 A1
20080016507 Thomas et al. Jan 2008 A1
20080040724 Kang et al. Feb 2008 A1
20080046666 Termaine et al. Feb 2008 A1
20080052432 Wilson et al. Feb 2008 A1
20080077813 Keller et al. Mar 2008 A1
20080091880 Vishin Apr 2008 A1
20080104598 Chang May 2008 A1
20080109611 Liu et al. May 2008 A1
20080126643 Higuchi May 2008 A1
20080126771 Chen et al. May 2008 A1
20080148237 Jiang et al. Jun 2008 A1
20080184211 Nickolls et al. Jul 2008 A1
20080195844 Shen et al. Aug 2008 A1
20080215865 Hino et al. Sep 2008 A1
20080225987 Fazzi et al. Sep 2008 A1
20080235500 Davis et al. Sep 2008 A1
20080250227 Linderman et al. Oct 2008 A1
20080250232 Nakashima Oct 2008 A1
20080256278 Thomas et al. Oct 2008 A1
20080256330 Wang et al. Oct 2008 A1
20080270758 Ozer et al. Oct 2008 A1
20080270774 Singh et al. Oct 2008 A1
20080282037 Kusachi et al. Nov 2008 A1
20080320476 Wingard et al. Dec 2008 A1
20090019264 Correale, Jr. et al. Jan 2009 A1
20090031104 Vorbach et al. Jan 2009 A1
20090070554 Wang et al. Mar 2009 A1
20090113170 Abdallah Apr 2009 A1
20090119457 Latorre et al. May 2009 A1
20090138659 Lauterbach May 2009 A1
20090138670 Mutlu et al. May 2009 A1
20090150647 Mejdrich et al. Jun 2009 A1
20090150890 Yourst Jun 2009 A1
20090157980 Bruce Jun 2009 A1
20090158017 Mutlu et al. Jun 2009 A1
20090164733 Kim et al. Jun 2009 A1
20090164766 Suggs et al. Jun 2009 A1
20090165007 Aghajanyan Jun 2009 A1
20090172344 Grochowski et al. Jul 2009 A1
20090240919 Alexander et al. Sep 2009 A1
20090241084 Malley et al. Sep 2009 A1
20090249026 Smelyanskiy et al. Oct 2009 A1
20090251476 Jiao et al. Oct 2009 A1
20090282101 Lim et al. Nov 2009 A1
20090287912 Sendag Nov 2009 A1
20090307450 Lee Dec 2009 A1
20090328053 Dice Dec 2009 A1
20100058033 Abernathy et al. Mar 2010 A1
20100064121 Alexander et al. Mar 2010 A1
20100082912 Lesartre et al. Apr 2010 A1
20100088443 Riocreux et al. Apr 2010 A1
20100100690 Rajamani et al. Apr 2010 A1
20100100704 Hill et al. Apr 2010 A1
20100100707 Mejdrich et al. Apr 2010 A1
20100115167 Tardieux et al. May 2010 A1
20100115244 Jensen et al. May 2010 A1
20100138607 Hughes et al. Jun 2010 A1
20100154042 Miyamoto et al. Jun 2010 A1
20100161948 Abdallah Jun 2010 A1
20100169578 Nychka et al. Jul 2010 A1
20100169611 Chou et al. Jul 2010 A1
20100205603 Merten et al. Aug 2010 A1
20100211746 Tsukishiro Aug 2010 A1
20100280996 Gross, IV et al. Nov 2010 A1
20100286976 Gao et al. Nov 2010 A1
20100299671 Kinsey Nov 2010 A1
20100306503 Henry et al. Dec 2010 A1
20100325394 Golla et al. Dec 2010 A1
20110010521 Wang et al. Jan 2011 A1
20110055479 West et al. Mar 2011 A1
20110067016 Mizrachi et al. Mar 2011 A1
20110082980 Gschwind et al. Apr 2011 A1
20110082983 Koktan Apr 2011 A1
20110093857 Sydow et al. Apr 2011 A1
20110119660 Tanaka May 2011 A1
20110153955 Herrenschmidt et al. Jun 2011 A1
20110225588 Pollock et al. Sep 2011 A1
20120005462 Hall et al. Jan 2012 A1
20120023318 Xing et al. Jan 2012 A1
20120042105 Maeda et al. Feb 2012 A1
20120042126 Krick et al. Feb 2012 A1
20120066483 Boury et al. Mar 2012 A1
20120089812 Smith Apr 2012 A1
20120096204 Auerbach et al. Apr 2012 A1
20120198209 Abdallah et al. Aug 2012 A1
20120246450 Abdallah Sep 2012 A1
20120246657 Abdallah Sep 2012 A1
20120278593 Clark et al. Nov 2012 A1
20130019047 Podvalny et al. Jan 2013 A1
20130036296 Hickey et al. Feb 2013 A1
20130046934 Nychka et al. Feb 2013 A1
20130086417 Sivaramakrishnan et al. Apr 2013 A1
20130097369 Talagala et al. Apr 2013 A1
20130238874 Avudaiyappan et al. Sep 2013 A1
20130283286 Lee et al. Oct 2013 A1
20130304991 Boettcher et al. Nov 2013 A1
20130311759 Abdallah Nov 2013 A1
20130346699 Walker Dec 2013 A1
20140032844 Avudaiyappan et al. Jan 2014 A1
20140032845 Avudaiyappan et al. Jan 2014 A1
20140032856 Avudaiyappan Jan 2014 A1
20140075168 Abdallah Mar 2014 A1
20140108730 Avudaiyappan et al. Apr 2014 A1
20140123145 Barrow-Williams et al. May 2014 A1
20140156947 Avudaiyappan Jun 2014 A1
20140181833 Bird et al. Jun 2014 A1
20140281242 Abdallah et al. Sep 2014 A1
20140281411 Abdallah Sep 2014 A1
20140281412 Abdallah Sep 2014 A1
20140281416 Abdallah Sep 2014 A1
20140281426 Abdallah Sep 2014 A1
20140281427 Abdallah Sep 2014 A1
20140281428 Abdallah Sep 2014 A1
20140281436 Abdallah Sep 2014 A1
20140282592 Abdallah Sep 2014 A1
20140282601 Abdallah Sep 2014 A1
20140317387 Abdallah Oct 2014 A1
20140344554 Abdallah Nov 2014 A1
20140373022 Chan Dec 2014 A1
20150039859 Abdallah Feb 2015 A1
20150046683 Abdallah Feb 2015 A1
20150046686 Abdallah Feb 2015 A1
20150186144 Abdallah Jul 2015 A1
20160041908 Avudaiyappan Feb 2016 A1
20160041913 Avudaiyappan Feb 2016 A1
20160041930 Avudaiyappan Feb 2016 A1
20160154653 Abdallah Jun 2016 A1
20160210176 Abdallah Jul 2016 A1
20160371188 Abdallah et al. Dec 2016 A1
Foreign Referenced Citations (58)
Number Date Country
1214666 Apr 1999 CN
1305150 Jul 2001 CN
1451115 Oct 2003 CN
1214666 Aug 2005 CN
1713137 Dec 2005 CN
1774709 May 2006 CN
1841314 Oct 2006 CN
1841332 Oct 2006 CN
1848095 Oct 2006 CN
1881223 Dec 2006 CN
101114218 Jan 2008 CN
101151594 Mar 2008 CN
101241428 Aug 2008 CN
101344840 Jan 2009 CN
101449256 Jun 2009 CN
101582025 Nov 2009 CN
101627365 Jan 2010 CN
101916180 Dec 2010 CN
102105864 Jun 2011 CN
0596636 May 1994 EP
0706133 Apr 1996 EP
2616928 Jul 2013 EP
2343270 May 2000 GB
20010050794 Jun 2001 KR
20010053622 Jun 2001 KR
20100003309 Jan 2010 KR
200707284 Mar 1995 TW
539996 Jul 2003 TW
544626 Aug 2003 TW
200401187 Jan 2004 TW
200405201 Apr 2004 TW
591530 Jun 2004 TW
200422949 Nov 2004 TW
I233545 Jun 2005 TW
I281121 May 2007 TW
200813766 Mar 2008 TW
200844853 Nov 2008 TW
200941339 Oct 2009 TW
I315488 Oct 2009 TW
200949690 Dec 2009 TW
I329437 Aug 2010 TW
I331282 Oct 2010 TW
I352288 Nov 2011 TW
201227520 Jul 2012 TW
201241744 Oct 2012 TW
201305819 Feb 2013 TW
9750031 Dec 1997 WO
9919793 Apr 1999 WO
0125921 Apr 2001 WO
2004114128 Dec 2004 WO
2007027671 Mar 2007 WO
2008021434 Feb 2008 WO
2008061154 May 2008 WO
2009012296 Jan 2009 WO
2009101563 Aug 2009 WO
2010049585 May 2010 WO
2012135031 Oct 2012 WO
2012135050 Oct 2012 WO
Non-Patent Literature Citations (337)
Entry
Abandonment from U.S. Appl. No. 13/824,013, dated Mar. 3, 2015, 1 page.
Advisory Action from U.S. Appl. No. 12/296,919, dated Aug. 26, 2011, 3 pages.
Alimohammad et al., “Modeling of FPGA Local/global Interconnect Resources and Derivation of Minimal Test Configuration,” 2002, IEEE, Proceedings of the 17th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems, DFT'02, pp. 1-9.
Communication pursuant to Article 94(3) EPC for European Application No. 070864410, dated Mar. 16, 2012, 4 pages.
Communication pursuant to Article 94(3) EPC for European Application No. 070864410, dated Nov. 14, 2012, 4 pages.
Communication pursuant to Article 94(3) EPC for European Application No. 07811845, dated Apr. 16, 2014, 5 pages.
Communication pursuant to Article 94(3) EPC for European Application No. 07811845, dated Dec. 21, 2015, 3 pages.
Communication pursuant to Article 94(3) EPC for European Application No. 07811845, dated Feb. 3, 2014, 5 pages.
Communication pursuant to Article 94(3) EPC for European Application No. 07811845, dated Feb. 16, 2012, 6 pages.
Communication pursuant to Article 94(3) EPC for European Application No. 07811845, dated Jan. 27, 2012, 7 pages.
Communication pursuant to Article 94(3) EPC for European Application No. 12150513, dated Apr. 16, 2013, 5 pages.
Communication pursuant to Article 94(3) EPC for European Application No. 12150513, dated Dec. 21, 2015, 4 pages.
Communication pursuant to Article 94(3) EPC for European Application No. 12150513, dated May 9, 2014, 8 pages.
Communication pursuant to Article 94(3) EPC for European Application No. 12174228, dated Jun. 11, 2013, 3 pages.
Communication pursuant to Article 94(3) EPC for European Application No. 12764627, dated Oct. 4, 2016, 4 pages.
Decision to Grant a Patent for European Application No. 07811845, dated Jun. 16, 2016, 2 pages.
Decision to Grant a Patent for European Application No. 12150513, dated Jun. 16, 2016, 2 pages.
Decision to Grant a Patent for European Application No. 12174228, dated Dec. 13, 2013, 5 pages.
Decision to Grant a Patent for Korean Application No. KR1020137027843, dated Apr. 29, 2016, 2 pages.
Decision to Grant a Patent from foreign counterpart Korean Patent Application No. 20137033565, dated Apr. 29, 2016, 2 pages.
Decision to Grant a Patent from foreign counterpart Korean Patent Application No. 20137033566, dated Apr. 29, 2016, 2 pages.
Extended European Search Report for Application No. 07811845.2, dated Nov. 2, 2009, 7 pages.
Extended European Search Report for Application No. 07864410, dated Feb. 19, 2010, 8 pages.
Extended European Search Report for Application No. 12150513, dated Jun. 19, 2012, 8 pages.
Extended European Search Report for Application No. 12174228, dated Oct. 16, 2012, 4 pages.
Extended European Search Report for Application No. 12174229, dated Jul. 4, 2014, 10 pages.
Extended European Search Report for Application No. 12174229, dated Oct. 10, 2012, 7 pages.
Extended European Search Report for Application No. 12788989, dated May 12, 2016, 9 pages.
Extended European Search Report for Application No. 12789667, dated Feb. 26, 2016, 7 pages.
Extended European Search Report for Application No. EP110826042, dated Jan. 24, 2014, 6 pages.
Extended European Search Report for Application No. EP11876128, dated Jun. 21, 2016, 8 pages.
Extended European Search Report for Application No. EP12764627, dated Jul. 10, 2014, 5 pages.
Final Office Action from U.S. Appl. No. 12/296,919, dated Jun. 14, 2011, 7 pages.
Final Office Action from U.S. Appl. No. 12/296,919, dated Oct. 22, 2010, 7 pages.
Final Office Action from U.S. Appl. No. 12/514,303, dated Jan. 24, 2013, 11 pages.
Final Office Action from U.S. Appl. No. 13/428,440, dated Dec. 24, 2014, 19 pages.
Final Office Action from U.S. Appl. No. 13/475,708, dated Jun. 9, 2016, 35 pages.
Final Office Action from U.S. Appl. No. 13/475,708, dated May 8, 2015, 23 pages.
Final Office Action from U.S. Appl. No. 13/475,739, dated Feb. 4, 2015, 12 pages.
Final Office Action from U.S. Appl. No. 13/475,739, dated Nov. 23, 2015, 28 pages.
Final Office Action from U.S. Appl. No. 14/194,589, dated Apr. 19, 2016, 7 pages.
Final Office Action from U.S. Appl. No. 14/212,203, dated Dec. 13, 2016, 20 pages.
Final Office Action from U.S. Appl. No. 14/212,533, dated Jan. 4, 2017, 27 pages.
Final Office Action from U.S. Appl. No. 14/213,692, dated Jan. 20, 2016, 18 pages.
Final Office Action from U.S. Appl. No. 14/360,284, dated Mar. 1, 2017, 10 pages.
First Office Action from foreign counterpart China Patent Application No. CN201280034725, dated Oct. 26, 2015, 26 pages.
First Office Action from foreign counterpart China Patent Application No. CN201280034739, dated Nov. 3, 2015, 39 pages.
First Office Action from foreign counterpart China Patent Application No. CN201310589048, dated Feb. 2, 2016, 8 pages.
Franklin et al., “The Expandable Split Window Paradigm for Exploiting Fine-Grain Parallelism,” ACM Sigarch Computer Architecture News, ACM Special Interest Group on Computer Architecture, vol. 20 (2), 1992, 10 pages.
Intention to Grant a patent for European Application No. 07811845, dated Mar. 31, 2016, 58 pages.
Notice of Allowance from U.S. Appl. No. 13/475,739, dated Mar. 25, 2016, 25 pages.
Notice of Allowance from U.S. Appl. No. 13/691,609, dated Aug. 6, 2014, 11 pages.
Notice of Allowance from U.S. Appl. No. 13/691,609, dated Feb. 23, 2015, 16 pages.
Notice of Allowance from U.S. Appl. No. 14/194,589, dated Jul. 27, 2016, 7 pages.
Notice of Allowance from U.S. Appl. No. 14/213,692, dated Dec. 23, 2016, 19 pages.
Notification of Reason for Refusal from foreign counterpart Korean Patent Application No. 20137033565, dated Sep. 30, 2015, 6 pages.
Notification of Reason for Refusal from foreign counterpart Korean Patent Application No. 20137033566, dated Sep. 30, 2015, 9 pages.
Notification of Reason for Refusal from foreign counterpart Korean Patent Application No. KR1020137027843, dated Sep. 30, 2015, 8 pages.
Notification to Grant Patent Right for Invention from foreign counterpart China Patent Application No. CN201280034725, dated Aug. 12, 2016, 3 pages.
Notification to Grant Patent Right for Invention from foreign counterpart China Patent Application No. CN201310589048, dated May 5, 2016, 3 pages.
Nuth et al., “The Named-State Register File: Implementation and Performance,” High-Performance Computer Architecture, First IEEE Symposium, 1995, 10 pages.
Office Action from foreign counterpart Chinese Patent Application No. 200780046679, dated May 21, 2013, 14 pages.
Restriction Requirement from U.S. Appl. No. 12/296,919, dated Feb. 8, 2011, 4 pages.
Restriction Requirement from U.S. Appl. No. 12/514,303, dated Oct. 15, 2012, 4 pages.
Rixner et al., “Register Organization for Media Processing,” IEEE, 2000.
Santos et al., “The 2D-VLIW Architecture,” Mar. 2006, 13 pages.
Sassone et al., “Dynamic Strands: Collapsing Speculative Dependence Chains for Reducing Pipeline Communication,” Microarchitecture, IEEE, 37th International Symposium on Portland, USA Dec. 2004, Piscataway, NJ, USA.
Search Report for Chinese Application No. CN201280024012, dated May 19, 2016, 2 pages.
Second Office Action from foreign counterpart China Patent Application No. 201180076244, dated Nov. 18, 2016, 19 pages.
Second Office Action from foreign counterpart China Patent Application No. CN201280034725, dated Apr. 14, 2016, 8 pages.
Second Office Action from foreign counterpart China Patent Application No. CN201280034739, dated Jun. 23, 2016, 44 pages.
Shiliang Hu., et al., “An Approach for Implementing Efficient Superscalar CISC Processors,” High Performance Computer Architecture, 2006, Twelfth International Symposium on Austin, Texas Feb. 11-15, 2006, pp. 40-51.
Summons to attend Oral Proceedings for European Application No. 070864410, mailed Apr. 3, 2013, 3 pages.
Third Office Action from foreign counterpart Chinese Patent Application No. 201280034739, dated Dec. 27, 2016, 18 pages.
Wallace S., et al.,“Multiple Branch and Block Prediction,” Third International symposium on High-Performance Computer Architecture, IEEE, Feb. 1997, pp. 94-103.
Written Opinion for Application No. PCT/US2007/066536, dated Jul. 30, 2008, 5 pages.
Ye J., et al.,“A New Recovery Mechanism in Superscalar Microprocessors by Recovering Critical Misprediction,” IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 2011, vol. E94-A (12), pp. 2639-2648.
Yeh T., et al.,“Increasing the Instruction Fetch Rate Via Multiple Branch Prediction and a Branch Address Cache,” 7th International Conference on Supercomputing, ACM, 1993, pp. 67-76.
Intention to Grant a patent for European Application No. 12150513, dated Mar. 15, 2016, 59 pages.
International Preliminary Report on Patentability and for Application No. PCT/US2007/084710, dated May 19, 2009, 4 pages.
International Preliminary Report on Patentability and Written Opinion for Application No. PCT/US2007/066536, dated Oct. 14, 2008, 6 pages.
International Preliminary Report on Patentability and Written Opinion for Application No. PCT/US2012/038711, dated Nov. 20, 2013, 5 pages.
International Preliminary Report on Patentability and Written Opinion for Application No. PCT/US2012/038713, dated Nov. 20, 2013, 5 pages.
International Preliminary Report on Patentability for Application No. PCT/US2011/051992, dated Mar. 28, 2013, 8 pages.
International Preliminary Report on Patentability for Application No. PCT/US2011/061953, dated Jun. 5, 2014, 7 pages.
International Preliminary Report on Patentability for Application No. PCT/US2012/30360, dated Oct. 10, 2013, 6 pages.
International Preliminary Report on Patentability for Application No. PCT/US2014/024608, dated Sep. 24, 2015, 8 pages.
International Preliminary Report on Patentability for Application No. PCT/US2014/024722, dated Sep. 24, 2015, 6 pages.
International Search Report and Written Opinion for Application No. PCT/US2007/084710, dated May 22, 2008, 6 pages.
International Search Report and Written Opinion for Application No. PCT/US2011/051992, dated Mar. 28, 2012, 11 pages.
International Search Report and Written Opinion for Application No. PCT/US2011/061953, dated Jul. 24, 2012, 8 pages.
International Search Report and Written Opinion for Application No. PCT/US2012/30360, dated Oct. 29, 2012, 7 pages.
International Search Report and Written Opinion for Application No. PCT/US2014/024608, dated Jul. 31, 2014, 9 pages.
International Search Report and Written Opinion for Application No. PCT/US2014/024722, dated Jun. 27, 2014, 7 pages.
International Search Report for Application No. PCT/US2007/066536, dated Jul. 30, 2008, 2 pages.
International Search Report for Application No. PCT/US2012/038711, dated Nov. 28, 2012, 3 pages.
International Search Report for Application No. PCT/US2012/038713, dated Oct. 29, 2012, 3 pages.
Kozyrakis et al., “A New Direction for Computer Architecture Research,” IEEE, Nov. 1, 1998, vol. 31 (11), pp. 24-32.
Mogul J.C., et al., “Potential benefits of delta encoding and data compression for HTTP,”, Oct. 1, 1997, ACM, SIGCOMM '97, pp. 181-194.
Nagarajan et al., “A Design Space Evaluation of Grid Processor Architectures,” 34th ACM/IEEE International Symposium, Piscataway, NJ, Dec. 1-5, 2001, pp. 40-51.
Nanda A.K., et al., “The Misprediction Recovery Cache,” International Journal of Parallel Programming, Plenum Publishing Corporation, 1998, vol. 26 (4), pp. 383-415.
Non-Final Office Action from U.S. Appl. No. 12/296,919, dated Apr. 7, 2011, 8 pages.
Non-Final Office Action from U.S. Appl. No. 12/296,919, dated Jun. 14, 2010, 7 pages.
Non-Final Office Action from U.S. Appl. No. 12/296,919, dated Mar. 7, 2012, 7 pages.
Non-Final Office Action from U.S. Appl. No. 12/514,303, dated Jun. 27, 2013, 7 pages.
Non-Final Office Action from U.S. Appl. No. 12/514,303, dated May 10, 2012, 9 pages.
Non-Final Office Action from U.S. Appl. No. 13/428,440, dated Jun. 18, 2014, 16 pages.
Non-Final Office Action from U.S. Appl. No. 13/428,440, dated May 4, 2015, 13 pages.
Non-Final Office Action from U.S. Appl. No. 13/475,708, dated Feb. 11, 2015, 27 pages.
Non-Final Office Action from U.S. Appl. No. 13/475,708, dated Feb. 12, 2016, 37 pages.
Non-Final Office Action from U.S. Appl. No. 13/475,739, dated May 21, 2015, 14 pages.
Non-Final Office Action from U.S. Appl. No. 13/475,739, dated Sep. 2, 2014, 15 pages.
Non-Final Office Action from U.S. Appl. No. 13/691,609, dated Jan. 15, 2014, 5 pages.
Non-Final Office Action from U.S. Appl. No. 13/691,609, dated Sep. 5, 2013, 11 pages.
Non-Final Office Action from U.S. Appl. No. 14/194,589, dated Nov. 19, 2015, 12 pages.
Non-Final Office Action from U.S. Appl. No. 14/212,203, dated Sep. 8, 2016, 52 pages.
Non-Final Office Action from U.S. Appl. No. 14/212,533, dated Sep. 22, 2016, 52 pages.
Non-final Office Action from U.S. Appl. No. 14/213,692, dated Aug. 24, 2015, 30 pages.
Non-final Office Action from U.S. Appl. No. 14/213,692, dated Jun. 13, 2016, 17 pages.
Non-final Office Action from U.S. Appl. No. 14/360,284, dated Oct. 21, 2016, 32 pages.
Notice of Allowance from foreign counterpart Chinese Patent Application No. 200780046679, dated Feb. 6, 2017, 8 pages.
Notice of Allowance from U.S. Appl. No. 12/296,919, dated Jul. 27, 2012, 6 pages.
Notice of Allowance from U.S. Appl. No. 12/514,303, dated Oct. 25, 2013, 12 pages.
Notice of Allowance from U.S. Appl. No. 13/428,440, dated Feb. 26, 2016, 10 pages.
Notice of Allowance from U.S. Appl. No. 13/428,440, dated Nov. 16, 2016, 9 pages.
Notice of Allowance from U.S. Appl. No. 13/428,440, dated Nov. 20, 2015, 9 pages.
Notice of Allowance from U.S. Appl. No. 13/475,708, dated Oct. 13, 2016, 17 pages.
Notice of Allowance from U.S. Appl. No. 13/475,739, dated Aug. 3, 2016, 11 pages.
Communication pursuant to Rules 161(2) and 162 EPC for Application No. 12763717, dated Nov. 22, 2013, 3 pages.
Communication pursuant to Rules 70(2) and 70a(2) EPC for Application No. 12763717, dated Oct. 10, 2014, 1 page.
Decision to Grant a Patent for Korean Patent Application No. 20137027842, dated Mar. 31, 2016, 2 pages.
Examination Report for European Application No. 12763717, dated Nov. 28, 2016, 5 pages.
Extended European Search Report for Application No. EP12763717, dated Sep. 24, 2014, 5 pages.
Extended European Search Report for Application No. EP12764838, dated Jul. 10, 2014, 5 pages.
Final Office Action from U.S. Appl. No. 13/428,438, dated Dec. 24, 2014, 17 pages.
Final Office Action from U.S. Appl. No. 13/428,452, dated Dec. 24, 2014, 20 pages.
Final Office Action from U.S. Appl. No. 14/213,115, dated Feb. 3, 2015, 11 pages.
Final Office Action from U.S. Appl. No. 14/213,115, dated Mar. 7, 2016, 18 pages.
Final Office Action from U.S. Appl. No. 14/213,854, dated Nov. 9, 2015, 19 pages.
Final Office Action from U.S. Appl. No. 14/214,045, dated Aug. 29, 2016, 14 pages.
First Office Action from foreign counterpart Chinese Patent Application No. 201280024054.0, dated May 30, 2016, 24 pages.
First Office Action from foreign counterpart Chinese patent application No. 201280024095, dated May 26, 2016, 32 pages.
International Preliminary Report on Patentability for Application No. PCT/US2012/030383, dated Oct. 10, 2013, 6 pages.
International Preliminary Report on Patentability for Application No. PCT/US2012/030409, dated Oct. 10, 2013, 6 pages.
International Preliminary Report on Patentability for Application No. PCT/US2014/024276, dated Sep. 24, 2015, 6 pages.
International Search Report and Written Opinion for Application No. PCT/US2012/030383, dated Oct. 25, 2012, 8 pages.
International Search Report and Written Opinion for Application No. PCT/US2012/030409, dated Oct. 12, 2012, 7 pages.
International Search Report and Written Opinion for Application No. PCT/US2014/024276, dated Jul. 31, 2014, 7 pages.
Non-Final Office Action from U.S. Appl. No. 13/428,438, dated Apr. 24, 2014, 15 pages.
Non-Final Office Action from U.S. Appl. No. 13/428,438, dated May 1, 2015, 12 pages.
Non-Final Office Action from U.S. Appl. No. 13/428,452, dated Apr. 23, 2015, 11 pages.
Non-Final Office Action from U.S. Appl. No. 13/428,452, dated Apr. 28, 2014, 21 pages.
Non-Final Office Action from U.S. Appl. No. 14/213,115, dated Oct. 19, 2015, 24 pages.
Non-Final Office Action from U.S. Appl. No. 14/213,115, dated Sep. 22, 2014, 19 pages.
Non-Final Office Action from U.S. Appl. No. 14/213,854, dated Apr. 29, 2016, 13 pages.
Non-Final Office Action from U.S. Appl. No. 14/213,854, dated Jun. 19, 2015, 23 pages.
Non-Final Office Action from U.S. Appl. No. 14/214,045, dated Apr. 1, 2016, 61 pages.
Non-Final Office Action from U.S. Appl. No. 14/214,045, dated Dec. 19, 2016, 88 pages.
Notice of Allowance from foreign counterpart Korean Patent Application No. 20137027841, dated Mar. 31, 2016, 2 pages.
Notice of Allowance from U.S. Appl. No. 13/428,438, dated Aug. 10, 2016, 9 pages.
Notice of Allowance from U.S. Appl. No. 13/428,438, dated Dec. 29, 2016, 9 pages.
Notice of Allowance from U.S. Appl. No. 13/428,438, dated Mar. 4, 2016, 14 pages.
Notice of Allowance from U.S. Appl. No. 13/428,438, dated Nov. 6, 2015, 9 pages.
Notice of Allowance from U.S. Appl. No. 13/428,440, dated Mar. 10, 2017, 52 pages.
Notice of Allowance from U.S. Appl. No. 13/428,452, dated Oct. 21, 2015, 9 pages.
Notice of Allowance from U.S. Appl. No. 13/475,708, dated Mar. 17, 2017, 55 pages.
Notice of Allowance from U.S. Appl. No. 14/213,115, dated Jun. 27, 2016, 9 pages.
Notice of Allowance from U.S. Appl. No. 14/213,115, dated Oct. 3, 2016, 20 pages.
Notice of Allowance from U.S. Appl. No. 14/213,854, dated Oct. 7, 2016, 19 pages.
Notification of Reason for Refusal from Foreign Counterpart Korean Patent Application No. 10-2013-7027842, dated Sep. 18, 2015, 7 pages.
Notification of Reason for Refusal from foreign counterpart Korean Patent Application No. 20137027841, dated Sep. 18, 2015, 10 pages.
Office Action from foreign counterpart Chinese patent application No. 201280024012, dated Feb. 3, 2017, 20 pages.
Office Action from foreign counterpart Chinese patent application No. 201280024095, dated Feb. 4, 2017, 31 pages.
Office Action from foreign counterpart European Patent Application No. EP12764838, dated Oct. 4, 2016, 4 pages.
Second Office Action from foreign counterpart Chinese Patent Application No. 201280024054.0, dated Dec. 26, 2016, 11 pages.
Extended European Search Report for Application No. 16196777.3, dated Mar. 20, 2017, 6 pages.
Final Office Action from U.S. Appl. No. 14/213,218, dated Jul. 6, 2016, 24 pages.
Final Office Action from U.S. Appl. No. 14/213,218, dated Jul. 22, 2015, 16 pages.
Final Office Action from U.S. Appl. No. 14/214,176, dated Aug. 29, 2016, 17 pages.
Non-Final Office Action from U.S. Appl. No. 14/213,218, dated Apr. 22, 2015, 22 pages.
Non-Final Office Action from U.S. Appl. No. 14/213,218, dated Feb. 2, 2016, 17 pages.
Non-Final Office Action from U.S. Appl. No. 14/214,176, dated Jan. 6, 2017, 36 pages.
Non-Final Office Action from U.S. Appl. No. 14/214,176, dated Mar. 25, 2016, 25 pages.
Barham P., et al., “Xen and the Art of Virtualization,” Proceedings of the ACM Symposium on Operating Systems Principles, XP002298786, Oct. 2003, pp. 164-177.
Communication pursuant to Rules 70(2) and 70a(2) EPC for European Application No. 11876314.3, dated Jul. 1, 2016, 1 page.
Cooperman G.,“Cache Basics,” 2003, pp. 1-3, URL: http://www.ccs.neu.edu/course/com3200/parent/NOTES/cache-basics.html.
Extended European Search Report for Application No. 11876314.3, dated Jun. 14, 2016, 6 pages.
Final Office Action from U.S. Appl. No. 14/216,493, dated Aug. 30, 2016, 21 pages.
Final Office Action from U.S. Appl. No. 14/360,282, dated Feb. 16, 2017, 10 pages.
First Office Action and Search report from foreign counterpart China Patent Application No. 201180076248.0, dated Mar. 17, 2016, 25 pages.
Garmany J., “The Power of Indexing,” archieved on Mar. 9, 2009, 7 pages.
Grant of Patent for Korean Application No. 10-2014-7016763, dated Oct. 31, 2016, 2 pages.
International Preliminary Report on Patentability for Application No. PCT/US2011/061957, dated Jun. 5, 2014, 7 pages.
International Search Report and Written Opinion for Application No. PCT/US2011/061957, dated Jul. 20, 2012, 8 pages.
Jacobson et al., “Path-based Next Trace Prediction,” IEEE, 1997, pp. 14-23.
Non-Final Office Action from U.S. Appl. No. 14/215,633, dated Oct. 22, 2015, 18 pages.
Non-final Office Action from U.S. Appl. No. 14/216,493, dated Apr. 4, 2016, 26 pages.
Non-Final Office Action from U.S. Appl. No. 14/216,493, dated Mar. 29, 2017, 18 pages.
Non-Final Office Action from U.S. Appl. No. 14/360,282, dated Oct. 21, 2016, 13 pages.
Non-final Office Action from U.S. Appl. No. 15/257,593, dated Apr. 7, 2017, 37 pages.
Notice of Allowance from U.S. Appl. No. 14/215,633, dated Mar. 23, 2017, 17 pages.
Notice of Allowance from U.S. Appl. No. 14/215,633, dated May 23, 2016, 9 pages.
Notice of Allowance from U.S. Appl. No. 14/215,633, dated Sep. 29, 2016, 13 pages.
Notification of Reason for Refusal from Korean Application No. 10-2014-7016763, dated Apr. 29, 2016, 11 pages.
Restriction Requirement from U.S. Appl. No. 14/360,282, dated Jul. 28, 2016, 4 pages.
Rotenberg E., et al.,“Trace Cache: A Low Latency Approach to High Bandwidth Instruction Fetching,” Apr. 11, 1996, 48 pages.
Second Office Action from foreign counterpart China Patent Application No. 201180076248.0, dated Oct. 20, 2016, 25 pages.
Extended European Search Report for Application No. 14769411.1, dated Apr. 5, 2017, 8 pages.
Final Office Action from U.S. Appl. No. 14/213,730, dated May 11, 2016, 14 pages.
Final Office Action from U.S. Appl. No. 14/216,859, dated Jun. 9, 2016, 16 pages.
First Office Action from foreign counterpart China Patent Application No. 201280024012.7, dated May 30, 2016, 21 pages.
International Preliminary Report on Patentability for Application No. PCT/US2014/024677, dated Sep. 24, 2015, 7 pages.
International Preliminary Report on Patentability for Application No. PCT/US2014/024828, dated Sep. 24, 2015, 8 pages.
International Search Report and Written Opinion for Application No. PCT/US2014/024677, dated Jun. 30, 2014, 9 pages.
International Search Report and Written Opinion for Application No. PCT/US2014/024828, dated Jul. 28, 2014, 9 pages.
Non-Final Office Action from U.S. Appl. No. 14/212,533, dated Apr. 20, 2017, 116 pages.
Non-Final Office Action from U.S. Appl. No. 14/213,730, dated Jan. 7, 2016, 27 pages.
Non-Final Office Action from U.S. Appl. No. 14/216,859, dated Jan. 28, 2016, 33 pages.
Notice of Allowance from U.S. Appl. No. 13/428,438, dated Apr. 13, 2017, 62 pages.
Notice of Allowance from U.S. Appl. No. 13/428,440, dated Apr. 20, 2017, 46 pages.
Notice of Allowance from U.S. Appl. No. 14/213,730, dated Oct. 27, 2016, 16 pages.
Notice of Allowance from U.S. Appl. No. 14/214,045, dated Apr. 18, 2017, 88 pages.
Notice of Allowance from U.S. Appl. No. 14/216,859, dated Jan. 24, 2017, 17 pages.
Notification of Reason for Refusal from foreign counterpart Korean Patent Application No. KR1020157029107, dated Oct. 13, 2016, 11 pages.
Partial Supplementary European Search Report for Application No. EP14770976.0, dated Mar. 24, 2017, 7 pages.
Advisory Action from U.S. Appl. No. 14/214,280, dated May 15, 2017, 3 pages.
Communication pursuant to Article 94(3) EPC for European Application No. 12788989.7, dated Jun. 22, 2017, 6 pages.
Decision to grant a patent from foreign counterpart Korean Patent Application No. 10-2014-7016774, dated Jul. 25, 2016, 2 pages.
Decision to Grant a Patent from foreign counterpart Korean Patent Application No. KR1020157029107, dated Apr. 25, 2017, 2 pages.
Extended European Search Report for Application No. 14770976.0, dated Jul. 3, 2017, 9 pages.
Extended European Search Report for Application No. EP11876130, dated Jun. 1, 2016, 7 pages.
Final Office Action from U.S. Appl. No. 14/213,135, dated Oct. 26, 2015, 20 pages.
Final Office Action from U.S. Appl. No. 14/216,493, dated May 22, 2017, 17 pages.
First Office Action and Search Report from foreign counterpart China Patent Application No. 201480024463.X, dated Apr. 1, 2017, 31 pages. (Translation available only for office action).
First Office Action and Search report from foreign counterpart Chinese Patent Application No. 201180076244.2, dated Mar. 22, 2016, 18 pages (Translation available only for Office Action).
First Office Action from foreign counterpart China Patent Application No. 201180076245.7, dated Mar. 21, 2016, 10 pages.
Intel “Programming on Intel® Platform,” The edition team of Intel® Software College course book, Shanghai Jiao Tong University Press, published Jan. 31, 2011, pp. 175-180.
International Preliminary Report on Patentability for Application No. PCT/US2011/061940, dated Jun. 5, 2014, 6 pages.
International Search Report and Written Opinion for Application No. PCT/US2011/061940, dated Jul. 20, 2012, 8 pages.
Non-Final Office Action from U.S. Appl. No. 14/213,135, dated May 14, 2015, 23 pages.
Non-Final Office Action from U.S. Appl. No. 14/213,135, dated May 31, 2016, 14 pages.
Non-Final Office Action from U.S. Appl. No. 14/360,280, dated Feb. 23, 2017, 34 pages.
Non-Final Office Action from U.S. Appl. No. 14/360,282, dated Jun. 22, 2017, 97 pages.
Non-Final Office Action from U.S. Appl. No. 14/733,827, dated Apr. 28, 2017, 99 pages.
Non-Final Office Action from U.S. Appl. No. 15/219,063, dated May 30, 2017, 102 pages.
Notice of Allowance from U.S. Appl. No. 13/475,708, dated Jun. 16, 2017, 51 pages.
Notice of Allowance from U.S. Appl. No. 13/475,708, dated Jun. 30, 2017, 53 pages.
Notice of Allowance from U.S. Appl. No. 14/213,135, dated Oct. 17, 2016, 17 pages.
Notice of Allowance from U.S. Appl. No. 14/213,218, dated Jun. 16, 2017, 89 pages.
Notice of Allowance from U.S. Appl. No. 14/214,176, dated May 10, 2017, 88 pages.
Notice of Allowance from U.S. Appl. No. 14/214,280, dated Jun. 29, 2017, 86 pages.
Notice of Allowance from U.S. Appl. No. 14/215,633, dated Jun. 30, 2017, 83 pages.
Notice of Allowance from U.S. Appl. No. 14/360,284, dated Jun. 14, 2017, 100 pages.
Notice of Allowance from U.S. Appl. No. 15/019,920, dated Jul. 14, 2017, 100 pages.
Notice of Allowance from U.S. Appl. No. 15/082,867, dated Jul. 7, 2017, 98 pages.
Notice of Final Rejection from foreign counterpart Korean Patent Application No. 10-2015-7029262, dated May 23, 2017, 7 pages.
Notice of Preliminary Rejection from foreign counterpart Korean Patent Application No. 10-2016-7017150, dated Apr. 20, 2017, 5 pages.
Notification of Reason for Refusal from foreign counterpart Korean Patent Application No. 10-2014-7016774, dated Jan. 28, 2016, 4 pages.
Notification of Reason for Refusal from foreign counterpart Korean Patent Application No. 10-2015-7029262, dated Oct. 17, 2016, 12 pages.
Notification of Reason for Refusal from foreign counterpart Korean Patent Application No. 10-2017-7002379, dated May 26, 2017, 6 pages.
Notification of Reason for Refusal from foreign counterpart Korean Patent Application No. 10-2017-7002473, dated May 26, 2017, 7 pages.
Notification to Grant Patent Right for Invention from foreign counterpart China Patent Application No. 201180076245.7, dated Nov. 2, 2016, 3 pages.
Office Action and Search Report from foreign counterpart Taiwan Patent Application No. 101117854, dated Mar. 30, 2017, 7 pages.
Third Office Action and Search report from foreign counterpart China Patent Application No. 201180076248.0, dated May 2, 2017, 27 pages.
Third Office Action from foreign counterpart Chinese Patent Application No. 201180076244.2, dated May 2, 2017, 20 pages.
Extended European Search Report for Application No. 14769450.9, dated Feb. 21, 2017, 16 pages.
Final Office Action from U.S. Appl. No. 14/214,280, dated Mar. 6, 2017, 12 pages.
Final Office Action from U.S. Appl. No. 14/214,280, dated Mar. 11, 2016, 18 pages.
Final Office Action from U.S. Appl. No. 14/214,280, dated Oct. 24, 2016, 20 pages.
Grant of Patent for Korean Application No. 10-2015-7028745, dated Nov. 30, 2016, 2 pages.
International Preliminary Report on Patentability for Application No. PCT/US2014/024775, dated Sep. 24, 2015, 6 pages.
International Search Report and Written Opinion for Application No. PCT/US2014/024775, dated Jun. 2, 2014, 7 pages.
Non-Final Office Action from U.S. Appl. No. 14/212,203, dated Mar. 24, 2017, 68 pages.
Non-Final Office Action from U.S. Appl. No. 14/214,280, dated Jan. 6, 2017, 14 pages.
Non-Final Office Action from U.S. Appl. No. 14/214,280, dated Jun. 16, 2016, 12 pages.
Non-Final Office Action from U.S. Appl. No. 14/214,280, dated Sep. 18, 2015, 18 pages.
Notification of Reason for Refusal from Korean Application No. 10-2015-7028745, dated May 23, 2016, 8 pages.
Office Action from foreign counterpart Taiwan Patent Application No. 20140109479, dated Nov. 28, 2016, 4 pages.
Partial SupplementaryEuropean Search Report for Application No. 14769450.9, dated Oct. 11, 2016, 8 pages.
Final Office Action from U.S. Appl. No. 15/219,063, dated Nov. 20, 2017, 27 pages.
Final Office Action from U.S. Appl. No. 15/354,742, dated Nov. 29, 2017, 20 pages.
Final Office Action from U.S. Appl. No. 15/354,857, dated Nov. 28, 2017, 23 pages.
Fourth Office Action and Search report from foreign counterpart China Patent Application No. 201180076248.0, dated Nov. 9, 2017, 38 pages. (Translation available only for office action).
Non-Final Office Action from U.S. Appl. No. 14/360,280, dated Dec. 14, 2017, 25 pages.
Notice of Allowance and Search Report from foreign counterpart Taiwan Patent Application No. 103109479, dated Nov. 30, 2017, 4 pages. (Translation available only for Search report).
Notice of Allowance from foreign counterpart Korean Patent Application No. 10-2017-7002379, dated Dec. 20, 2017, 3 pages.
Notice of Allowance from foreign counterpart Korean Patent Application No. 10-2017-7002473, dated Dec. 20, 2017, 6 pages.
Notice of Allowance from U.S. Appl. No. 13/475,708, dated Nov. 27, 2017, 24 pages.
Notice of Allowance from U.S. Appl. No. 14/216,859, dated Dec. 1, 2017, 113 pages.
Notice of Allowance from U.S. Appl. No. 15/019,920, dated Dec. 6, 2017, 24 pages.
Notice of Allowance from U.S. Appl. No. 15/082,867, dated Dec. 22, 2017, 25 pages.
Notice on Grant of Patent Right for Invention from foreign counterpart Chinese Patent Application No. 201280024054.0, dated Dec. 1, 2017, 4 pages.
Notice on Grant of Patent Right for Invention from foreign counterpart Chinese Patent Application No. 201280024095.X, dated Nov. 7, 2017, 6 pages.
Second Office Action from foreign counterpart China Patent Application No. 201480024463.X, dated Nov. 14, 2017, 34 pages.
Third Office Action from foreign counterpart Chinese Patent Application No. 201280024012.7, dated Nov. 6, 2017, 8 pages.
Final Office Action from U.S. Appl. No. 14/212,203, dated Sep. 12, 2017, 84 pages.
Final Office Action from U.S. Appl. No. 14/212,533, dated Sep. 8, 2017, 69 pages.
Final Office Action from U.S. Appl. No. 14/360,282, dated Oct. 4, 2017, 22 pages.
Non-Final Office Action from U.S. Appl. No. 15/354,857, dated Sep. 12, 2017, 111 pages.
Notice of Allowance from foreign counterpart Korean Patent Application No. 10-2015-7029262, dated Aug. 31, 2017, 3 pages.
Notice of Allowance from foreign counterpart Korean Patent Application No. 10-2016-7017150, dated Oct. 30, 2017, 3 pages.
Notice of Allowance from U.S. Appl. No. 14/213,692, dated Sep. 28, 2017, 112 pages.
Notice of Allowance from U.S. Appl. No. 14/213,730, dated Aug. 31, 2017, 96 pages.
Notice of Allowance from U.S. Appl. No. 14/214,045, dated Oct. 6, 2017, 137 pages.
Notice of Allowance from U.S. Appl. No. 14/214,176, dated Oct. 19, 2017, 25 pages.
Notice of Allowance from U.S. Appl. No. 14/215,633, dated Oct. 4, 2017, 15 pages.
Notice of Allowance from U.S. Appl. No. 14/733,827, dated Sep. 22, 2017, 30 pages.
Notice of Allowance from U.S. Appl. No. 15/257,593, dated Oct. 11, 2017, 95 pages.
Notice on Grant of Patent Right for Invention from foreign counterpart China Application No. 201180076244.2, dated Aug. 28, 2017, 4 pages.
Office Action and Search Report from foreign counterpart Taiwan Patent Application No. 100142885, dated Jan. 23, 2017, 12 pages.
Third Office Action from foreign counterpart China Patent Application No. 201280024054.0, dated Jul. 28, 2017, 8 pages.
Final Office Action from U.S. Appl. No. 14/360,280, dated Aug. 10, 2017, 103 pages.
Non-final Office Action from U.S. Appl. No. 15/354,742, dated Aug. 25, 2017, 152 pages.
Non-final Office Action from U.S. Appl. No. 15/357,943, dated Aug. 25, 2017, 111 pages.
Notice of Allowance from U.S. Appl. No. 13/428,438, dated Aug. 7, 2017, 42 pages.
Notice of Allowance from U.S. Appl. No. 14/213,135, dated Aug. 3, 2017, 103 pages.
Notice of Allowance from U.S. Appl. No. 14/216,493, dated Aug. 4, 2017, 95 pages.
First Office Action and Search Report from foreign counterpart Chinese Patent Application No. 201480024528.0, dated Jan. 26, 2018, 19 pages.
First Office Action and Search Report from foreign counterpart Chinese Patent Application No. 201480024832.5, dated Feb. 6, 2018, 15 pages. (Translation available only for office action).
Advisory Action from U.S. Appl. No. 14/212,203, dated Apr. 5, 2018, 8 pages.
Non-Final Office Action from U.S. Appl. No. 15/862,496, dated Apr. 5, 2018, 151 pages.
Notice of Allowance from U.S. Appl. No. 14/216,493, dated Apr. 2, 2018, 22 pages.
Advisory Action from U.S. Appl. No. 14/360,282, dated Jan. 23, 2018, 2 pages.
Corrected Notice of Allowance from U.S. Appl. No. 13/475,708, dated Feb. 14, 2018, 27 pages.
Corrected Notice of Allowance from U.S. Appl. No. 13/475,708, dated Feb. 26, 2018, 31 pages.
Corrected Notice of Allowance from U.S. Appl. No. 15/357,943, dated Apr. 2, 2018, 4 pages.
Non-Final Office Action from U.S. Appl. No. 13/824,013, dated Feb. 7, 2018, 141 pages.
Notice of Allowance from U.S. Appl. No. 15/219,063, dated Mar. 19, 2018, 28 pages.
Notice of Allowance from U.S. Appl. No. 15/357,943, dated Jan. 16, 2018, 16 pages.
Communication pursuant to Article 94(3) EPC for Application No. 14770976.0, dated Mar. 16, 2018, 4 pages.
Communication pursuant to Article 94(3) EPC for Application No. 11876128.7, dated Feb. 5, 2018, 9 pages.
Communication pursuant to Article 94(3) EPC for Application No. 11876130.3, dated Feb. 5, 2018, 9 pages.
Communication pursuant to Article 94(3) EPC for Application No. 11876314.3, dated Feb. 5, 2018, 8 pages.
Communication pursuant to Article 94(3) EPC for Application No. 12789667.8, dated Feb. 21, 2018, 4 pages.
Intention to grant from foreign counterpart European Patent Application No. 12788989.7, dated Feb. 23, 2018, 47 pages.
Notice on Grant of Patent Right for Invention from foreign counterpart Chinese Patent Application No. 201180076248.1, mailed Feb. 27, 2018, 6 pages.
Notice on Grant of Patent Right for Invention from foreign counterpart Chinese Patent Application No. 201280024012.7, mailed Mar. 12, 2018, 4 pages.
Related Publications (1)
Number Date Country
20160210145 A1 Jul 2016 US
Provisional Applications (1)
Number Date Country
61467944 Mar 2011 US
Continuations (1)
Number Date Country
Parent 13428440 Mar 2012 US
Child 15082359 US