A central processing unit (CPU) or graphics processing unit (GPU) of a computer may include a microprocessor. The microprocessor may be configured to execute code compiled to its native instruction-set architecture (ISA) in addition to certain non-native ISAs.
When the microprocessor encounters non-native instructions, blocks of the non-native instructions may be converted to native instructions and may also be optimized—e.g., to increase speed of execution. Optimized blocks of native instructions corresponding to the original non-native instructions may be stored in an instruction cache for future use. However, code optimization may require significant computational effort. Optimizing every code block encountered by the microprocessor may present an unacceptable performance overhead in some systems.
Aspects of this disclosure will now be described by example and with reference to the illustrated embodiments listed above. Components that may be substantially the same in one or more embodiments are identified coordinately and are described with minimal repetition. It will be noted, however, that elements identified coordinately may also differ to some degree. The claims appended to this description uniquely define the subject matter claimed herein. The claims are not limited to the example structures or numerical ranges set forth below, nor to implementations that address the herein-identified problems or disadvantages of the current state of the art.
Instruction memory 14 and data memory 16 may each be readable and writable by the microprocessor through a hierarchical memory cache system. In the illustrated embodiment, the memory cache system includes an off-core, level-three (L3) cache 20 and an on-core, level-two (L2) cache 22, in addition to instruction- and data-specific level-one (L1) caches, as described below. In other embodiments, the memory cache system may include any number of levels, with the levels residing on- or off-chip. The memory cache system may be operatively coupled to a memory controller (not shown in the drawings) which can also be on- or off-chip. Embodied in random-access memory of any suitable variant, the instruction and data memories may correspond to different physical memory structures or to different parts of the same physical memory structure. In some embodiments, the instruction and data memories may also include read-only memory (ROM).
Continuing in
IFU 24 may be configured to retrieve instruction code of various forms. In addition to instructions natively executable by the execution units of core 18, the instruction fetch unit may also retrieve instructions compiled to a non-native instruction set architecture (ISA). Such non-native instructions may require decoding or translation into the native ISA to be recognized by the execution units. To this end, processing system 10 includes hardware decoder 34. When the IFU retrieves a non-native instruction, it routes that instruction to execution units 40 through the hardware decoder. When it retrieves a native instruction, that instruction is routed directly to the execution units, by-passing the hardware decoder. The execution units may include integer and/or floating-point componentry, for example.
Hardware decoder 34 is a logic structure arranged in core 18 and configured to selectively decode instructions for execution in the core. In particular, the hardware decoder decodes non-native instructions retrieved by IFU 24. The hardware decoder parses op-codes, operands, and addressing modes of the non-native instructions, and creates a functionally equivalent, but non-optimized set of native instructions.
Continuing in
As instructions are executed in the execution units of core 18, a sequence of logical and/or arithmetic results evolves therein. The write-back logic of the execution units stores these results in the appropriate registers of the core. In some embodiments, memory access 42 have the exclusive task of enacting store and load operations to and from data memory 16, via L1 data cache 46.
The basic functionality of processing system 10 can be represented in the form of a processing pipeline.
In some scenarios, pipeline 50 may process only one instruction at a time. The instruction being processed may occupy only one stage of the pipeline, leaving the remaining stages unused during one or more clock cycles. For increased instruction throughput, two or more stages of the pipeline may be used simultaneously, to process two or more instructions. In ideally ‘scalar’ execution, a first instruction may be fetched, a second instruction decoded, a result of a third instruction computed, that of a fourth instruction committed to memory, and that of a fifth instruction written back to the register file, all in the same clock cycle. No aspect of
As noted above, processing system 10 may be configured to execute instructions conforming to one or more non-native ISAs in addition to the native ISA of microprocessor 12. One illustrative example of a non-native ISA that processing system 10 may be configured to execute is the 64-bit Advanced RISC Machine (ARM) instruction set; another is the x86 instruction set. Indeed, the full range of non-native ISAs here contemplated includes reduced instruction-set computing (RISC) and complex instruction-set computing (CISC) ISAs, very long instruction-word (VLIW) ISAs, and the like. The ability to execute selected non-native instructions provides a practical advantage for the processing system, in that it may be used to execute code compiled for pre-existing processing systems.
Returning now to
Optionally and selectively, translator 62 may optimize as well as translate a specified block 63 of non-native instructions. In particular, the non-native instructions may be converted into functionally equivalent block 64 of native instructions, optimized for speed of execution in processing system 10. Alternatively, or in addition, the translated instructions may be optimized to reduce power consumption. In the embodiments considered herein, various modes of optimization may be available to the translator. These include features common in so-called out-of-order processing systems, such as register renaming and instruction re-ordering, where individual instructions of the optimized block are resequenced relative to corresponding instructions of the non-native block. These features are set forth as non-limiting examples; the translator may employ a wide variety of techniques to produce optimized native translations. Moreover, it will be noted that the term ‘block’ as used herein can refer to a sequence of instructions of virtually any length; it is not limited to the so-called ‘basic-block’ as known in the art.
In some embodiments, translation manager 48 may be configured to store the translated and optimized code block 64 in trace cache 66. In the embodiment illustrated in
More particularly, IFU 24, on retrieving a non-native instruction, may supply the address of that instruction to THASH 32. The THASH correlates the address of the non-native instruction with the address of the corresponding optimized translation, if such a translation exists. If there is a hit in the THASH, the address of the optimized translation is returned to the IFU, which in turn retrieves the optimized translation from trace cache 66 using that address. The translation is then piped through for execution in the execution units of core 18 without use of hardware decoder 34. At the boundaries of each block of optimized, native code, the translation manager makes available to the programmer a fully compatible architectural set of state.
At 72 of method 70, a block of non-native instruction code is retrieved from instruction memory 14 through the IFU of a microprocessor core. In one embodiment, the instructions retrieved may comprise a code block starting at a branch-target address. At 74 it is determined whether hardware decoding is desired for this block of code. Hardware decoding may be preferred over software translation when the optimization aspect of the software translation is expected to provide relatively little improvement in overall performance. For example, hardware decoding may be preferred if it is predicted that the code block will be executed infrequently, or that there will be an especially high overhead associated with optimized translation. If hardware decoding is desired, then the method advances to 76, where the block retrieved is passed through hardware decoder 34 and decoded for execution in core 18. However, if hardware decoding is not desired, then the method advances to 78, where the block is submitted to translation manager 48. Following appropriate conversion in either the hardware decoder or the translation manager, the native code corresponding to the retrieved block of non-native code is executed, at 80. Thus, instructions translated by the translator are executed without further processing by the hardware decoder. The method then returns to 72, where a subsequent block of non-native code is retrieved.
Naturally, there is a performance overhead associated with creating an optimized translation using the translation manager, which is a software structure. Further, the performance benefit of any optimization may scale with the frequency with which the optimized code is executed in lieu of slower, non-optimized code. It may be advantageous, therefore, to submit frequently executed code for optimization and to decode infrequently executed code in hardware, without optimization.
In principle, a software data structure such as an array may be used to keep track of the frequency of execution of the various blocks of non-native code in instruction memory 14. This array could be stored in off-core memory and contain counts of commonly encountered branch-target addresses and data representing how often they were taken. However, this approach is costly at runtime because every branch instruction encountered by the microprocessor could potentially require redirection to the translation manager, merely to update the array.
To address this issue and provide still other advantages, processing system 10 includes, as shown in
The registers of BCT 84A are addressable for reading and writing by translation manager 48. Each register 88A is addressable for reading via a read index and for writing via a write index. As noted above, each non-native code block to be translated and optimized may start at a branch-target address. The various registers of the BCT are addressable, accordingly, through one or more hashed forms of the branch-target address. The hashing feature enables mapping of a manageable number of BCT registers to a much larger number of branch target addresses. In the embodiment illustrated in
When a read-enabled register receives a signal indicating that a branch is being taken—e.g., from a common clock line—that signal causes the contents of the register to be output, via a DOUT bus, to decrement unit 96. The decrement unit subtracts one from the tally received therein and provides the decremented tally to the DIN bus. The decremented tally is rewritten back to the same register when that register is write-enabled via selection logic 94. If the decremented value equals zero, then interrupt-on-zero unit 98 generates an interrupt in core 18, which is received by translation manager 48 and may trigger subsequent action, as further described herein. A tally of zero is reached when the branch corresponding to that register has been taken a desired number of times—i.e., when the branch is ‘saturated’. In this manner, the BCT may be configured to raise an interrupt in microprocessor 12 when any of the registers reach zero, and to make the branch-target address of the saturating branch available to an interrupt-trapping translation-manager procedure.
Returning now to
At 108 of method 78 it is determined whether the branch-target address is already stored in THASH 32. If the branch-target address is already stored in the THASH, then the method advances to 110, where the IP is redirected to the optimized translation corresponding to the non-native code block received. In this manner, the optimized native code block is executed by the processing system without further use of the hardware decoder.
However, if the branch-target address is not already stored in the trace cache, then the method advances to 112. At 112 various bits of the branch-target address are hashed to create a read index and a write index to the appropriate register of BCT 84. In one embodiment, the read index and the write index may be different indices. At 114 an entry corresponding to the read address is read from the BCT. At 116 this entry is decremented in value—e.g., decremented by one. At 118 it is determined whether the decremented value is equal to zero. The presence of a zero value in the register may be a condition for causing the BCT to invoke the translator. In one non-limiting example, the translator may be invoked by raising an interrupt in the processing system.
In the illustrated example, if the decremented value is equal to zero, then the method advances to 120; otherwise the method advances to 122, where the decremented value is written back to the BCT at the write index computed at 112. Then, at 123, the non-native block of code is decoded in the hardware decoder and executed. In this manner, an appropriate register of the BCT tallies how many times the hardware decoder has decoded the code block.
It will be noted that the operational details of method 78 should not be understood in a limiting sense, for numerous variations are contemplated as well. At 116, for instance, the contents of the register may by incremented instead of decremented. In some examples, accordingly, underflow or overflow of the register may be a condition for invoking the translator.
Continuing in
At 126 it is determined whether translation and optimization of the code block is already in progress. If translation and optimization are already in progress, then the method advances to 123, where the hardware decoder is invoked to avoid having to wait for the optimization to be completed; otherwise, the method advances to 128. In this and other embodiments, the decision at 126 may reflect other conditions that influence whether a non-native code block should or should not be translated/optimized at this point.
At 128 the code block is translated and optimized to generate an optimized, native code block using translator 62 (
At 130 the optimized native code block is stored in trace cache 66 for subsequent execution in the processing system. From this point, execution of the method continues at 110, where the IP is redirected to the optimized native code.
It will be understood, that the systems and methods described hereinabove are embodiments of this disclosure—non-limiting examples for which numerous variations and extensions are contemplated as well. Accordingly, this disclosure includes all novel and non-obvious combinations and sub-combinations of the such systems and methods, as well as any and all equivalents thereof.
Number | Name | Date | Kind |
---|---|---|---|
3815101 | Boss et al. | Jun 1974 | A |
3950729 | Fletcher et al. | Apr 1976 | A |
4654790 | Woffinden | Mar 1987 | A |
4797814 | Brenza | Jan 1989 | A |
4812981 | Chan et al. | Mar 1989 | A |
5123094 | MacDougall | Jun 1992 | A |
5179669 | Peters | Jan 1993 | A |
5245702 | McIntyre et al. | Sep 1993 | A |
5278962 | Masuda et al. | Jan 1994 | A |
5414824 | Grochowski | May 1995 | A |
5446854 | Khalidi et al. | Aug 1995 | A |
5487146 | Guttag et al. | Jan 1996 | A |
5526504 | Hsu et al. | Jun 1996 | A |
5649102 | Yamauchi et al. | Jul 1997 | A |
5649184 | Hayashi et al. | Jul 1997 | A |
5696925 | Koh | Dec 1997 | A |
5721855 | Hinton et al. | Feb 1998 | A |
5870582 | Cheong et al. | Feb 1999 | A |
5949785 | Beasley | Sep 1999 | A |
5956753 | Glew et al. | Sep 1999 | A |
5963984 | Garibay, Jr. et al. | Oct 1999 | A |
5999189 | Kajiya et al. | Dec 1999 | A |
6012132 | Yamada et al. | Jan 2000 | A |
6031992 | Cmelik et al. | Feb 2000 | A |
6091897 | Yates et al. | Jul 2000 | A |
6091987 | Thompson | Jul 2000 | A |
6118724 | Higginbottom | Sep 2000 | A |
6297832 | Mizuyabu et al. | Oct 2001 | B1 |
6298390 | Matena et al. | Oct 2001 | B1 |
6362826 | Doyle et al. | Mar 2002 | B1 |
6457115 | McGrath | Sep 2002 | B1 |
6470428 | Milway et al. | Oct 2002 | B1 |
6499090 | Hill et al. | Dec 2002 | B1 |
6519694 | Harris | Feb 2003 | B2 |
6549997 | Kalyanasundharam | Apr 2003 | B2 |
6636223 | Morein | Oct 2003 | B1 |
6658538 | Arimilli et al. | Dec 2003 | B2 |
6711667 | Ireton | Mar 2004 | B1 |
6714904 | Torvalds et al. | Mar 2004 | B1 |
6742104 | Chauvel et al. | May 2004 | B2 |
6751583 | Clarke et al. | Jun 2004 | B1 |
6813699 | Belgard | Nov 2004 | B1 |
6823433 | Barnes et al. | Nov 2004 | B1 |
6839813 | Chauvel | Jan 2005 | B2 |
6859208 | White | Feb 2005 | B1 |
6877077 | McGee et al. | Apr 2005 | B2 |
6883079 | Priborsky | Apr 2005 | B1 |
6950925 | Sander et al. | Sep 2005 | B1 |
6978462 | Adler et al. | Dec 2005 | B1 |
6981083 | Arimilli et al. | Dec 2005 | B2 |
7007075 | Coffey | Feb 2006 | B1 |
7010648 | Kadambi et al. | Mar 2006 | B2 |
7062631 | Klaiber et al. | Jun 2006 | B1 |
7082508 | Khan et al. | Jul 2006 | B2 |
7107411 | Burton et al. | Sep 2006 | B2 |
7107441 | Zimmer et al. | Sep 2006 | B2 |
7117330 | Alverson et al. | Oct 2006 | B1 |
7120715 | Chauvel et al. | Oct 2006 | B2 |
7124327 | Bennett et al. | Oct 2006 | B2 |
7139876 | Hooker | Nov 2006 | B2 |
7159095 | Dale et al. | Jan 2007 | B2 |
7162612 | Henry et al. | Jan 2007 | B2 |
7191349 | Kaushik et al. | Mar 2007 | B2 |
7194597 | Willis et al. | Mar 2007 | B2 |
7194604 | Bigelow et al. | Mar 2007 | B2 |
7203932 | Gaudet et al. | Apr 2007 | B1 |
7225355 | Yamazaki et al. | May 2007 | B2 |
7234038 | Durrant | Jun 2007 | B1 |
7275246 | Yates, Jr. et al. | Sep 2007 | B1 |
7310722 | Moy et al. | Dec 2007 | B2 |
7340582 | Madukkarumukumana et al. | Mar 2008 | B2 |
7340628 | Pessolano | Mar 2008 | B2 |
7401358 | Christie et al. | Jul 2008 | B1 |
7406585 | Rose et al. | Jul 2008 | B2 |
7447869 | Kruger et al. | Nov 2008 | B2 |
7519781 | Wilt | Apr 2009 | B1 |
7545382 | Montrym et al. | Jun 2009 | B1 |
7702843 | Chen et al. | Apr 2010 | B1 |
7730489 | Duvur et al. | Jun 2010 | B1 |
7752627 | Jones et al. | Jul 2010 | B2 |
7873793 | Rozas et al. | Jan 2011 | B1 |
7890735 | Tran | Feb 2011 | B2 |
7921300 | Crispin et al. | Apr 2011 | B2 |
7925923 | Hyser et al. | Apr 2011 | B1 |
8035648 | Wloka et al. | Oct 2011 | B1 |
8190863 | Fossum et al. | May 2012 | B2 |
8364902 | Hooker et al. | Jan 2013 | B2 |
8533437 | Henry et al. | Sep 2013 | B2 |
8549504 | Breternitz, Jr. et al. | Oct 2013 | B2 |
8621120 | Bender et al. | Dec 2013 | B2 |
8706975 | Glasco et al. | Apr 2014 | B1 |
8707011 | Glasco et al. | Apr 2014 | B1 |
8762127 | Winkel et al. | Jun 2014 | B2 |
9384001 | Hertzberg et al. | Jul 2016 | B2 |
9547602 | Klaiber et al. | Jan 2017 | B2 |
20010049818 | Banerjia et al. | Dec 2001 | A1 |
20020004823 | Anderson et al. | Jan 2002 | A1 |
20020013889 | Schuster et al. | Jan 2002 | A1 |
20020099930 | Sakamoto et al. | Jul 2002 | A1 |
20020108103 | Nevill | Aug 2002 | A1 |
20020169938 | Scott et al. | Nov 2002 | A1 |
20020172199 | Scott et al. | Nov 2002 | A1 |
20030014609 | Kissell | Jan 2003 | A1 |
20030018685 | Kalafatis et al. | Jan 2003 | A1 |
20030033507 | McGrath | Feb 2003 | A1 |
20030120892 | Hum et al. | Jun 2003 | A1 |
20030140245 | Dahan et al. | Jul 2003 | A1 |
20030167420 | Parsons | Sep 2003 | A1 |
20030172220 | Hao | Sep 2003 | A1 |
20030196066 | Mathews | Oct 2003 | A1 |
20030236771 | Becker | Dec 2003 | A1 |
20040025161 | Chauvel et al. | Feb 2004 | A1 |
20040054833 | Seal et al. | Mar 2004 | A1 |
20040078778 | Leymann et al. | Apr 2004 | A1 |
20040122800 | Nair et al. | Jun 2004 | A1 |
20040128448 | Stark et al. | Jul 2004 | A1 |
20040153350 | Kim et al. | Aug 2004 | A1 |
20040168042 | Lin | Aug 2004 | A1 |
20040193831 | Moyer | Sep 2004 | A1 |
20040215918 | Jacobs et al. | Oct 2004 | A1 |
20040225869 | Pagni et al. | Nov 2004 | A1 |
20040268071 | Khan et al. | Dec 2004 | A1 |
20050050013 | Ferlitsch | Mar 2005 | A1 |
20050055533 | Kadambi et al. | Mar 2005 | A1 |
20050086650 | Yates, Jr. et al. | Apr 2005 | A1 |
20050097276 | Lu et al. | May 2005 | A1 |
20050097280 | Hofstee et al. | May 2005 | A1 |
20050138332 | Kottapalli et al. | Jun 2005 | A1 |
20050154831 | Steely, Jr. et al. | Jul 2005 | A1 |
20050154867 | DeWitt, Jr. et al. | Jul 2005 | A1 |
20050207257 | Skidmore | Sep 2005 | A1 |
20050268067 | Lee et al. | Dec 2005 | A1 |
20060004984 | Morris et al. | Jan 2006 | A1 |
20060010309 | Chaudhry et al. | Jan 2006 | A1 |
20060069879 | Inoue et al. | Mar 2006 | A1 |
20060069899 | Schoinas et al. | Mar 2006 | A1 |
20060095678 | Bigelow et al. | May 2006 | A1 |
20060149931 | Haitham et al. | Jul 2006 | A1 |
20060174228 | Radhakrishnan et al. | Aug 2006 | A1 |
20060187945 | Andersen | Aug 2006 | A1 |
20060190671 | Jeddeloh | Aug 2006 | A1 |
20060195683 | Kissell | Aug 2006 | A1 |
20060230223 | Kruger et al. | Oct 2006 | A1 |
20060259732 | Traut et al. | Nov 2006 | A1 |
20060259744 | Matthes | Nov 2006 | A1 |
20060259825 | Cruickshank et al. | Nov 2006 | A1 |
20060277398 | Akkary et al. | Dec 2006 | A1 |
20060282645 | Tsien | Dec 2006 | A1 |
20060288174 | Nace et al. | Dec 2006 | A1 |
20070067505 | Kaniyur et al. | Mar 2007 | A1 |
20070073996 | Kruger et al. | Mar 2007 | A1 |
20070106874 | Pan et al. | May 2007 | A1 |
20070126756 | Glasco et al. | Jun 2007 | A1 |
20070157001 | Ritzau | Jul 2007 | A1 |
20070168634 | Morishita et al. | Jul 2007 | A1 |
20070168643 | Hummel et al. | Jul 2007 | A1 |
20070204137 | Tran | Aug 2007 | A1 |
20070234358 | Hattori | Oct 2007 | A1 |
20070240141 | Qin et al. | Oct 2007 | A1 |
20080141011 | Zhang et al. | Jun 2008 | A1 |
20080172657 | Bensal et al. | Jul 2008 | A1 |
20080263284 | da Silva et al. | Oct 2008 | A1 |
20090019317 | Quach et al. | Jan 2009 | A1 |
20090204785 | Yates, Jr. et al. | Aug 2009 | A1 |
20090327661 | Sperber et al. | Dec 2009 | A1 |
20090327673 | Yoshimatsu et al. | Dec 2009 | A1 |
20100161901 | Williamson et al. | Jun 2010 | A9 |
20100205402 | Henry et al. | Aug 2010 | A1 |
20100205415 | Henry et al. | Aug 2010 | A1 |
20100217936 | Carmichael et al. | Aug 2010 | A1 |
20100306503 | Henry et al. | Dec 2010 | A1 |
20110078425 | Shah et al. | Mar 2011 | A1 |
20110153307 | Winkel et al. | Jun 2011 | A1 |
20110307876 | Ottoni et al. | Dec 2011 | A1 |
20120023359 | Edmeades et al. | Jan 2012 | A1 |
20120089819 | Chaudhry et al. | Apr 2012 | A1 |
20120198157 | Abdallah | Aug 2012 | A1 |
20130198458 | Winkel et al. | Aug 2013 | A1 |
20130219370 | Beale et al. | Aug 2013 | A1 |
20130246709 | Segelken et al. | Sep 2013 | A1 |
20130275684 | Tuck et al. | Oct 2013 | A1 |
20130311752 | Brauch et al. | Nov 2013 | A1 |
20140019723 | Yamada et al. | Jan 2014 | A1 |
20140052962 | Hertzberg et al. | Feb 2014 | A1 |
20140082291 | Van Zoeren et al. | Mar 2014 | A1 |
20140136891 | Holmer et al. | May 2014 | A1 |
20140189310 | Tuck et al. | Jul 2014 | A1 |
20140281259 | Klaiber et al. | Sep 2014 | A1 |
20140281392 | Tuck et al. | Sep 2014 | A1 |
20150026443 | Kumar et al. | Jan 2015 | A1 |
Number | Date | Country |
---|---|---|
1390329 | Jan 2003 | CN |
1519728 | Aug 2004 | CN |
1629799 | Jun 2005 | CN |
1682181 | Oct 2005 | CN |
1823322 | Aug 2006 | CN |
1831757 | Sep 2006 | CN |
101042670 | Sep 2007 | CN |
101110074 | Jan 2008 | CN |
100378618 | Apr 2008 | CN |
101984403 | Mar 2011 | CN |
102110011 | Jun 2011 | CN |
102013218370 | Mar 2014 | DE |
0671718 | Sep 1995 | EP |
1557754 | Jul 2005 | EP |
2287111 | Sep 1995 | GB |
2404043 | Jan 2005 | GB |
2404044 | Jan 2005 | GB |
02288927 | Nov 1990 | JP |
03054660 | Mar 1991 | JP |
04182858 | Jun 1992 | JP |
200401187 | Jan 2004 | TW |
I232372 | May 2005 | TW |
I233545 | Jun 2005 | TW |
200537886 | Nov 2005 | TW |
I263938 | Oct 2006 | TW |
I275938 | Mar 2007 | TW |
200723111 | Jun 2007 | TW |
I282230 | Jun 2007 | TW |
I284281 | Jul 2007 | TW |
200809514 | Feb 2008 | TW |
I315488 | Oct 2009 | TW |
I315846 | Oct 2009 | TW |
201106264 | Feb 2011 | TW |
201135460 | Oct 2011 | TW |
201220183 | May 2012 | TW |
1425418 | Feb 2014 | TW |
2012103209 | Aug 2012 | WO |
Entry |
---|
Rozas, Guillermo J. et al., “Queued Instruction Re-Dispatch After Runahead,” U.S. Appl. No. 13/730,407, filed Dec. 28, 2012, 36 pages. |
Adve, S. et al., “Shared Memory Consistency models: A Turorial”, WRL Research Report 95/7, Western Digital Laboratory, Sep. 1995, 32 pages. |
Chaudhuri, “The impact of NACKs in shared memory scientific applications”, Feb. 2004, IEEE, IEEE Transactions on Parallel and distributed systems vol. 15, No. 2, p. 134-150. |
Chaudry, S. et al., “High-Performance Throughput Computing,” Micro, IEEE 25.3, pp. 32-45, May 2005, 14 pages. |
Dehnert et al., “The Transmeta Code MorphingTM Software: Using Speculation, Recovery, and Adaptive Retranslation to Address Real-Life Challenges,” Mar. 23, 2003, IEEE, CGO '03 Proceedings of the International Symposium on Code generation and optimization: feedback-directed and runtime optimization, pp. 15-24. |
Dundas, J. et al., “Improving Date Cache Performance by Pre-executing Instructions Under a Cache Miss”, Proceedings of the 1997 International Conference on Supercomputing, Jul. 1997, 9 pages. |
Ekman, M. et al., “Instruction Categorization for Runahead Operation”, U.S. Appl. No. 13/708,544, filed Dec. 7, 2012, 32 Pages. |
Ekman, M. et al., “Selective Poisoning of Data During Runahead”, U.S. Appl. No. 13/662,171, filed Oct. 26, 2012, 33 pages. |
Guelfi et al., (Rapid Integration of Software Engineering Techniques) 2005, Second International Workshop, 9 pages. |
Harper et al., (Rapid recovery from transient Faults n the fault tolerant processor with fault-tolerant shared memory) 1990, IEEE, p. 350-359. |
Holmer, B., et al., “Managing Potentially Invalid Results During Runahead”, U.S. Appl. No. 13/677,085, filed Nov. 14, 2012, 29 pages. |
Mutlu, O. et al. “Runahead Execution: An Alternative to Very large Instruction Windows for Out-of-order Processors,” This paper appears in: “High-Performance Computer Architecture,” Feb. 8-12, 2003, 13 pages. |
Wikipedia, Physical Address, Apr. 17, 2010, pp. 1-2, www.wikipedia.com. |
Ooi, (Fault Tolerant Architecture in a cache memory control LSI), 1992, IEEE, 507-514. |
Oracle, (Oracle 8i Parallel server), 1999, Oracle, Release 2 (8.1.6) 1-216. |
Osronline, (The Basics: So what is a Page fault?), http://www.osronline.com/article.cfm?article=222, May 5, 2003, p. 1-2. |
PC Magazine (Definition of: Page Fault) PCMag.com, Mar. 27, 2009. |
Rotenberg et al., “A Trace Cache Microarchitecture and Evaluation, ” IEEE Transactions on Computers, vol. 48, No. 2, Feb. 1999, 10 pages. |
Rotenberg et al., “Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching,” Proceedings of th 29th Annual International Symposium on Microarchitecture, Dec. 2-4, 1996, Paris, France, IEEE, 12 pages. |
Rotenberg et al., “Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching,” <http://people.engr.ncsu.edu/ericro/publications/techreport_MICRO-29_rbs.pdf>, Apr. 11, 1996, 48 pages. |
Rozas, J. et al., “Lazy Runahead Operation for a Microprocessor”, U.S. Appl. No. 13/708,645, filed Dec. 7, 2012, 32 pages. |
Shalan, (Dynamic Memory Management for embedded real-time multiprocessor system on a chip), 2000, ACM, 180-186. |
Shalan, (Dynamic Memory Management for embedded real-time multiprocessor system on a chip), 2003, Georgia Inst. Of Tech. 1-118. |
Wikipedia article, “Instruction Prefetch,” https://en.wikipedia.org/wiki/Instruction_prefetch, downloaded May 23, 2016. |
Wikipedia article, “x86,” https://en.wikipedia.org/wiki/X86, downloaded May 23, 2016. |
Wikipedia, (CPU Cache definition), Wikipedia, Jan. 26, 2010, pp. 1-16. |
Wikipedia, (Page Fault definition), Wikipedia, Mar. 9, 2009, pp. 1-4. |
Wikipedia, Memory Address, Oct. 29, 2010, pp. 1-4, www.wikipedia.com. |
Intel Itanium Architecture Software Develope's Manual, Intel, http://www.intel.com/design/itanium/manuals/iasdmanual.htm, Mar. 8, 2013, 1 page. |
Laibinis, et al., “Formal Development of Reactive Fault Tolerant Systems”, Sep. 9, 2005, Springer, Second Workshop, RISE 2005, p. 234-249. |
Number | Date | Country | |
---|---|---|---|
20130311752 A1 | Nov 2013 | US |