The present invention is related to processing systems and processors, and more specifically to a pipelined processor core that includes execution slices having a recirculating load-store queue.
In present-day processor cores, pipelines are used to execute multiple hardware threads corresponding to multiple instruction streams, so that more efficient use of processor resources can be provided through resource sharing and by allowing execution to proceed even while one or more hardware threads are waiting on an event.
In existing processor cores, and in particular processor cores that are divided into multiple execution slices instructions are dispatched to the execution slice(s) and are retained in the issue queue until issued to an execution unit. Once an issue queue is full, additional operations cannot typically be dispatched to a slice. Since the issue queue contains not only operations, but operands and state/control information, issue queues are resource-intensive, requiring significant power and die area to implement.
It would therefore be desirable to provide a processor core having reduced issue queue requirements.
The invention is embodied in a processor core, an execution unit circuit and a method. The method is a method of operation of the processor core, and the processor core is a processor core that includes the execution unit circuit.
The execution unit circuit includes an issue queue that receives a stream of instructions including functional operations and load-store operations, and multiple execution pipelines including a load-store pipeline that computes effective addresses of load operations and store operations, and issues the load operations and store operations to a cache unit. The execution unit circuit also includes a recirculation queue that stores entries corresponding to the load operations and the store operations and control logic for controlling the issue queue, the load-store pipeline and the recirculation queue. The control logic operates so that after the load-store pipeline has computed the effective address of a load operation or a store operation, the effective address of the load operation or the store operation is written to the recirculation queue and the load operation or the store operation is removed from the issue queue so that if one of the load operations or store operations are rejected by the cache unit, they are subsequently reissued to the cache unit from the recirculation queue.
The foregoing and other objectives, features, and advantages of the invention will be apparent from the following, more particular, description of the preferred embodiment of the invention, as illustrated in the accompanying drawings.
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives, and advantages thereof, will best be understood by reference to the following detailed description of the invention when read in conjunction with the accompanying Figures, wherein like reference numerals indicate like components, and:
The present invention relates to an execution slice for inclusion in a processor core that manages an internal issue queue by moving load/store (LS) operation entries to a recirculation queue once the effective address (EA) of the LS operation has been computed. The LS operations are issued to a cache unit and if they are rejected, the LS operations are subsequently re-issued from the recirculation queue rather than from the original issue queue entry. Since the recirculation queue entries only require storage for the EA for load operations and the EA and store value for store operations, power and area requirements are reduced for a given number of pending LS issue queue entries in the processor. In contrast, the issue queue entries are costly in terms of area and power due to the need to store operands, relative addresses and other fields such as conditional flags that are not needed for executing the LS operations once the EA is resolved.
Referring now to
Referring now to
The load-store portion of the instruction execution cycle, (i.e., the operations performed to maintain cache consistency as opposed to internal register reads/writes), is performed by a plurality of load-store (LS) slices LS0-LS7, which manage load and store operations as between instruction execution slices ES0-ES7 and a cache memory formed by a plurality of cache slices CS0-CS7 which are partitions of a lowest-order cache memory. Cache slices CS0-CS3 are assigned to partition CLA and cache slices CS4-CS7 are assigned to partition CLB in the depicted embodiment and each of load-store slices LS0-LS7 manages access to a corresponding one of the cache slices CS0-CS7 via a corresponding one of dedicated memory buses 40. In other embodiments, there may be not be a fixed partitioning of the cache, and individual cache slices CS0-CS7 or sub-groups of the entire set of cache slices may be coupled to more than one of load-store slices LS0-LS7 by implementing memory buses 40 as a shared memory bus or buses. Load-store slices LS0-LS7 are coupled to instruction execution slices ES0-ES7 by a write-back (result) routing network 37 for returning result data from corresponding cache slices CS0-CS7, such as in response to load operations. Write-back routing network 37 also provides communications of write-back results between instruction execution slices ES0-ES7. Further details of the handling of load/store (LS) operations between instruction execution slices ES0-ES7, load-store slices LS0-LS7 and cache slices CS0-CS7 is described in further detail below with reference to
Instruction execution slices ES0-ES7 may issue internal instructions concurrently to multiple pipelines, e.g., an instruction execution slice may simultaneously perform an execution operation and a load/store operation and/or may execute multiple arithmetic or logical operations using multiple internal pipelines. The internal pipelines may be identical, or may be of discrete types, such as floating-point, scalar, load/store, etc. Further, a given execution slice may have more than one port connection to write-back routing network 37, for example, a port connection may be dedicated to load-store connections to load-store slices LS0-LS7, or may provide the function of AGEN bus 38 and/or data bus 39, while another port may be used to communicate values to and from other slices, such as special-purposes slices, or other instruction execution slices. Write-back results are scheduled from the various internal pipelines of instruction execution slices ES0-ES7 to write-back port(s) that connect instruction execution slices ES0-ES7 to write-back routing network 37. Cache slices CS0-CS7 are coupled to a next higher-order level of cache or system memory via I/O bus 41 that may be integrated within, or external to, processor core 20. While the illustrated example shows a matching number of load-store slices LS0-LS7 and execution slices ES0-ES7, in practice, a different number of each type of slice can be provided according to resource needs for a particular implementation.
Within processor core 20, an instruction sequencer unit (ISU) 30 includes an instruction flow and network control block 57 that controls dispatch routing network 36, write-back routing network 37, AGEN bus 38 and store data bus 39. Network control block 57 also coordinates the operation of execution slices ES0-ES7 and load-store slices LS0-LS7 with the dispatch of instructions from dispatch queues Disp0-Disp7. In particular, instruction flow and network control block 57 selects between configurations of execution slices ES0-ES7 and load-store slices LS0-LS7 within processor core 20 according to one or more mode control signals that allocate the use of execution slices ES0-ES7 and load-store slices LS0-LS7 by a single thread in one or more single-threaded (ST) modes, and multiple threads in one or more multi-threaded (MT) modes, which may be simultaneous multi-threaded (SMT) modes. For example, in the configuration shown in
Referring now to
Referring now to
Referring now to
Execution slice 42AA includes multiple internal execution pipelines 74A-74C and 72 that support out-of-order and simultaneous execution of instructions for the instruction stream corresponding to execution slice 42AA. The instructions executed by execution pipelines 74A-74C and 72 may be internal instructions implementing portions of instructions received over dispatch routing network 32, or may be instructions received directly over dispatch routing network 32, i.e., the pipelining of the instructions may be supported by the instruction stream itself, or the decoding of instructions may be performed upstream of execution slice 42AA. Execution pipeline 72 is a load-store (LS) pipeline that executes LS instructions, i.e., computes effective addresses (EAs) from one or more operands. A recirculation queue (DARQ) 78 is controlled according to logic as illustrated above with reference to
Referring now to
While the invention has been particularly shown and described with reference to the preferred embodiments thereof, it will be understood by those skilled in the art that the foregoing and other changes in form, and details may be made therein without departing from the spirit and scope of the invention.
The present Application is a Continuation of U.S. patent application Ser. No. 16/049,038, filed on Jul. 30, 2018 and claims priority thereto under 35 U.S.C. § 120. The disclosure of the above-referenced parent U.S. Patent Application is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
4858113 | Saccardi | Aug 1989 | A |
5055999 | Frank | Oct 1991 | A |
5095424 | Woffinden | Mar 1992 | A |
5471593 | Branigin | Nov 1995 | A |
5475856 | Kogge | Dec 1995 | A |
5553305 | Gregor | Sep 1996 | A |
5630139 | Ozaki | May 1997 | A |
5630149 | Bluhm | May 1997 | A |
5680597 | Kumar | Oct 1997 | A |
5822602 | Thusoo | Oct 1998 | A |
5996068 | Dwyer, III | Nov 1999 | A |
6026478 | Dowling | Feb 2000 | A |
6044448 | Agrawal | Mar 2000 | A |
6073215 | Snyder | Jun 2000 | A |
6073231 | Bluhm | Jun 2000 | A |
6092175 | Levy | Jul 2000 | A |
6112019 | Chamdani | Aug 2000 | A |
6119203 | Snyder | Sep 2000 | A |
6138230 | Hervin | Oct 2000 | A |
6145054 | Mehrotra | Nov 2000 | A |
6170051 | Dowling | Jan 2001 | B1 |
6212544 | Borkenhagen | Apr 2001 | B1 |
6219780 | Lipasti | Apr 2001 | B1 |
6237081 | Le | May 2001 | B1 |
6286027 | Dwyer, III | Sep 2001 | B1 |
6311261 | Chamdani | Oct 2001 | B1 |
6336183 | Le | Jan 2002 | B1 |
6356918 | Chuang | Mar 2002 | B1 |
6381676 | Aglietti | Apr 2002 | B2 |
6425073 | Roussel | Jul 2002 | B2 |
6463524 | Delaney | Oct 2002 | B1 |
6487578 | Ranganathan | Nov 2002 | B2 |
6498051 | Watanabe | Dec 2002 | B1 |
6549930 | Chrysos | Apr 2003 | B1 |
6564315 | Keller | May 2003 | B1 |
6725358 | Moore | Apr 2004 | B1 |
6728866 | Kahle | Apr 2004 | B1 |
6732236 | Favor | May 2004 | B2 |
6839828 | Gschwind | Jan 2005 | B2 |
6846725 | Nagarajan et al. | Jan 2005 | B2 |
6868491 | Moore | Mar 2005 | B1 |
6883107 | Rodgers | Apr 2005 | B2 |
6944744 | Ahmed | Sep 2005 | B2 |
6948051 | Rivers | Sep 2005 | B2 |
6954846 | Leibholz | Oct 2005 | B2 |
6978459 | Dennis | Dec 2005 | B1 |
7020763 | Saulsbury | Mar 2006 | B2 |
7024543 | Grisenthwaite | Apr 2006 | B2 |
7035998 | Nemirovsky | Apr 2006 | B1 |
7086053 | Long | Aug 2006 | B2 |
7093105 | Webb, Jr. | Aug 2006 | B2 |
7100028 | McGrath | Aug 2006 | B2 |
7114163 | Hardin | Sep 2006 | B2 |
7117345 | Janik et al. | Oct 2006 | B2 |
7124160 | Saulsbury | Oct 2006 | B2 |
7155600 | Burky | Dec 2006 | B2 |
7191320 | Hooker | Mar 2007 | B2 |
7251594 | Krishnan | Jul 2007 | B2 |
7263624 | Marchand | Aug 2007 | B2 |
7290261 | Burky | Oct 2007 | B2 |
7302527 | Barrick | Nov 2007 | B2 |
7386704 | Schulz | Jun 2008 | B2 |
7395419 | Gonion | Jul 2008 | B1 |
7398374 | Delano | Jul 2008 | B2 |
7469318 | Chung | Dec 2008 | B2 |
7478198 | Latorre | Jan 2009 | B2 |
7478225 | Brooks | Jan 2009 | B1 |
7490220 | Balasubramonian | Feb 2009 | B2 |
7512724 | Dennis | Mar 2009 | B1 |
7565652 | Janssen | Jul 2009 | B2 |
7600096 | Parthasarathy | Oct 2009 | B2 |
7656013 | Horiuchi | Feb 2010 | B2 |
7669035 | Young | Feb 2010 | B2 |
7669036 | Brown | Feb 2010 | B2 |
7694112 | Barowski | Apr 2010 | B2 |
7707390 | Ozer | Apr 2010 | B2 |
7721069 | Ramchandran | May 2010 | B2 |
7793278 | Du | Sep 2010 | B2 |
7836317 | Marchand | Nov 2010 | B2 |
7865769 | Luick | Jan 2011 | B2 |
7889204 | Hansen | Feb 2011 | B2 |
7890735 | Tran | Feb 2011 | B2 |
7926023 | Okawa | Apr 2011 | B2 |
7975134 | Gonion | Jul 2011 | B2 |
7979677 | Nguyen | Jul 2011 | B2 |
7987344 | Hansen | Jul 2011 | B2 |
8028152 | Glew | Sep 2011 | B2 |
8041928 | Burky | Oct 2011 | B2 |
8046566 | Abernathy | Oct 2011 | B2 |
8074224 | Nordquist | Dec 2011 | B1 |
8078833 | Wang | Dec 2011 | B2 |
8099556 | Ghosh | Jan 2012 | B2 |
8103852 | Bishop | Jan 2012 | B2 |
8108656 | Katragadda | Jan 2012 | B2 |
8135942 | Abernathy | Mar 2012 | B2 |
8140832 | Mejdrich | Mar 2012 | B2 |
8141088 | Morishita | Mar 2012 | B2 |
8166282 | Madriles | Apr 2012 | B2 |
8219783 | Hara | Jul 2012 | B2 |
8250341 | Schulz | Aug 2012 | B2 |
8335892 | Minkin | Dec 2012 | B1 |
8386751 | Ramchandran | Feb 2013 | B2 |
8412914 | Gonion | Apr 2013 | B2 |
8464024 | Makphaibulchoke | Jun 2013 | B2 |
8464025 | Yamaguchi | Jun 2013 | B2 |
8489791 | Byrne | Jul 2013 | B2 |
8555039 | Rychlik | Oct 2013 | B2 |
8578140 | Yokoi | Nov 2013 | B2 |
8656401 | Venkataramanan | Feb 2014 | B2 |
8683182 | Hansen | Mar 2014 | B2 |
8700877 | Shebanow | Apr 2014 | B2 |
8713263 | Bryant | Apr 2014 | B2 |
8732438 | Caprioli | May 2014 | B2 |
8850121 | Ashcraft | Sep 2014 | B1 |
8949572 | Kurosawa | Feb 2015 | B2 |
8966232 | Tran | Feb 2015 | B2 |
8984264 | Karlsson | Mar 2015 | B2 |
9207995 | Boersma | Dec 2015 | B2 |
9223709 | O'Bleness | Dec 2015 | B1 |
9250899 | Gschwind | Feb 2016 | B2 |
9262174 | Fetterman | Feb 2016 | B2 |
9323739 | Nicholson | Apr 2016 | B2 |
9417879 | Wilkerson | Aug 2016 | B2 |
9424045 | Airaud | Aug 2016 | B2 |
9448936 | Goel | Sep 2016 | B2 |
9519484 | Stark | Dec 2016 | B1 |
9639369 | Blasco | May 2017 | B2 |
9665372 | Eisen | May 2017 | B2 |
9672043 | Eisen | Jun 2017 | B2 |
9690585 | Eisen | Jun 2017 | B2 |
9690586 | Eisen | Jun 2017 | B2 |
9720696 | Chu | Aug 2017 | B2 |
9740486 | Boersma | Aug 2017 | B2 |
9760375 | Boersma | Sep 2017 | B2 |
9817667 | Spadini | Nov 2017 | B2 |
9842005 | Abdallah | Dec 2017 | B2 |
9870229 | Chu | Jan 2018 | B2 |
9971602 | Eisen | May 2018 | B2 |
9977678 | Eisen | May 2018 | B2 |
10019263 | Abdallah | Jul 2018 | B2 |
10048964 | Abdallah | Aug 2018 | B2 |
10157064 | Eisen | Dec 2018 | B2 |
10223125 | Brownscheidle | Mar 2019 | B2 |
10419366 | Kim | Sep 2019 | B1 |
10983800 | Eisen | Apr 2021 | B2 |
20020194251 | Richter | Dec 2002 | A1 |
20030120882 | Granston | Jun 2003 | A1 |
20040111594 | Feiste | Jun 2004 | A1 |
20040162966 | Webb, Jr. | Aug 2004 | A1 |
20040216101 | Burky | Oct 2004 | A1 |
20050138290 | Hammarlund | Jun 2005 | A1 |
20060095710 | Pires Dos Reis Moreira | May 2006 | A1 |
20070022277 | Iwamura | Jan 2007 | A1 |
20070204137 | Tran | Aug 2007 | A1 |
20070226470 | Krimer | Sep 2007 | A1 |
20070226471 | Kapustin et al. | Sep 2007 | A1 |
20090113182 | Abernathy | Apr 2009 | A1 |
20090172370 | Butler | Jul 2009 | A1 |
20090198981 | Levitan | Aug 2009 | A1 |
20090265514 | Biles | Oct 2009 | A1 |
20110161616 | Tarjan | Jun 2011 | A1 |
20120246450 | Abdallah | Sep 2012 | A1 |
20120278590 | Lin | Nov 2012 | A1 |
20130028332 | Le Leannec | Jan 2013 | A1 |
20130054939 | Felch | Feb 2013 | A1 |
20130212585 | Tran | Aug 2013 | A1 |
20150134935 | Blasco | May 2015 | A1 |
20160103715 | Sethia | Apr 2016 | A1 |
20160117174 | Chadha | Apr 2016 | A1 |
20160202986 | Ayub | Jul 2016 | A1 |
20160202988 | Ayub | Jul 2016 | A1 |
20160202990 | Brownscheidle | Jul 2016 | A1 |
20160202992 | Brownscheidle | Jul 2016 | A1 |
20170168837 | Eisen | Jun 2017 | A1 |
20170364356 | Ayub | Dec 2017 | A1 |
20180039577 | Chadha | Feb 2018 | A1 |
20180067746 | Chu | Mar 2018 | A1 |
20180150300 | Eisen | May 2018 | A1 |
20180150395 | Cordes | May 2018 | A1 |
Number | Date | Country |
---|---|---|
101021778 | Aug 2007 | CN |
101706714 | May 2010 | CN |
101710272 | May 2010 | CN |
102122275 | Jul 2011 | CN |
101876892 | Jul 2013 | CN |
101676865 | Nov 2013 | CN |
102004719 | May 2015 | CN |
2493209 | Feb 2016 | GB |
0374721 | Mar 1991 | JP |
11252179 | Sep 1999 | JP |
2006114036 | Apr 2006 | JP |
2007172610 | Jul 2007 | JP |
2009009570 | Jan 2009 | JP |
2011522317 | Jul 2011 | JP |
2013521557 | Jun 2013 | JP |
2017516215 | Jun 2017 | JP |
2011082690 | Jul 2011 | WO |
Entry |
---|
G. G. Pechanek and S. Vassiliadis, “The ManArray/sup TM/ embedded processor architecture,” Proceedings of the 26th Euromicro Conference. Euromicro 2000. Informatics: Inventing the Future, 2000, pp. 348-355 vol. 1, doi: 10.1109/EURMIC.2000.874652. |
Gerald G. Pechanek, Stamatis Vassiliadis, and Nikos Pitsianis. 1999. ManArray Processor Interconnection Network: An Introduction. In <i>Proceedings of the 5th International Euro-Par Conference on Parallel Processing</i> (</i>Euro-Par '99</i>). Springer-Verlag, Berlin, Heidelberg, 761-765. |
Grzegorz Czajkowski, Chi-Chao Chang, Chris Hawblitzel, Deyu Hu, and Thorsten von Eicken. 1998. Resource management for extensible Internet servers. In <i>Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications</i> (<i>EW 8</i>). Association for Computing Machinery, New York, NY, USA, 33-39. DOI:https://doi.org/10.1145/319195.319201. |
List of IBM Patents or Patent Applications Treated as Related, 3 pages. |
Mark Gebhart, Daniel R. Johnson, David Tarjan, Stephen W. Keckler, William J. Dally, Erik Lindholm, and Kevin Skadron. 2012. A Hierarchical Thread Schedulerand Register File for Energy-Efficient Throughput Processors. </i>ACM Trans. Comput. Syst.</i> 30, 2, Article 8 (Apr. 2012), 38 pages. DOI:https://doi.org/10.1145/2166879.2166882. |
Method and system for Implementing “Register Threads” in a Simultaneously-Multithreaded (SMT) Processor Core, An IP. com Prior Art Database Technical Disclosure, Authors et. al.: Disclosed Anonymously, IP.com No. IPCOM000199825D, IP.com Electronic Publication: Sep. 17, 2010, pp. 1-4, <http://ip.com/IPCOM/000199825>. |
Partial Translation of Office Action for JP Application 2017-530696 Teruyama, “Instruction window buffers with issued instruction buffers,” Toshiba technology bulletin, pp. 25-28. vol. 20(14) published 2002-0600, Toshiba Corp., Mar. 18, 2002. |
T. Bridges, S. W. Kitchel and R. M. Wehrmeister, “A CPU utilization limit for massively parallel MIMD computers,” [Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation, 1992, pp. 83-92, doi: 10.1109/FMPC.1992.234902, https://ieeexplore.ieee.org/document/234902. |
Balasubramonian, Rajeev: Lecture Notes: Out-of-Order Processors, CS 7820 Parallel Computer Architecture, Spring 2008. URL: http://www.cs.utah.edu/classes/cs7820-rajeev/, Archived in http://www.archive.org on Apr. 18, 2008 [retrieved on Dec. 19, 2022]. |
Chen, Shelley and Morris, Jennifer. (2002) Out of Order Memory Accesses Using a Load Wait Buffer, class project for 18-741 Advanced Computer Architecture, University of California, Berkeley, 2002. https://users.ece.cmu.edu/˜schen1/ <Retrieved Jan. 3, 2023>. |
DE112015004983.5—Translated Germany Patent Office Action dated Dec. 20, 2022. |
CN Office Action Publication No. 201580024347.2, dated Jun. 20, 2018 (13 pgs). |
Zhang, Zhao: Lecture 6: Tomasulo Algorithm (II), CprE 585 Advanced Computer, Iowa State University, 2003. URL: https://home.engineering.iastate.edu/˜zzhang/courses-bak/cpre585-f03/lectures.htm [abgerufen am Oct. 19, 2022]. |
Number | Date | Country | |
---|---|---|---|
20210406023 A1 | Dec 2021 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 16049038 | Jul 2018 | US |
Child | 17467882 | US | |
Parent | 14595635 | Jan 2015 | US |
Child | 16049038 | US |