1. Field of the Invention
The present invention is related to processing systems and processors, and more specifically to a pipelined processor core with dynamic instruction stream mapping.
2. Description of Related Art
In present-day processor cores, pipelines are used to execute multiple hardware threads corresponding to multiple instruction streams, so that more efficient use of processor resources can be provided through resource sharing and by allowing execution to proceed even while one or more hardware threads are waiting on an event.
In existing systems, specific resources and pipelines are typically allocated for execution of the different instruction streams and multiple pipelines allow program execution to continue even during conditions when a pipeline is busy. However, resources are still tied up for pipelines that are busy, and when all the pipeline(s) assigned to an instruction stream are busy, the instruction stream is stalled, reducing the potential throughput of the processor core.
It would therefore be desirable to provide a method for processing program instructions that provides improved flexibility and throughput.
The invention is embodied in a method of operation of a processor core.
The processor core includes multiple parallel instruction execution slices for executing multiple instruction streams in parallel and multiple dispatch queues coupled by a dispatch routing network to the execution slices. The method controls the dispatch routing network such that the relationship between the dispatch queues and the instruction execution slices is dynamically varied according to execution requirements for the instruction streams and the availability of resources in the instruction execution slices. In some embodiments, the instruction execution slices may be dynamically reconfigured as between single-instruction-multiple-data (SIMD) instruction execution and ordinary instruction execution on a per-instruction basis, permitting the mixture of those instruction types. In other embodiments, instructions having an operand width greater than the width of a single instruction execution slice may be processed by multiple instruction execution slices dynamically configured to act in concert for the particular instructions requiring greater operand width. In other embodiments, when an instruction execution slice is busy processing one or more previously accepted instructions for one of the streams, another instruction execution slice can be selected to perform execution of a next instruction for the stream, permitting an instruction stream to proceed with execution even while one of the instruction execution slices is stalled.
The foregoing and other objectives, features, and advantages of the invention will be apparent from the following, more particular, description of the preferred embodiment of the invention, as illustrated in the accompanying drawings.
The novel features believed characteristic of the invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives, and advantages thereof, will best be understood by reference to the following detailed description of the invention when read in conjunction with the accompanying Figures, wherein like reference numerals indicate like components, and:
The present invention relates to methods of operation in processors and processing systems in which conventional pipelines are replaced with execution slices that can be assigned arbitrarily to execute instructions, in particular when a slice executing a current instruction for a stream is busy, and in which slices can be combined on-the-fly to execute either wider instructions or single-instruction-multiple-data (SIMD) instructions requiring multiple slices to handle the multiple data. Multiple dispatch queues are provided to receive multiple instruction streams and the dispatch queues are coupled to the instruction execution slices via a dispatch routing network so that the dispatch routing network can be controlled to perform the above dynamic reconfiguration of the relationship between the instruction execution slices and the dispatch queues according to the availability of the instruction execution slices and/or the requirements for instruction processing. A plurality of cache slices are coupled to the instruction execution slices via a result routing network so that the cache slices can also be varied in relationship with the instruction execution slices according to availability or according to other criteria. The result routing network provides communication of results and operands needed for further processing by instruction execution slices and/or cache slices.
Referring now to
Referring now to
The load-store portion of the instruction execution cycle, (i.e., the operations performed to maintain cache consistency as opposed to internal register reads/writes), is performed by a plurality of cache slices LS0-LS7, which are coupled to instruction execution slices ES0-ES7 by a write-back (result) routing network 24. In the depicted embodiment, any of cache slices LS0-LS7 can be used to perform load-store operation portion of an instruction for any of instruction execution slices ES0-ES7, but that is not a requirement of the invention. Instruction execution slices ES0-ES7 may issue internal instructions concurrently to multiple pipelines, e.g., an instruction execution slice may simultaneously perform an execution operation and a load/store operation and/or may execute multiple arithmetic or logical operations using multiple internal pipelines. The internal pipelines may be identical, or may be of discrete types, such as floating-point, scalar, load/store, etc. Further, a given execution slice may have more than one port connection to write-back routing network 24, for example, a port connection may be dedicated to load-store connections to cache slices LS0-LS7, while another port may be used to communicate values to and from other slices, such as special-purposes slices, or other instruction execution slices. Write-back results are scheduled from the various internal pipelines of instruction execution slices ES0-ES7 to write-back port(s) that connect instruction execution slices ES0-ES7 to write-back routing network 24. A load-store routing network 28 couples cache slices LS0-LS7 to provide conversion transfers for execution of SIMD instructions, processing of instructions with data width greater than a width of cache slices LS0-LS7 and other operations requiring translation or re-alignment of data between cache slices LS0-LS7. An I/O routing network 26 couples cache slices LS0-LS7 to a pair of translation slices XS0, XS1 that provide access to a next higher-order level of cache or system memory that may be integrated within, or external to, processor core 20. While the illustrated example shows a matching number of cache slices LS0-LS7 and execution slices ES0-ES7, in practice, a different number of each type of slice can be provided according to resource needs for a particular implementation. As mentioned above, dispatch routing network 22 is a unidirectional network, but can also take the form of a cross-point network as shown, as may load-store routing network 28 and I/O routing network 26.
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
Referring now to
While the invention has been particularly shown and described with reference to the preferred embodiments thereof, it will be understood by those skilled in the art that the foregoing and other changes in form, and details may be made therein without departing from the spirit and scope of the invention.
The present application is a Continuation of U.S. patent application Ser. No. 14/274,927, filed on May 12, 2014 and claims priority thereto under 35 U.S.C. 120. The disclosure of the above-referenced parent U.S. patent application is incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
4858113 | Saccardi | Aug 1989 | A |
5055999 | Frank et al. | Oct 1991 | A |
5095424 | Woffinden et al. | Mar 1992 | A |
5471593 | Branigin | Nov 1995 | A |
5475856 | Kogge | Dec 1995 | A |
5553305 | Gregor | Sep 1996 | A |
5630149 | Bluhm | May 1997 | A |
5680597 | Kumar | Oct 1997 | A |
5822602 | Thusoo | Oct 1998 | A |
5996068 | Dwyer, III et al. | Nov 1999 | A |
6044448 | Agrawal et al. | Mar 2000 | A |
6073215 | Snyder | Jun 2000 | A |
6073231 | Bluhm et al. | Jun 2000 | A |
6092175 | Levy et al. | Jul 2000 | A |
6112019 | Chamdani et al. | Aug 2000 | A |
6119203 | Snyder et al. | Sep 2000 | A |
6138230 | Hervin et al. | Oct 2000 | A |
6145054 | Mehrotra et al. | Nov 2000 | A |
6170051 | Dowling | Jan 2001 | B1 |
6212544 | Borkenhagen et al. | Apr 2001 | B1 |
6286027 | Dwyer, III et al. | Sep 2001 | B1 |
6311261 | Chamdani et al. | Oct 2001 | B1 |
6336183 | Le et al. | Jan 2002 | B1 |
6356918 | Chuang et al. | Mar 2002 | B1 |
6381676 | Aglietti et al. | Apr 2002 | B2 |
6425073 | Roussel et al. | Jul 2002 | B2 |
6463524 | Delaney et al. | Oct 2002 | B1 |
6549930 | Chrysos et al. | Apr 2003 | B1 |
6564315 | Keller et al. | May 2003 | B1 |
6728866 | Kahle et al. | Apr 2004 | B1 |
6732236 | Favor | May 2004 | B2 |
6839828 | Gschwind et al. | Jan 2005 | B2 |
6847578 | Ayukawa et al. | Jan 2005 | B2 |
6868491 | Moore | Mar 2005 | B1 |
6883107 | Rodgers et al. | Apr 2005 | B2 |
6944744 | Ahmed et al. | Sep 2005 | B2 |
6948051 | Rivers et al. | Sep 2005 | B2 |
6954846 | Leibholz et al. | Oct 2005 | B2 |
6978459 | Dennis | Dec 2005 | B1 |
7020763 | Saulsbury et al. | Mar 2006 | B2 |
7024543 | Grisenthwaite et al. | Apr 2006 | B2 |
7086053 | Long et al. | Aug 2006 | B2 |
7114163 | Hardin et al. | Sep 2006 | B2 |
7124160 | Saulsbury et al. | Oct 2006 | B2 |
7155600 | Burky et al. | Dec 2006 | B2 |
7191320 | Hooker et al. | Mar 2007 | B2 |
7263624 | Marchand et al. | Aug 2007 | B2 |
7290261 | Burky et al. | Oct 2007 | B2 |
7302527 | Barrick et al. | Nov 2007 | B2 |
7386704 | Schulz et al. | Jun 2008 | B2 |
7469318 | Chung et al. | Dec 2008 | B2 |
7478225 | Brooks et al. | Jan 2009 | B1 |
7512724 | Dennis et al. | Mar 2009 | B1 |
7565652 | Janssen et al. | Jul 2009 | B2 |
7600096 | Parthasarathy et al. | Oct 2009 | B2 |
7669035 | Young et al. | Feb 2010 | B2 |
7669036 | Brown et al. | Feb 2010 | B2 |
7694112 | Barowski et al. | Apr 2010 | B2 |
7721069 | Ramchandran et al. | May 2010 | B2 |
7793278 | Du et al. | Sep 2010 | B2 |
7836317 | Marchand et al. | Nov 2010 | B2 |
7889204 | Hansen et al. | Feb 2011 | B2 |
7926023 | Okawa et al. | Apr 2011 | B2 |
7975134 | Gonion | Jul 2011 | B2 |
7987344 | Hansen et al. | Jul 2011 | B2 |
8046566 | Abernathy et al. | Oct 2011 | B2 |
8074224 | Nordquist et al. | Dec 2011 | B1 |
8099556 | Ghosh et al. | Jan 2012 | B2 |
8103852 | Bishop et al. | Jan 2012 | B2 |
8108656 | Katragadda et al. | Jan 2012 | B2 |
8135942 | Abernathy et al. | Mar 2012 | B2 |
8141088 | Morishita et al. | Mar 2012 | B2 |
8166282 | Madriles et al. | Apr 2012 | B2 |
8250341 | Schulz et al. | Aug 2012 | B2 |
8386751 | Ramchandran et al. | Feb 2013 | B2 |
8464025 | Yamaguchi et al. | Jun 2013 | B2 |
8555039 | Rychlik | Oct 2013 | B2 |
8683182 | Hansen et al. | Mar 2014 | B2 |
8713263 | Bryant | Apr 2014 | B2 |
8966232 | Tran | Feb 2015 | B2 |
8984264 | Karlsson et al. | Mar 2015 | B2 |
20020194251 | Richter et al. | Dec 2002 | A1 |
20040111594 | Feiste et al. | Jun 2004 | A1 |
20040181652 | Ahmed | Sep 2004 | A1 |
20040216101 | Burky et al. | Oct 2004 | A1 |
20070022277 | Iwamura et al. | Jan 2007 | A1 |
20080133885 | Glew | Jun 2008 | A1 |
20080313424 | Gschwind | Dec 2008 | A1 |
20090037698 | Nguyen | Feb 2009 | A1 |
20090113182 | Abernathy et al. | Apr 2009 | A1 |
20100100685 | Kurosawa et al. | Apr 2010 | A1 |
20120066482 | Gonion | Mar 2012 | A1 |
20120110271 | Boersma et al. | May 2012 | A1 |
20140215189 | Airaud et al. | Jul 2014 | A1 |
20140244239 | Nicholson et al. | Aug 2014 | A1 |
20150324204 | Eisen | Nov 2015 | A1 |
Number | Date | Country |
---|---|---|
101021778 | Aug 2007 | CN |
101676865 | Mar 2010 | CN |
101876892 | Nov 2010 | CN |
102004719 | Apr 2011 | CN |
Entry |
---|
International Search Report and Written Opinion in PCT/IB2015/052741 mailed on Oct. 9, 2015, 10 pages (pp. 1-10 in pdf). |
U.S. Appl. No. 14/594,716, filed Jan. 12, 2015, Eisen, et al. |
U.S. Appl. No. 14/595,549, filed Jan. 13, 2015, Brownscheidle, et al. |
U.S. Appl. No. 14/595,635, filed Jan. 13, 2015, Ayub, et al. |
U.S. Appl. No. 14/480,680, filed Sep. 9, 2014, Boersma, et al. |
U.S. Appl. No. 14/574,644, filed Dec. 18, 2014, Boersma, et al. |
List of IBM Patents or Patent Applications Treated as Related, 2 pages. |
U.S. Appl. No. 14/723,940, filed May 28, 2015, Eisen, et al. |
U.S. Appl. No. 14/724,073, filed May 28, 2015, Brownscheidle, et al. |
U.S. Appl. No. 14/724,268, filed May 28, 2015, Ayub, et al. |
U.S. Appl. No. 14/501,152, filed Sep. 30, 2014, Chu, et al. |
U.S. Appl. No. 14/869,305, filed Sep. 29, 2015, Chu, et al. |
Gebhart et al., “A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors”, ACM Transactions on Computer Systems, vol. 30, No. 2, Article 8, Publication date: Apr. 2012, pp. 8:1-8:38, © 2012 ACM, <http://doi.acm.org/10.1145/2166879.2166882>. |
“Method and system for Implementing “Register Threads” in a Simultaneously-Multithreaded (SMT) Processor Core”, An IP.com Prior Art Database Technical Disclosure, Authors et. al.: Disclosed Anonymously, IP.com No. IPCOM000199825D, IP.com Electronic Publication: Sep. 17, 2010, pp. 1-4, <http://ip.com/IPCOM/000199825>. |
List of IBM Patents or Patent Applications Treated as Related, 3 pages. |
Czajkowski, et al., “Resource Management for Extensible Internet Servers”, Proceedings of the 8th ACM SIGOPS European Workshop on Support for Composing Distributed Applications, Sep. 1998, pp. 33-39, ACM, Portugal. |
Bridges, et al., “A CPU Utilization Limit for Massively Parallel MIMD Computers”, Fourth Symposium on the Frontiers of Massively Parallel Computing, Oct. 19-21, 1992, pp. 83-92, IEEE, VA, US. |
Office Action in U.S. Appl. No. 14/274,927 mailed on Mar. 9, 2016, 32 pages (pp. 1-32 in pdf). |
U.S. Appl. No. 14/274,927, filed May 12, 2014, Eisen, et al. |
U.S. Appl. No. 14/274,942, filed May 12, 2014, Eisen, et al. |
U.S. Appl. No. 14/302,589, filed Jun. 12, 2014, Eisen, et al. |
Pechanek, et al., “ManArray Processor Interconnection Network: An Introduction”, Euro-Par' 99 Parallel Processing, Lecture Notes in Computer Science, 5th International Euro-Par Conference, Aug. 31-Sep. 3, 1999 Proceedings, pp. 761-765, vol. 1685, Spring Berlin Heidelberg, Toulouse, France. |
Pechanek, et al., “The ManArray Embedded Processor Architecture”,Proceedings of the 26th Euromicro Conference, IEEE Computer Society, Sep. 5-7, 2000, pp. 348-355, vol. 1, Maastricht. |
List of IBM Patents or Patent Applications Treated as Related, 1 page. |
Final Office Action in U.S. Appl. No. 14/274,927 mailed on Aug. 12, 2016, 25 pages. (pp. 1-25 in pdf). |
Notice of Allowance in U.S. Appl. No. 14/274,927 mailed on Oct. 24, 2016, 10 pages. (pp. 1-10 in pdf). |
Number | Date | Country | |
---|---|---|---|
20150324206 A1 | Nov 2015 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 14274927 | May 2014 | US |
Child | 14300563 | US |