The present invention is related to the following commonly-owned, co-pending United States Patent Applications filed on even date herewith, the entire contents and disclosure of each of which is expressly incorporated by reference herein as if fully set forth herein. U.S. patent application Ser. No. 11/768,777, for “A SHARED PERFORMANCE MONITOR IN A MULTIPROCESSOR SYSTEM”; U.S. patent application Ser. No. 11/768,645, for “OPTIMIZED COLLECTIVES USING A DMA ON A PARALLEL COMPUTER”; U.S. Pat. No. 7,694,035, for “DMA SHARED BYTE COUNTERS IN A PARALLEL COMPUTER”; U.S. Pat. No. 7,788,334, for “MULTIPLE NODE REMOTE MESSAGING”; U.S. patent application Ser. No. 11/768,697, for “A METHOD AND APPARATUS OF PREFETCHING STREAMS OF VARYING PREFETCH DEPTH”; U.S. Pat. No. 7,827,391, for “METHOD AND APPARATUS FOR SINGLE-STEPPING COHERENCE EVENTS IN A MULTIPROCESSOR SYSTEM UNDER SOFTWARE CONTROL”; U.S. Pat. No. 7,669,012, for “INSERTION OF COHERENCE EVENTS INTO A MULTIPROCESSOR COHERENCE PROTOCOL”; U.S. patent application Ser. No. 11/768,791, for “METHOD AND APPARATUS TO DEBUG AN INTEGRATED CIRCUIT CHIP VIA SYNCHRONOUS CLOCK STOP AND SCAN”; U.S. Pat. No. 7,802,025, for “DMA ENGINE FOR REPEATING COMMUNICATION PATTERNS”; U.S. Pat. No. 7,680,971, for “METHOD AND APPARATUS FOR A CHOOSE-TWO MULTI-QUEUE ARBITER”; U.S. patent application Ser. No. 11/768,800, for “METHOD AND APPARATUS FOR EFFICIENTLY TRACKING QUEUE ENTRIES RELATIVE TO A TIMESTAMP”; U.S. Pat. No. 7,701,846, for “BAD DATA PACKET CAPTURE DEVICE”; U.S. patent application Ser. No. 11/768,593, for “EXTENDED WRITE COMBINING USING A WRITE CONTINUATION HINT FLAG”; U.S. Pat. No. 7,793,038, for “A SYSTEM AND METHOD FOR PROGRAMMABLE BANK SELECTION FOR BANKED MEMORY SUBSYSTEMS”; U.S. Pat. No. 7,761,687, for “AN ULTRASCALABLE PETAFLOP PARALLEL SUPERCOMPUTER”; U.S. patent application Ser. No. 11/768,810, for “SDRAM DDR DATA EYE MONITOR METHOD AND APPARATUS”; U.S. Pat. No. 7,797,503, for “A CONFIGURABLE MEMORY SYSTEM AND METHOD FOR PROVIDING ATOMIC COUNTING OPERATIONS IN A MEMORY DEVICE”; U.S. patent application Ser. No. 11/768,559, for “ERROR CORRECTING CODE WITH CHIP KILL CAPABILITY AND POWER SAVING ENHANCEMENT”; U.S. patent application Ser. No. 11/768,552, for “STATIC POWER REDUCTION FOR MIDPOINT-TERMINATED BUSSES”; U.S. patent application Ser. No. 11/768,527, for “COMBINED GROUP ECC PROTECTION AND SUBGROUP PARITY PROTECTION”; U.S. patent application Ser. No. 11/768,669, for “A MECHANISM TO SUPPORT GENERIC COLLECTIVE COMMUNICATION ACROSS A VARIETY OF PROGRAMMING MODELS”; U.S. patent application Ser. No. 11/768,813, for “MESSAGE PASSING WITH A LIMITED NUMBER OF DMA BYTE COUNTERS”; U.S. Pat. No. 7,738,443, for “ASYNCRONOUS BROADCAST FOR ORDERED DELIVERY BETWEEN COMPUTE NODES IN A PARALLEL COMPUTING SYSTEM WHERE PACKET HEADER SPACE IS LIMITED”; U.S. patent application Ser. No. 11/768,682, for “HARDWARE PACKET PACING USING A DMA IN A PARALLEL COMPUTER”; and U.S. patent application Ser. No. 11/768,752, for “POWER THROTTLING OF COLLECTIONS OF COMPUTING ELEMENTS”.
1. Field of the Invention
This invention generally relates to multiprocessor computer systems, and more specifically, to coherent, shared memory multiprocessor computer systems. Even more specifically, the preferred embodiment of the invention relates to a method and system for flexible and programmable coherence traffic partitioning for multiprocessor systems.
2. Background Art
To achieve high performance computing, multiple individual processors have been interconnected to form multiprocessor computer systems capable of parallel processing. Multiple processors can be placed on a single chip, or several chips—each containing one or several processors—interconnected into a multiprocessor computer system.
Processors in a multiprocessor computer system use private cache memories because of their short access time and to reduce the number of memory requests to the main memory. However, managing caches in a multiprocessor system is complex. Multiple private caches introduce the multi-cache coherency problem (or stale data problem) due to multiple copies of main memory data that can concurrently exist in the caches of the multiprocessor system.
Multi-cache coherency can be maintained in a multiprocessor computer system by use of an appropriate coherence protocol. The protocols that maintain the coherence between multiple processors generally rely on coherence events sent between caches. For example, MESI is a common coherence protocol where every hardware cache line can be in one of four states: modified (M), exclusive (E), shared (S), or invalid (I). Line states are changed by memory references issued by the processors.
In a coherent multiprocessor system, a memory reference issued by one processor can affect the caches of other processors. For example, when a processor stores to a line, the coherence mechanism must insure that eventually all caches either have the new data or have no data for that line at all. This generally involves a good deal of inter-processor communication for testing the state of the line in the various caches and changing the state, if necessary. Commonly, such interprocessor communication is conducted by passing packets containing coherence protocol actions and responses between processors.
One group of cache coherence protocols is referred to as snooping. In a snooping approach, each cache keeps the sharing status of a block of physical memory locally. The caches are usually on a shared memory bus, and all cache controllers snoop (monitor) the bus to determine whether they have a copy of a requested data block.
A common hardware coherence protocol is based on invalidations. In this protocol, any number of caches can contain a read-only line, but these copies must be destroyed when any processor stores to the line. To do this, the cache corresponding to the storing processor sends invalidations to all the other caches before storing the new data into the line. If the caches are write-through, then the store also goes to main memory where all caches can see the new data. Otherwise a more complicated protocol is required when some other cache reads the line with the new data.
As multiprocessor systems scale both in size and speed, bus-based interconnects between processors become a limiting factor. A common replacement for a bus is a point-to-point network, where every processor has a dedicated communication channel to every other processor.
Also, as multiprocessor systems scale, it is desirable to share the capability of the system between multiple applications running simultaneously. In some cases, this can be done by running separate processes with a shared operating system. But in other cases, this sharing results in security concerns. In these cases, it is desirable to run multiple applications on dedicated processors, each with their own operating systems. What is needed is a mechanism for partitioning the processors into separate groups such that they can operate independently from one another. Ideally, the size of the groups and their number should be adjustable based on the needs of the applications. The coherence mechanism should partition along with the processors so that every group of processors remains consistent and operates just like a smaller version of the whole multiprocessor.
An object of this invention is to provide a method and system for partitioning the processors of a multiprocessor system into groups such that the groups can operate independently from one another.
Another object of the present invention is to partition the processors of a multiprocessor system into groups, where the sizes and number of the groups are adjustable.
A further object of the invention is to partition the processors of a multiprocessor system having a cache coherence mechanism into groups, and to partition the cache coherency mechanism along with the processors so that every group of processors remains consistent and operates just like a smaller version of the whole multiprocessor system.
Another object of this invention is to use a cache coherency mechanism to partition the processors of a multiprocessor system into logical groups.
These and other objectives are attained with a multiprocessor computing system and a method of logically partitioning a multiprocessor computing system. The multiprocessor computing system comprises a multitude of processing units and a multitude of snoop units. Each of the processing units includes a local cache, and the snoop units are provided for supporting cache coherency in the multiprocessor system. Each of the snoop units is connected to a respective one of the processing units and to all of the other snoop units. The multiprocessor computing system further includes a partitioning system for using the snoop units to partition the multitude of processing units into a plurality of independent, adjustable-size, memory-consistent processing groups. Preferably, when the processor units are partitioned into these processing groups, the partitioning system configures the snoop units to maintain cache coherency within each of said groups.
In the operation of the preferred multiprocessor computing system, data packets specifying memory references are sent from the processing units to the snoop units, and each of the snoop units includes a packet processor for processing said data packets. Also, the partitioning system includes a multitude of control mechanisms, and each of said control mechanisms is associated with one of the snoop units. Each of the control mechanisms blocks the associated snoop unit from processing selected ones of said data packets; and in particular, blocks the associated snoop unit from processing data packets coming from processing units outside the processing group to which the snoop unit belongs. In this way, the control mechanisms effect the desired logical partitioning of the processing units.
Further benefits and advantages of this invention will become apparent from a consideration of the following detailed description, given with reference to the accompanying drawings, which specify and show preferred embodiments of the invention.
Referring now to drawings, and more particularly to
To implement the memory coherence protocol, a snoop unit 140a, . . . , 140n is provided for each respective processor core 100a, . . . , 100n in the multiprocessor system 10. For transferring coherence requests, the preferred embodiment does not use the system bus 150, as typically found in prior art systems, but rather implements a point-to-point interconnection 160 whereby each processor's associated snoop unit is directly connected with each snoop unit associated with every other processor in the system. Thus, coherence requests are decoupled from all other memory requests transferred via the system local bus, reducing the congestion of the bus, which is often a system bottleneck. All coherence requests to a single processor are forwarded to the snoop unit 140a, . . . , 140n. The snoop units may optionally include one or more snoop filters that process incoming snoop requests and present only a fraction of all requests to the processors.
In order to achieve the partition of
The preferred embodiment of this invention uses a token flow-control protocol, which works as follows. The receiving end of the link can buffer some number of packets, and there is a token for every buffer slot. Initially, the sending end of the link holds all the tokens. The sender consumes a token for every packet that it sends. The receiver buffers the packets and then returns a token to the sender every time it has completed processing a packet and freed a packet buffer. An advantage of the token-based flow control protocol is that the link can remain completely busy as long as the tokens are returned at the same rate that packets are sent, and as long as there are enough tokens to last until the first token is returned (i.e., to cover the cumulative time of the packet transfer, the packet processing, and the token return).
Tokens are typically implemented as counters at the sending and receiving ends of a link. Data packets implicitly carry a token from the sender to the receiver, and there are various ways to return tokens from the receiver to the sender. One way is to send a special, dedicated packet with the token. Another way is to piggyback the token on a data packet going in the opposite direction, if the link is full-duplex.
Normally, the register 502 driving the select signal 504 is programmed with the value 1 so that the multiplexer 506 selects the token signal 510 coming from the packet processing logic 512 as the token_return signal 406, and allows the valid signal 404 to go through the AND gate 514 to the packet processing logic 512. In order to virtually “cut” the link, the register 502 is programmed with the value 0 so that the multiplexer 506 selects the valid signal 404 as the token_return signal 406, and the valid signal is blocked by the AND gate 514 from affecting the packet processing logic 512. Effectively, the valid signal 404 is looped-back as the token_return signal 406 and the packet processing logic 512 assumes that no packets are ever received.
The invention shown in
Those skilled in the art will recognize that a completely partitionable multiprocessor system can be formed by combining this invention with a physically-partitionable memory. For example, every memory request could carry a unique “coherence domain identifier” that could be used to determine which physical memory partition it would be directed to.
Those skilled in the art will recognize that this invention works equally well when the Snoop Rcv units 204 contain one or more snoop filters for eliminating unnecessary coherence requests, as this functionality is orthogonal to the operation of the coherence protocol. The preferred embodiment of the invention works so long as a token is returned for every coherence request, regardless of whether the request is presented to the processor or not.
As will be readily apparent to those skilled in the art, the present invention or aspects of the invention can be realized in hardware, or a combination of hardware and software. Any kind of computer/server system(s)—or other apparatus adapted for carrying out the methods described herein—is suited. A typical combination of hardware and software could be a general-purpose computer system with a computer program that, when loaded and executed, carries out methods described herein. Alternatively, a specific use computer, containing specialized hardware for carrying out one or more of the functional tasks of the invention, could be utilized.
The present invention or aspects of the invention can also be embodied in a computer program product, which comprises all the respective features enabling the implementation of the methods described herein, and which—when loaded in a computer system—is able to carry out these methods. Computer program, software program, program, or software, in the present context mean any expression, in any language, code or notation, of a set of instructions intended to cause a system having an information processing capability to perform a particular function either directly or after either or both of the following: (a) conversion to another language, code or notation; and/or (b) reproduction in a different material form.
While it is apparent that the invention herein disclosed is well calculated to fulfill the objects stated above, it will be appreciated that numerous modifications and embodiments may be devised by those skilled in the art, and it is intended that the appended claims cover all such modifications and embodiments as fall within the true spirit and scope of the present invention.
This invention was made with Government support under Contract No.: B554331, awarded by Department of Energy. The Government has certain rights to this invention.
Number | Name | Date | Kind |
---|---|---|---|
4777595 | Strecker et al. | Oct 1988 | A |
5063562 | Barzilai et al. | Nov 1991 | A |
5142422 | Zook et al. | Aug 1992 | A |
5349587 | Nadeau-Dostie et al. | Sep 1994 | A |
5353412 | Douglas et al. | Oct 1994 | A |
5452432 | Macachor | Sep 1995 | A |
5524220 | Verma et al. | Jun 1996 | A |
5634007 | Calta et al. | May 1997 | A |
5659710 | Sherman et al. | Aug 1997 | A |
5708779 | Graziano et al. | Jan 1998 | A |
5748613 | Kilk et al. | May 1998 | A |
5761464 | Hopkins | Jun 1998 | A |
5796735 | Miller et al. | Aug 1998 | A |
5809278 | Watanabe et al. | Sep 1998 | A |
5825748 | Barkey et al. | Oct 1998 | A |
5890211 | Sokolov et al. | Mar 1999 | A |
5917828 | Thompson | Jun 1999 | A |
6023732 | Moh et al. | Feb 2000 | A |
6061511 | Marantz et al. | May 2000 | A |
6072781 | Feeney et al. | Jun 2000 | A |
6122715 | Palanca et al. | Sep 2000 | A |
6185214 | Schwartz et al. | Feb 2001 | B1 |
6219300 | Tamaki | Apr 2001 | B1 |
6263397 | Wu et al. | Jul 2001 | B1 |
6295571 | Scardamalia et al. | Sep 2001 | B1 |
6311249 | Min et al. | Oct 2001 | B1 |
6324495 | Steinman | Nov 2001 | B1 |
6356106 | Greeff et al. | Mar 2002 | B1 |
6366984 | Carmean et al. | Apr 2002 | B1 |
6442162 | O'Neill et al. | Aug 2002 | B1 |
6466227 | Pfister et al. | Oct 2002 | B1 |
6564331 | Joshi | May 2003 | B1 |
6594234 | Chard et al. | Jul 2003 | B1 |
6598123 | Anderson et al. | Jul 2003 | B1 |
6601144 | Arimilli et al. | Jul 2003 | B1 |
6631447 | Morioka et al. | Oct 2003 | B1 |
6647428 | Bannai et al. | Nov 2003 | B1 |
6662305 | Salmon et al. | Dec 2003 | B1 |
6735174 | Hefty et al. | May 2004 | B1 |
6775693 | Adams | Aug 2004 | B1 |
6799232 | Wang | Sep 2004 | B1 |
6874054 | Clayton et al. | Mar 2005 | B2 |
6880028 | Kurth | Apr 2005 | B2 |
6889266 | Stadler | May 2005 | B1 |
6894978 | Hashimoto | May 2005 | B1 |
6954887 | Wang et al. | Oct 2005 | B2 |
6986026 | Roth et al. | Jan 2006 | B2 |
7007123 | Golla et al. | Feb 2006 | B2 |
7058826 | Fung | Jun 2006 | B2 |
7065594 | Ripy et al. | Jun 2006 | B2 |
7143219 | Chaudhari et al. | Nov 2006 | B1 |
7191373 | Wang et al. | Mar 2007 | B2 |
7239565 | Liu | Jul 2007 | B2 |
7280477 | Jeffries et al. | Oct 2007 | B2 |
7298746 | De La Iglesia et al. | Nov 2007 | B1 |
7363629 | Springer et al. | Apr 2008 | B2 |
7373420 | Lyon | May 2008 | B1 |
7401245 | Fischer et al. | Jul 2008 | B2 |
7454640 | Wong | Nov 2008 | B1 |
7454641 | Connor et al. | Nov 2008 | B2 |
7461236 | Wentzlaff | Dec 2008 | B1 |
7463529 | Matsubara | Dec 2008 | B2 |
7502474 | Kaniz et al. | Mar 2009 | B2 |
7539845 | Wentzlaff et al. | May 2009 | B1 |
7613971 | Asaka | Nov 2009 | B2 |
7620791 | Wentzlaff et al. | Nov 2009 | B1 |
7698581 | Oh | Apr 2010 | B2 |
20010055323 | Rowett et al. | Dec 2001 | A1 |
20020078420 | Roth et al. | Jun 2002 | A1 |
20020087801 | Bogin et al. | Jul 2002 | A1 |
20020100020 | Hunter et al. | Jul 2002 | A1 |
20020129086 | Garcia-Luna Aceves et al. | Sep 2002 | A1 |
20020138801 | Wang et al. | Sep 2002 | A1 |
20020156979 | Rodriguez | Oct 2002 | A1 |
20020184159 | Tadayon et al. | Dec 2002 | A1 |
20030007457 | Farrell et al. | Jan 2003 | A1 |
20030028749 | Ishikawa et al. | Feb 2003 | A1 |
20030050714 | Tymchenko | Mar 2003 | A1 |
20030050954 | Tayyar et al. | Mar 2003 | A1 |
20030074616 | Dorsey | Apr 2003 | A1 |
20030105799 | Khan et al. | Jun 2003 | A1 |
20030163649 | Kapur et al. | Aug 2003 | A1 |
20030177335 | Luick | Sep 2003 | A1 |
20030188053 | Tsai | Oct 2003 | A1 |
20030235202 | Van Der Zee et al. | Dec 2003 | A1 |
20040003184 | Safranek et al. | Jan 2004 | A1 |
20040019730 | Walker et al. | Jan 2004 | A1 |
20040024925 | Cypher et al. | Feb 2004 | A1 |
20040073780 | Roth et al. | Apr 2004 | A1 |
20040103218 | Blumrich et al. | May 2004 | A1 |
20040210694 | Shenderovich | Oct 2004 | A1 |
20040243739 | Spencer | Dec 2004 | A1 |
20050007986 | Malladi et al. | Jan 2005 | A1 |
20050053057 | Deneroff et al. | Mar 2005 | A1 |
20050076163 | Malalur | Apr 2005 | A1 |
20050160238 | Steely et al. | Jul 2005 | A1 |
20050216613 | Ganapathy et al. | Sep 2005 | A1 |
20050251613 | Kissell | Nov 2005 | A1 |
20050270886 | Takashima | Dec 2005 | A1 |
20050273564 | Lakshmanamurthy et al. | Dec 2005 | A1 |
20060050737 | Hsu | Mar 2006 | A1 |
20060080513 | Beukema et al. | Apr 2006 | A1 |
20060206635 | Alexander et al. | Sep 2006 | A1 |
20060248367 | Fischer et al. | Nov 2006 | A1 |
20070055832 | Beat | Mar 2007 | A1 |
20070133536 | Kim et al. | Jun 2007 | A1 |
20070168803 | Wang et al. | Jul 2007 | A1 |
20070174529 | Rodriguez et al. | Jul 2007 | A1 |
20070195774 | Sherman et al. | Aug 2007 | A1 |
20080147987 | Cantin et al. | Jun 2008 | A1 |
Number | Date | Country | |
---|---|---|---|
20090006769 A1 | Jan 2009 | US |