The present invention is related to the following commonly-owned, co-pending United States patent applications filed on even date herewith, the entire contents and disclosure of each of which is expressly incorporated by reference herein as if fully set forth herein. U.S. patent application Ser. No. 11/768,777, for “A SHARED PERFORMANCE MONITOR IN A MULTIPROCESSOR SYSTEM”; U.S. patent application Ser. No. 11/768,645, for “OPTIMIZED COLLECTIVES USING A DMA ON A PARALLEL COMPUTER”; U.S. patent application Ser. No. 11/768,781, for “DMA SHARED BYTE COUNTERS IN A PARALLEL COMPUTER”; U.S. patent application Ser. No 11/768,784, for “MULTIPLE NODE REMOTE MESSAGING”; U.S. patent application Ser. No. 11/768,697, for “A METHOD AND APPARATUS OF PREFETCHING STREAMS OF VARYING PREFETCH DEPTH”; U.S. patent application Ser. No. 11/768,532, for “PROGRAMMABLE PARTITIONING FOR HIGH-PERFORMANCE COHERENCE DOMAINS IN A MULTIPROCESSOR SYSTEM”; U.S. patent application Ser. No. 11/768,857, for “METHOD AND APPARATUS FOR SINGLE-STEPPING COHERENCE EVENTS IN A MULTIPROCESSOR SYSTEM UNDER SOFTWARE CONTROL”; U.S. patent application Ser. No. 11/768,547, for “INSERTION OF COHERENCE REQUEST FOR DEBUGGING A MULTIPROCESSOR”; U.S. patent application Ser. No. 11/768,791, for “METHOD AND APPARATUS TO DEBUG AN INTEGRATED CIRCUIT CHIP VIA SYNCHRONOUS CLOCK STOP AND SCAN”; U.S. patent application Ser. No. 11/768,795, for “DMA ENGINE FOR REPEATING COMMUNICATION PATTERNS”; U.S. patent application Ser. No. 11/768,799, for “METHOD AND APPARATUS FOR GRANTING PROCESSORS ACCESS TOA RESOURCE”; U.S. patent application Ser. No.11/768,572, for “BAD DATA PACKET CAPTURE DEVICE”; U.S. patent application Ser. No. 11/768,593, for “EXTENDED WRITE COMBINING USING A WRITE CONTINUATION HINT FLAG”; U.S. patent application Ser. No. 11/768,805, for “A SYSTEM AND METHOD FOR PROGRAMMABLE BANK SELECTION FOR BANKED MEMORY SUBSYSTEMS”; U.S. patent application Ser. No. 11/768,905, for “AN ULTRASCALABLE PETAFLOP PARALLEL SUPERCOMPUTER”; U.S. patent application Ser. No. 11/768,810, for “DATA EYE MONITOR METHOD AND APPARATUS”; U.S. patent application Ser. No. 11/768,812, for “A CONFIGURABLE MEMORY SYSTEM AND METHOD FOR PROVIDING ATOMIC COUNTING OPERATIONS IN A MEMORY DEVICE”; U.S. patent application Ser. No. 11/768,559, for “ERROR CORRECTING CODE WITH CHIP KILL CAPABILITY AND POWER SAVING ENHANCEMENT”; U.S. patent application Ser. No. 11/768,552, for “STATIC POWER REDUCTION FOR MIDPOINT-TERMINATED BUSSES”; U.S. patent application Ser. No. 11/768,527, for “COMBINED GROUP ECC PROTECTION AND SUBGROUP PARITY PROTECTION”; U.S. patent application Ser. No. 11/768,669, for “A MECHANISM TO SUPPORT GENERIC COLLECTIVE COMMUNICATION ACROSS A VARIETY OF PROGRAMMING MODELS”; U.S. patent application Ser. No. 11/768,813, for “MESSAGE PASSING WITH A LIMITED NUMBER OF DMA BYTE COUNTERS”; U.S. patent application Ser. No. 11/768,619, for “ASYNCRONOUS BROADCAST FOR ORDERED DELIVERY BETWEEN COMPUTE NODES IN A PARALLEL COMPUTING SYSTEM WHERE PACKET HEADER SPACE IS LIMITED”; U.S. patent application Ser. No. 11/768,682, for “HARDWARE PACKET PACING USING A DMA IN A PARALLEL COMPUTER”; and U.S. patent application Ser. No. 11/768,752, for “POWER THROTTLING OF COLLECTIONS OF COMPUTING ELEMENTS”.
1. Field of the Invention
The present invention generally relates to computer systems having multiprocessor architectures and, more particularly, to a novel multi-processor computer system for processing memory accesses requests.
2. Description of the Prior Art
To achieve high performance computing, multiple individual processors have been interconnected to form multiprocessor computer systems capable of parallel processing. Multiple processors can be placed on a single chip, or several chips—each containing one or several processors—interconnected into a multiprocessor computer system.
Processors in a multiprocessor computer system use private cache memories because of their short access time (a cache is local to a processor and provides fast access to data) and to reduce the number of memory requests to the main memory. However, managing caches in a multiprocessor system is complex. Multiple private caches introduce the multi-cache coherency problem (or stale data problem) due to multiple copies of main memory data that can concurrently exist in the multiprocessor system.
Small-scale shared memory multiprocessing systems have processors (or groups thereof) interconnected by a single bus. However, with the increasing speed of processors, the feasible number of processors that can share the bus effectively decreases.
The protocols that maintain the coherence between multiple processors are called cache coherence protocols. Cache coherence protocols track any sharing of data blocks between the processors. Depending upon how data sharing is tracked, cache coherence protocols can be grouped into two classes: directory based and snooping.
In a multiprocessor system with coherent cache memory, consistency is maintained by a coherence protocol that generally relies on coherence events sent between caches. A common hardware coherence protocol is based on invalidations. In this protocol, any number of caches can include a read-only line, but these copies must be destroyed when any processor stores to the line. To do this, the cache corresponding to the storing processor sends invalidations to all the other caches before storing the new data into the line. If the caches are write-through, then the store also goes to main memory where all caches can see the new data. Otherwise, a more complicated protocol is required when some other cache reads the line with the new data.
In a cache-coherent multiprocessor system, there may be bursts of activity that cause coherence actions, such as invalidations, to arrive at a cache faster than the cache can process them. In this case, they are generally stored in first-in, first-out (FIFO) queues, thereby absorbing the burst of activity. As known, FIFO queues are a very common structure used in computer systems. They are used to store information that must wait, commonly because the destination of the information is busy. For example, requests to utilize a shared resource often wait in FIFO queues until the resource becomes available. Another example is packet-switched networks, where packets often wait in FIFO queues until a link they need becomes available.
A common operation in a multiprocessor is memory synchronization, which insures that all memory accesses and their related coherence protocol events started before some point in time have completed. For example, memory synchronization can be used before initiating a DMA transfer of data prepared in memory. The synchronization insures that the memory is completely consistent before the DMA transfer begins.
Before a multiprocessor memory synchronization can complete, all coherence protocol events that were initiated prior to the synchronization must be processed. Some of these events could be stored in FIFO queues in the coherence logic of the multiprocessor. One way to make sure all such events have been processed is to drain all of the FIFO queues before completing the memory synchronization. However, this is inefficient because coherence events that arrived after the memory synchronization began are unnecessarily processed, causing a delay in the completion of the synchronization. A second problem with this approach is that processors must be prevented from generating new coherence actions or else the queues will continue to fill, potentially causing a livelock. Stopping all of the processors is necessary for the complete draining approach, but inefficient.
What is needed is a mechanism for tracking queue entries that existed prior to the memory synchronization, and completing the synchronization when those entries have been processed. Ideally, the memory system should be allowed to continue generating new coherence protocol events while the events prior to the synchronization are draining.
It would thus be highly desirable to provide a system and method for tracking queue entries that existed prior to the memory synchronization, and completing the synchronization when those entries have been processed.
Further, it would be desirable to provide a system and method for tracking queue entries wherein the memory system is allowed to continue generating new coherence protocol events while the events prior to the synchronization are draining.
It is therefore an object of the present invention to provide a novel system and method for tracking coherence event queue entries that existed prior to a memory synchronization operation performed by a processor in a multiprocessor system architecture, and completing the synchronization when those entries have been processed.
It is a further object of the invention to provide a system and method for tracking queue entries wherein the memory system is allowed to continue generating new coherence protocol events while the events prior to the synchronization are draining (i.e., being dequeued).
That is, the present invention teaches an apparatus and method for tracking event signals transmitted by processors in a multiprocessor system. According to a first aspect of the invention, the apparatus comprises a queue structure for storing said event signals transmitted in said system; a logic device associated with a queue structure for controlling enqueuing and dequeuing of received event signals at the structure; and, a counting mechanism for tracking a number of event signals remaining enqueued in the queue structure and dequeued since receipt of a timestamp signal. The counting mechanism generates an output signal indicating that all of the event signals present in the queue structure at the time of receipt of the timestamp signal have been dequeued. This output signal indicates that all events present when the timestamp was asserted have completed.
Further to this embodiment, the logic device generates an enqueue signal for receipt at the queue structure for controlling input of the event signal in the queue structure and a dequeue signal for controlling the dequeuing of coherence event signal from the queue structure.
Further to this embodiment, the counting mechanism includes a first counter device responsive to assertion of said enqueue signal and dequeue signal for counting a number of enqueued event signals in said queue structure.
The counting mechanism is further responsive to receipt of the timestamp signal for receiving a count signal representing the number of enqueued signals in the queue structure, the counting mechanism counting down from the number in response to each dequeue signal asserted when each of the enqueued event signals is dequeued from the queue structure, the counting mechanism generating the output signal when it counts down to zero.
In one additional advantageous embodiment of this invention, flexibility is provided by enabling the counting mechanism to respond to an assertion of a second timestamp signal for tracking a number of events remaining in the queue structure since receipt of a second timestamp signal, issued independently from the assertion of the first timestamp signal.
In the additional advantageous embodiment of this invention, the counter mechanism in response to assertion of said second timestamp signal, receives the count signal representing the number of enqueued signals in the queue structure, and counts down from the number in response to assertion of each dequeue signals asserted when each of the enqueued coherence event signals is dequeued from the queue structure, and further generating a second output signal when it counts down to zero.
In the embodiments described, the multiprocessor system performs a memory synchronization operation, such that this output signal is used as part of a completion condition for a memory synchronization operation in the multiprocessor.
According to a further aspect of the invention, there is provided a method for tracking event signals transmitted in a multiprocessor system, the method comprising:
intercepting an event signal in the multiprocessor system;
enqueuing and dequeuing intercepted the event signals at a queue structure of a plurality of queue structures;
counting a number of event signals enqueued in a respective the queue structure and dequeued from the queue structure since receipt of a timestamp signal, and,
generating an output signal indicating that all of the event signals present in the queue structure at the time of receipt of the timestamp signal have been dequeued.
Further to this aspect of the invention, the method further comprises:
controlling said enqueuing and dequeuing of intercepted said event signals by generating a respective enqueue signal for receipt at said queue structure for controlling input of said intercepted event signal in said queue structure and a dequeue signal for receipt at said queue structure for controlling said dequeuing of said event signal from said queue structure.
Further to this aspect of the invention, wherein counting a number of the coherence event signals enqueued and dequeued from each respective queue structure includes:
implementing a first counter device responsive to assertion of the enqueue signal and dequeue signal for counting a number of enqueued event signals in the queue structure;
implementing a second counter device responsive to receipt of the timestamp signal for receiving a count signal representing the number of enqueued event signals presently in the queue structure; and,
counting down from the number in response to assertion of each dequeue signal asserted when removing the enqueued event signal from the queue structure, the second counter device generating the output signal when it counts down to zero.
Further according to this further aspect of the invention, the method comprises:
performing a memory synchronization operation by asserting said timestamp inputs for all queue structures in said system, and then waiting until all output signals are asserted before completing a memory synchronization operation.
Further according to this further aspect of the invention, the method comprises:
counting a number of coherence event signals enqueued in a respective the queue structure and dequeued from the queue structure since receipt of a second timestamp signal, and,
responding to assertion of a second timestamp signal for tracking a number of events remaining in the queue structure since receipt of the second timestamp signal.
In each of the embodiments described, the multiprocessor system may further include an arbitration unit responsive to receipt of said generated output signals associated with a respective queue structure for implementing logic to generate an arbitration signal for input to a processor cache.
In each of the embodiments described, the event signals may comprise coherence event signals, and the queue unit is a coherence logic unit associated with each processor of the multiprocessor system. The multiprocessor system may further include one or more snoop filter units associated with each the coherence logic unit that process incoming coherence invalidation events and present a reduced number of coherence events to a processor.
Advantageously, while the invention is described in the context of a microprocessor chip, the invention can be broadly applied to many other digital circuits and systems.
The objects, features and advantages of the present invention will become apparent to one skilled in the art, in view of the following detailed description taken in combination with the attached drawings, in which:
In one embodiment, when a processor desires to write new data to a cache line, each processor device 100a, . . . , 100d issues a respective coherence event signal, e.g., invalidate request signal 130a, . . . 130d. These invalidation request signals are broadcast from the respective processors, and particularly their associated caches, to every other processor cache in the system.
Further associated with each processor, as shown in
If the inputs and outputs of the Coherence Logic operate at the same speed, then the Coherence Logic units 125a, . . . 125d can receive invalidation requests at four times the rate they can output them to the processor caches. Therefore, according to the invention, the invalidation requests are stored in queues as shown and described herein with respect to
An arbitration unit 220 executes signal processing timed in a manner to control snoop signal requests 225 output from the FIFO queues. Details concerning the operation of the arbitration unit 220 is found in commonly-owned United States patent application Ser. No. 11/768,799 the whole contents and disclosure of which is incorporated by reference as if fully set forth herein. In the preferred embodiment, the queues are emptied (drained) as controlled by the arbiter unit. In another embodiment, there is no arbiter or synchronization circuit to synchronize draining of the queues.
A coherence event signal (e.g., invalidation request) is enqueued to the tail of the FIFO queue by placing it on the data_in input 140 of the timestamp queue 250 and pulsing the enqueue input 280 synchronous to the clock input. The coherence event at the head of the queue is always available at the data_out output of the queue. The coherence event at the head of the queue is dequeued, or discarded, by pulsing the dequeue input 290 synchronous to the clock signal. When the timestamp input 275 is pulsed synchronous to the clock signal, all queue entries present at that time are tagged. Once the last of those entries has been dequeued, the timestamp_done output 265 asserts. Therefore, a memory synchronization operation can insure that all coherence protocol events have completed by pulsing the timestamp inputs of all the FIFO queues in the system, and then waiting until all of the timestamp_done outputs assert before completing a memory synchronization.
The up/down counter in
The NOR gate 360 shown in
In the multiprocessor environment 10 of the preferred embodiment, the timestamp_done outputs of all the timestamp queues can be combined with a logical OR (not shown) to produce a single signal indicating that all coherence events present when timestamp was asserted (i.e. when the memory synchronization began) have completed. This signal can then be used as part of the completion condition for the memory synchronization.
The timestamp queue of
While there has been shown and described what is considered to be preferred embodiments of the invention, it will, of course, be understood that various modifications and changes in form or detail could readily be made without departing from the spirit of the invention. It is therefore intended that the invention be not limited to the exact forms described and illustrated, but should be constructed to cover all modifications that may fall within the scope of the appended claims.
The U.S. Government has a paid-up license in this invention and the right in limited circumstances to require the patent owner to license others on reasonable terms as provided for by the terms of Contract. No. B554331 awarded by the Department of Energy.
Number | Name | Date | Kind |
---|---|---|---|
4777595 | Strecker et al. | Oct 1988 | A |
5063562 | Barzilai et al. | Nov 1991 | A |
5142422 | Zook et al. | Aug 1992 | A |
5349587 | Nadeau-Dostie et al. | Sep 1994 | A |
5353412 | Douglas et al. | Oct 1994 | A |
5452432 | Macachor | Sep 1995 | A |
5524220 | Verma et al. | Jun 1996 | A |
5634007 | Calta et al. | May 1997 | A |
5659710 | Sherman et al. | Aug 1997 | A |
5708779 | Graziano et al. | Jan 1998 | A |
5748613 | Kilk et al. | May 1998 | A |
5761464 | Hopkins | Jun 1998 | A |
5796735 | Miller et al. | Aug 1998 | A |
5809278 | Watanabe et al. | Sep 1998 | A |
5825748 | Barleu et al. | Oct 1998 | A |
5890211 | Sokolov et al. | Mar 1999 | A |
5917828 | Thompson | Jun 1999 | A |
6023732 | Moh et al. | Feb 2000 | A |
6061511 | Marantz et al. | May 2000 | A |
6072781 | Feeney et al. | Jun 2000 | A |
6122715 | Palanca et al. | Sep 2000 | A |
6185214 | Schwartz et al. | Feb 2001 | B1 |
6219300 | Tamaki | Apr 2001 | B1 |
6263397 | Wu et al. | Jul 2001 | B1 |
6295571 | Scardamalia et al. | Sep 2001 | B1 |
6311249 | Min et al. | Oct 2001 | B1 |
6324495 | Steinman | Nov 2001 | B1 |
6356106 | Greeff et al. | Mar 2002 | B1 |
6366984 | Carmean et al. | Apr 2002 | B1 |
6442162 | O'Neill et al. | Aug 2002 | B1 |
6466227 | Pfister et al. | Oct 2002 | B1 |
6564331 | Joshi | May 2003 | B1 |
6594234 | Chard et al. | Jul 2003 | B1 |
6598123 | Anderson et al. | Jul 2003 | B1 |
6601144 | Arimilli et al. | Jul 2003 | B1 |
6631447 | Morioka et al. | Oct 2003 | B1 |
6647428 | Bannai et al. | Nov 2003 | B1 |
6662305 | Salmon et al. | Dec 2003 | B1 |
6735174 | Hefty et al. | May 2004 | B1 |
6775693 | Adams | Aug 2004 | B1 |
6799232 | Wang | Sep 2004 | B1 |
6874054 | Clayton et al. | Mar 2005 | B2 |
6880028 | Kurth | Apr 2005 | B2 |
6889266 | Stadler | May 2005 | B1 |
6894978 | Hashimoto | May 2005 | B1 |
6954887 | Wang et al. | Oct 2005 | B2 |
6986026 | Roth et al. | Jan 2006 | B2 |
7007123 | Golla et al. | Feb 2006 | B2 |
7058826 | Fung | Jun 2006 | B2 |
7065594 | Ripy et al. | Jun 2006 | B2 |
7143219 | Chaudhari et al. | Nov 2006 | B1 |
7191373 | Wang et al. | Mar 2007 | B2 |
7239565 | Liu | Jul 2007 | B2 |
7280477 | Jeffries et al. | Oct 2007 | B2 |
7298746 | De La Iglesia et al. | Nov 2007 | B1 |
7363629 | Springer et al. | Apr 2008 | B2 |
7373420 | Lyon | May 2008 | B1 |
7401245 | Fischer et al. | Jul 2008 | B2 |
7454640 | Wong | Nov 2008 | B1 |
7454641 | Connor et al. | Nov 2008 | B2 |
7461236 | Wentzlaff | Dec 2008 | B1 |
7463529 | Matsubara | Dec 2008 | B2 |
7502474 | Kaniz et al. | Mar 2009 | B2 |
7539845 | Wentzlaff et al. | May 2009 | B1 |
7613971 | Asaka | Nov 2009 | B2 |
7620791 | Wentzlaff et al. | Nov 2009 | B1 |
7698581 | Oh | Apr 2010 | B2 |
20010055323 | Rowett et al. | Dec 2001 | A1 |
20020078420 | Roth et al. | Jun 2002 | A1 |
20020087801 | Bogin et al. | Jul 2002 | A1 |
20020100020 | Hunter et al. | Jul 2002 | A1 |
20020129086 | Garcia-Luna—Aceves et al. | Sep 2002 | A1 |
20020138801 | Wang et al. | Sep 2002 | A1 |
20020156979 | Rodriguez | Oct 2002 | A1 |
20020184159 | Tadayon et al. | Dec 2002 | A1 |
20030007457 | Farrell et al. | Jan 2003 | A1 |
20030028749 | Ishikawa et al. | Feb 2003 | A1 |
20030050714 | Tymchenko | Mar 2003 | A1 |
20030050954 | Tayyar et al. | Mar 2003 | A1 |
20030074616 | Dorsey | Apr 2003 | A1 |
20030105799 | Khan et al. | Jun 2003 | A1 |
20030163649 | Kapur et al. | Aug 2003 | A1 |
20030177335 | Luick | Sep 2003 | A1 |
20030188053 | Tsai | Oct 2003 | A1 |
20030235202 | Van Der Zee et al. | Dec 2003 | A1 |
20040003184 | Safranek et al. | Jan 2004 | A1 |
20040019730 | Walker et al. | Jan 2004 | A1 |
20040024925 | Cypher et al. | Feb 2004 | A1 |
20040073780 | Roth et al. | Apr 2004 | A1 |
20040103218 | Blumrich et al. | May 2004 | A1 |
20040210694 | Shenderovich | Oct 2004 | A1 |
20040243739 | Spencer | Dec 2004 | A1 |
20050007986 | Malladi et al. | Jan 2005 | A1 |
20050053057 | Deneroff et al. | Mar 2005 | A1 |
20050076163 | Malalur | Apr 2005 | A1 |
20050160238 | Steely et al. | Jul 2005 | A1 |
20050216613 | Ganapathy et al. | Sep 2005 | A1 |
20050251613 | Kissell | Nov 2005 | A1 |
20050270886 | Takashima | Dec 2005 | A1 |
20050273564 | Lakshmanamurthy et al. | Dec 2005 | A1 |
20060050737 | Hsu | Mar 2006 | A1 |
20060080513 | Beukema et al. | Apr 2006 | A1 |
20060206635 | Alexander et al. | Sep 2006 | A1 |
20060248367 | Fischer et al. | Nov 2006 | A1 |
20070055832 | Beat | Mar 2007 | A1 |
20070133536 | Kim et al. | Jun 2007 | A1 |
20070168803 | Wang et al. | Jul 2007 | A1 |
20070174529 | Rodriguez et al. | Jul 2007 | A1 |
20070195774 | Sherman et al. | Aug 2007 | A1 |
20080147987 | Cantin et al. | Jun 2008 | A1 |
Entry |
---|
Definition of “mechanism”, Oxford English Dictionary, http://dictionary.oed.com/cgi/entry/00304337?query—type=word&queryword=mechanism&first=1&max—to—show=10&sort—type=alpha&result—place=2&search—id=y2at-u2EIGc-11603&hilite=00304337. |
Definition of “mechanism”; Oxford English Dictionary; OED Third Edition; Jun. 2001; http://www.oed.com/view/Entry/115557?redirectedFrom=mechanism#eid. |
Almasi, et al., “MPI on BlueGene/L: Designing an Efficient General Purpose Messaging Solution for a Large Cellular System,” IBM Research Report RC22851 (W037-150) Jul. 22, 2003. |
Almasi, et al.,“Optimization of MPI Collective Communication on BlueGene/L Systems,” ICS'05, Jun. 20-22, 2005, Boston, MA. |
Gara, et al., “Overview of the Blue Gene/L system architecture,” IBM J. Res. & Dev., vol. 49, No. 2/3, Mar./May 2005, pp. 195-212. |
Huang, et al., “Performance Evaluation of Adaptive MPI,” PPoPP'06, Mar. 29-31, 2006, New York, New York. |
MPI (Message Passing Interface) standards documents, errata, and archives http://www.mpi-forum.org visited Jun. 16, 2007 (Sections 4.2, 4.4 and 10.4). |
David Chaiken, Craig Fields, Kiyoshi Kurihara, Anant Agarwal, Directory-Based Cache Coherence in Large-Scale Multiprocessors, Computer, v.23 n.6, p. 49-58, Jun. 1990. |
Michel, Dubois, Christoph Scheurich, Faye A. Briggs, Synchronization, Coherence, and Event Ordering in Multiprocessors, Computer, v.21 n.2, p. 9-21, Feb. 1988. |
Giampapa, et al., “Blue Gene/L advanced diagnostics environment,” IBM J. Res. & Dev., vol. 49, No. 2/3, Mar./May 2005, pp. 319-331. |
IBM Journal of Research and Development, Special Double Issue on Blue Gene, vol. 49, Nos. 2/3, Mar./May 2005 (“Preface”). |
IBM Journal of Research and Development, Special Double Issue on Blue Gene, vol. 49, Nos. 2/3, Mar./May 2005 (“Intro”). |
“Intel 870: A Building Block for Cost-Effective, Scalable Servers”, Faye Briggs, Michel et al., pp. 36-47, Mar.-Apr. 2002. |
Pande, et al., Performance Evaluation and Design Trade-Offs for Network-On-Chip Interconnect Architectures, 2005, IEEE, pp. 1025-1040. |
Number | Date | Country | |
---|---|---|---|
20090006672 A1 | Jan 2009 | US |