Claims
- 1. A multiprocessor wherein each processor in the multiprocessor shares resources, comprising:
a hardware locking device that provides support for synchronization between the multiple processors in the multiprocessor and the orderly sharing of the resources, wherein a processor only has permission to access a resource when it owns the lock associated with that resource, and an attempt by a processor to own a lock requires only a single load operation, rather than a traditional atomic load followed by store, such that the processor only performs a read operation and the hardware locking device performs a subsequent write operation rather than the processor.
- 2. The memory system of claim 1, wherein the hardware locking devices are implemented in a weakly-ordered multiprocessor system.
- 3. The memory system of claim 1, wherein the hardware locking device comprises a test-and-set lock which relies upon a memory read by a processor for its operation, and automatically writes a value to the lock and returns the value that was previously stored in the lock before the write.
- 4. The memory system of claim 1, wherein the hardware locking devices are implemented as part of a chip.
- 5. The memory system of claim 1, wherein the hardware locking devices are implemented in a stand-alone device.
- 6. The memory system of claim 1, wherein each processor has separate write and read buses, the operations of which are relatively independent.
- 7. The memory system of claim 1, wherein the multiprocessor comprises a multiprocessor system on-chip (SOC) structure.
- 8. The memory system of claim 7, wherein the multiprocessor system includes two CPU cores, each having an internal first-level cache, and each processor has separate write and read buses which are relatively independent.
- 9. A method of prefetching non-contiguous data structures, comprising:
embedding pointers in each data structure to point to and indicate the access order of the non-contiguous data structures; prefetching the target data structures based on the access order indicated by the pointers.
- 10. The method of claim 9, further comprising prefetching large data structures that are stored non-contiguously but accessed repeatedly in the same order.
- 11. The method of claim 9 wherein the prefetching is based on memory lines, where a memory line is an aligned section of contiguous physical memory locations, such that the memory is divided into lines, where some portion of the lines may be stored in at least one cache at any given time, and wherein each memory line is redefined so that in addition to the normal physical memory data, each memory line includes a pointer sufficiently large to point to any other memory line in the memory.
- 12. The method of claim 11, wherein each memory line includes, in addition to the data and the pointer, additional bits to implement an algorithm that sets and uses the pointers automatically.
- 13. The method of claim 12, wherein each memory line includes two bits to indicate the status of the pointer.
- 14. The method of claim 11, wherein the pointers are set by the following mechanism, when a memory line is accessed and the memory line has no valid pointer, the memory line is promoted to parent status, and the next memory line that is demand fetched is considered to be its child, once the child memory line is known, the pointer of the parent memory line is modified to point to the child memory line, in addition, a probation bit is set to indicate that the pointer may not be useful, and the mechanism is recursive in that a child memory line is also a parent memory line of another subsequent memory line.
- 15. The method of claim 14, wherein when a memory line with a probation pointer set is referenced, the memory line is considered to be a parent memory line, but the child memory line is not fetched, once the next memory line is requested, its address is compared with the pointer of the probationary child memory line, if they match, then the parent's pointer is promoted to known status, and if they do not match, then the parent's pointer is marked invalid, and when a memory line with a known pointer is referenced, the child memory line that is pointed to is immediately prefetched, and if the next memory line to be referenced is not that child memory line, then the parent's pointer status is downgraded to probationary status.
- 16. The method of claim 11, wherein the memory includes a plurality of different level caches, and a memory line and its associated pointer remain together as the memory line is moved up the memory hierarchy between different level caches, so any other caches in cache levels below the one implementing prefetching would simply cache the pointer along with the memory line data.
- 17. The method of claim 14, further including a content-addressable prefetch table that keeps track of parent-child relationships, and the table has three fields: parent, child, and status.
- 18. The method of claim 17, wherein
when a memory line is fetched and it has an invalid pointer, the next memory line to be fetched is considered its child, the parent's pointer is set to the child memory line, and the pointer status is set to probationary, and when a line is fetched or prefetched with a known pointer, then the child memory line pointed to is immediately prefetched, and when a memory line with a probationary or known pointer is fetched and it is not found as a parent memory line in the prefetch table, then it is entered into the table along with its child pointer and pointer status; when a memory line is referenced, the address of the memory is compared to all the child pointers in the prefetch table, and if a match is found, then the associated parent's status is updated and a probationary pointer is upgraded to known status, and in addition, the matching entry in the prefetch table is removed, and if more than one child entry is matched, then each child entry is handled in the same manner.
- 19. The method of claim 17, wherein when an entry is evicted from the prefetch table due to a lack of capacity, the parent's pointer status is updated, a valid pointer is made probationary, and a probationary pointer is invalidated, and when a memory line is evicted from the cache and that memory line is found as a parent in the prefetch table, then the table entry is also invalidated and treated as a prefetch table eviction.
CROSS-REFERENCE
[0001] The present invention claims the benefit of commonly-owned, co-pending U.S. Provisional Patent Application Serial No. 60/271,124 filed Feb. 24, 2001 entitled MASSIVELY PARALLEL SUPERCOMPUTER, the whole contents and disclosure of which is expressly incorporated by reference herein as if fully set forth herein. This patent application is additionally related to the following commonly-owned, co-pending United States patent Applications filed on even date herewith, the entire contents and disclosure of each of which is expressly incorporated by reference herein as if fully set forth herein. U.S. patent application Serial No. (YOR920020027US1, YOR920020044US1 (15270)), for “Class Networking Routing”; U.S. patent application Serial No. (YOR920020028US1 (15271)), for “A Global Tree Network for Computing Structures”; U.S. patent application Serial No. (YOR920020029US1 (15272)), for ‘Global Interrupt and Barrier Networks”; U.S. patent application Serial No. (YOR920020030US1 (15273)), for ‘Optimized Scalable Network Switch”; U.S. patent application Serial No. (YOR920020031US1, YOR920020032US1 (15258)), for “Arithmetic Functions in Torus and Tree Networks’; U.S. patent application Serial No. (YOR920020033US1, YOR920020034US1 (15259)), for ‘Data Capture Technique for High Speed Signaling”; U.S. patent application Serial No. (YOR920020035US1 (15260)), for ‘Managing Coherence Via Put/Get Windows’; U.S. patent application Serial No. (YOR920020036US1, YOR920020037US1 (15261)), for “Low Latency Memory Access And Synchronization”; U.S. patent application Serial No. (YOR920020038US1 (15276), for ‘Twin-Tailed Fail-Over for Fileservers Maintaining Full Performance in the Presence of Failure”; U.S. patent application Serial No. (YOR920020039US1 (15277)), for “Fault Isolation Through No-Overhead Link Level Checksums’; U.S. patent application Serial No. (YOR920020040US1 (15278)), for “Ethernet Addressing Via Physical Location for Massively Parallel Systems”; U.S. patent application Serial No. (YOR920020041US1 (15274)), for “Fault Tolerance in a Supercomputer Through Dynamic Repartitioning”; U.S. patent application Serial No. (YOR920020042US1 (15279)), for “Checkpointing Filesystem”; U.S. patent application Serial No. (YOR920020043US1 (15262)), for “Efficient Implementation of Multidimensional Fast Fourier Transform on a Distributed-Memory Parallel Multi-Node Computer”; U.S. patent application Serial No. (YOR9-20010211US2 (15275)), for “A Novel Massively Parallel Supercomputer”; and U.S. patent application Serial No. (YOR920020045US1 (15263)), for “Smart Fan Modules and System”.
PCT Information
Filing Document |
Filing Date |
Country |
Kind |
PCT/US02/05575 |
2/25/2002 |
WO |
|