Claims
- 1. A processing cluster for use in a system with a main memory, said processing cluster comprising:
a set of cache memory adapted for coupling to said main memory; and a set of compute engines, wherein each compute engine in said set of compute engines is coupled to said set of cache memory and includes a central processing unit coupled to a coprocessor, wherein said coprocessor is coupled to said set of cache memory.
- 2. The processing cluster of claim 1, wherein said coprocessor includes:
a sequencer coupled to said central processing unit; and a set of application engines coupled to said sequencer.
- 3. The processing cluster of claim 2, wherein said set of application engines includes a streaming input engine coupled to said set of cache memory to retrieve data from said set of cache memory.
- 4. The processing cluster of claim 3, wherein said streaming input engine is coupled to at least one application engine in said set of application engines to provide data retrieved from said set of set of cache memory.
- 5. The processing cluster of claim 4, wherein said streaming input engine retrieves a full cache line of data from said set of cache memory and provides said cache line of data to said at least one application engine in transfers including a programmable number of bytes.
- 6. The processing cluster of claim 5, wherein said streaming input engine retrieves said full cache line in a plurality of data retrievals from said set of cache memory.
- 7. The processing cluster of claim 2, wherein said set of application engines includes a streaming output engine coupled to said set of cache memory to store data in said set of cache memory.
- 8. The processing cluster of claim 7, wherein said streaming output engine is coupled to at least one application engine in said set of application engines to store data from said at least one application engine in said set of cache memory.
- 9. The processing cluster of claim 8, wherein said streaming output engine retrieves a programmable number of bytes from a full cache line of data from said at least one application engine and transfers a full cache line of data retrieved from said at least one application engine to said set of cache memory.
- 10. The processing cluster of claim 9, wherein said streaming output engine transfers said full cache line of data retrieved from said at least one application engine to said set of cache memory using a plurality of transfer operations.
- 11. The processing cluster of claim 2, wherein said set of application engines includes:
a streaming input engine coupled to said set of cache memory to retrieve data from said set of cache memory; and a transmission media access controller coupled to said streaming input buffer to receive said data and provide said data to a communications network.
- 12. The processing cluster of claim 2, wherein said set of application engines includes:
a streaming output buffer coupled to said set of cache memory to store data in said set of cache memory; and a reception media access controller coupled to said streaming output buffer to provide network data received from a communications network.
- 13. The processing cluster of claim 2, wherein said central processing unit provides instructions to said sequencer identifying an operation to perform.
- 14. The processing cluster of claim 13, wherein said sequencer instructs an application engine in said set of application engines to perform said operation in response to the instructions provided by said central processing unit.
- 15. The processing cluster of claim 1, further including:
a memory management unit coupled to at least 2 compute engines in said set of compute engines.
- 16. The processing cluster of claim 15, wherein said memory management unit provides address translations for said at least 2 compute engines on a rotating basis.
- 17. The processing cluster of claim 16, wherein said memory management unit is coupled to said coprocessors in said at least 2 compute engines.
- 18. The processing cluster of claim 1, wherein said coprocessor includes internal translation buffers for converting physical addresses to virtual addresses.
- 19. The processing cluster of claim 1, wherein said set of cache memory includes a set of first tier caches coupled to a second tier cache adapted for coupling to said main memory.
- 20. The processing cluster of claim 19, wherein each coprocessor is coupled to a first tier data cache in said set of first tier caches.
- 21. The processing cluster of claim 20, wherein each central processing unit is coupled to a first tier data cache in said set of first tier caches and a first tier instruction cache in said set of first tier caches.
- 22. The processing cluster of claim 1, wherein said set of cache memory and said set of compute engines are formed together on a single integrated circuit.
- 23. The processing cluster of claim 1, wherein said set of compute engines consists of 1 compute engine.
- 24. The processing cluster of claim 1, wherein said set of compute engines consists of 4 compute engines.
- 25. A multi-processor system comprising:
a main memory; a set of processing clusters, wherein each processing cluster in said set of processing clusters includes:
a set of cache memory coupled to said main memory, and a set of compute engines, wherein each compute engine in said set of compute engines is coupled to said set of cache memory and includes a central processing unit coupled to a coprocessor, wherein said coprocessor is coupled to said set of cache memory; and a snoop controller coupled to said sets of cache memory to receive and place memory requests for transferring data between said processing clusters and said main memory.
- 26. The multi-processor system of claim 25, wherein said coprocessor includes:
a sequencer coupled to said central processing unit; and a set of application engines coupled to said sequencer.
- 27. The multi-processor system of claim 26, wherein said set of application engines includes a streaming input engine coupled to said set of cache memory to retrieve data from said set of cache memory.
- 28. The multi-processor system of claim 27, wherein said streaming input engine is coupled to at least one application engine in said set of application engines to provide data retrieved from said set of cache memory.
- 29. The multi-processor system of claim 26, wherein said set of application engines includes a streaming output engine coupled to said set of cache memory to store data in said set of cache memory.
- 30. The multi-processor system of claim 29, wherein said streaming output engine is coupled to at least one application engine in said set of application engines to store data from said at least one application engine in said set of set of cache memory.
- 31. The multi-processor system of claim 25, wherein said snoop controller is coupled to said sets of cache memory via a snoop ring for issuing snoop requests.
- 32. The multi-processor system of claim 31, wherein said snoop controller is coupled to said sets of cache memory via point-to-point links for receiving memory requests.
- 33. The multi-processor system of claim 32, wherein processing clusters said set of processing clusters are coupled together via a data ring.
- 34. The multi-processor system of claim 33, wherein said data ring is coupled to said sets of cache memory.
- 35. The multi-processor system of claim 34, wherein said sets of cache memory each include a set of first tier caches coupled to a second tier cache coupled to said snoop controller and said data ring.
- 36. The multi-processor system of claim 25, wherein said set of processing clusters and said snoop controller are formed together on a single integrated circuit.
- 37. A computer system including:
a set of cache memory; a memory management unit; and a set of compute engines coupled to said memory management unit and said set of cache memory, wherein each compute engine includes a central processing unit coupled to a coprocessor, wherein said set of compute engines includes at least 2 compute engines.
- 38. The computer system of claim 37, wherein each compute engine in said set of compute engines is coupled to said memory management unit.
- 39. The computer system of claim 38, wherein each coprocessor in said set of set of compute engines is coupled to said memory management unit.
- 40. A processing cluster for use in a system with a main memory, said processing cluster comprising:
a set of cache memory adapted for coupling to said main memory, wherein said set of cache memory includes:
a set of first tier cache memories, and a second tier cache memory coupled to said set of first tier cache memories and adapted for coupling to said main memory; a set of compute engines, wherein each compute engine in said set of compute engines is coupled to said set of cache memory and includes:
a central processing unit coupled to cache memory in said set of first tier cache memories, and a coprocessor coupled to said central processing unit and a cache memory in said set of first tier cache memories; and a memory management unit coupled to at least 2 compute engines in said set of compute engines.
- 41. A multi-processor system comprising:
a main memory; a set of processing clusters, wherein each processing cluster in said set of processing clusters includes:
a set of cache memory coupled to said main memory, said set of cache memory including:
a set of first tier cache memories, and a second tier cache memory coupled to said set of first tier cache memories; a set of compute engines, wherein each compute engine in said set of compute engines includes:
a central processing unit coupled to a cache memory in said set of first tier cache memories, and a coprocessor coupled to said central processing unit and a cache memory in said set of first tier cache memories; and a snoop controller coupled to said second tier cache memory for receiving and placing memory requests for transferring data between said processing clusters and said main memory.
CROSS-REFERENCE TO RELATED APPLICATIONS
[0001] This application is a continuation of U.S. patent application Ser. No. 09/900,481, entitled “Multi-Processor System,” filed on Jul. 6, 2001, which is incorporated herein by reference.
[0002] This Application is related to the following Applications:
[0003] “Coprocessor Including a Media Access Controller,” by Frederick Gruner, Robert Hathaway, Ramesh Panwar, Elango Ganesan and Nazar Zaidi, Attorney Docket No. NEXSI-01021US0, filed the same day as the present application;
[0004] “Application Processing Employing A Coprocessor,” by Frederick Gruner, Robert Hathaway, Ramesh Panwar, Elango Ganesan, and Nazar Zaidi, Attorney Docket No. NEXSI-01201US0, filed the same day as the present application;
[0005] “Compute Engine Employing A Coprocessor,” by Robert Hathaway, Frederick Gruner, and Ricardo Ramirez, Attorney Docket No. NEXSI01 202US0, filed the same day as the present application;
[0006] “Streaming Input Engine Facilitating Data Transfers Between Application Engines And Memory,” by Ricardo Ramirez and Frederick Gruner, Attorney Docket No. NEXSI-01203US0, filed the same day as the present application;
[0007] “Streaming Output Engine Facilitating Data Transfers Between Application Engines And Memory,” by Ricardo Ramirez and Frederick Gruner, Attorney Docket No. NEXSI-01204US0, filed the same day as the present application;
[0008] “Transferring Data Between Cache Memory And A Media Access Controller,” by Frederick Gruner, Robert Hathaway, and Ricardo Ramirez, Attorney Docket No. NEXSI-01211US0, filed the same day as the present application;
[0009] “Processing Packets In Cache Memory,” by Frederick Gruner, Elango Ganesan, Nazar Zaidi, and Ramesh Panwar, Attorney Docket No. NEXSI-01212US0, filed the same day as the present application;
[0010] “Bandwidth Allocation For A Data Path,” by Robert Hathaway, Frederick Gruner, and Mark Bryers, Attorney Docket No. NEXSI-01213US0, filed the same day as the present application;
[0011] “Ring-Based Memory Requests In A Shared Memory Multi-Processor,” by Dave Hass, Frederick Gruner, Nazar Zaidi, Ramesh Panwar, and Mark Vilas, Attorney Docket No. NEXSI-01281US0, filed the same day as the present application;
[0012] “Managing Ownership Of A Full Cache Line Using A Store-Create Operation,” by Dave Hass, Frederick Gruner, Nazar Zaidi, and Ramesh Panwar, Attorney Docket No. NEXSI-01282US0, filed the same day as the present application;
[0013] “Sharing A Second Tier Cache Memory In A Multi-Processor,” by Dave Hass, Frederick Gruner, Nazar Zaidi, and Ramesh Panwar, Attorney Docket No. NEXSI-01283US0, filed the same day as the present application; and
[0014] “Ring Based Multi-Processing System,” by Dave Hass, Mark Vilas, Fred Gruner, Ramesh Panwar, and Nazar Zaidi, Attorney Docket No. NEXSI-01028US0, filed the same day as the present application.
[0015] Each of these related Applications are incorporated herein by reference.
Continuations (1)
|
Number |
Date |
Country |
Parent |
09900481 |
Jul 2001 |
US |
Child |
10105732 |
Mar 2002 |
US |