Computing systems are increasingly integrating large numbers of different types of components on a single chip or on multi-chip modules. The complexity and power consumption of a system increases with the number of different types of components. Often, these components are connected together via switches, routers, communication buses, bridges, buffers, controllers, coherent devices, and other links. The combination of these interconnecting components is referred to herein as a “communication fabric”, or “fabric” for short.
Generally speaking, the fabric facilitates communication by routing messages between a plurality of components on an integrated circuit (i.e., chip) or multi-chip module. Examples of messages communicated over a fabric include memory access requests, status updates, data transfers, coherency probes, coherency probe responses, system messages, and the like. The system messages can include messages indicating when different types of events occur within the system. These events include agents entering or leaving a low-power state, shutdown events, commitment of transactions to long-term storage, thermal events, bus locking events, translation lookaside buffer (TLB) shootdowns, and so on. With a wide variety of messages to process and with increasing numbers of clients on modern system on chips (SoCs) and integrated circuits (ICs), determining how to route the messages through the fabric can be challenging.
The advantages of the methods and mechanisms described herein may be better understood by referring to the following description in conjunction with the accompanying drawings, in which:
In the following description, numerous specific details are set forth to provide a thorough understanding of the methods and mechanisms presented herein. However, one having ordinary skill in the art should recognize that the various implementations may be practiced without these specific details. In some instances, well-known structures, components, signals, computer program instructions, and techniques have not been shown in detail to avoid obscuring the approaches described herein. It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements.
Various systems, apparatuses, and methods for processing multi-cast messages are disclosed herein. In one implementation, a system includes at least one or more processing units, one or more memory controllers, and a communication fabric coupled to the processing unit(s) and the memory controller(s). The communication fabric includes a plurality of crossbars which connect various agents within the system. When a multi-cast message is received by a crossbar, the crossbar extracts a message type indicator and a recipient type indicator from the message. The crossbar uses the message type indicator to determine which set of masks to lookup using the recipient type indicator. Then, the crossbar determines which one or more masks to extract from the selected set of masks based on values of the recipient type indicator. The crossbar combines the one or more masks with a multi-cast route to create a port vector for determining on which ports to forward the multi-cast message. It is noted that while the term “crossbar” is used in the following discussion, various implementations need not be fully connected or otherwise have a particular design. Rather, the term “crossbar” contemplates any type of switching structure with multiple input/output ports that is configured to receive data via one or more ports and selectively convey corresponding data via one or more ports.
Referring now to
In other implementations, computing system 100 includes other components and/or computing system 100 is arranged differently. Processing units 110A-B are representative of any number and type of processing units. For example, in one implementation, processing unit 110A is a central processing unit (CPU) and processing unit 110B is a graphics processing unit (GPU). In other implementations, processing units 110A-B include other numbers and types of processing units (e.g., digital signal processor (DSP), field programmable gate array (FPGA), application specific integrated circuit (ASIC)).
Fabric 115 is representative of any communication interconnect and any protocol for communicating among the components of the system 100. Fabric 115 provides the data paths, switches, routers, multiplexers, controllers, and other logic that connect the processing units 110A-B, I/O interfaces 120, memory controller(s) 125, memory device(s) 130, and other device(s) 140 to each other. Fabric 115 handles the request, response, and data traffic, as well as probe traffic to facilitate coherency. Fabric 115 also handles interrupt request routing and configuration access paths to the various components of system 100. In various implementations, fabric 115 is bus-based, including shared bus configurations, cross bar configurations, and hierarchical buses with bridges. In various implementations, fabric 115 is packet-based, and is hierarchical with bridges, cross bar, point-to-point, or other interconnects. From the point of view of fabric 115, the other components of system 100 are referred to as “clients”. Fabric 115 processes requests generated by various clients and passes the requests on to other clients. In one implementation, fabric 115 includes a plurality of arbitration points and a plurality of masters, with each master abstracting one or more clients and generating or proxying requests into the fabric for the clients. The arbitration points are also referred to as crossbars, switches, or routers.
Memory controller(s) 125 are representative of any number and type of memory controllers accessible by core complexes 105A-N. Memory controller(s) 125 are coupled to any number and type of memory devices(s) 130. Memory device(s) 130 are representative of any number and type of memory devices. For example, in various implementations, the type of memory in memory device(s) 130 includes Dynamic Random Access Memory (DRAM), Static Random Access Memory (SRAM), NAND Flash memory, NOR flash memory, Ferroelectric Random Access Memory (FeRAM), or others. Memory device(s) 130 are accessible by processing units 110A-B, I/O interfaces 120, display controller 135, and other device(s) 140 via fabric 115 and memory controller(s) 125. I/O interfaces 120 are representative of any number and type of I/O interfaces (e.g., peripheral component interconnect (PCI) bus, PCI-Extended (PCI-X), PCIE (PCI Express) bus, gigabit Ethernet (GBE) bus, universal serial bus (USB)).
Various types of peripheral devices are coupled to I/O interfaces 120. Such peripheral devices include (but are not limited to) displays, keyboards, mice, printers, scanners, joysticks or other types of game controllers, media recording devices, external storage devices, network interface cards, and so forth. Other device(s) 140 are representative of any number and type of devices (e.g., multimedia device, video codec).
In various implementations, computing system 100 is a computer, laptop, mobile device, server or any of various other types of computing systems or devices. It is noted that the number of components of computing system 100 vary from implementation to implementation. There can be more or fewer of each component than the number shown in
Turning now to
As shown in
As shown in
In one implementation, probe filter 235 is used to keep track of the cache lines that are currently in use by system 200. While only one probe filter 235 is shown in system 200, it should be understood that system 200 can have multiple probe filters, with each probe filter tracking cache lines for a given memory space or for one or more given memory device(s). A probe filter helps to reduce memory bandwidth and probe bandwidth by performing a memory request or probe request only when required. The operating principle of a probe filter is inclusivity (i.e., a line that is present in a cache within system 200 must be present in the probe filter). In one implementation, probe filter 235 includes cluster information to track a specific cluster in which a cache line resides. For example, each cluster includes a plurality of processing nodes, and the cluster information specifies which cluster, but not which processing node within the cluster, stores a particular cache line. In another implementation, probe filter 235 includes socket information to track a specific socket on which a cache line resides. A socket includes any number of clusters, with the number varying according to the implementation. In one implementation, system 200 includes two sockets, and probe filter 235 includes information on whether a particular cache line is shared by one socket or both sockets as well as on which cluster(s) the particular cache line resides. In some implementations, each probe filter may track independent cluster(s) for each socket in the system, or it may combine the cluster(s) for all the sockets in the system.
Referring now to
In one implementation, each source or destination in the fabric (e.g., fabric 115 of
Turning now to
The source field 405 is used as an index to lookup multi-cast routing table 410 to retrieve a multi-cast route 430. Multi-cast route 430 is a bit vector which indicates which ports of the crossbar lead to any master or link interface unit for the specific source. Generally speaking, this represents a first list of ports (e.g., base route information) that is subsequently modified to create a second list of ports using one or more masks. The crossbar also utilizes the multi-cast mask field 415 as an index into master-type mask table 420. In one implementation, the crossbar interprets multi-cast mask field 415 based on the value of a message type field retrieved from the message. An example of interpreting multi-cast mask field 415 based on the value of a message type field is described in further detail below in the discussion of
Referring now to
It should be understood that the example of crossbar 505 and functional units on each port is exemplary only and specific to one particular implementation. Other implementations can have other numbers of ports with other numbers and types of functional units on the paths leading out of these ports. It is noted that the functional units listed as being on the path leading out of a given port do not need to be directly connected to the given port. Rather, there can be any number of other functional units, crossbars, or other components in between crossbar 505 and these listed functional units. It is also noted that a cluster (e.g., cluster A, cluster B) refers to a coherency cluster of multiple processing nodes. It is further noted that an external socket refers to a socket which is external to the socket containing crossbar 505 and which also includes multiple processing nodes. In some cases, the external socket includes a plurality of coherency clusters.
In one implementation, when crossbar 505 receives a multi-cast message, crossbar 505 retrieves a source ID (e.g., fabric ID), a message type indicator, and a recipient type indicator from the message. Crossbar 505 uses the source ID to select a route from multi-cast routing table 520. While there are two routes shown in table 520—first source route 540 and second source route 542, it should be understood that table 520 also includes any number of other entries. Crossbar 505 interprets the recipient type indicator based on the message type indicator. An example of an interpretation, in accordance with one implementation, is described further below in
In one implementation, crossbar 505 combines the masks selected from table 530 into a single master mask. This single master mask is then combined, using a bitwise AND operation, with the route selected from table 520. The output from the bitwise AND operation is a port vector which specifies the ports on which the multi-cast message should be routed.
Turning now to
When a crossbar receives a multi-cast message, the crossbar extracts the message type field 652 from the message. The crossbar also extracts the recipient type field 658 from the message. The crossbar then interprets the recipient type field 658 according to the fields shown in multi-cast mask table 605A if the message is a coherence probe (e.g., message type field 652=1) or the crossbar interprets the recipient type field 658 according to the fields shown in multi-cast mask table 605B if the message is a system management message (e.g., message type field 652=0). In other words, if the message is a coherence probe, then the first bit of the recipient type field 658 indicates if the message should be sent to coherency cluster 610A, the Nth bit of the recipient type field 658 indicates if the message should be sent to coherency cluster 610N, the subsequent bit indicates if the message should be sent to socket 615A, and the last bit indicates if the message should be sent to socket 615N. If the message is a system management message, then the first bit of the recipient type field 658 indicates if the message should be sent to agent type 620A and the last bit indicates if the message should be sent to agent type 620N. The bits in between indicate if the message should be sent to the other agent types. It is to be understood that the ordering of bits in the mask table 605A is exemplary only. Other ways of organizing the data in the mask tables 605A and 605B are possible and are contemplated.
In one implementation, the fields 610A-N and 615A-N of table 605A specify the types of recipients which should receive the multi-cast message when the multi-cast message is a coherence probe. The fields 620A-N of table 605B specify the agent types that should receive the multi-cast message when the multi-cast message is a system management message. For example, in one implementation, agent type 620A is a coherent agent, agent type 620N is a graphics device, and the other agent types can correspond to other types of devices or functional units. In other implementations, agent types can be assigned in other manners. The agent which generates the multi-cast message populates the recipient type field in the message.
Once the agent(s) who should receive the multi-cast message are identified using table 605A or table 605B, then a lookup of a master-type mask table (e.g., master-type mask table 530 of
Referring now to
An agent generates a multi-cast message to send to a plurality of recipients via a communication fabric (block 705). The agent is any of various types of agents, such as a processor (e.g., CPU, GPU), I/O device, or otherwise. As used herein, a “multi-cast message” is defined as a message which specifically identifies two or more potential recipients. A “multi-cast message” is generally contrasted with a “broadcast message” which does not specifically identify particular recipients but is simply broadcast to all listeners. In some implementations, specifically identifying potential recipients may be accomplished by including information in the message that identifies classes or types of recipients rather than individually identifying each specific potential recipient. Various such embodiments are possible and are contemplated. The agent encodes a message type indicator in the multi-cast message (block 710). For example, in one implementation, the message type indicator is a single bit. In this implementation, the message type indicator specifies whether the message is a coherency probe (e.g., indicator=1) or a system management message (e.g., indicator=0). In other implementations, other numbers of message types are employed and the message type indicator includes other numbers of bits to specify the message type. In some implementations, a slave coupled to the agent which generated the multi-cast message encodes the message type indicator in the message.
Also, the agent encodes a recipient type indicator in the multi-cast message, wherein the recipient type indicator specifies which type(s) of recipients should receive the multi-cast message (block 715). For example, in one implementation, if the message is a coherence probe, then the recipient type indicator specifies to which coherency clusters the message should be sent. In one implementation, the agent determines which coherency clusters to send the message by performing a lookup of a probe filter (e.g., probe filter 235 of
Turning now to
Referring now to
After blocks 920 and 930, the crossbar performs a bitwise OR operation to combine the retrieved masks into a combined mask (block 935). Next, the crossbar performs a bitwise AND operation on the combined mask, a multi-cast route, and a base route to create a port vector (block 940). Next, the crossbar forwards the multi-cast message on ports indicated by bits that are set in the port vector (block 945). After block 945, method 900 ends. It is noted that in other implementations, the message type field includes more than one bit to specify more than two different types of messages. In these implementations, the message type field specifies which set of masks to lookup using the recipient type field.
Turning now to
Next, the crossbar accesses a routing table to identify a first list of ports of the crossbar on which to convey the message (block 1010). Then, the crossbar generates a mask based on the mask data (block 1015). Next, the crossbar modifies the first list of ports based on the mask to create a second list of ports, wherein the second list of ports includes fewer ports than the first list of ports (block 1020). Then, the crossbar conveys the message via the second list of ports (block 1025). After block 1025, method 1000 ends.
In various implementations, program instructions of a software application are used to implement the methods and/or mechanisms described herein. For example, program instructions executable by a general or special purpose processor are contemplated. In various implementations, such program instructions are represented by a high level programming language. In other implementations, the program instructions are compiled from a high level programming language to a binary, intermediate, or other form. Alternatively, program instructions are written that describe the behavior or design of hardware. Such program instructions are represented by a high-level programming language, such as C. Alternatively, a hardware design language (HDL) such as Verilog is used. In various implementations, the program instructions are stored on any of a variety of non-transitory computer readable storage mediums. The storage medium is accessible by a computing system during use to provide the program instructions to the computing system for program execution. Generally speaking, such a computing system includes at least one or more memories and one or more processors configured to execute program instructions.
It should be emphasized that the above-described implementations are only non-limiting examples of implementations. Numerous variations and modifications will become apparent to those skilled in the art once the above disclosure is fully appreciated. It is intended that the following claims be interpreted to embrace all such variations and modifications.
Number | Name | Date | Kind |
---|---|---|---|
5303362 | Butts, Jr. et al. | Apr 1994 | A |
5412788 | Collins et al. | May 1995 | A |
5517494 | Green | May 1996 | A |
5537575 | Foley et al. | Jul 1996 | A |
5560038 | Haddock | Sep 1996 | A |
5659708 | Arimilli et al. | Aug 1997 | A |
5673413 | Deshpande et al. | Sep 1997 | A |
5684977 | Van Loo et al. | Nov 1997 | A |
5749095 | Hagersten | May 1998 | A |
5859983 | Heller et al. | Jan 1999 | A |
5878268 | Hagersten | Mar 1999 | A |
5887138 | Hagersten et al. | Mar 1999 | A |
5893144 | Wood et al. | Apr 1999 | A |
5924118 | Arimilli et al. | Jul 1999 | A |
5966729 | Phelps | Oct 1999 | A |
5987544 | Bannon et al. | Nov 1999 | A |
5991819 | Young | Nov 1999 | A |
6012127 | McDonald et al. | Jan 2000 | A |
6018791 | Arimilli et al. | Jan 2000 | A |
6038644 | Irie et al. | Mar 2000 | A |
6049851 | Bryg et al. | Apr 2000 | A |
6070231 | Ottinger | May 2000 | A |
6085263 | Sharma et al. | Jul 2000 | A |
6098115 | Eberhard et al. | Aug 2000 | A |
6101420 | VanDoren et al. | Aug 2000 | A |
6108737 | Sharma et al. | Aug 2000 | A |
6108752 | Van Doren et al. | Aug 2000 | A |
6112281 | Bamford et al. | Aug 2000 | A |
6138218 | Arimilli et al. | Oct 2000 | A |
6199153 | Razdan et al. | Mar 2001 | B1 |
6209065 | Van Doren et al. | Mar 2001 | B1 |
6249846 | Van Doren et al. | Jun 2001 | B1 |
6275905 | Keller et al. | Aug 2001 | B1 |
6286090 | Steely, Jr. et al. | Sep 2001 | B1 |
6292705 | Wang et al. | Sep 2001 | B1 |
6295583 | Razdan et al. | Sep 2001 | B1 |
6370621 | Keller | Apr 2002 | B1 |
6535489 | Merchant | Mar 2003 | B1 |
6631401 | Keller et al. | Oct 2003 | B1 |
7058053 | Schober | Jun 2006 | B1 |
7237016 | Schober | Jun 2007 | B1 |
20020009095 | Van Doren | Jan 2002 | A1 |
20030002448 | Laursen | Jan 2003 | A1 |
20030088694 | Patek | May 2003 | A1 |
20040114737 | MacConnell | Jun 2004 | A1 |
20050152332 | Hannum | Jul 2005 | A1 |
20070168600 | Anthony, Jr. | Jul 2007 | A1 |
20080016254 | Kruger | Jan 2008 | A1 |
20100014533 | Hirano | Jan 2010 | A1 |
20140115137 | Keisam | Apr 2014 | A1 |
20150188848 | Tran | Jul 2015 | A1 |
20160117248 | Morton | Apr 2016 | A1 |
20160182245 | Chen | Jun 2016 | A1 |
20180302410 | Venkataraman | Oct 2018 | A1 |
Number | Date | Country | |
---|---|---|---|
20200099993 A1 | Mar 2020 | US |