Shared memory subsystems are implemented in a variety of networking and other computing applications. In such a configuration, several initiators may have access to a common memory. To serve a large number of initiators, the shared memory can have a substantial size (e.g., over 10 megabytes (MB)).
Example embodiments include a memory array circuit configured to route packet data through the array. A memory array may have a plurality of memory devices (also referred to as memory structures) arranged in a plurality of rows and columns and an interconnect for routing packets to the devices via a plurality of passthrough channels connecting non-adjacent memory devices. Each of the memory devices may include a memory configured to store packet data, and a packet router configured to interface with at least one non-adjacent memory device of the memory array. The packet router may be configured to: 1) determine a routing direction for a packet, and 2) based on the routing direction, selectively forward the packet to the non-adjacent memory device via a passthrough channel of the plurality of passthrough channels. A memory interface may be configured to route the packet from a source to the memory array, the memory interface selectively forwarding the packet to one of the plurality of memory devices based on the destination address.
The packet router may be further configured to forward the packet to the memory in response to determining a match between the memory device and the destination address. The plurality of passthrough channels may be integral to the plurality of memory devices, and may connect memory devices separated by a single memory device of the memory array. The memory interface may be further configured to 1) identify the destination address based on a packet header of the packet, the destination address indicating one of the plurality of memory devices, and 2) append the destination address to the packet prior to forwarding the packet to the memory array. The appended destination address may include a target identifier (ID) that indicates XY coordinates within the memory array for the one of the plurality of memory devices.
The memory interface may be configured to selectively forward the packet to one of a subset of the memory devices, the subset including 1) two adjacent memory devices in a first column, and 2) two non-adjacent memory devices in a second column distinct from the first column. Each of the plurality of passthrough channels may connect memory devices separated by multiple memory devices. The circuit may further include a plurality of memory interfaces including the memory interface, each of the plurality of memory interfaces configured to route the packet from one or more respective sources to one of a subset of the memory devices coupled to the memory interface.
The packet router may be further configured to select the non-adjacent memory device for forwarding the packet based on a history of at least one memory device to which the packet router forwarded a previous packet. The packet router may select the non-adjacent memory device for forwarding the packet based on a location of the memory interface relative to the memory array. The packet router may select the non-adjacent memory device for forwarding the packet based on a command or other signal issued by the memory interface.
Further embodiments include a method of routing packet data. At a memory device of a memory array having a plurality of memory devices arranged in a plurality of rows and columns and a plurality of passthrough channels connecting non-adjacent memory devices, 1) a destination address for a packet may be determined; and 2) based on the destination address, the packet may be selectively forwarded to a non-adjacent memory device via a passthrough channel of the plurality of passthrough channels. At a memory interface, the packet may be selectively forwarded to one of the plurality of memory devices based on the destination address.
Further embodiments include a node array circuit configured to route packet data through the array. A node array may have a plurality of node devices arranged in a plurality of rows and columns and a plurality of passthrough channels connecting non-adjacent node devices. Each of the node devices may include a node configured to store packet data, and a packet router configured to interface with at least one non-adjacent node device of the node array. The packet router may be configured to: 1) determine a destination address for a packet, and 2) based on the destination address, selectively forward the packet to the non-adjacent node device via a passthrough channel of the plurality of passthrough channels. A node interface may be configured to route the packet from a source to the node array, the node interface selectively forwarding the packet to one of the plurality of node devices based on the destination address.
The foregoing will be apparent from the following more particular description of example embodiments, as illustrated in the accompanying drawings in which like reference characters refer to the same parts throughout the different views. The drawings are not necessarily to scale, emphasis instead being placed upon illustrating embodiments.
A description of example embodiments follows.
A transfer of data between two devices of the array 105 may be referred to as a “hop,” and each hop may involve the transfer of data of any size between the devices. For example, a data transfer may be a parallel transmission of several bytes (e.g., 16, 32, or 64 bytes) of data between the devices. Such transfers occur both for requests from a source 180a and for a response from a target device 106, and each access operation can involve several such transfers between adjacent devices. As represented by the arrows within the array 105, each device 106 may include several channels for such transfers, including channels for 2-way communications with each adjacent device. The bandwidth of the array 105 for such transfers, therefore, is a function of the bandwidth of these inter-device channels. Further, the latency of a transfer operation is defined by two variables: 1) the number of “hops” (devices traversed) for the data to reach its target (e.g., device 106 or source 108a), and 2) the time taken by each device in the path of the transfer operation to forward the data the next device in the path.
Example embodiments, described below, provide a mesh circuit that optimizes the routing of data between external sources and destinations internal to the array. As a result, such embodiments significantly reduce the latency exhibited by data transfer operations within the mesh circuit, thereby reducing the number of pending operations (e.g., outstanding requests) and thus the logic required to manage pending operations, saving power and reducing the circuit's needed area. Such embodiments also as minimize data traffic congestion within the circuit due to faster routing of data, thereby allowing a reduction in the size of buffers (e.g., first in, first out (FIFO) buffers) used to queue data and requests at each device. Although example embodiments below include memory subsystems, further embodiments may be implemented as any other subsystem comprising a mesh network of nodes, such as a distributed processing system comprising an array of processors and/or other devices.
The mesh interface 320 and node 330 may be configured comparably to the memory interface 220a and memory device 230a described above. In particular, the node 330 may include a packet router 332 and a processor 334. The packet router 332 may communicate with the mesh interface 320 via an adjacent channel 336 for communications related to tasks for execution by the processor 334. For example, the memory interface 320 may forward a data packet (indicating, for example, a request to complete a processing task) to the packet router 332 via the adjacent channel 336. The packet router 332 may parse the data packet to determine whether the data packet is addressed to the node 33. If so, then the packet router 332 may access its processor 334 to execute a corresponding task. Otherwise, the packet router 332 may forward the packet to another node toward its destination, as described in further detail below. Further, a passthrough channel 338 may extend from the memory interface 220a, through or around the memory device 230a, terminating at another memory device, thereby enabling the mesh interface 320 to communicate with the other memory device. The passthrough channel 238 may be integral to the device 330 (e.g., a wire extending through the device), or may extend above, below, or around the perimeter of the device 330.
Further, as indicated by the arrows representing adjacent and passthrough channels, each of the devices 230a-d is coupled to at least one other device of the same group via a passthrough channel. For example, device 230a is connected to device 230e, a member of the same “odd row, odd column” (“odd/odd”) group, via a passthrough channel. As a result, the memory interface 220a can access any device of the array 205 through one of the four devices 230a-d to which it is directly connected. In order to forward a data packet (e.g., request, data for processing/storage, command) from the source 280a to a target device addressed by the packet, the memory interface 220a may first forward the packet to one of the devices 230a-d belonging to the same group as the target device, which, in turn, may forward the packet along a path toward its destination.
Upon receipt of the packet at the device 230c, the packet router of the device 230c may determine the destination address for the packet (515). Depending on the information conveyed by the memory interface 220a, the packet router may do so by reading the XY coordinates provided by the memory interface 220a, or by performing an address translation comparable to that performed by the memory interface 220a. Based on the destination address, the device 230c selectively forward the packet to a non-adjacent memory device via a passthrough channel (520). As shown in
As shown in
While example embodiments have been particularly shown and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the scope of the embodiments encompassed by the appended claims.
This application claims the benefit of U.S. Provisional Application No. 63/423,800, filed on Nov. 8, 2022. The entire teachings of the above application are incorporated herein by reference.
Number | Name | Date | Kind |
---|---|---|---|
5920699 | Bare | Jul 1999 | A |
7143219 | Chaudhari et al. | Nov 2006 | B1 |
7739436 | Meyer | Jun 2010 | B2 |
8467213 | Channabasappa | Jun 2013 | B1 |
9569369 | Sedlar | Feb 2017 | B2 |
10141034 | Zitlaw | Nov 2018 | B1 |
10824505 | Swarbrick | Nov 2020 | B1 |
10936486 | Swarbrick | Mar 2021 | B1 |
11769832 | Han | Sep 2023 | B2 |
11923995 | Raleigh | Mar 2024 | B2 |
20030028713 | Khanna et al. | Feb 2003 | A1 |
20090245257 | Comparan | Oct 2009 | A1 |
20090316700 | White | Dec 2009 | A1 |
20100013517 | Manohar et al. | Jan 2010 | A1 |
20120215976 | Inoue | Aug 2012 | A1 |
20130121178 | Mainaud | May 2013 | A1 |
20130329546 | Wijnands | Dec 2013 | A1 |
20140215185 | Danielsen | Jul 2014 | A1 |
20140226634 | Voigt | Aug 2014 | A1 |
20150006776 | Liu et al. | Jan 2015 | A1 |
20150188847 | Chopra | Jul 2015 | A1 |
20150205530 | Eilert et al. | Jul 2015 | A1 |
20170193136 | Prasad | Jul 2017 | A1 |
20170195295 | Tatlicioglu | Jul 2017 | A1 |
20180301201 | Kantipudi | Oct 2018 | A1 |
20200026684 | Swarbrick | Jan 2020 | A1 |
20210385164 | Parmar | Dec 2021 | A1 |
20220150044 | Xiang | May 2022 | A1 |
20230315898 | Alaeddini | Oct 2023 | A1 |
20230316334 | Vankayala | Oct 2023 | A1 |
20240048508 | Viego et al. | Feb 2024 | A1 |
Entry |
---|
Fusella, et al., “Understanding Turn Models for Adaptive Routing: the Modular Approach,” Design, Automation and Test in Europe (2018) 1489-1492. |
Glass, et al., “The Turn Model for Adaptive Routing,” Advanced Computer Systems Laboratory, 1992, 278-287. |
Khan, et al., “Design of a Round Robin Arbiter On Resource Sharing,” Proceedings of 8th IRF International Conference, May 4, 2014, Pune, India. |
Lee, et al., “Probabilistic Distance-based Arbitration: Providing Equality of Service for Many-core CMPs,” 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture, pp. 509-519. |
Mandal, et al., “Theoretical Analysis and Evaluation of NoCs with Weighted Round-Robin Arbitration,” Dept. of ECE, University of Wisconsin-Madison, Aug. 21, 2021. |
Merchant, “The Design and Performance Analysis of an Arbiter for a Multi-Processor Shared-Memory System,” Aug. 1984, Laboratory for Information and Decision Systems, Massachusetis Institute of Technology, Cambridge, Massachusetts 02139. |
Next Hop Definition, Created Nov. 17, 2005, Retrieved from the Internet at http://www.linfo.org/next_hop.html on Sep. 15, 2022, The Linux Information Project. |
Wikipedia, “Mesh Interconnect Architecture—Intel,” Retrieved from the Internet on Nov. 23, 2022 at https://en.wikipedia.org/wiki/Turn_restriction_routing. |
Wikipedia, “Turn restriction routing,” Retrieved from the Internet on Nov. 23, 2022 at https://en.wikichip.org/wiki/Intel/mesh_interconnect_architecture. |
Number | Date | Country | |
---|---|---|---|
63423800 | Nov 2022 | US |