Computer networks and systems have become indispensable tools for modern business. Today terabytes or more of information on virtually every subject imaginable are stored and accessed across networks. Some applications, such as telecommunication network applications, mobile advertising, social media applications, etc., demand short response times for their data. As a result, new memory-based implementations of programs, such as in-memory databases, are being employed in an effort to provide the desired faster response times. These memory-intensive programs primarily rely on large amounts of directly addressable physical memory (e.g., random access memory) for storing terabytes of data rather than hard drives to reduce response times.
The following description illustrates various examples with reference to the following figures:
For simplicity and illustrative purposes, the principles of this disclosure are described by referring mainly to examples thereof. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the examples. It is apparent that the examples may be practiced without limitation to all the specific details. Also, the examples may be used together in various combinations.
A node-based computing device, according to an example, includes a processor node and memory nodes. The processor node may be communicatively coupled to the memory nodes via interconnects, such as point-to-point links. Further, the memory nodes may also be communicatively coupled to each other via interconnects, such as point-to-point links. Each memory node may be a memory subsystem including a memory controller and memory to store data. Each memory node may also include routing logic to route message data to a destination, which may be another memory node, processor node, or an input/output (“I/O”) port in the node-based computing device. Collectively, the memory nodes may provide a main memory address space for processor nodes.
Examples may use the memory nodes and the point-to-point links of a node-based computing device as a messaging fabric to communicate different protocol types carrying different types of messages, such as cache coherency messages, memory access command messages, and I/O messages, to given memory nodes, I/O ports, or processors of the node-based computing device.
In an example discussed herein, a node-based computing device may include processor nodes and memory nodes. Each memory node may include local memory. The local memory from the memory devices may collectively form a main memory address space of the node-based computing device. Point-to-point links may communicatively couple the memory nodes to one of the processor nodes, the memory nodes to another processor node, and the memory nodes to each other. One of the processor nodes may include a processor-side memory controller. The processor-side memory controller may establish a virtual circuit between the processor node and another processor node. The virtual circuit may dedicate a path through the memory nodes. The processor-side memory controller may communicate a cache coherency message to the another processor node using the path dedicated through the virtual circuit.
In another example, a processor node may detect a high use memory node. A high use memory node may be a memory node from memory nodes communicatively coupled to the processor node. The memory nodes may form an addressable memory space for the processor node. The processor node may establish a virtual circuit that dedicates a communication path from the processor node of the high use memory node. The processor node may also communicate subsequent messages through the virtual circuit. The memory nodes may forward the subsequent messages according to the dedicated path.
These and other examples are now described.
The processor nodes 110a,b may, in some cases, be connected to each other with a direct processor-to-processor link 150, which may be a point-to-point link that provides a direct communication channel between the processor nodes 110a,b. In some cases, the processor nodes 110a,b may be configured to use the direct processor-to-processor link 150 to communicate high priority message data that is destined for each other. For example, the processor node 110a may send a cache coherency message destined for the processor node 110b through the direct processor-to-processor link 150 to avoid multiple hops through the memory nodes 130a-i.
The node-based computing device 100 may include memory nodes 130a-i that may also be connected together via point-to-point links 131, which are inter-node point-to-point links. Each memory node can operate as a destination of message data if the data to be accessed is stored at the memory node, and as a router that forwards message data along a path to an appropriate destination, such as another memory node, one of the processor nodes 110a,b, or the I/O port 112. For example, the processor-side memory controllers 111a,b can send memory access command messages, e.g., read, write, copy, etc., to the memory nodes 130a-i to perform memory access operations for the processor nodes 110a,b. Each memory node receiving message data may execute the command if that memory node is the destination or route the command to its destination memory node. The node-based computing device 100 may provide memory scalability through the point-to-point links 131 and through the ability to add memory nodes as needed, which may satisfy the memory capacity requirements of big-data workloads. Scaling up memory capacity in the node-based computing device 100 may involve, in some cases, cascading additional memory nodes.
The node-based computing device 100 may establish virtual circuits to provide quality of service (“QoS”) provisions for messages communicated through the node-based computing device 100. For example, the node-based computing device 100 may establish a virtual circuit, such as the virtual circuit 160, to provide performance bounds (and thereby, latency bounds) and band bandwidth allotments for given types of messages communicated through the memory nodes 130a-i. The virtual circuit 160 may be based on connection oriented packet switching, meaning that data may be delivered along the same memory node path. A possible advantage with a virtual circuit over connectionless packet switching is that in some cases bandwidth reservation during the connection establishment phase is supported, making guaranteed QoS possible. For example, a constant bit rate QoS class may be provided, resulting in emulation of circuit switching. Further, in some cases, less overhead may be used, since the packets (e.g., messages) are not routed individually and complete addressing information is not provided in the header of each data packet. Instead, a virtual channel identifier is included in each packet. Routing information may be transferred to the memory nodes during the connection establishment phase.
In
As
In terms of routing, the memory-side memory controller 132g (or any other memory-side memory controller) may receive message data, determine whether the message data relate to a memory address mapped to local memory of the memory node 130g, and, if so, the memory-side memory controller 132g fetches data from the local memory. If the memory node 130g is not the destination, the memory-side memory controller 132g may send the message data to a next hop in the node-based computing device 100 toward the destination along one of the point-to-point links 131.
The method 200 may begin at operation 202 when the processor-side memory controller 111a establishes the virtual circuit 160. The virtual circuit 150 may include a path within the memory nodes between the memory processor node 110a and the processor node 110b. In some cases, the virtual circuit 150 may act as a dedicated path (e.g., memory nodes 130g-i, and corresponding point-to-point links) between the processor nodes 110a,b which may be used to communicate messages of a given type. Cache coherency messages are an example of a message type in which the virtual circuit 150 may be used to transmit. In some cases, establishing the virtual circuit 150 may involve the processor-side memory controller 111a reserving performance properties for the virtual circuit, such as a bandwidth or a priority. With virtual circuits, the processor-side memory controller 111a can also apply dynamic voltage and frequency scaling (“DVFS”) on different virtual channels in the node-based computing device to favorably deliver power to the memory nodes and links with high priority virtual channels, ensuring, in some cases, that messages are delivered in time to meet the QoS goals. In an example, the node-based computing device 100 can have a power budget, and DVFS can be applied to speed up the virtual circuits by increasing the voltage and/or frequency of the point-to-point links and memory nodes in those circuits. The power budget is maintained by adjusting (e.g., decreasing) the voltage and/or frequency of other paths (e.g., the point-to-point links and memory nodes) in the node-based computing device 100. Thus, the speed of the connections in the node-based computing device can vary while maintaining an overall energy budget. Note that applying DVFS in memory nodes 130a-i and point-to-point links 131 may lead to asynchronous network designs. To solve this problem, memory nodes can include buffers to allow for additional packet/message buffering at each node to compensate for the slower rates.
At operation 204, once the virtual circuit 150 between the processor nodes 110a,b is established, the processor node 110a may communicate a cache coherency message to the processor node 110b using the virtual circuit 150. As just discussed above, the virtual circuit 150 may be used to provide a dedicated path through the memory nodes 130a-i for a given type of message. Accordingly, according to the virtual circuit 160, the cache coherency message 140 may travel from the processor-side controller 111a to the memory-side memory controller 132g, to the memory-side memory controller 132h, to the memory-side memory controller 132i, and, finally, to the processor-side memory controller 111b.
As discussed above, the method of 200 may be performed by the node-based computing device 100 during system start-up to reserve or otherwise establish a virtual circuit usable to provide performance bounds (and thereby, latency bounds) and band bandwidth allotments for given types of messages communicated through the memory nodes 130a-i. In additional or alternative cases, a virtual circuit may be established dynamically during runtime. An example of a case where a virtual circuit can be established during runtime is where a processor node is likely to access a given memory node (e.g., the memory node is known to have data relevant to the processor node). For example, a virtual circuit can be established for a processor node executing a Hadoop worker compute node and the memory node holding its associated map/reduce data. The benefits of these virtual circuits may be to provide dedicated routing paths (and thereby, latency bounds) and bandwidth allotments for specific traffic.
The method 300 may begin at operation 302 when the processor-side memory controller 111a detects that a memory node may be a high use memory node. A high use memory node may refer to a memory node that is likely to be the destination of subsequent memory access messages. A number of techniques can be used to signal that a communication path will be high use path. In some cases, the instruction set architecture (“ISA”) of the processor node 110a can provide explicit instructions for a programmer/compiler to signal that memory access to a given region (e.g., memory address, data structure, memory node) should be optimized by the processor-side memory controller entity and the node-based computing device. The ISA may also have an instruction to disable a memory address as a high use path.
In other cases, a node-based computing device 100 can predict when a region or path to a memory node should be optimized. For example, the processor-side memory controller can make such predictions by using performance counters that create a virtual circuit after a rate of activity within a time frame exceeds a threshold amount, and then disables a virtual circuit after a rate of inactivity within a time frame exceeds a threshold amount. The performance counters may be specific to messages being sent to a given address (or range of addresses) or a given memory node. These predictions could detect so called hot zones and cold zones within the node-based computing device in a manner that does not involve programmer or compiler assistance.
Upon detecting the high use memory node, the processor-side memory controller 111a may then, at operation 304, establish a virtual circuit between the processor node 110a and the high use memory node. With virtual circuits, the processor-side memory controller 111a can also apply DVFS on different virtual channels in the node-based computing device to ensure that power is favorably delivered to the memory nodes and links with high priority virtual channels, ensuring that important messages are delivered in time to meet the QoS goals. In an example, the node-based computing device can have a power budget, and DVFS can be applied to speed up the virtual circuits by boosting the voltage and/or frequency of the links and nodes in those circuits. The power budget is maintained by adjusting the voltage and/or frequency of other paths in the node-based computing device. Thus, the speed of the connections in the node-based computing device can vary while maintaining an overall energy budget. Note that applying DVFS in memory nodes and point-to-point links may lead to asynchronous network designs. To solve this problem, memory nodes can include buffers to allow for additional packet/message buffering at each node to compensate for the slower rates.
With continued reference to
As described above, in some cases applying DVFS may lead to asynchronous network designs. Also described above, some implementations of the memory-side memory controllers may include storage buffers to allow for the memory-side memory controllers to buffer incoming messages at varying (e.g., slower) rates.
An additional option to ease the asynchronous challenges is to partition the node-based computing device into different voltage and frequency domains. Approaches adopting voltage and frequency domains can reduce the degree of asynchrony that each channel of a point-to-point link could potentially observe. In these designs, DVFS is applied to a voltage and frequency domain rather than the node-based computing device as a whole.
Some voltage and frequency domains may include buffers that are optimized for a range of DVFS values. Thus, one domain may include buffers of greater sizes than the buffers found in other domains to accommodate a greater step down in speed. Furthermore, other domains may not allow DVFS and are therefore optimized for a single frequency/voltage model. This hybrid/nonhomogeneous configuration can balance runtime flexibility and design time ease.
The processor 510 may be a central processing unit (CPU), a semiconductor-based microprocessor, a graphics processing unit (GPU), other hardware devices or circuitry suitable for retrieval and execution of instructions stored in computer-readable storage device 520, or combinations thereof. For example, the processor 510 may include multiple cores on a chip, include multiple cores across multiple chips, multiple cores across multiple devices, or combinations thereof. The processor 510 may fetch, decode, and execute one or more of the virtual circuit memory controller instructions 522 to implement methods and operations discussed above, with reference to
Computer-readable storage device 520 may be any electronic, magnetic, optical, or other physical storage device that contains or stores executable instructions. Thus, computer-readable storage device may be, for example, Random Access Memory (RAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), a storage drive, a Compact Disc Read Only Memory (CD-ROM), non-volatile memory, and the like. As such, the machine-readable storage device can be non-transitory. As described in detail herein, computer-readable storage device 520 may be encoded with a series of executable instructions for communicating message data through a node-based computing device using a virtual circuit.
As used herein, the term “computer system” may refer to one or more computer devices, such as the computer device 500 shown in
While this disclosure makes reference to some examples, various modifications to the described examples may be made without departing from the scope of the claimed features.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2014/047703 | 7/22/2014 | WO | 00 |