Embodiments of the invention relate to computing architectures. More particularly, embodiments of the invention relate to partitioning of computing platforms.
Logical partitions may be created on a computer system that divide processors, memory and/or other resources into multiple sets of resources that may be operated independently of each other. Each partition may have its own instance of an operating system and applications. Partitions may be used for different purposes, for example, a database operation may be supported by one partition and another partition on the same computer system may support a client/server operation.
In general, there are currently two categories of partitioning, which are hard physical partitioning and software partitioning. Platforms that implement hard physical partitioning schemes transparently support multiple operating systems at a coarse granularity. Platforms that implement software partitioning schemes such as logical partitioning require operating system changes to redefine the boundary between the operating system and the platform, which may not be practical in many situations. Platforms that implement software partitioning schemes such as virtual partitioning require a significantly complex, fragile and often expensive software layer to create virtual partitions.
Embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings in which like reference numerals refer to similar elements.
a is a conceptual illustration of one embodiment of a message header the may carry a partition identifier in an address field.
b is a conceptual illustration of one embodiment of a message header having a field to carry a partition identifier.
In the following description, numerous specific details are set forth. However, embodiments of the invention may be practiced without these specific details. In other instances, well-known circuits, structures and techniques have not been shown in detail in order not to obscure the understanding of this description.
Described herein are various architectures that may support firm partitioning that may extend the concept of hard physical partitioning at finer granularity levels in a transparent fashion without requiring operating system changes or a complex software layer. Firm partitioning may allow support for more hardware partitions in a given system or may allow hardware partitioning in platforms with a limited number of distinct components, as is the case in low-end server or client platforms. This technique becomes increasingly important as the industry transitions to multi-core processors that incorporate sufficient processing resources on a single die to readily support multiple operating system instances.
As described in greater detail below, a single component may assign a portion of its resources to different partitions. Accordingly, a large number of partitions may be supported in a given platform independent of the number of distinct components that comprise the platform. Therefore, the number of hardware partitions supported may be increased for high-end platforms and/or hardware partitioning in platforms such low-end servers or client devices may be provided.
Firm partitioning, as described below, includes a concept by which a system interconnect may support firm partitioning in a platform with point-to-point links. Using prior art techniques, support for finer forms of partitioning without operating system modifications or a complex virtualization layer has not been provided. Hardware partitioning schemes, on the other hand, in the prior art are not able to allocate resources at the granularity of cores or I/O ports.
Conceptually, firm partitioning may be considered a form of hardware partitioning. Firm partitioning may offer the same programming model to system software as a hard physical partition or an unpartitioned platform. Distinctions may only visible to configuration firmware and system management. Firm partitioning may rely more on configuration firmware than hard physical partitions. For example, while hard physical partitions may be configured by service processor or configuration firmware, firm partitions may require configuration firmware to ensure programming model isolation (e.g., independent partition reset may not be fully supported by hardware).
In one embodiment, firm partitioning may result in an execution environment that the operating system cannot distinguish from the full platform and provides programming model isolation. In one embodiment, an operating system running on one firm partition may not be able to affect the operation of an operating system running on another firm partition. Each firm partition may be able to boot an operating system independent of other firm partitions.
In one embodiment, any number of resources (e.g., processing elements, input/output hubs) may be interconnected via point-to-point links that may be used to transport coherent and/or non-coherent requests and responses. In one embodiment, a link protocol may be used to communicate the coherent and/or non-coherent requests and responses.
The example of
Protocol engine 115 may operate to translate requests and responses between the coherency protocol utilized by interconnect 110 and the coherency protocol utilized by the point-to-point links that may be used to interconnect multiple modules. In one embodiment, protocol engine 115 may be coupled with protocol router 120, which may forward messages based on external protocol destination node identifiers included in the messages.
In one embodiment, routing by interconnect 110 may be performed using the destination node identifier that may be included in request, snoop and/or response messages. In one embodiment, a processor, input/output hub, or other module component may have multiple node identifiers and routing tables that may be configured to forward messages with different node identifiers to the same destination. In one embodiment, protocol router 120 may also be coupled with memory controller 130 and coordinate coherency protocol actions for cache lines stored in memory 135.
Similarly, module 140 may, for example, include any number of processing elements (e.g., 145, 147), which may be processing cores, co-processors, or any other type of processing resource. The processing elements may be coupled with interconnect 150, which may function to couple the processing elements with protocol engine 155.
Protocol engine 155 may operate to translate requests and responses between the coherency protocol utilized by interconnect 150 and the coherency protocol utilized by the point-to-point links that may be used to interconnect multiple modules. In one embodiment, protocol engine 155 may be coupled with protocol router 160, which may forward messages based on external protocol destination node identifiers included in the messages.
As described above, routing by interconnect 150 may be performed using the destination node identifier that may be included in request, snoop and/or response messages. In one embodiment, protocol router 160 may also be coupled with memory controller 165 and coordinate coherency protocol actions for cache lines stored in memory 170. Protocol router 160 may be coupled with protocol router 120 via a point-to-point link.
In one embodiment, module 180 may include protocol engine 185 that may be coupled with protocol router 120 via a first point-to-point link. Protocol engine 185 may also be coupled with protocol router 160 via a second point-to-point link. Protocol engine 185 may be coupled with interconnect 190, which may operate in a similar manner as interconnect 110 and interconnect 150 discussed above. Interconnect 190 may be coupled with any number of ports (195, 197), which may include, for example, PCI or PCI Express ports. Interconnect 190 may also be coupled with any number of integrated device 187, including, for example, integrated circuits, etc.
PCI refers to the Peripheral Component Interconnect system that allows system components to be interconnected. Various PCI standards documents are available from the PCI Special Interest Group of Portland, Oregon. The various characteristics of PCI interfaces are well known in the art.
As described in greater detail below, the resources illustrated in
Other embodiments may also be supported. For example, each processing element may have a corresponding protocol with multiple protocol engines coupled with a protocol router. As another example, a single, centrally connected protocol router may be coupled with multiple protocol routers and/or protocol engines to provide a centralized routing configuration.
Partitioning allows a set of computing resources to be isolated and dedicated to a corresponding operating system instance. Using firm partitioning as described herein a resource may serve more than one partition. This is not possible using the hard and soft partitioning techniques available previously.
In one embodiment, protocol routers may be configured so that components of a partition are not physically connected with each other. In order for these components to communicate with each other, traffic may flow through routers that may be located on dies of resources corresponding to a different partition. In one embodiment, firm partitioning may be supported in which resources (e.g., processing cores, memory, PCI Express ports, integrated devices) of a component may be assigned to different partitions.
In one embodiment firm partitioning is supported by associating sufficient information with messages flowing over the internal and external interconnects to logically isolate messages from each partition. The following example, describes a cache coherent request. Other types of messages may be supported similarly.
The request message may result in a snoop of private caches of processing element coupled with the interconnect, 220. In one embodiment, only processing elements that share the internal partition identifier are snooped. In one embodiment having shared cache banks, the internal partition identifier may be included in the cache tag.
If the requested data is retrieved via the local snoop, 225, the requested data may be returned to the source using a source internal partition identifier. In one embodiment, if the requested data is not found in a cache of a processing element coupled with the interconnect, 225, a request may be generated to the protocol engine corresponding to the requesting processing element (e.g., protocol engine 115), 230. In one embodiment, the request to the protocol engine also includes the internal partition identifier.
In one embodiment, a protocol engine (e.g., protocol engine 115) may decode an address corresponding to the request message to determine a node identifier for a resource (e.g., memory controller) that “owns” the memory block corresponding to the request, 240. The protocol engine may use the internal partition identifier to identify a different destination per source because the same address may be used by different sources that belong to different partitions to refer to different physical addresses.
In one embodiment a protocol request message may be generated by a protocol engine and forwarded to a protocol router (e.g., protocol router 120). The protocol engine may transform the internal partition identifier to an external partition identifier. The request message with the external partition identifier may be routed to a destination resource, 250. The protocol request message may pass through any number of protocol routers (e.g., protocol router 120 and protocol router 160) depending on the system configuration before reaching the destination.
In one embodiment, a receiving memory controller, or other resource, (e.g., memory controller 165) may transmit snoop requests to all resources that may be snooped for a copy of the requested data, 260. In one embodiment, snoop requests are transmitted to all memory controllers of a partition and may also be transmitted to all input/output hubs that have the ability to cache data blocks.
In one embodiment, the protocol engine may use the external partition identifier to identify the memory block that corresponds to the request address for the partition. In such an embodiment, the external partition identifier may be included in the snoop request messages. In one embodiment, the receiving protocol engines and/or input/output hubs use the external partition identifier to determine the caches or cache banks that belong to the partition and should be snooped. In one embodiment, the external partition identifier may be transformed to an internal partition identifier upon the snoop request being received by an interconnect (e.g., interconnect 150).
Snoop responses corresponding to the snoop request(s) may be collected by a memory controller (e.g., memory controller 165), 270. The snoop responses may be routed to the originating protocol router (e.g., protocol router 120) through any number of protocol routers depending on the configuration of the host system.
When the snoop responses are received by the originating protocol engine (e.g., protocol engine 115), the external response messages may be translated internal interconnect response messages with the corresponding internal partition identifier, 290. The translate messages may be transmitted to the requesting resource (e.g., processing element 105).
The example of
In one embodiment, protocol routers and/or other system components may include routing tables that may be used to route messages as described above. The routing tables may allow multiple identifiers to correspond to a single component. This may support sharing of resources between multiple partitions.
For example, a memory controller may belong to multiple partitions identified by different partition identifiers. The protocol router or the protocol engine may translate a system address (e.g., <nodeid, physical address) into a unique target device address. The physical address may not be unique across multiple partitions.
In one embodiment, a protocol packet header may carry a partition identifier that may be used for routing of messages. In one embodiment, the upper four address bits may be used to indicate the partition identifier as illustrated in
In
Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
While the invention has been described in terms of several embodiments, those skilled in the art will recognize that the invention is not limited to the embodiments described, but can be practiced with modification and alteration within the spirit and scope the appended claims. The description is thus to be regarded as illustrative instead of limiting.