Multi-core processors were introduced to advance the processor technology space when, for the most part silicon processes capabilities exceeded a single core processor's ability to effectively utilize the area available. Unlike a single-core processor, which generally includes a single processor core in a single Integrated circuit (IC), a multi-core processor generally includes two or more processor cores in a single IC. For example, a dual-core processor comprises two processor cores in a single IC, and a quad-core processor comprises four processor cores in a single IC.
Regardless of the number of processor cores in the IC, the benefit of the multi-core architecture is typically the same: enhanced performance and/or efficient simultaneous processing of multiple tasks (i.e., parallel processing). Consumer and enterprise devices such as desktops, laptops, and servers take advantage of these benefits to improve response-time when running processor-intensive processes, such as antivirus scans, ripping/burning media, file searching, servicing multiple external requests, and the like.
Example embodiments are described in the following detailed description and in reference to the drawings, in which:
Various embodiments of the present disclosure are directed to a multi-core processor architecture. More specifically, various embodiments are directed to a multi-core processor architecture wherein each processor core is allocated to one of a plurality of system images, and the plurality of system images share a common memory component by utilizing an address translation gasket to maintain separation between memory regions assigned to each of the plurality of system images. As described in greater detail below, this novel and previously unforeseen approach provides for more efficient and effective utilization of a single processor socket.
By way of background, there has been recognition that processor densities achievable with current technologies are beyond what a single system image requires for many applications. For these applications, more cores, and in some cases special processing units, do not add value proportional to their incremental costs. Rather, the processing power associated with each core in multi-core processors is often underutilized if utilized at all. While solutions such as “virtualization” and “physicalization” have been introduced to address these inefficiencies, such solutions have their own respective drawbacks. Moreover, they do not squarely address the issue of how to efficiently and effectively utilize each processor core in a multi-core processor. For example, virtualization software (e.g., VMWare) is generally designed to share multiple high-performance processors in a server among multiple system images running under a hypervisor. This software is beneficial because it makes information technology (IT) infrastructure more flexible and simpler to manage. Moreover, it reduces hardware and energy costs by consolidating to a smaller number of highly-utilized servers. However, the virtualization software is often associated with high licensing fees, and the associated hypervisor may be considered a large fault zone or single point of failure. In addition, the virtualization software imposes a performance overhead on the host system. Therefore, while there are various benefits associated with virtualization solutions, there are also various disadvantages associated with the solution.
Physicalization, by contrast, is positioned at the other end of the spectrum from virtualization. Physicalization utilizes multiple light-weight servers comprising lower-performance processors in a dense architecture. The general goal is to achieve maximum value, performance, and/or performance per watt by picking the right size processor for each “micrsoserver” node. The benefit of this approach is that it reduces operating costs by eliminating the need for costly virtualization software, and further by focusing on system packaging efficiency. The drawback, however, is that duplicate components are utilized in each micrsoserver node. For example, input/output components, memory, and/or memory interfaces are redundantly included in each micrsoserver node. Moreover, the “one server, one application” physicalization model is often inflexible and difficult to manage.
Various embodiments of the present application address at least the foregoing by utilizing hardware and/or firmware mechanisms that allow multiple system images to share a single processor socket. Stated differently, various embodiments configure a processor socket to run multiple smaller system images rather then one big system image. While each smaller system images may believe it owns an entire processor socket, in actuality, each system image may be running on a portion of the processor socket and sharing processor components with other system images.
This inventive architecture is realized, in part, by implementing an address translation gasket between the processor cores and a memory interface component. The address translation gasket is configured to maintain separation between the system images, and to allow sharing of a common memory while at the same time preventing access to unauthorized regions of memory. The inventive architecture is further realized by allocating processor cores to different system images, and by sharing high cost and often underutilized components such as input/output and memory by the different system images. As a result, the cost per system image may be reduced, processor cores and associated components may be efficiently utilized, and risk may be mitigated. For example, when compared to virtualization solutions, hypervisor licensing fees and the large fault domain may be eliminated. When compared to physicalization, inflexible provisions and redundant components may be eliminated. Hence, the architecture addresses drawbacks associated with virtualization and physicalization, while at the same time advancing processor efficiency to a level previously unforeseen. This inventive architecture is described further below with reference to various example embodiments and various figures.
In one example embodiment of the present disclosure, a processor is provided. The processor comprises a plurality of processor core components, a memory interface component, and an address translation gasket. Each processor core component is assigned to one of a plurality of system images, and the plurality of system images share a common memory component by at least utilizing the address translation gasket to maintain separation between memory regions assigned to each of the plurality of system images. The memory interface component is shared by the plurality of independent system images. The address translation gasket is configured to intercept transactions bound for the memory interface component comprising a system image identifier and a target address, generate a translation address based at least in part on the system identifier and the target address, and send the translation address to the memory interface component.
In a further example embodiment of the present disclosure, another processor is provided. The processor comprises a plurality of processor core components and an address translation gasket. The plurality of processor core components are each assigned to one of a plurality of system images, and the plurality of system images share a common memory component by at least utilizing the address translation gasket to maintain separation between memory regions assigned to each of the plurality of system images. The address translation gasket is configured to intercept transactions bound for a memory interface component from the plurality of processor core components, and generate translation addresses for the transactions based at least in part on a system image identifier and address associated with the transactions. The address translation gasket is further configured to intercept transactions bound for the plurality of processor core components from the memory interface component, and generate translation addresses for these transactions.
In yet another example embodiment of the present disclosure, a further processor is provided. The processor comprises a plurality of processor core components, a memory interface component, and an address translation gasket. The plurality of processor core components are each assigned to one of a plurality of system images, and the plurality of system images share a common memory component by at least utilizing an address translation gasket to maintain separation between memory regions assigned to each of the plurality of system images. The memory interface component is shared by the plurality of independent system images. The address translation gasket is configured to intercept transactions bound for the memory interface component from the plurality of processor core components, wherein the transactions each comprise a system image identifier and a target address, and wherein the address translation gasket is configured to generate a translation address based at least in part on the system identifier and the target address by at least one of (i) treating the system identifier as one or more additional address bits and concatenating the one or more additional address bits with the target address to produce the translation address; (ii) mapping the system identifier to a fixed address offset, and adding the fixed address offset to the target address to produce the translation address; and (iii) mapping the system identifier and at least a portion of the target address to an assigned portion of memory.
As used herein, a “system image” is intended to refer to a single computing node running a single operating system (OS) and/or hypervisor instance, and comprising at least one processor core, allocated memory, and allocated input/output component.
Each processor core (110-140) is a processing device configured to read and execute program instructions. Each core (110-140) may comprise, for example, a control unit (CU) and an arithmetic logic unit (ALU). The CU may be configured to locate, analyze, and/or execute program instructions. The ALU may be configured to conduct calculation, comparing, arithmetic, and/or logical operations. As a whole, each core may conduct operations such as fetch, decode, execute, and/or writeback. While only four processor cores are shown in
The memory interface component 150 is configured to interface with one or more memory components (not shown in
The address translation gasket 160 is configured to intercept transactions bound for the memory interface component 150, and to obtain a target address and a system image identifier from each transaction. The address translation gasket 160 may use the system image identifier to identify a memory region assigned to the system image. This may be accomplished, for example, by applying an offset, or by providing a lookup function to map blocks of the system image's address space to locations in the shared memory pool. The address translation gasket 160 may then generate a translation address and check to ensure that the translation address does not reach beyond the memory range allocated to the system image before transmitting the translation address to the memory interface component 150. The memory interface component 150 may operate solely on the translated transaction it receives. Since addresses associated with different system images are not allowed to overlap, coherency flow naturally works in the environment. Once accesses are processed in the memory interface, the address translation gasket 150 also provides a reverse address translation to convert addresses bound for the system cores (110-140) back to values the cores (110-140) expect.
Each input/output component (170-190) is configured to provide for the data flow to and from the processor's other internal components (e.g., the processor cores) and components outside of the processor on the board (e.g., a video card). Example input/output components may be, for example, configured in accordance with peripheral component interconnect (PCI), PCI-extended (PCI-X), and/or PCI-express (PCIe). Such input/output component may serve as a motherboard-level interconnects, connecting the processor 100 with both integrated-peripherals (e.g., processor mounted integrated circuits) and add-on peripherals (e.g., expansion cards). Similar to described above with respect to the processor cores, it should be understood that the input/output components (170-190) on the processor 100 do not have to be identical, and each can vary in terms of capabilities, for example.
In various embodiments, the plurality of processor core components (110-140), the memory interface component 150, the address translation gasket 160, and the plurality of input/output components (170-190) may be integrated onto a single integrated circuit die. Alternatively, in various embodiments, the plurality of processor core components (110-140), the memory interface component 150, the address translation gasket 160, and the plurality of input/output components (170-190) may be integrated onto multiple integrated circuit dies in a single chip package. Regardless of the implementation, the plurality of processor core components (110-140), the memory interface component 150, the address translation gasket 160, and the plurality of input/output components (170-190) may be communicatively coupled via one or more communication busses.
Turning now to the processor 100 operation, various embodiments of the present disclosure deploy multiple system images on the single processor 100. The system images may be independent insofar as one system image may not be influenced, controlled, and/or dependent upon another system image. The system images may be isolated insofar as each system image may be separated from one another such that information with respect to one system image may not be accessible by another system image. For example, a system image with a first company's data may not be influenced or accessible by a system image with a second company's data, even though both run on a single processor. This may be accomplished, in part, by operations conducted at the address translation gasket 160. In particular, the address translation gasket 160 is configured to intercept transactions bound for the memory interface 150 and from the processor cores (110-140). The address translation gasket 160 obtains at least a target address and a system image identifier from each intercepted transaction, and generates a translation address based on the target address and/or the system image identifier (e.g., by mapping the target address and/or a system image identifier to an assigned address range in a physical memory). The address translation gasket 160 then provides this translation address to the memory interface 150. As a result, the address translation gasket 160 is able to act as an intermediate between the processor cores (110-140) and the memory interface 150, and thereby control which portion of memory the processor cores (110-140) access, as well as ensure that the processor cores (110-140) are not accessing portions of memory outside the portion(s) allocated to the respective processor core(s). The address translation gasket 160 provides similar reverse translation functions for transactions from the memory interface 150 and bound for the processor cores (110-140). In this direction, the address translation gasket 160 reverse translates the transaction such that the processor cores (110-140) receive expected transaction values.
With respect to allocation between processor cores (110-140) and system images, each of the plurality of processor cores (110-140) may be allocated to a different independent and isolated system image. Alternatively or in addition, a group processor cores (110-140) may be allocated to an independent and isolated system image. For example, as shown in
Other processor components may be similarly allocated or shared by one or more of the system images. For example, as shown in
Management logic may be configured to allocate the processor cores (110-140), the memory interface components (150-160), and/or the input/output components (170-190) to the various system images. In some embodiments, one or a group of processor cores may be designated as the “monarch,” and configured to execute the management logic to provide for the allocations. That is, one or a group of processor cores may be responsible for allocating the plurality of processor core components, as well as the memory interface and input/output components, to the various system images. In addition, the monarch may be responsible for, e.g., enabling/disabling the processor core components, allocating shared memory capacity to the system images (discussed in greater detail with respect to
In alternative embodiments, a separate management component may be included in the processor 100 to conduct the above-mentioned functionality of the monarch processor core(s) via management logic. Therefore, in that implementation, a monarch processor core or group of processor cores may not be utilized.
The processor 100 is similar to the processor described above with respect to
The system images (e.g., system image #0, system image #1, and system image #2) and their respective cores (110-140) may share the memory capacity of the memory. That is, a portion of memory capacity of the memory 210 may be assigned to each of the plurality of independent and isolated system images. For example, as shown in
In some embodiments, the memory 210 may be partitioned based on address ranges. For example, system image #0 may be assigned address range 0-200, system image #1 may be assigned address range 201-300, and system image #2 may be assigned address range 301-400. While only one memory is shown (i.e., memory 210), it should be understood that in various embodiments the system images utilize multiple memories that may be different in terms of type, size, speed, and/or other parameters. For example, the system images may utilize a first and second memory with different storage capacities. Furthermore, while
More specifically, each system image is assigned a system image identifier. In the system shown in
In one implementation of this embodiment, if all the system ID combinations are not in use, a system ID that is not in use may be effectively given to another system ID by choosing to provide an extra address bit to the system ID. For example, if there are assigned system images “00,” “10,” and “11,” and half of the total memory space is to go to system image “00,” when a transaction from system image “00” is provided to too address translation gasket, the address translation gasket may not force use of the second address bit of the system ID (i.e., “0”), but rather allow one more bit of the address to be utilized, and therefore the resulting system ID that is concatenated with the address may be “01” rather than “00.” Put another way, the most significant bit of the system ID may be used, and the next bit of the system ID may be determined by an address bit instead of the second system ID bit.
More precisely, each system image is assigned a system image identifier. In the system shown in
In particular, each system image is assigned a system image identifier. In the system shown in
The process may begin at block 610, when the address translation gasket 160 receives a transaction comprising a system image identifier and a target address. The address translation gasket 160 may then proceed to translate the target address at block 620 by treating the system image identifier as one or more additional address bits, and by concatenating the one or more additional address bits with the target address to produce the translation address. Alternatively, the address translation gasket 160 may translate the target address at block 630 by mapping the system image identifier to a fixed address offset value, and at block 640 adding the fixed address offset value to the target address to produce the translation address. Alternatively, the address translation gasket 160 may translate the target address at block 650 by mapping the system image identifier and at least a portion of the target address to a memory block, and at block 660 generating the translation address based at least in part on the memory block. Regardless of the manner utilized to translate the address and obtain the translation address, at block 670, the address translation gasket checks the translation address to confirm that the translation address is within the address range assigned to the particular system image. Once this point is confirmed, at block 680, the translation address is sent from the address translation gasket 160 to the memory interface 150.
The present disclosure has been shown and described with reference to the foregoing exemplary embodiments. It is to be understood, however, that other forms, details, and embodiments may be made without departing from the spirit and scope of the disclosure that is defined in the following claims.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US2012/035776 | 4/30/2012 | WO | 00 | 9/24/2014 |