Not applicable.
Not applicable.
Not applicable.
Data centers may comprise large clusters of servers. Data center servers may accept requests from users and respond to such requests. For example, servers may host data and transmit such data to a user upon request. A server may also be configured to host processes that perform various functionalities. As such, a user may transmit a request to a server to perform afunctionality, the server may perform the functionality by executing a process, and then the server may respond to the user with the results of the function. A server may comprise computing components, data storage components, communication components, and other components to process user requests and communicate with the user. Such components may be interconnected using various networking devices and techniques.
In one embodiment, the disclosure includes a network element (NE) comprising a processor configured to receive a resource request via a Peripheral Component Interconnect (PCI) Express (PCI-e) network from a first device, wherein the first device is external to the NE, and query an access control list to determine whether the first device has permission to access a resource.
In another embodiment, the disclosure includes an apparatus comprising a memory comprising instructions, and a processor configured to execute the instructions by allocating a resource of a shared device for use by an external device over a PCI-e network by updating a resource allocation table.
In another embodiment, the disclosure includes method comprising determining a first resource allocation for an external device by receiving data from a resource allocation table, and transmitting a first resource request to a first resource allocated to the external device, wherein the first resource request is transmitted to the first resource via a PCI-e network.
These and other features will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings and claims.
For a more complete understanding of this disclosure, reference is now made to the following brief description, taken in connection with the accompanying drawings and detailed description, wherein like reference numerals represent like parts.
It should be understood at the outset that, although an illustrative implementation of one or more embodiments are provided below, the disclosed systems and/or methods may be implemented using any number of techniques, whether currently known or in existence. The disclosure should in no way be limited to the illustrative implementations, drawings, and techniques illustrated below, including the exemplary designs and implementations illustrated and described herein, but may be modified within the scope of the appended claims along with their full scope of equivalents.
In a server, processors and/or processor clusters may be interconnected with resources, such as process accelerators, memory storage devices, and input/output (I/O) devices by a PCI-e architecture and/or a related protocol(s) as set forth in PCI Special Interest Group (PCI-SIG) document PCI-e Base Specification Revision 3.0, which is hereby incorporated by reference. PCI-e may be employed to allow processors at the server level (e.g. within a single server) to share resources. With the increasing emphasis on cloud computing, there has been interest in expanding PCI-e to interconnect components at the rack level and/or data center level (e.g. interconnection between a plurality of servers and a shared blade server).
Disclosed herein is an architecture for sharing resource(s) between a plurality of servers over a PCI-e network without requiring Single-Root I/O Virtualization (SR-MV), Multi-Root I/O Virtualization (MR-MV), and/or other virtualization support. The architecture may comprise a plurality of computing nodes, a management entity, and one or more shared resources, which may be interconnected via the PCI-e network. Each server may comprise at least one processor. The shared device may comprise an access control list and/or a resource allocation table which may comprise permissions and resource allocation for each processor, respectively. The management entity may manage resource sharing by managing the access control list and/or the resource allocation table. As an example, the management entity may assign a unique bus identifier (BID) to each processor and associate permissions and/or resource allocation to the processor via the BID. The management entity may be positioned on the device comprising the shared resource or may be positioned in a separate component. Resources may be shared amongst processors and/or dynamically provisioned for exclusive use by a specific processor as needed. The servers and/or processors may be unaware of the sharing, which may promote security and isolation and may allow legacy devices to connect to the shared resources without attendant upgrades. The shared resources may comprise process accelerators such as application specific integrated circuits (ASICs), field programmable gate arrays (FPGA), graphics processing units (GPUs), digital signal processors (DSPs), etc., memory storage devices such as cache, long term storage, etc., and network communication devices such Ethernet controllers, transmitters, receivers, transceivers, etc.
Servers 110 may be configured to host processes, data, and/or respond to user and/or administrator requests. Servers 110 may comprise processor(s) 115, which may execute commands to perform the functions which may be required of the server 110. Processors 115 may use multithreading and/or other technologies to process a plurality of requests substantially simultaneously. Processors 115 may comprise a single processor, a processor cluster, and/or groups of processor clusters. Processors 115 may receive input, process requests, and generate output. In order to perform functions, processors 115 may require access to resources which may transmit data to and from the processor (e.g. I/O), perform part of a process (e.g. process accelerators), and/or store data. Some resources may be located inside a server 110 and may be dedicated for the user of the processors 115 of that server 110. Other resources may be located in other components (e.g. shared device 120) and may be shared by a plurality of processors 115 in a plurality of servers 110.
Processors 115 may transmit resource requests to the shared device 120. The shared device may comprise a plurality of resources and may respond to such resource requests. For example, a shared device may comprise process accelerators such as application specific integrated circuits (ASICs), field programmable gate arrays (FPGA), graphics processing units (GPUs), digital signal processors (DSPs), etc. Process accelerators may be optimized for a specific task and may perform such specific tasks more quickly and/or efficiently than a general processing unit (e.g. processors 115). A processor 115 wishing to offload all or part of a particular process may transmit a resource request to shared device 120, and shared device 120 may employ process accelerators to complete the process and transmit resulting data back to the requesting processor 115. As another example, shared device 120 may comprise memory storage devices such as cache (e.g. temporary storage) and long term storage (e.g. disk drives, solid state drives, redundant array of independent disks (RAID), etc.) A shared device 120 comprising a memory storage device may store data from a processor 115 and return such data to the processor 115 on request. As another example, shared device 120 may comprise network communication devices such as an Ethernet card, an optical interconnect, an Open Systems Interconnection (OSI) model layer 1 communications device, and/or any other transmitter, receiver, and/or transceiver. A shared device 120 comprising a network communication device may communicate with other components across a transport network and communicate such data to/from the associated processor 115 across the PCI-e network 130. It should be noted that shared device 120 may be dedicated to a single resource, a single type of resource, or may comprise a plurality of unrelated resources. It should also be noted that while only one shared device 120 is shown for reasons of clarity, a plurality of shared devices 120 may be easily employed in network 100.
The management entity 140 may manage permissions and/or resource allocations for one or more shared devices 120. The management entity 140 may assign a BID to each server 110, each processor 115, each shared device 120, and/or combinations thereof. The management entity 140 may also dynamically assign permissions and/or allocate resources to servers 110 and/or processors 115 by indicating changes to permissions of a BID and/or indicating which resources are allocated to a BID, respectively, at a specified time. The management entity 140 may communicate with processors 115 and/or servers 110 and may maintain an awareness resource needs of network 100, a subportion of network 100, a particular server 110, and/or particular processors 115 so that resource allocations may be changed as the needs of the network 100 and associated components change.
Servers 110, shared device 120, and/or management entity 140 may each be positioned on a separate NE (e.g. network node) and be interconnected via PCI-e network 130. PCI-e network 130 may comprise a switched fabric of parallel and/or serial buses, which may interconnect the components of network 100. For example, PCI-e network 130 may connect to servers 110 and/or connect directly to processors 115 and interconnect the servers 110/processors 115 with shared devices 120 and/or management entity 140. PCI-e network 130 may transport communications between the network 100 components using switching mechanisms and associated data communication protocols.
Network 100 may be implemented such that servers 110 and/or processors 115 may request resources of shared device 120 without knowledge of the associated resource allocation. As such, sharing may be extended to legacy servers 110 and/or processors 115 without attendant upgrades (e.g. Virtualization Technology for directed I/O (VT-d), etc.) The management entity 140 may maintain awareness of the resource needs of the network 100, assign a BID to each processor 115 and manage the resource allocations and permissions of the shared device 120. When a shared device 120 receives a resource request from a server 110/processor 115, the shared device may provide access to appropriate resources based on the allocations and permissions set by the management entity. The resource sharing of network 100 may implemented in a simple manner in comparison to more complex resource sharing implementations such as SR-IOV and/or MR-IOV. Unlike SR-IOV and/or MR-IOV, which may require standardization to allow servers 110 and/or processors 115 to be aware of the resources of the shared device 120, the resource sharing of network 100 may not require any modification of software operating on the servers 110 and/or processors 115. As such, servers 110 and/or processors 115 may not be required to specify the resources to be allocated as part of a resource request.
Logic unit 350 may be a general processing unit, an ASIC, or other device configured to process requests. Logic unit 350 may receive a resource request via the PCI port 371 from a processor such as processor 115. The resource request may be received at the registers 351. The request may comprise a BID associated with the processor. The logic unit 350 may query the ACL 352 to determine what permissions (e.g. read, write, execute, etc.) are associated with the BID. If the BID does not have permission to perform an action associated with the resource request, the logic unit 350 may drop the request and/or transmit an error to the processor. Otherwise, the logic unit 350 may query the resource allocation table 354 to determine the resource allocation associated with the BID. Once the logic unit 350 determines the resource allocation for the processor, the logic unit 350 may transmit the request to the state management device 353. The state management device 353 may maintain the state of the PCI-e connections and interpret the resource request in light of the resource allocation. The state management device 353 may generate a packet to the resource(s) allocated to the processor (e.g. resource 381, 382, and/or 383) in a format that may be understood by the allocated resource. The state management device 353 may also maintain a queue and transmit the packets to the resources 381-383 based on the availability of a specified resource 381-383.
Resources 381-383 may be process accelerators, data storage devices, network communication devices, or combinations thereof. Resources 381-383 may receive the requests from the state management device 353, interpret the requests, and respond based on the request and the nature of the resource 381-383. For example, a process accelerator may perform the requested process and transmit the results back to the requesting processor via logic unit 350 and via PCI port 371, through I/O ports 381-383, or combinations thereof. As another example, a data storage device may store and/or retrieve data to/from storage positioned inside the shared device 300 or connected to the shared device 300 via I/O ports 381-383, e.g. via a serial advanced technology attachment (SATA) connection. As another example, a network communication device may receive a packet for transmission from the state management device 353 and transmit the packet via I/O ports 381-383, for example using an Ethernet connection and via a Serial Gigabit Media Independent Interface (SGMII) interface. As another example, a network communication device may receive a packet from I/O ports 381-383 and forward the packet to the logic unit 350 for transmission to the appropriate processor via PCI-e port 371.
Resources 381-383 may be dedicated to a particular processor at a specified time or may be shared (e.g. based on the queue at the state management device 353) based on the resource allocation in the resource allocation table 354. Resources may therefore be reallocated dynamically by updating the resource allocation table 354 and/or the permissions at the access control list 352. As discussed above, the resource allocation table 354 and/or the access control list 352 may be managed by a management entity such as management entity 140, 240, and/or 241. In the case of management entity 240 and/or 241, the management entity may be implemented on logic unit 350. In the case of management entity 140, the management entity may update the resource allocation table 354 and/or the access control list 352 by a communication with logic unit 350 via PCI-e port 371, I/O ports 381, 382, and/or 383, and/or by some other connection.
At a later time, the management entity may determine to update the resource allocation for the processor. The management entity may send an update message 440 to the resource allocation table. The update message 440 may comprise data indicating the BID of the processor and that resource 2 is allocated to the BID. The processor may send a resource request 411 to the shared device, in a substantially similar manner to resource request 410. The shared device may send a query 423 to the resource allocation table and receive a reply 424 indicating that resource 2 is allocated to the BID associated with the processor. Based on the updated resource allocation received in reply 424, the shared device may covert the resource request into a packet 431 that can be received by resource 2 and may transmit the packet 431 to resource 2. By implementing method 400, a management entity may dynamically change a resource allocation for a particular processor without the processor being aware of the allocation. Method 400 may allow resource 1 and/or resource 2 to be shared between multiple processors without requiring the processors manage or even be aware of such sharing.
It is understood that by programming and/or loading executable instructions onto the NE 500, at least one of the processor 530, downstream ports 520, Tx/Rxs 510, memory 532, and/or upstream ports 550 are changed, transforming the NE 500 in part into a particular machine or apparatus, e.g., a multi-core forwarding architecture, having the novel functionality taught by the present disclosure. It is fundamental to the electrical engineering and software engineering arts that functionality that can be implemented by loading executable software into a computer can be converted to a hardware implementation by well-known design rules. Decisions between implementing a concept in software versus hardware typically hinge on considerations of stability of the design and numbers of units to be produced rather than any issues involved in translating from the software domain to the hardware domain. Generally, a design that is still subject to frequent change may be preferred to be implemented in software, because re-spinning a hardware implementation is more expensive than re-spinning a software design. Generally, a design that is stable that will be produced in large volume may be preferred to be implemented in hardware, for example in an ASIC, because for large production runs the hardware implementation may be less expensive than the software implementation. Often a design may be developed and tested in a software form and later transformed, by well-known design rules, to an equivalent hardware implementation in an application specific integrated circuit that hardwires the instructions of the software. In the same manner as a machine controlled by a new ASIC is a particular machine or apparatus, likewise a computer that has been programmed and/or loaded with executable instructions may be viewed as a particular machine or apparatus.
At least one embodiment is disclosed and variations, combinations, and/or modifications of the embodiment(s) and/or features of the embodiment(s) made by a person having ordinary skill in the art are within the scope of the disclosure. Alternative embodiments that result from combining, integrating, and/or omitting features of the embodiment(s) are also within the scope of the disclosure. Where numerical ranges or limitations are expressly stated, such express ranges or limitations should be understood to include iterative ranges or limitations of like magnitude falling within the expressly stated ranges or limitations (e.g., from about 1 to about 10 includes, 2, 3, 4, etc.; greater than 0.10 includes 0.11, 0.12, 0.13, etc.). For example, whenever a numerical range with a lower limit, Rl, and an upper limit, Ru, is disclosed, any number falling within the range is specifically disclosed. In particular, the following numbers within the range are specifically disclosed: R=Rl+k*(Ru−Rl), wherein k is a variable ranging from 1 percent to 100 percent with a 1 percent increment, i.e., k is 1 percent, 2 percent, 3 percent, 4 percent, 7 percent, . . . , 70 percent, 71 percent, 72 percent, . . . , 97 percent, 96 percent, 97 percent, 98 percent, 99 percent, or 100 percent. Moreover, any numerical range defined by two R numbers as defined in the above is also specifically disclosed. The use of the term “about” means+10% of the subsequent number, unless otherwise stated. Use of the term “optionally” with respect to any element of a claim means that the element is required, or alternatively, the element is not required, both alternatives being within the scope of the claim. Use of broader terms such as comprises, includes, and having should be understood to provide support for narrower terms such as consisting of, consisting essentially of, and comprised substantially of. Accordingly, the scope of protection is not limited by the description set out above but is defined by the claims that follow, that scope including all equivalents of the subject matter of the claims. Each and every claim is incorporated as further disclosure into the specification and the claims are embodiment(s) of the present disclosure. The discussion of a reference in the disclosure is not an admission that it is prior art, especially any reference that has a publication date after the priority date of this application. The disclosure of all patents, patent applications, and publications cited in the disclosure are hereby incorporated by reference, to the extent that they provide exemplary, procedural, or other details supplementary to the disclosure.
While several embodiments have been provided in the present disclosure, it may be understood that the disclosed systems and methods might be embodied in many other specific forms without departing from the spirit or scope of the present disclosure. The present examples are to be considered as illustrative and not restrictive, and the intention is not to be limited to the details given herein. For example, the various elements or components may be combined or integrated in another system or certain features may be omitted, or not implemented.
In addition, techniques, systems, and methods described and illustrated in the various embodiments as discrete or separate may be combined or integrated with other systems, modules, techniques, or methods without departing from the scope of the present disclosure. Other items shown or discussed as coupled or directly coupled or communicating with each other may be indirectly coupled or communicating through some interface, device, or intermediate component whether electrically, mechanically, or otherwise. Other examples of changes, substitutions, and alterations are ascertainable by one skilled in the art and may be made without departing from the spirit and scope disclosed herein.