Modular hardware components can be removably installed into peripheral component slots of a computing device to provide the computing device with additional capabilities and/or resources. These modular hardware components can include, for instance, network interface cards (“NICs”), computer-readable media (e.g., solid state storage devices, random-access memory), universal serial bus (“USB”) cards, graphics cards, compute cards, accelerator cards, etc. Each peripheral component slot includes electrical contact(s) that electrically couple an installed modular hardware component (often referred to as a “card”) to a processor and/or other components of the computing device, e.g., over one or more shared buses provided by a peripheral component bridge. In many cases these peripheral component slots are designed and manufactured according to a standard such as the peripheral component interconnect (“PCI”) or PCI express (“PCIe”) standards, and the peripheral component bridge takes the form of a “host bridge” (for PCI) or “root complex” (for PCIe).
Features of the present disclosure are illustrated by way of example and not limited in the following figure(s), in which like numerals indicate like elements.
For simplicity and illustrative purposes, the present disclosure is described by referring mainly to an example thereof. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be readily apparent however, that the present disclosure may be practiced without limitation to these specific details. In other instances, some methods and structures have not been described in detail so as not to unnecessarily obscure the present disclosure.
Additionally, it should be understood that the elements depicted in the accompanying figures may include additional components and that some of the components described in those figures may be removed and/or modified without departing from scopes of the elements disclosed herein. It should also be understood that the elements depicted in the figures may not be drawn to scale and thus, the elements may have different sizes and/or configurations other than as shown in the figures.
Different modular hardware components may have different capabilities and/or configurations. For example, one modular hardware component may include four electrical bus lanes (a.k.a, “x4” in PCIe parlance), another modular hardware component may include eight electrical bus lanes (“x8”), and so on. Also, different modular hardware components may have different transmission speed capabilities. For example, one PCIe component may belong to the PCIe 4.0 generation, and therefore may be capable of up to 16 gigatransfers per second (GT/s) per lane. Another PCIe component may belong to the PCIe 5.0 generation, and therefore may be capable of up to 32 GT/s per lane. And so on.
In most computing devices, electrical bus lanes or “routes” between peripheral component slots and other resources of the computing device (e.g., processor) tend to be static. For example, a computing device may include a mixture of x1, x4, x8, and x16 peripheral slots. Additionally, like modular hardware components, the peripheral component slots can also have limited transmission speed capabilities. Consequently, many users may struggle installing modular hardware devices in a manner that maximizes bandwidth between one or more of the modular hardware components and other resources of the computing device.
Accordingly, examples are described herein for dynamically allocating a number of shared bus lanes to a plurality of peripheral component slots, and more particularly, to the respective pluralities of peripheral bus lanes of the plurality of peripheral component slots. This dynamic allocation may be performed with various goals in mind, such as maximizing bandwidth utilization between each of the plurality of installed modular components and a peripheral component bridge, or between a selected one of the plurality of installed modular components (e.g., a component heavily utilized by a user) and the peripheral component bridge.
In various examples, the dynamic allocation may be based on a variety of different signals or factors. In some examples, the dynamic allocation may be based on information about peripheral hardware components that are installed in peripheral component slots. This information may include, for instance, a usable range of bus lanes by the modular component or a transmission speed capability of the modular component.
A modular component's usable range is a range between a maximum number of lanes that the modular component can use at full capacity and a minimum number of lanes that the modular component can use and still operate (at least to some threshold standard). For example, a NIC may be designed to operate as an x8 device in optimal conditions and can also operate as low as an x4 device. A transmission speed capability of a modular component may represent a maximum bit transfer rate. Examples of transmission speed capabilities were described previously with regard to different generations of PCIe components.
In some examples, the dynamic allocation may be based on aspects of the peripheral component slot. For example, PCIe slots may be rated for higher transmission speeds than others, e.g., because they are closer to the PCIe root complex, resulting in less channel loss. Other examples of limited transmission speeds may be related to limitations of a given root complex, the presence of signal conditioners (e.g. redrivers or retimers), or backplane or cable limitations.
In some examples, the dynamic allocation may be based on historical usage data associated with each of the modular components installed in a computing system and/or with usage data of the computing system based on a user's unique workflow. For example, a graphic designer may rely heavily on the full processing capabilities of a graphics PCIe card, and may be less concerned that a PCIe NIC achieve peak performance (maybe there is a network bottleneck elsewhere that limits the NIC's performance regardless). These preferences may be set by the graphical designer manually, or may be detected over time (e.g., the graphic designer tends to utilize the PCIe graphics card far more frequently and heavily than the NIC). In either case, allocation of PCIe lanes may be prioritized to the graphics PCIe card over the NIC.
In some examples, the dynamical allocation may be based on a priori and/or a posteriori knowledge about performance of the modular component. A priori knowledge about the modular component may include, for instance, knowledge about how the modular component is intended to work, e.g., based on knowledge about how the component's manufacturer designed the component to operate under particular circumstances. This a priori knowledge may be obtained from, for instance, technical specifications, reverse engineering, and/or other information associated with the modular component. A posteriori knowledge, by contrast, may be obtained through observation of and/or experience with a modular component, and may include, for instance, observations about the modular component's performance, empirical data gathered about the modular component's performance, and so forth, which may in some instances be informed through performing a specific workload.
For example, a priori and/or a posteriori knowledge may reveal that a particular modular component operates in particular ways depending on how many peripheral component lanes they are allocated. Suppose there is a surplus of shared bus lanes available, and hence, a flexible range of shared bus lanes are available for allocation to the modular component. That modular component may be known to fully utilize the allocated lanes so that it can perform optimally and also have headroom for taking on transient workload influxes. However, given fewer lanes, the modular component may still be able to provide similar performance, albeit with less headroom to take on transient workflow influxes. But, in some cases these transient workload influxes may be uncommon for the component or the system in which it is installed to experience, and therefore may represent a reasonable compromise.
In
A plurality of peripheral component slots in the forms of PCIe slots 1121-N (N being a positive integer) are also provided in
In various examples, multiplexor 108 multiplexes the x16 shared bus lanes 106 to multiple different PCIe slots 112. Consequently, and as is evident in
In various examples, various types of circuitry 122 may be provided to dynamically allocate the x16 shared bus lanes 106 across multiple PCIe slots 1121-N. In some examples, circuitry 122 may be implemented using hardware, and may take the form of, for instance, a field-programmable gate array (“FPGA”) or an application-specific integrated circuit (“ASIC”).
In other examples, such as in
In some examples, circuitry 122 may, e.g., by way of an interrogation engine 124, “interrogate” or “train” each of the PCIe slots 1121-N to obtain information about a modular component 130 installed in the PCIe slot 112. As used herein, to “interrogate” a PCIe slot 112 means to interact with it electrically to ascertain information about it. In some examples the interrogation of multiple PCIe slots 112 may be accomplished by configuring multiplexor 108 to iterate through each PCIe slot 112, supplying all (x16) shared bus lanes 106 at once during each iteration. A link width (e.g. the number of PCIe lanes electrically connected) and transmission speed capability (e.g. the PCIe generation supported, for example, PCIe G3 at 8 GTps, PCIe G4 at 16 GTps, and so on) can be ascertained. In other words, the BIOS learns what link types can be supplied to the modular component 230 to maximize its bandwidth. The above process may be repeated for each PCIe slot 112.
The information about the modular component 130 learned through this interrogation may include, for instance, a usable range of bus lanes, a transmission speed capability of the modular component 130, a priori and/or a posteriori knowledge about performance of the modular component 130, and/or historical usage of the modular component 130 within a system, by a particular user, and/or across a population of users. Based in this information, circuitry 122 may, e.g., by way of an allocation engine 126, cause multiplexor 108 to dynamically allocate the number (x16 in
In
In various examples, circuitry 222 may interrogate the PCIe slots 1121-3 to obtain this information about modular components 2301-3. In some examples, when system power is first applied, a power on self-test (“POST”) is initiated. During the POST, circuitry 222 may discover and configure modular components that are removably installed in PCIe slots 1121-3. In some examples, the POST process may be controlled, for instance, by the BIOS.
In some examples, each PCIe slot 112 may be individually interrogated (or “trained”) to ascertain the capabilities of the modular component 230 installed therein. With the information gained through this interrogation, circuitry 222 may allocate the sixteen (x16) shared bus lanes 106. There are sixteen total shared bus lanes 106 available between PCIe root complex 104 and multiplexor 108, and any one of PCIe slots 1121-3 can use as many as sixteen lanes locally. Assuming the graphics card installed in third PCIe slot 1123 has a usable range from sixteen lanes (x16) to, say, eight lanes (x8), in some examples, circuitry 222 may cause the sixteen shared bus lanes 106 to be allocated as follows. Four shared lanes (x4) may be allocated to each of first PCIe slot 1121 and second PCIe slot 1122. This enables the NVMe 2301 and NIC 2302 to operate at full capacity.
Meanwhile, third PCIe slot 1123 is allocated eight lanes (x8), rather than its possible sixteen, because while likely not optimal, the graphics card 2303 may be run using as few as eight lanes (x8). Assuming the user of system 200 places higher priority on reducing storage bottlenecks locally and network communication, the resulting degradation in graphics card performance may be acceptable.
Not all peripheral component slots (e.g., 1121-3) are the same. In many examples, different peripheral component slots support different numbers of lanes locally, and/or may support different transmission speeds.
A first PCIe slot 3121 is closest physically to PCIe root complex 104. Consequently, first PCIe slot 3121 is able to support the fastest transmission speeds of all three PCIe slots 3121-3. For example, it can be assumed that first PCIe slot 3121 supports up to fifth generation PCIe speeds of 32 GT/s per lane. By contrast, second and third PCIe slots 1122-3 may only support up to fourth generation PCIe speeds of 16 GT/s per lane.
Also, different modular components are installed in PCIe slots 3121-3 than in previous examples. In
In the scenario of
The remaining twelve shared bus lanes 106 can be allocated amongst the two graphics cards 3301, 3 to optimize graphical performance. For example, the PCIe slot 3121 in which first graphics card 3301 is installed is faster (fifth generation) than the PCIe slot 3123 (fourth generation) in which second graphics card 3303 is removably installed. Accordingly, circuitry 222 allocates more lanes to third PCIe slot 3123 to make up for the difference in transmission speed capabilities. In particular, third PCIe slot 3123 is allocated eight lanes (x8), whereas first PCIe lane 3121 is allocated four lanes (x4). This may maximize the total graphics performance given the limited number (x16) of shared bus lanes 106 after four lanes are allocated to PCIe slot 3122. Both graphics cards will receive 128 GT/s of bandwidth.
As noted previously, in some examples, a priori and/or a posteriori knowledge about installed modular components' performances may be leveraged to determine how to dynamically allocated shared bus lanes among them.
Similar to
In addition, it is known that first graphics card 4301, by virtue of its being a type “A” graphics card, utilizes resources more efficiently than second graphics card 4303 (which is type “B”).” This knowledge may be retrieved from, for instance, a lookup table that includes weighted utilization scores for various modular components. In some examples, circuitry 222 may iterate through and calculate a total performance score for each different allocation permutation of graphics cards 4301 and 4303. Circuitry 222 may then configure multiplexor 108 with the permutation that resulted in the greatest performance score. For example, in
In addition to or instead of transmission speed capabilities, usable ranges of lanes, a priori and/or a posteriori knowledge about modular components' performances, in some examples, historical usage of modular components in a particular computer system, by a particular user workload, and/or across a population of user workloads may be considered. In
At block 502, the system may interrogate each of a plurality of peripheral slots to obtain information about a modular component installed in the peripheral component slot. As noted previously, this interrogation may occur in some examples during POST. In various examples, this information about the modular component may include a usable range of bus lanes and/or a transmission speed capability.
At block 504, the system may obtain a priori and/or a posteriori knowledge about the installed modular components' performances. For example, circuitry 122/222 may consult a lookup table that includes weighted utilization scores for at least some of the installed modular components. From these weighted utilization scores, the system may calculate total performance scores for a variety of different permutations and/or combinations of the installed modular components.
At block 506, the system may obtain historical usage information about the installed modular components, the computing system in which they are installed, and/or the user that operates the computing system. For example, resource utilization may be tracked over time for a computing system using an initial allocation of shared bus lanes. Eventually it may become clear that in the system in question, some modular components are relied on more extensively than others. In some such cases (and where permissible), the more relied-upon modular components may be allocated additional shared bus lanes, e.g., at the expense of lesser-relied-upon modular components.
Similar historical usage information may be applied on a user-by-user basis as well. For example, suppose a particular workstation with a particular combination of modular components installed is used by two users, a robot technician and data scientist. The robot technician may more heavily utilize network bandwidth in order to interact with robot(s), whereas the data scientist may rely upon graphics cards (“GPUs”) to perform artificial intelligence-based computation. Accordingly, when the robot technician is logged in, dynamic allocation may favor a NIC modular component. When the data scientist is logged in, dynamic allocation may favor a graphics card, particularly one that is known (e.g., from a priori and/or a posteriori knowledge obtained at block 504) to be well-suited for artificial intelligence (e.g., “deep learning”) calculations.
At block 508, the system may obtain user preference information. User preference information may include explicitly-defined user preferences for how shared bus lanes should be allocated amongst a plurality of peripheral component slots. In some examples, user preference information may be the strongest signal as to how shared bus lanes should be dynamically allocated.
At block 510, the system may cause multiplexor 108 to dynamically allocate a number of shared bus lanes 106 to the respective pluralities 110 of peripheral bus lanes of the peripheral component slots. This dynamic allocation may be based on, for instance, the usable ranges and transmission speed capabilities of the installed modular components obtained at block 502, the a priori and/or a posteriori knowledge about the modular components obtained at block 504, the historical usage information obtained at block 506, and/or the user preference information obtained at block 508. Alternatively, in some examples, the system may generate audible and/or visual output that is provided to a user to recommend particular allocations. The user can then accept or reject these recommendations.
In some examples, shared bus lanes (106) may be dynamically allocated using other parameters as well. In some examples, shared bus lanes may be allocated on a first-come-first-served basis. Circuitry 112/222 may interrogate each slot as described previously. The first interrogated slot may end up with a certain number of lanes allocated (e.g., via multiplexor 108), and then circuitry 112/222 may move on to interrogating the next slot with the remaining budget of shared bus lanes. This may continue until all slots have been interrogated, or until the shared bus lanes have been allocated.
In some examples, circuitry 122/222 can identify when modular components are not installed in slots, and can optimize shared bus lane utilization accordingly. For instance, there may be both slot(s) that share PCIe resources from multiplexor 108 and slots that have statically assigned resources. When a modular component is installed in a slot that has a static resource allocation, but there is a slot with shared resources that could provide better performance, circuitry 122/222 may alert a user of a potential configuration change.
Instruction(s) 602 causes processor 102 to interrogate each PCIe slot of a plurality of PCIe slots for information about a PCIe component removably installed in the PCIe slot. In various examples, the information about the PCIe component includes a usable range of bus lanes and a transmission speed capability.
Instruction(s) 604 causes processor 102 to dynamically allocate a number of shared PCIe lanes of a PCIe root complex across the plurality of PCIe slots based on transmission speed capabilities or historical usage of the installed modular components. In various examples, the number of shared PCIe lanes are dynamically allocated by re-rerouting a multiplexor.
In various examples, a system such as a PCA or computing device may include: a peripheral component bridge that provides a number of shared bus lanes; a multiplexor operably coupled with the bridge via the shared bus lanes; a plurality of peripheral component slots, each operably coupled with the multiplexor via a respective plurality of peripheral bus lanes, wherein the multiplexor multiplexes the shared bus lanes to multiple different peripheral component slots; and circuitry to: interrogate each of the peripheral slots to obtain information about a modular component installed in the peripheral component slot, wherein the information about the modular component includes a usable range of bus lanes and a transmission speed capability; and cause the multiplexor to dynamically allocate the number of shared bus lanes to the respective pluralities of peripheral bus lanes of the peripheral component slots based on the usable ranges and transmission speed capabilities of the installed modular components.
In some examples, the peripheral component bridge, multiplexor, and at least some of the peripheral component slots are disposed on a printed circuit assembly. In some examples, the peripheral component bridge comprises a PCIe root complex, and the plurality of peripheral component slots comprise PCIe slots. In some examples, the circuitry is to cause the multiplexor to dynamically allocate the number of shared bus lanes further based on respective transmission speed capabilities of the plurality of peripheral component slots.
In some examples, the circuitry is to cause the multiplexor to dynamically allocate the number of shared bus lanes further based on historical usage data associated with each of the modular components. In some examples, the circuitry is to cause the multiplexor to dynamically allocate the number of shared bus lanes further based on respective roles of each of the modular components.
In some examples, a goal of the dynamic allocation is to maximize bandwidth utilization between each of the plurality of installed modular components and the peripheral component bridge. In some examples, a goal of the dynamic allocation is to maximize bandwidth utilization between a selected one of the plurality of installed modular components and the peripheral component bridge. In some examples, to interrogate a given peripheral slot for information, the circuitry is to temporarily allocate all of the shared bus lanes to the given peripheral slot and apply power to the temporarily-allocated shared lanes to count a number of electrical connections made with the modular component installed in the given peripheral slot.
Additionally, in some examples, an apparatus or system may include: a number of communal peripheral component interface express (“PCIe”) lanes; a plurality of PCIe slots that each includes a plurality of local PCIe lanes, wherein a total number of local PCIe lanes across the plurality of PCIe slots exceeds the number of communal PCIe lanes; and routing circuitry to dynamically allocate the communal PCIe lanes amongst the plurality of PCIe slots based on historical usage data associated with hardware components removably inserted into the plurality of PCIe slots. In some examples, the routing circuitry is to dynamically allocate the communal PCIe lanes amongst the plurality of PCIe slots further based on respective transmission speed capabilities of the hardware components removably inserted into the plurality of PCIe slots.
In some examples, the routing circuitry is to dynamically allocate the communal PCIe lanes amongst the plurality of PCIe slots further based on respective ranges of local PCIe lanes usable by the hardware components removably inserted into the plurality of PCIe slots. In some examples, the routing circuitry includes a processor and a multiplexor.
In another aspect, a computer-readable (transitory or non-transitory) medium may include instructions that, in response to execution of the instructions by a processor, cause the processor to: interrogate each peripheral component interface express (“PCIe”) slot of a plurality of PCIe slots for information about a PCIe component removably installed in the PCIe slot, wherein the information about the PCIe component includes a usable range of bus lanes and a transmission speed capability; and dynamically allocate a number of shared PCIe lanes of a PCIe root complex across the plurality of PCIe slots based on transmission speed capabilities or historical usage of the installed modular components. In some such examples, the number of shared PCIe lanes are dynamically allocated by re-rerouting a multiplexor.
Although described specifically throughout the entirety of the instant disclosure, representative examples of the present disclosure have utility over a wide range of applications, and the above discussion is not intended and should not be construed to be limiting, but is offered as an illustrative discussion of aspects of the disclosure.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2020/018569 | 2/18/2020 | WO |