Embodiments of the present disclosure relate to the field of clock control, and in particular, to a clock architecture and a processing assembly.
Currently, in order to increase the computing speed of a system, high-speed computing assemblies emerge as required, and each computing module in the high-speed computing assembly may achieve independent computing and execute/run a task, thereby increasing the completion speed of a computing task. However, in the high-speed computing assembly, communication between different modules has a certain frequency synchronization requirement, and when the phase deviation between communication frequencies is too large, a correctable error and/or an uncorrectable error may occur in a communication process.
Thus, the setting of communication frequencies in the high-speed computing assembly is relatively harsh, and once a frequency topology structure is fixed, the structure is no longer expanded; and a topology structure and computing power of computing modules of the high-speed computing assembly are also limited, such that the frequency may not be flexibly adjusted in the high-speed computing assembly, and the computing power of the entire computing assembly is in an undesirable state.
Aiming at the described technical problems existing in the related art, no effective solution has been proposed by a person skilled in the art.
In view of this, an object of embodiments of the present disclosure is to provide a clock architecture and a processing assembly which are more flexible and may provide higher computing power support. The solution is as follows:
Optionally, the external clock signal of the clock module in the highest clock module layer is provided by a host server.
Optionally, an output terminal of each clock buffer circuit is connected to a next-stage module one by one, and the next-stage module includes a non-clock module and/or the clock module at a next clock module layer.
Optionally, when the next-stage module is the clock module at the next clock module layer, the output terminal of the corresponding clock buffer circuit is connected to the second input terminal of the clock module at the next clock module layer.
Optionally, each clock module further includes:
Optionally, the clock architecture further includes a hub;
Optionally, the non-clock module includes a computing module and/or a communication module and/or a storage module, and each computing module is respectively connected to one output terminal of the clock buffer circuit.
Optionally, the computing module includes an Field Programmable Gate Array (FPGA) circuit, and/or a Complex Programmable Logic Device (CPLD) circuit, and/or a Graphics Processing Unit (GPU) circuit;
Optionally, the communication module includes: a communication unit and/or a communication card slot, and a clock terminal of the communication module is independently connected to one output terminal of the clock buffer circuit.
Optionally, when the next-stage module is the clock module at the next clock module layer, the output terminal of the corresponding clock buffer circuit is connected to the second input terminal of the clock module at the next clock module layer via one communication card slot.
Optionally, a maximum allowable number of layers of clock module layers in the clock architecture is determined by the maximum clock jitter limit.
Optionally, the process that the maximum allowable number of layers of clock module layers is determined by the maximum clock jitter limit, includes:
Optionally, the process that the maximum allowable number of layers in the clock architecture is determined according to the jitter value and the maximum clock jitter limit, includes:
Optionally, the process that the jitter value of the clock link is calculated according to the jitter value of each element of the current clock architecture, includes:
Optionally, a General Purpose Input/Output (GPIO) terminal of the BMC circuit is connected to the enable terminal of the selection switch circuit, and the GPIO terminal is configured to send the enable signal to the enable terminal.
Optionally, the process of enabling, all the output terminals to output the local clock signal or enabling all the output terminals to output the external clock signal, according to the enable signal, includes:
Optionally, the storage circuit includes a memory bank and a storage hard disk.
Optionally, the maximum clock jitter limit is determined according to a communication protocol used.
Correspondingly, some embodiments of the present disclosure further disclose a processing assembly, including:
Optionally, the processing assembly is a high-speed computing module, and clocks of all units in the high-speed computing module are correspondingly provided by the clock architecture.
Embodiments of the present disclosure disclose a clock architecture; the selection switch circuit in each clock module may select the local clock signal or the external clock signal as an output clock, such that regulation and control of clock in a processing assembly using the clock architecture, such as a high-speed computing assembly, is more flexible; and the characteristics of the clock architecture being scalable and the clock being selectable provide a reliable basis for improving the accurate operation of the processing assembly.
In order to describe the technical solutions in embodiments of the present disclosure or in the related art more clearly, hereinafter, accompanying drawings requiring to be used in the embodiments or the related art will be introduced briefly. Apparently, the accompanying drawings in the following description merely relate to embodiments of the present disclosure, and for a person of ordinary skill in the art, other accompanying drawings may also be obtained according to the provided accompanying drawings without involving any inventive effort.
Hereinafter, the technical solutions in embodiments of the present disclosure will be described clearly and completely with reference to the accompanying drawings of the embodiments of the present disclosure. Obviously, the embodiments as described are only some of the embodiments of the present disclosure, and are not all of the embodiments of the present disclosure. All other embodiments obtained by a person of ordinary skill in the art on the basis of the embodiments of the present disclosure without involving any inventive effort shall all fall within the scope of protection of the embodiments of the present disclosure.
The setting of communication frequencies in a high-speed computing assembly is relatively harsh, and once a frequency topology structure is fixed, the structure is no longer expanded; and a topology structure and computing power of computing modules of the high-speed computing assembly are also limited, such that the frequency may not be flexibly adjusted in the high-speed computing assembly, and the computing power of the entire computing assembly is in an undesirable state.
Embodiments of the present disclosure disclose a clock architecture; a selection switch circuit in each clock module may select a local clock signal or an external clock signal as an output clock, such that regulation and control of clock in a processing assembly using the clock architecture, such as a high-speed computing assembly, is more flexible; and the characteristics of the clock architecture being scalable and the clock being selectable provide a reliable basis for improving the accurate operation of the processing assembly.
Embodiments of the present disclosure disclose a clock architecture. The clock architecture includes one or more clock module layers; wherein each clock module layer includes one or more clock modules M. Refer to
It may be understood that the external clock signal clk_h of the clock module M in the highest clock module layer is provided by a host server.
It may be understood that an output terminal of each clock buffer circuit clk buffer is connected to a next-stage module one by one, and the next-stage module includes a non-clock module and/or a clock module M at a next clock module layer. Optionally, when the next-stage module is the clock module M at the next clock module layer, the output terminal of the corresponding clock buffer circuit clk buffer is connected to the second input terminal of the clock module M at the next clock module layer.
Optionally, the clock module M at each layer further includes: a Baseboard Management Controller (BMC) circuit, configured to be connected to the enable terminal of the selection switch circuit MUX and generate the enable signal. It may be understood that usually, a GPIO terminal of the BMC circuit is connected to the enable terminal SEL pin of the MUX, and sends the enable signal to the enable terminal SEL pin.
It may be understood that the two input terminals of the selection switch circuit MUX receive two different clocks: the local clock signal clk_m and the external clock signal clk_h; according to the characteristics of the selection switch circuit MUX, all the output terminals of the selection switch circuit MUX output the same output clock; and according to a relationship between a level of the enable signal and configuration, all the output terminals of the selection switch circuit MUX may simultaneously output the local clock signal clk_m, or all the output terminals of the selection switch circuit MUX may simultaneously output the external clock signal clk_h. By selecting the output of the selection switch circuit MUX in the current clock module M, a corresponding clock is provided for the next-stage module in the current clock module M, so as to ensure that the next-stage module operates according to the clock.
It may be understood that the non-clock module includes a computing module and/or a communication module and/or a storage module, and each computing module is respectively connected to one output terminal of the clock buffer circuit clk buffer.
It may be understood that detailed setting of the non-clock module may be adjusted according to the actual type of a processing assembly to which the clock architecture is applied. Hereinafter, description is made in detail by taking the processing assembly being a high-speed computing assembly as an example:
In some optional embodiments, the computing module includes an Field-Programmable Gate Array (FPGA) circuit, and/or a Complex Programmable Logic Device (CPLD) circuit, and/or a Graphics Processing Unit (GPU) circuit; the computing module further includes a storage circuit, the storage circuit being connected to the FPGA circuit or the CPLD circuit or the GPU circuit. It may be understood that generally, the storage circuit and the FPGA circuit may form one computing unit, i.e. a Computing Module, and a plurality of computing units may form one high-speed computing assembly; clocks of all the units in the high-speed computing assembly are correspondingly provided by the clock architecture in the present embodiment. As the clock supply of the clock architecture in the present embodiment is flexible and the architecture is scalable, clock support may be provided for computing modules with higher computing power. The actual type of the computing module depends on the internal structure of the high-speed computing module to be served by the clock architecture.
Optionally, the storage circuit includes a memory bank and a storage hard disk, wherein the memory bank may be selected as Dual Inline Memory Modules (DIMMs), and the storage hard disk may be selected from an Solid State Disk (SSD) or other forms of storage hard disks. Similarly, the actual type of storage circuit depends on the internal structure of the high-speed computing module to be served by the clock architecture.
Optionally, the communication module includes: a communication unit and/or a communication card slot, and a clock terminal of the communication module is independently connected to one output terminal of the clock buffer circuit clk buffer. It may be understood that the communication unit and the communication card slot may be determined according to a communication protocol; a PCIe protocol (peripheral component interconnect express, a high-speed serial computer expansion bus standard) is usually selected. Correspondingly, the communication unit includes but is not limited to a PCIe switch, and the communication card slot includes a PCIe slot.
Taking the single-layer clock module M shown in
Optionally, in
Similarly, in
Similarly, in
Similarly, in
It may be understood that the next-stage module of each clock module M is in an actual form of a non-clock module, which may be determined according to the internal structure of the high-speed computing module to be served by the clock architecture; and when the next-stage module of the clock module M is a clock module M at a next clock module layer, adjacent clock modules M are connected in series. Optionally, each clock module M has an independent local clock signal clk_m generated by an internal local clock generator clk gen and an external clock signal clk_h; the external clock signal clk_h of the clock module M in the highest clock module layer is provided by the host server, and the external clock signals clk_h of the clock modules M of other clock module layers are provided by the clock modules M of previous layers; one output terminal of the selection switch circuit MUX in the clock module M of the previous layer is connected to an input terminal of one clock buffer circuit clk buffer, and an output terminal of the clock buffer circuit clk buffer is connected to a second input terminal of the clock module M of another clock module layer, and sends the external clock signal clk_h to the clock module M of the another clock module layer.
It may be understood that when the next-stage module is the clock module M at the next clock module layer, the output terminal of the corresponding clock buffer circuit clk buffer is connected to the second input terminal of the clock module M at the next clock module layer via one communication card slot.
As shown in
It may be understood that in the PCIe standard description, one PCIe channel includes two terminals for sending and receiving, and the total PCIe connection data bandwidth may be extended by adding an additional channel, and the flexibility thereof makes PCIe ubiquitous in applications such as servers, network attached storage, network switches, routers, and TV set-top boxes, etc. The strict timing computing of these applications themselves and the challenges of system design impose stringent performance requirements on PCIe frequencies. Generally, PCIe specifies a 100 MHz external reference frequency, i.e. Refclk, which has an accuracy within +300 ppm and is set to coordinate data transmission between two PCIe devices. The PCIe standard supports three ranges of frequency allocation schemes: a common frequency, a data frequency, and a separate clock architecture. All frequency schemes require a frequency precision of +300 ppm.
Optionally, a common clock architecture (Common Clock) is as shown in
Optionally, a separate clock architecture (Separate Reference Clock) is as shown in
It may be understood that PCIe connection is configured to transfer large amounts of data from a transmitter to the receiver, and ensures a high success rate of data transmission. In order to achieve this, the data transferred by the transmitter in a bit center or adjacent bits must be sampled by the receiver, and a frequency/frequency data recovery (Clock/Data Recovery block, CDR) in the receiver will generate a frequency, and the data is periodically sampled to a latch. In this process, various phase jitter sources cause a fluctuation of a sample time sequence. As the sample position deviates from an ideal position, the Bit Error rate increases, thereby causing a correctable error or an uncorrectable error when PCIe is in operation.
Correspondingly, in this embodiment, clocks in the clock architecture are optional, and not only a common clock architecture may be selected to be supported to provide clocks for the high-speed computing assembly, but also a separate clock architecture may be selected to be supported to provide clocks for the high-speed computing assembly. The clock architecture supports automatic switching between the two clock architectures, and also supports a spread spectrum frequency (SSC) and clock jitter budget control.
Optionally, a maximum allowable number of layers of clock module layers in the clock architecture is determined by the maximum clock jitter limit. Generally, the maximum clock jitter limit is determined according to a communication protocol used, and different clock jitter limits may be specified for different PCIe protocols by using a PCI sig protocol, as shown in Table 1 below:
Optionally, in the clock architecture, the calculation of the clock jitter uses element jitter as a calculation parameter, and the jitter value of the clock link with the longest communication path serves as the clock jitter value of the current clock architecture. Optionally, the process that the maximum allowable number of layers of clock module layers is determined by the maximum clock jitter limit is as shown in
In some optional embodiments, the process that the maximum allowable number of layers in the clock architecture is determined according to the jitter value and the maximum clock jitter limit, includes:
In some optional embodiments, the process that the jitter value of the clock link is calculated according to the jitter value of each element of the current clock architecture, includes:
Optionally, taking
Taking
Optionally, the selected model in
Optionally, for applying the selected model in
It may be understood that the maximum allowable number of layers of the clock architecture herein does not represent the number of all clock modules M in the clock architecture, but refers to the number of clock module layers in the clock architecture and corresponds to the number of clock modules M in the longest communication link; for example, M2 and M2-1 in
In some optional embodiments, the BMC circuits may also communicate with the host server; refer to
Embodiments of the present disclosure disclose a clock architecture; the selection switch circuit in each clock module may select the local clock signal or the external clock signal as an output clock, such that regulation and control of clock in a processing assembly using the clock architecture, such as a high-speed computing assembly, is more flexible; and the characteristics of the clock architecture being scalable and the clock being selectable provide a reliable basis for improving the accurate operation of the processing assembly.
Correspondingly, embodiments of the present disclosure further disclose a processing assembly, including:
Optionally, the clock architecture in the processing assembly includes one or more clock module layers; wherein each clock module layer includes one or more clock modules M. Refer to
It may be understood that the external clock signal clk_h of the clock module M in the highest clock module layer is provided by a host server.
It may be understood that an output terminal of each clock buffer circuit clk buffer is connected to a next-stage module one by one, and the next-stage module includes a non-clock module and/or a clock module M at a next clock module layer. Optionally, when the next-stage module is the clock module M at the next clock module layer, the output terminal of the corresponding clock buffer circuit clk buffer is connected to the second input terminal of the clock module M at the next clock module layer.
Optionally, the clock module M at each layer further includes: a BMC circuit, configured to be connected to the enable terminal of the selection switch circuit MUX and generate the enable signal. It may be understood that usually, a General Purpose Input/Output (GPIO) terminal of the BMC circuit is connected to the enable terminal SEL pin of the MUX, and sends the enable signal to the enable terminal SEL pin.
It may be understood that the two input terminals of the selection switch circuit MUX receive two different clocks: the local clock signal clk_m and the external clock signal clk_h; according to the characteristics of the selection switch circuit MUX, all the output terminals of the selection switch circuit MUX output the same output clock; and according to a relationship between a level of the enable signal and configuration, all the output terminals of the selection switch circuit MUX may simultaneously output the local clock signal clk_m, or all the output terminals of the selection switch circuit MUX may simultaneously output the external clock signal clk_h. By selecting the output of the selection switch circuit MUX in the current clock module M, a corresponding clock is provided for the next-stage module in the current clock module M, so as to ensure that the next-stage module operates according to the clock.
It may be understood that the non-clock module includes a computing module and/or a communication module and/or a storage module, and each computing module is respectively connected to one output terminal of the clock buffer circuit clk buffer.
It may be understood that setting of the non-clock module may be adjusted according to the type of a processing assembly to which the clock architecture is applied. Hereinafter, description is made by taking the processing assembly being a high-speed computing assembly as an example:
In some optional embodiments, the computing module includes an FPGA circuit, and/or a CPLD circuit, and/or a GPU circuit; the computing module further includes a storage circuit, the storage circuit being connected to the FPGA circuit or the CPLD circuit or the GPU circuit. It may be understood that generally, the storage circuit and the FPGA circuit may form one computing unit, i.e. a Computing Module, and a plurality of computing units may form one high-speed computing assembly; clocks of all the units in the high-speed computing assembly are correspondingly provided by the clock architecture in the present embodiment. As the clock supply of the clock architecture in the present embodiment is flexible and the architecture is scalable, clock support may be provided for computing modules with higher computing power. The type of the computing module depends on the internal structure of the high-speed computing module to be served by the clock architecture.
Optionally, the storage circuit includes a memory bank and a storage hard disk, wherein the memory bank may be selected as Dual Inline Memory Modules (DIMMs), and the storage hard disk may be selected from an SSD or other forms of storage hard disks. Similarly, the type of storage circuit depends on the internal structure of the high-speed computing module to be served by the clock architecture.
Optionally, the communication module includes: a communication unit and/or a communication card slot, and a clock terminal of the communication module is independently connected to one output terminal of the clock buffer circuit clk buffer. It may be understood that the communication unit and the communication card slot may be determined according to a communication protocol; a PCIe protocol is usually selected. Correspondingly, the communication unit includes but is not limited to a PCIe switch and the communication card slot includes a PCIe slot.
Taking the single-layer clock module M shown in
Optionally, in
Similarly, in
Similarly, in
Similarly, in
It may be understood that the next-stage module of each clock module M is in a form of a non-clock module, which may be determined according to the internal structure of the high-speed computing module to be served by the clock architecture; and when the next-stage module of the clock module M is a clock module M at a next clock module layer, adjacent clock modules M are connected in series. Optionally, each clock module M has an independent local clock signal clk_m generated by an internal local clock generator clk gen and an external clock signal clk_h; the external clock signal clk_h of the clock module M in the highest clock module layer is provided by the host server, and the external clock signals clk_h of the clock modules M of other clock module layers are provided by the clock modules M of previous layers; one output terminal of the selection switch circuit MUX in the clock module M of the previous layer is connected to an input terminal of one clock buffer circuit clk buffer, and an output terminal of the clock buffer circuit clk buffer is connected to a second input terminal of the clock module M of another clock module layer, and sends the external clock signal clk_h to the clock module M of the another clock module layer.
It may be understood that when the next-stage module is the clock module M at the next clock module layer, the output terminal of the corresponding clock buffer circuit clk buffer is connected to the second input terminal of the clock module M at the next clock module layer via one communication card slot.
As shown in
It may be understood that PCIe connection is configured to transfer large amounts of data from a transmitter to the receiver, and ensures a high success rate of data transmission. In order to achieve this, the data transferred by the transmitter in a bit center or adjacent bits must be sampled by the receiver, and a frequency/frequency data recovery (Clock/Data Recovery block, CDR) in the receiver will generate a frequency, and the data is periodically sampled to a latch. In this process, various phase jitter sources cause a fluctuation of a sample time sequence. As the sample position deviates from an ideal position, the Bit Error rate increases, thereby causing a correctable error or an uncorrectable error when PCIe is in operation.
Correspondingly, in this embodiment, clocks in the clock architecture are optional, and not only a common clock architecture may be selected to be supported to provide clocks for the high-speed computing assembly, but also a separate clock architecture may be selected to be supported to provide clocks for the high-speed computing assembly. The clock architecture supports automatic switching between the two clock architectures, and also supports a spread spectrum frequency (SSC) and clock jitter budget control.
Optionally, a maximum allowable number of layers of clock module layers in the clock architecture is determined by the maximum clock jitter limit. Generally, the maximum clock jitter limit is determined according to a communication protocol used, and different clock jitter limits may be specified for different PCIe protocols by using a PCI sig protocol, as shown in Table 1.
Optionally, in the clock architecture, the calculation of the clock jitter uses element jitter as a calculation parameter, and the jitter value of the clock link with the longest communication path serves as the clock jitter value of the current clock architecture. Optionally, the process that the maximum allowable number of layers of clock module layers is determined by the maximum clock jitter limit is as shown in
In some optional embodiments, the process that the maximum allowable number of layers in the clock architecture is determined according to the jitter value and the maximum clock jitter limit, includes:
In some optional embodiments, the process that the jitter value of the clock link is calculated according to the jitter value of each element of the current clock architecture, includes:
Optionally, taking
Taking
Optionally, the selected model in
Optionally, for applying the selected model in
It may be understood that the maximum allowable number of layers of the clock architecture herein does not represent the number of all clock modules M in the clock architecture, but refers to the number of clock module layers in the clock architecture and corresponds to the number of clock modules M in the longest communication link; for example, M2 and M2-1 in
In some optional embodiments, the BMC circuits may also communicate with the host server; refer to
In the clock architecture of embodiments of the present disclosure, the selection switch circuit in each clock module may select the local clock signal or the external clock signal as an output clock, such that regulation and control of clock in a processing assembly using the clock architecture, such as a high-speed computing assembly, is more flexible; and the characteristics of the clock architecture being scalable and the clock being selectable provide a reliable basis for improving the accurate operation of the processing assembly.
Finally, it should also be noted that in the present text, relational terms such as first and second, etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply any actual relationship or sequence between these entities or operations. Furthermore, the terms “include”, “including”, or any other variations thereof are intended to cover a non-exclusive inclusion, so that a process, a method, an article, or a device that includes a series of elements not only includes those elements, but also includes other elements that are not explicitly listed, or further includes inherent elements of the process, the method, the article, or the device. Without further limitation, an element defined by a sentence “including a . . . ” does not exclude other same elements existing in the process, the method, the article, or the device that includes the element.
Hereinabove, the clock architecture and the processing assembly provided in the embodiments of the present disclosure have been described in detail. The principle of embodiments of the present disclosure and the embodiments have been described herein by applying optional examples, and the illustration of the embodiments above is only used to help understand the method and core ideas of embodiments of the present disclosure; meanwhile, a person of ordinary skill in the art may make modifications to the optional embodiments and application ranges according to the idea of embodiments of the present disclosure. In conclusion, the content of the present description shall not be construed as limitation to the embodiments of the present disclosure.
| Number | Date | Country | Kind |
|---|---|---|---|
| 202211518351.2 | Nov 2022 | CN | national |
The present application is a National Stage Application of PCT International Application No.: PCT/CN2023/093323 filed on May 10, 2023, which claims priority to Chinese Patent Application 202211518351.2, filed in the China National Intellectual Property Administration on Nov. 30, 2022, the disclosure of which is incorporated herein by reference in its entirety.
| Filing Document | Filing Date | Country | Kind |
|---|---|---|---|
| PCT/CN2023/093323 | 5/10/2023 | WO |