As data center computing environments strive to increase the amount of inbound request traffic that a data center can respond to, system managers are increasingly focused on efficiently delivering the inbound request traffic to its respective destination(s) within the data center.
HTTP/2 is a communication protocol that supports multiple, separate “request/response” sessions (“streams”) over a same network connection (e.g., a Layer 4 Transmission Control Protocol (TCP) connection) between two endpoints.
Referring to
An HTTP/2 instance 106 is located between the client's TCP socket 104 and the client applications C1-C5. Likewise, on the data center side, a corresponding HTTP/2 instance 107 is located between the data center's TCP socket 105 and the data center services S1-S5 that the client side applications C1-C5 are engaged with. The HTTP/2 instances 106, 107 execute the specific protocol actions that are specific to HTTP/2 upon the packets that are sent over connection 103.
The HTTP/2 instances 106, 107 present respective application program interfaces (APIs) to the applications (Cx) and services (Sx). The applications/services pass outbound header and payload content through the APIs. In response, the HTTP/2 instances 106, 107 form a respective outbound message that is submitted to its corresponding TCP instance for transmission over the connection 103. Likewise, in the receive direction, the TCP instances 104, 105 pass HTTP/2 messages to their corresponding HTTP/2 instances 106, 107. The HTTP/2 instances 106, 107 extract the header and payload content from the messages and pass them to their corresponding application/service.
A request message from any client application (e.g., C1) typically includes a specific command (e.g., “GET”) that the client application C1, through the request message, asks a data center service (e.g., S1) to perform. Upon receipt of the message, the data center service S1 performs the action specified by the command. The performance of the action by the data center service S1 typically generates a response (e.g., a specific item of data that the client application C1 asked the service S1 to GET). The data center service S1 then sends the response to the requesting client application C1.
The header of a request message typically includes the requested action (e.g., GET, PUT) along with other information that pertains to the message (e.g., the target of the message, a timestamp, the size of the message and/or payload, etc.), while, the payload of the request message typically includes an operand pertaining to the requested action (e.g., an address or other information that identifies the information that is to be retrieved in response to a GET request, the data to be PUT on a server, etc.).
The header of a response message typically includes information that identifies the message as a response message (e.g., POST) along with other information that pertains to the message (e.g., the target of the message, a timestamp, the size of the message and/or payload, etc.), while the payload of a response message typically includes the information that was requested in an earlier request message (e.g., the data requested by an earlier GET message).
If the client device 101 is executing different applications C1-C5 that are concurrently engaged with different, respective data center services S1-S5, the HTTP/2 instances 106, 107 allow different logical sessions (“streams”) to concurrently exist over the same TCP connection 103 between the engaged application/service pairs. That is, individual request/response cycles can be carried out between: 1) C1 and S1 (“stream 1”); 2) C2 and S2 (“stream 2”); 3) C3 and S3 (“stream 3”); 4) C4 and S4 (“stream 4”); 5) C5 and S5 (“stream 5”).
In order to enhance the throughput of all streams through the same TCP connection 103, both the client-side and data center-side HTTP/2 instances 106, 107 break down their respective outbound messages into smaller frames.
Referring to
Although a lengthy, textual list of information can exist in an HTTP/2 message header, an HTTP/2 instance will compress the list to reduce the size/footprint of the header information for the outbound message.
The HTTP/2 instance also breaks down the message payload into one or more DATA frames 123 until all of the payload has been formatted into frames. The frames are then presented to the HTTP/2 instance's corresponding TCP socket in order (initial HEADERS frame, followed by one or more CONTINUATION frames (if any), followed by one or more DATA frames).
For inbound message, an HTTP/2 instance will perform the reverse of the above described process including the decompression of the header content that was received in the HEADERS 121 frame and any CONTINUATION frames 122.
Notably, if multiple request messages are concurrently generated by the multiple client applications C1-C5, the client-side HTTP/2 instance 106 breaks down the multiple request messages into their constituent frames and then multiplexes the frames into the respective payload of one or more packets that are provided to the TCP instance 104 for transmission over connection 103. Thus, one or more packets are sent over connection 103 whose payload contains respective frames from multiple request messages (multiple streams).
Likewise, if multiple response messages are concurrently generated by the data center applications S1-S5, the data center HTTP/2 instance 107 breaks down the multiple response messages into their constituent frames and multiplexes the frames into the respective payload of one or more packets that are sent to the client 101 over the TCP connection 103. Thus, one or more packets are sent over connection 103 whose payload contains respective frames from multiple response messages (multiple streams).
A challenge is the reassembling of received frames back to their original message when large numbers of streams are concurrently in flight over the same connection 103 (the HTTP/2 specification recommends a maximum number of streams per connection configuration setting of no less than 100 streams).
In a simplistic approach, in the inbound direction, an HTTP/2 instance will simply queue in memory all received frames for all inbound messages all frames for any particular message have been completely received. After all frames for a message have been received, the message's header content is decompressed. The decompressed header information and message payload are then sent to the application/service that is the target of the message.
Unfortunately, if large numbers of concurrently existing streams are allowed, there can correspondingly be large numbers of messages that are in-flight at any moment in time. In that case, large amounts of memory will be needed to queue all of the frames for all of the messages.
Here, a targeted application/service expects to receive the substantive content of any inbound message that is directed to it. As such, the targeted/application should be designed with enough memory space to hold, e.g., the complete HTTP/2 message. By passing the payload content of the frames to each targeted application/service, e.g., piecemeal, the targeted application/service is essentially responsible for an inbound message's reassembly from its constituent frames.
Importantly, substantial memory efficiency is achieved because the HTTP/2 instance does not require an amount of memory space necessary to keep all respective inbound frames of all in-flight messages until completion. Instead, such memory space is spread across the memory allocated to the absolute end-points (the targeted application/services).
As observed in
The HTTP/2 instance then begins to process the series of frames in the same order in which they were arranged in the packet payload (alternatively, frames belonging to a same stream can be batch processed so long as the frames within the same stream are processed in the order in which they were received). Here, referring to
As such, as observed in
Handling the received frames 213 in order, referring to
As described above, the message header content within the payload of a HEADERS frames is compressed and (typically) identifies the intended endpoint target for the message (in the instant example, the intended endpoint for the S3 message is client application C3). As such, the HTTP/2 instance decompresses the payload of the HEADERS frame 213-1 and sends 218 the decompressed frame payload to the C3 endpoint (if the target endpoint is not identified in the payload of the HEADERS frame 213-1, the payload can be locally stored as a form of intermediate content until the target is identified from the processing of a subsequent CONT frame for the S3 message).
As will become more clear in the discussions below, the S3 meta data record 215 can be used for a number of supporting functions used to process the inbound sequence of frames that are carrying the S3 message, such as, monitoring the state of the S3 message's frame reception sequence; handling intermediate frame payload content across packet boundaries (described further below); recording the intended endpoint target for the S3 message, etc. Here, with the HTTP/2 instance processing the payload of the HEADERS frame 213-1 which carries header information for the S3 message, the HTTP/2 instance can glean pertinent information about the S3 message which can be used, e.g., to monitor the state of the S3 message's frame reception sequence (e.g., the size of the S3 message header, the size of the S3 message payload, the size of the S3 message, etc.). For instance, based on the size of the S3 message header, the HTTP/2 instance can determine when the S3 message header has been fully received.
In the specific example of the packet 212 of
For the sake of initial explanation, there is no intermediate content is any of the frames 213 in the packet 212 of
Referring to
With the answer to intermediate content inquiries 220 being “no”, the HTTP/2 instance decompresses the payload of the CONT frame 213-2 and sends 223 to the C3 endpoint. As a follow-up procedure to the look-up 219 into the S3 meta data 215, the HTPP/2 instance can update (“write-back”) the meta data 215 to include progress information on the processing of the S3 message. For instance, the S3 meta data 215 can be updated to indicate that a first CONT frame has just been received for the S3 message, that a first DATA frame is next expected for the S3 message (e.g., based on the message header size and the size of the respective payloads of frames 213-1, 213-2, etc.).
Referring to
Referring to
With the answer to inquiries 231_1, 231_2 being “no”, the HTTP/2 instance sends 234 the complete frame with S3 message payload data to C3. Moreover, as a follow-up procedure to the look-up 230 into the S3 meta data 215, the HTPP/2 instance can update the meta data 215 to include progress information on the processing of the S3 message. For instance, the S3 meta data 215 can be updated to indicate that a first DATA frame has just been received for the S3 message, how many more DATA frames are next expected for the S3 message (e.g., based on the message size and/or message payload size and the content of the respective payloads of frames 213-1, 213-2 which contained the S3 message's header information).
Referring to
With the answer to inquiries 236_1, 236_2 being “no”, the HTTP/2 instance sends 239 the complete frame with S5 message payload data to C5. Moreover, as a follow-up procedure to the look-up 235 into the S5 meta data 226, the HTPP/2 instance can update the meta data 226 to include progress information on the processing of the S5 message. For instance, the S5 meta data 226 can be updated to indicate that a first DATA frame has just been received for the S5 message, how many more DATA frames are next expected for the S5 message (e.g., based on the message size and/or message payload size).
Regardless, as observed in
When a frame is divided and intermediate content is created as a result, the answer to one of the intermediate content inquiries described above becomes “yes”. Which inquiry becomes “yes” depends on whether leading or trailing content was received in the newly received frame portion.
Thus, referring back to
With respect to
With respect to
With respect to
With respect to
In a more complex scenarios, a message's payload as carried by the respective payload of one or more DATA frames is compressed or encoded in some other way (e.g., encrypted). If so, the message's header information (and subsequently its meta data) can identify the type of encoding so that the HTTP/2 instance understands what type of decompression, decoding, etc. is to be applied to the message's payload.
In the case of encryption, the type of encryption could be left out of the message header (so as not to assist any unwelcome message listeners). Here, for instance, the type of encryption to be applied can be established as part of the setup of the connection and then stored locally on the receiving side. In this case, there is a default type of encryption for the connection and the HTTP/2 layer need not discover it in the header of a received message. As such, the message header need only indicate whether the message payload is encrypted and, if so, the HTTP/2 instance records that decryption is required in the message's meta data so that it is applied with the message's DATA frames are received.
Importantly, processes 218, 223, 229, 234, 239 send newly received frame content to their destination. As such, queueing of all frames belonging to a same in-flight HTTP/2 message in memory that is local to the HTTP/2 instance is avoided as evidenced by the rapid decrease of stored frames in memory 212 from
Although the example of
In various embodiments, the meta data described above for the processes of
Although embodiments above have stressed TCP as the transport layer protocol, it is conceivable that the teachings above could be applied at two different, e.g., IP address, network endpoints where another protocol, such as User Data Protocol (UDP), is used at the transport layer. For example, certain implementations can use the QUIC protocol between UDP and a text transfer protocol (e.g., HTTP/2, HTTP/3, etc.) that employs the message receiving teachings provided at length above. Here, UDP with QUIC (“UDP+QUIC”) provide a connection based transport layer protocol.
The HTTP/2 protocol is specified in RFC 7540, “Hypertext Transfer Protocol 2 (HTTP/2)” published by the HTTP Working Group of the Internet Engineering Task Force (IETF). Although the teachings above have been focused on HTTP/2 teachings specifically, it is important to point that the teachings above can be applied to other text/document/file transportation protocols (to the extent they sends messages in frames and/or multiplex the respective frames of multiple in-flight messages) besides HTTP/2 such as, future generations of the Hypertext Transfer Protocol (HTTP) beyond HTTP/2 (e.g., HTTP/3), other versions of HTTP and/or their future generations (e.g., HTTP Secure (HTTPS), current and/or future versions of Post Office Protocol (POP), Simple Mail Transfer Protocol (SMTP), Network News Transfer Protocol (NNTP), File Transfer Protocol (FTP), Network Time Protocol (NTP), etc.
The teachings above are believed to be particularly useful in a data center environment, or other complex computing environment, that relies upon one or more infrastructure processing units (IPUs) to offload underlying networking functions, such as the TCP and HTTP/2 protocols, from the processors that execute the service endpoints (e.g., S1-S5 in
Networked based computer services, such as those provided by cloud services and/or large enterprise data centers, commonly execute application software programs for remote clients. Here, the application software programs typically execute a specific (e.g., “business”) end-function (e.g., customer servicing, purchasing, supply-chain management, email, etc.). Remote clients invoke/use these applications through temporary network sessions/connections that are established by the data center between the clients and the applications. A recent trend is to strip down the functionality of at least some of the applications into more finer grained, atomic functions (“microservices”) that are called by client programs as needed. Microservices typically strive to charge the client/customers based on their actual usage (function call invocations) of a microservice application. Microservice function calls and associated operands can be formatted according to various syntaxes and/or protocols such as Remote Procedure Call (RPC), gRPC, Cap'n Proto, Apache Thrift, Apache Avro, JSON-RPC, XML-RPC, etc.
In order to support the network sessions and/or the applications' functionality, however, certain underlying computationally intensive and/or trafficking intensive functions (“infrastructure” functions) are performed.
Examples of infrastructure functions include transport layer protocol functions (e.g., TCP), hypertext transfer communication protocol functions (such as HTTP/2), encryption/decryption for secure network connections, compression/decompression for smaller footprint data storage and/or network communications, virtual networking between clients and applications and/or between applications, packet processing, ingress/egress queuing of the networking traffic between clients and applications and/or between applications, ingress/egress queueing of the command/response traffic between the applications and mass storage devices, error checking (including checksum calculations to ensure data integrity), distributed computing remote memory access functions, etc.
Traditionally, these infrastructure functions have been performed by the CPU units “beneath” their end-function applications. However, the intensity of the infrastructure functions has begun to affect the ability of the CPUs to perform their end-function applications in a timely manner relative to the expectations of the clients, and/or, perform their end-functions in a power efficient manner relative to the expectations of data center operators.
As such, as observed in
As observed in
Notably, each pool 301, 302, 303 has an IPU 307_1, 307_2, 307_3 on its front end or network side. Here, each IPU 307 performs pre-configured infrastructure functions on the inbound (request) packets it receives from the network 304 before delivering the requests to its respective pool's end function (e.g., executing software in the case of the CPU pool 301, memory in the case of memory pool 302 and storage in the case of mass storage pool 303). As the end functions send certain communications into the network 304, the IPU 307 performs pre-configured infrastructure functions on the outbound communications before transmitting them into the network 304. The communication 312 between the IPU 307_1 and the CPUs in the CPU pool 301 can transpire through a network (e.g., a multi-nodal hop Ethernet network) and/or more direct channels such as Compute Express Link (CXL), Advanced Extensible Interface (AXI), Open Coherent Accelerator Processor Interface (OpenCAPI), Gen-Z, etc.
Depending on implementation, one or more CPU pools 301, memory pools 302, mass storage pools 303 and network 304 can exist within a single chassis, e.g., as a traditional rack mounted computing system (e.g., server computer). In a disaggregated computing system implementation, one or more CPU pools 301, memory pools 302, and mass storage pools 303 are separate rack mountable units (e.g., rack mountable CPU units, rack mountable memory units (M), rack mountable mass storage units (S)).
In various embodiments, the software platform on which the applications 305 are executed include a virtual machine monitor (VMM), or hypervisor, that instantiates multiple virtual machines (VMs). Operating system (OS) instances respectively execute on the VMs and the applications execute on the OS instances. Alternatively or combined, container engines (e.g., Kubernetes container engines) respectively execute on the OS instances. The container engines provide virtualized OS instances and containers respectively execute on the virtualized OS instances. The containers provide isolated execution environment for a suite of applications which can include applications for microservices.
Comparing
As such, for inbound HTTP/2 messages, an IPU 307_1 sends individual frames that belong to a same inbound message piecemeal to a targeted microservice (e.g., any of S1-S5), or, e.g., packs frames that the IPU 307_1 has processed and are destined for a same CPU into a same packet that is sent to the CPU. Importantly, this reduces the amount of local IPU memory (memory 429 as described further below with respect to
The IPU 407 can be implemented with: 1) e.g., a single silicon chip that integrates any/all of cores 411, FPGAs 412, ASIC blocks 413 on the same chip; 2) a single silicon chip package that integrates any/all of cores 411, FPGAs 412, ASIC blocks 413 on more than chip within the chip package; and/or, 3) e.g., a rack mountable system having multiple semiconductor chip packages mounted on a printed circuit board (PCB) where any/all of cores 411, FPGAs 412, ASIC blocks 413 are integrated on the respective semiconductor chips within the multiple chip packages.
The processing cores 411, FPGAs 412 and ASIC blocks 413 represent different tradeoffs between versatility/programmability, computational performance and power consumption. Generally, a task can be performed faster in an ASIC block and with minimal power consumption, however, an ASIC block is a fixed function unit that can only perform the functions its electronic circuitry has been specifically designed to perform.
The general purpose processing cores 411, by contrast, will perform their tasks slower and with more power consumption but can be programmed to perform a wide variety of different functions (via the execution of software programs). The general purpose processing cores can be implemented as reduced instruction set (RISC) processors, complex instruction set (CISC) processors, a combination of RISC and CISC processors, etc.
The FPGA(s) 412 provide for more programming capability than an ASIC block but less programming capability than the general purpose cores 411, while, at the same time, providing for more processing performance capability than the general purpose cores 411 but less than processing performing capability than an ASIC block.
The IPU 407 also includes multiple memory channel interfaces 428 to couple to external memory 429 that is used to store instructions for the general purpose cores 411 and input/output data for the IPU cores 411 and each of the ASIC blocks 421-426. The IPU includes multiple PCIe physical interfaces and an Ethernet Media Access Control block 430, and/or more direct channel interfaces (e.g., CXL and or AXI over PCIe) 431, to support communication to/from the IPU 407. Here, for example, interfaces 430 can be viewed as one or more network side interfaces because these interfaces 430 interface with network 304 in
As mentioned above, the IPU 407 can be a semiconductor chip, a plurality of semiconductor chips integrated within a same chip package, a plurality of semiconductor chips integrated in multiple chip packages that components of a module or card (e.g., a NIC), etc.
In the case of authorization, the IPU 507_1 determines whether or not a sending entity (e.g., any of C1-05 in
Here, S5* corresponds to a first mode in which the IPU 507_1 is configured to offload the CPU that is supporting microservice S5 for, e.g., certain functions. For example, if a function call is made to microservice S5 to decompress a data item identified within the function call by an address in the memory pool 502, the IPU 507_1, having a decompression ASIC block (e.g., block 427), can intercept the message, retrieve the data item and decompress the data item on behalf of microservice S5 (the CPU that supports microservice S5 did not execute the function call).
Here, some updating of communication state from the IPU 507_1 to microservice S5 may be necessary so that microservice S5 understands that the IPU 507_1 intercepted and performed a function call on its behalf (e.g., if microservice S5 is to perform the next process after the function performed by the IPU 507_1). Depending on the semantics associated with the function call, the IPU 507_1 can respond to the calling customer (C5), and/or, invoke another next microservice function that is appropriate for the calling customer's microserviced process flow.
In another mode of operation, the IPU 507_1 is the endpoint for a microservice S6 rather than one of the CPUs within the CPU pool 501. Here, for example, the software to fully implement microservice S6 executes on one or more of the general purpose processing cores of the IPU 507_1. The S6 microservice can be, e.g., similar to any of microservices S1-S5 in that all the microservices S1-S6 receive function calls from certain users to perform certain functions, and such functional calls are performed/resolved with the corresponding underlying hardware (CPUs for microservices S1-S5 and IPU for microservice S6). This can include not only substantive processing of data, but also, depending on the associated semantics, responding to a calling customer, and/or, invoking another next microservice function that is appropriate for a larger microserviced process flow.
In various embodiments, as suggested above, a motivation for implementing partial or full microservice support at an IPU is the presence of the acceleration hardware on the IPU (e.g., ASIC, FPGA). For example, referring briefly back to
Candidate microservices and/or microservice endpoint function calls that can be performed at an IPU include, to name a few: 1) compression/decompression (e.g., with compression/decompression ASIC, FPGA and/or software functionality integrated with the IPU); 2) artificial intelligence (AI) inferencing (e.g., with neural network ASIC, FPGA and/or software functionality integrated with the IPU); 3) video transcoding (e.g., with video transcoding ASIC, FPGA and/or software functionality integrated with the IPU); 4) video annotation (e.g., with video annotation ASIC, FPGA and/or software functionality integrated with the IPU); 5) encryption/decryption (e.g., with video annotation ASIC, FPGA and/or software functionality integrated with the IPU); 6) intrusion detection (e.g., with security ASIC, FPGA and/or software functionality integrated with the IPU); etc.
Embodiments of the invention may include various processes as set forth above. The processes may be embodied in program code (e.g., machine-executable instructions). The program code, when processed, causes a general-purpose or special-purpose processor to perform the program code's processes. Alternatively, these processes may be performed by specific/custom hardware components that contain hard wired interconnected logic circuitry (e.g., application specific integrated circuit (ASIC) logic circuitry) or programmable logic circuitry (e.g., field programmable gate array (FPGA) logic circuitry, programmable logic device (PLD) logic circuitry) for performing the processes, or by any combination of program code and logic circuitry.
Elements of the present invention may also be provided as a machine-readable medium for storing the program code. The machine-readable medium can include, but is not limited to, floppy diskettes, optical disks, CD-ROMs, and magneto-optical disks, FLASH memory, ROMs, RAMs, EPROMs, EEPROMs, magnetic or optical cards or other type of media/machine-readable medium suitable for storing electronic instructions.
In the foregoing specification, the invention has been described with reference to specific exemplary embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the invention as set forth in the appended claims. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.