Embodiments relate generally to cloud computing environments, and more particularly, to protecting multiple tenants when sharing access to an accelerator.
In most modern cloud computing environments, the computing infrastructure is shared between multiple users, commonly referred to as tenants. Since each tenant has its own programs (e.g., code) and data, the program execution environment and memory storing this code and data must be strictly isolated such that one tenant is not able to read or modify the code and/or data of another tenant. This deters theft of the tenant's code and/or data and deters a potentially malicious tenant from subverting the use of the computing resources of another tenant. This isolation is often achieved by virtualizing the computing resources of the cloud computing environment such that each tenant is mapped to specific virtual machine (VM). Hardware mechanisms embodied within processor, memory and input/output (I/O) systems enforce these isolation boundaries, with a software component known as a hypervisor establishing and managing these boundaries. The hypervisor runs at a higher privilege than other software in the computing infrastructure and is trusted by virtue of its implementation simplicity (as compared to a traditional operating system (OS)), based in part on its limited functionality of establishing and managing isolation boundaries.
This approach works well on centralized computing systems such as those found in typical client and server systems. However, when a compute task of a tenant is offloaded to a compute accelerator connected to the central computing system (often called the host computing system), via an interconnect, maintaining these isolations becomes problematic.
So that the manner in which the above recited features of the present embodiments can be understood in detail, a more particular description of the embodiments, briefly summarized above, may be had by reference to embodiments, some of which are illustrated in the appended drawings. It is to be noted, however, that the appended drawings illustrate only typical embodiments and are therefore not to be considered limiting of its scope. The figures are not to scale. In general, the same reference numbers will be used throughout the drawings and accompanying written description to refer to the same or like parts.
Embodiments described herein provide an efficient way to isolate code and/or data of an application executing within a host computing system when at least a portion of the code and data is offloaded for processing by an attached accelerator computing device. This is achieved at least in part by using cryptographically secure communications between the host computing system and accelerator, an Isolated Memory Regions (IMRs) infrastructure and a Trusted Execution Environment (TEE) in the accelerator, and secure compute zones in the accelerator associated with selected tenants.
Host computing system 102 communicates with accelerator 116 over bus 110. In an embodiment, bus 110 is a peripheral component interconnect express (PCI-e) high speed serial computer bus as described at pcisig.com. In other embodiments, other busses may be used. In one embodiment, communication over bus 110 is protected by transport layer security (TLS) (e.g., TLS over PCI-e), a cryptographic protocol to provide communications security over a computer network.
Accelerator 116 is used to offload at least some processing tasks (also known as workloads) from host computing system 102 to improve the overall efficiency of system 100. Accelerator 116 comprises any current or future developed single- or multi-core processor or microprocessor, such as: one or more systems on a chip (SOCs); central processing units (CPUs); digital signal processors (DSPs); graphics processing units (GPUs); application-specific integrated circuits (ASICs), programmable logic units, field programmable gate arrays (FPGAs), and the like. In an embodiment, accelerator 116 is a processing system designed to efficiently compute tasks relating to artificial intelligence (AI), machine learning (ML), deep learning, inference processing, and/or image processing. Although in
In this example accelerator 116 comprises four compute zones — compute zone 0118, compute zone 1138, compute zone 2158, and compute zone 3178. As used herein, a compute zone includes data processing circuitry for performing one or more computing tasks offloaded from host computing system 102. In other examples, any number of compute zones may be included in accelerator 116. Compute zones operate in parallel in the accelerator to efficiently perform computing tasks. In embodiments, each compute zone is isolated from other compute zones; that is, one compute zone cannot access or affect the processing and/or data of other compute zones.
In one embodiment wherein bus 110 is a PCI-e bus, the PCI-e bus provides eight physical PCI-e functions (PFs), labeled 112, 114, 132, 134, 152, 154, 172, and 174 in
In an embodiment, memory 250 comprises a dynamic random-access memory (DRAM), and temporary memory 252 comprises a high speed ‘near’ static random-access memory (SRAM). Access to memory 250 and temporary memory 252 by compute zones is provided by memory controllers (MCs) MC 0206, MC 1216, MC 2226 and MC3236. Each compute zone uses a MC to access the memories. For example, compute zone 0118 accesses the memories using MC 0206, compute zone 1138 accesses the memories using MC 1216, compute zone 2158 accesses the memories using MC 2226, and compute zone 3178 accesses the memories using MC 3236.
Media engines 202, 212, 222, and 232 provide media processing operations such as encoding video data, decoding video data, compressing video data, and decompressing video data.
Inference engines 204, 214, 224, and 234 provide one or more artificial intelligence (AI), machine learning, and/or deep learning data processing operations. These operations include object detection, object tracking, object classification, labelling, etc. For example, a data processing operation could include a process that tracks a specific red vehicle as it moves across the field of view of a surveillance camera. Another example would be the ability to detect the location of a particular vehicle using a license plate detection process.
Crypto engines 208, 218, 228, and 238 provide cryptographic processing operations in hardware. These operations may include encryption, decryption, hashing, integrity checking, authentication, signing, and/or signature verification.
Selected regions of memory 250 and temporary memory 252 associated with each compute zone are isolated using Isolated Memory Region (IMR) registers. IMRs are fence registers which are securely configured to only allow memory read/write accesses from a specific compute zone (and related entities, e.g., other bus masters in the system such as PCIe DMA engines (in 242), generic DMA engines and other peripherals (in 242) and accelerator processor subsystem 240). This prevents access by one compute zone to data from another compute zone. Thus, the data stored in a compute zone's protected region of memory 250 is isolated from other data of other compute zones as well as other HW devices in the accelerator such as a PCIe controller (in 242) and accelerator processor subsystem 240. This increases the security provided by the accelerator.
Accelerator 116 includes bus subsystem 244 for communicating with host computing system 102 over bus 110, and peripheral subsystem 242 for communicating with any peripherals attached to accelerator 116 (not shown in
Accelerator processor subsystem 240 includes one or more processors to execute code for accelerator 116. In an embodiment, the one or more processors comprises an ARM-based compute complex (according to a specification by ARM, Ltd.), that supports the ARM TrustZone Trusted Execution Environment (TEE) for secure computing operations, including setting of IMRs. ARM TrustZone technology is a system-on-chip (SoC) and central processing unit (CPU) system-wide approach to security with hardware-enforced isolation to establish secure end points and a device root of trust. This compute complex operates like a ‘control-plane’ for the ‘data-plane’ processing performed by the compute zones and controls overall processing of accelerator 116.
Each VM 404 runs a least one tenant application 406 and a guest OS 410. Guest OS 410 includes bus driver 412 to control communications over bus 110 to one or more compute zones on accelerator 116. Each VM 404 that interacts with one or more compute zones on the accelerator includes a compute zone driver 408 to control communications between the tenant's application 406 and assigned compute zone(s). The compute zone driver is also responsible for the confidentiality and integrity of data exchanged between application 406 and accelerator 116 over PCIe interconnect 110.
At block 502, during initialization of host computing system 102, accelerator resource manager 402 on the host computing system detects each attached accelerator 116, detects the compute zones (e.g., 118, 138, 158, and 178) in each accelerator, and assigns at least one PF for each compute zone (e.g., PFs 112, 114, 132, 134, 152, 154, 172, and 174). At block 504, a user of host computing system 102 requests one or more compute zones to be assigned to a tenant. In an embodiment, the request is read from a configuration file on the host computing system that maps PFs to VMs before the VMs are started by the host. In another embodiment, the request is received over a command line interface from a user (for example, from a system administrator of a cloud computing environment). In response, at block 506 accelerator resource manager 402 assigns the requested compute zone (if available) to the tenant (and to the tenant's VM). In an embodiment, a static configuration is used to map compute zones to tenants for a host computing system. In another embodiment, the mapping of compute zones to tenants is dynamic and may be changed during runtime. In an embodiment, A VM 404 is started as an empty shell and once up and running, a tenant is provisioned into the VM.
When a persistent memory (such as an embedded MultiMediaCard (eMMC) or other temporary memory 252) is not present on accelerator 116 (e.g., the accelerator is “flash-less”), host computing system 102 sends a link certificate and encrypted private configuration assets to TrustZone TEE 306 in accelerator processor subsystem 240. In some accelerators, this information resides in the persistent memory (e.g., temporary memory 252). The link certificate and encrypted private configuration assets are used by the accelerator to establish a secure communications link with the host computing system.
Accelerator resource manager 402 searches for available resources and assigns PFs associated with the requested compute zone to the tenant (and thus also to the VM). At block 508, accelerator resource manager 402 creates and starts a VM for the tenant. At block 510, the accelerator resource manager starts the tenant software within the VM. At block 512, compute zone driver 408 within the tenant's VM detects the one or more assigned PFs and instructs the accelerator to initialize the compute zone(s) assigned to the tenant (e.g., thus causing the initialization to be performed). Trusted loader 308 sets up the tenant boundaries in memory 250 and temporary memory 252 to prevent other tenants from accessing any data within the tenant's protected (and isolated) memory (for example, protected regions 260 and 262 for memory 250 and temporary memory 252, respectively, for compute zone 0118). At block 514, the tenant executes a cryptographic key exchange protocol with key exchange function 310 in TrustZone TEE 304 in accelerator 116 and both sides of the key exchange protocol derive the same unique session key. The trusted loader at block 516 programs the newly derived session key specific to this tenant/compute zone combination into the cryptographic engine of the compute zone (for example, crypto engine 0208 of compute zone 0118 for communication with tenant 0108 in VM 0106).
All communications between the VM (for example, VM 0106) on host computing system 102 and the compute zone (for example, compute zone 0118) on accelerator 116 over the assigned PFs (e.g., 112, 114) is encrypted with this session key. Since the session key is known only to the tenant within the VM and the assigned compute zone, no other entity (either hardware (HW) or software (SW)) in the host computing system or the accelerator, or in the communications path between the host computing system and the accelerator, can access (e.g., steal) communications encrypted with this session key. In an embodiment, once programed into the crypto engine the session key cannot be read back out by any entity (either HW or SW) on accelerator 116 or host computing system 102. Processing then continues at block 518 on
At block 518, the tenant downloads an encrypted workload to the assigned compute zone (for example, tenant 0108 downloads an encrypted workload to compute zone 0118) via the assigned PFs (e.g., 112 or 114) over the encrypted communications link. At block 520, the compute zone decrypts the workload (for example, using the crypto engine 0208 in compute zone 0118 and the embedded session key) and starts executing the workload. The workload can be any one or more data processing tasks. Once the workload is running and ready to process data, at block 522 the tenant sends an encrypted data stream to the compute zone running the decrypted workload. In one embodiment, the data stream comprises a video data stream. The data stream has been previously encrypted by the tenant with the same session key used to encrypt the workload. This session key (embedded in the crypto engine) is also used by the crypto engine in the compute zone at block 524 to decrypt the received encrypted data stream and store the decrypted (e.g., plaintext) data stream in the protected region (e.g., 260) of memory 250 allocated to the compute zone. While in the protected region, the decrypted data stream cannot be accessed by other compute zones or untrusted software executing in accelerator processor subsystem 240 (e.g., untrusted apps 316).
At block 526, the compute zone processes the decrypted data stream to produce metadata. In an embodiment, metadata produced by the compute zone is stored in the protected region of memory 250 (e.g., protected region 260 for compute zone 0118). During processing, the compute zone may store temporary data in the compute zone's protected region of temporary memory 252 (e.g., area 262 for compute one 0118). In an embodiment, this temporary data is metadata. In an embodiment, the one or more inference engines of the compute zone are applied to the decrypted data stream (for example, inference engines 0204 of compute zone 0118). In an embodiment, the one or more inference engines comprise one or more machine learning (ML) models.
In an embodiment, the compute zone uses functions provided by a media engine (for example, media engine 0202 of compute zone 0118) to process the data stream prior to or after processing by the one or more inference engines. At block 528, the crypto engine in the compute zone (for example, crypto engine 0208 of compute engine 0118) encrypts the metadata using the embedded session key. At block 530, the compute zone sends the encrypted metadata over the encrypted communications link from the accelerator to the tenant on the host computing system. At block 532, the tenant decrypts the encrypted metadata. The tenant can then use the metadata (that is, the results of the accelerator's computation of the offloaded workload) for any purposes as needed.
In an embodiment, the tenant may then request to release the compute zone (thereby allowing the compute zone to be used by another tenant). In another embodiment, the tenant keeps the allocation of the compute zone for use with another workload as long as the tenant is running on the host computing system. In embodiments, the processing of
One or more inference engines (such as inference engines 0204) read the one or more decoded frames 636 from the protected region of memory 250 over logical data path 664. In an embodiment, the one or more inference engines apply a machine learning model to the decoded frames and generate region of interest (ROI) metadata 638, which is stored in the protected region of memory 250 over logical data path 666. The one or more inference engines write object (obj) class metadata 640 to the protected region of memory 250 over logical data path 668. In an embodiment, an inference control 620 portion of untrusted OS kernel 322 controls the inferencing operations performed by the one or more inference engines. In an embodiment, inference control 620 is an application 316 that controls and/or directs the processing of inference engine(s) 204 without having access to sensitive tenant data 634, 636, 638, and 640. In one embodiment, the processing performed by the one or more inference engines is video data stream processing. In other embodiments, the processing may be related to voice data processing, voice recognition, two-dimensional or three-dimensional image classification, pattern recognition, detectors, and the like. In various embodiments, the data being processed may be radar data, acoustic data, sensor data, or any other suitable data.
The crypto engine (such as crypto engine 0208) reads object class metadata 640 from the protected region of memory 250 over logical data path 670 and encrypts the metadata. The crypto engine stores the encrypted metadata 644 in memory 250 over logical path 672. Accelerator 116 sends encrypted metadata 644 over bus 110 to host computing system 102 over logical data path 674. Decrypt function 614 on the host decrypts the encrypted metadata and forwards the decrypted metadata over logical path 676 to application 602. Application 602 can then use the decrypted metadata as needed.
Decode plugin 608 controls media engine 202, ensuring that the media engine is able to correctly decode encoded frame 634, without having direct access to encoded frame 634 or decoded frame 636. Object detection function 610 triggers inference engine(s) 204 to detect objects present in decoded frame 636, resulting in ROI Metadata 638, without having direct access to decoded frame 636 or ROI Metadata 638. Object classification function 612 also triggers inference engine(s) 204 to classify objects (car, dog, cat, etc.) present in decoded frame 636, resulting in “Label” ROI metadata 638 (such as “car”, “dog”, “cat”), without having direct access to decoded frame 636 or “Label” ROI metadata 638.
The isolation techniques of embodiments are described above with reference to cloud computing and multi-tenancy scenarios but are also applicable to any distributed processing environments and to a plurality of processing contexts where the contexts trust each other but still need isolation for confidentiality or privacy reasons.
In some embodiments, at least some of host computing system and/or accelerator 116 is hosted by or part of firmware of graphics processing unit (GPU) 714. In yet other embodiments, at least some of host computing system 102 and/or accelerator 116 is hosted by or be a part of firmware of central processing unit (“CPU” or “application processor”) 712.
In yet another embodiment, at least some of host computing system and/or accelerator 116 is hosted as software or firmware logic by operating system (OS) 706. In yet a further embodiment, at least some of host computing system and/or accelerator 116 is partially and simultaneously hosted by multiple components of computing device 700, such as one or more of GPU 714, GPU firmware (not shown in
Throughout the document, term “user” may be interchangeably referred to as “viewer”, “observer”, “person”, “individual”, “end-user”, and/or the like. It is to be noted that throughout this document, terms like “graphics domain” may be referenced interchangeably with “graphics processing unit”, “graphics processor”, or simply “GPU” and similarly, “CPU domain” or “host domain” may be referenced interchangeably with “computer processing unit”, “application processor”, or simply “CPU”.
Computing device 700 may include any number and type of communication devices, such as large computing systems, such as server computers, desktop computers, etc., and may further include set-top boxes (e.g., Internet-based cable television set-top boxes, etc.), global positioning system (GPS)-based devices, etc. Computing device 700 may include mobile computing devices serving as communication devices, such as cellular phones including smartphones, personal digital assistants (PDAs), tablet computers, laptop computers, e-readers, smart televisions, television platforms, wearable devices (e.g., glasses, watches, bracelets, smartcards, jewelry, clothing items, etc.), media players, etc. For example, in one embodiment, computing device 700 may include a mobile computing device employing a computer platform hosting an integrated circuit (“IC”), such as system on a chip (“SoC” or “SOC”), integrating various hardware and/or software components of computing device 700 on a single chip.
As illustrated, in one embodiment, computing device 700 may include any number and type of hardware and/or software components, such as (without limitation) GPU 714, a graphics driver (also referred to as “GPU driver”, “graphics driver logic”, “driver logic”, user-mode driver (UMD), UMD, user-mode driver framework (UMDF), UMDF, or simply “driver”) (not shown in
Computing device 700 may include operating system (OS) 706 serving as an interface between hardware and/or physical resources of the computer device 700 and a user. It is contemplated that CPU 712 may include one or more processors, such as processor(s) 702 of
It is to be noted that terms like “node”, “computing node”, “server”, “server device”, “cloud computer”, “cloud server”, “cloud server computer”, “machine”, “host machine”, “device”, “computing device”, “computer”, “computing system”, and the like, may be used interchangeably throughout this document. It is to be further noted that terms like “application”, “software application”, “program”, “software program”, “package”, “software package”, and the like, may be used interchangeably throughout this document. Also, terms like “job”, “input”, “request”, “message”, and the like, may be used interchangeably throughout this document.
It is contemplated that some processes of the graphics pipeline as described herein are implemented in software, while the rest are implemented in hardware. A graphics pipeline may be implemented in a graphics coprocessor design, where CPU 712 is designed to work with GPU 714 which may be included in or co-located with CPU 712. In one embodiment, GPU 714 may employ any number and type of conventional software and hardware logic to perform the conventional functions relating to graphics rendering as well as novel software and hardware logic to execute any number and type of instructions.
Memory 708 may include a random-access memory (RAM) comprising application database having object information. A memory controller hub (not shown
Processed data is stored in a buffer in the hardware graphics pipeline, and state information is stored in memory 708. The resulting image is then transferred to I/O sources 704, such as a display component for displaying of the image. It is contemplated that the display device may be of various types, such as Cathode Ray Tube (CRT), Thin Film Transistor (TFT), Liquid Crystal Display (LCD), Organic Light Emitting Diode (OLED) array, etc., to display information to a user.
Memory 708 may comprise a pre-allocated region of a buffer (e.g., frame buffer); however, it should be understood by one of ordinary skill in the art that the embodiments are not so limited, and that any memory accessible to the lower graphics pipeline may be used. Computing device 700 may further include an input/output (I/O) control hub (ICH) (not shown in
CPU 712 may include one or more processors to execute instructions in order to perform whatever software routines the computing system implements. The instructions frequently involve some sort of operation performed upon data. Both data and instructions may be stored in system memory 708 and any associated cache. Cache is typically designed to have shorter latency times than system memory 708; for example, cache might be integrated onto the same silicon chip(s) as the processor(s) and/or constructed with faster static RAM (SRAM) cells whilst the system memory 708 might be constructed with slower dynamic RAM (DRAM) cells. By tending to store more frequently used instructions and data in the cache as opposed to the system memory 708, the overall performance efficiency of computing device 700 improves. It is contemplated that in some embodiments, GPU 714 may exist as part of CPU 712 (such as part of a physical CPU package) in which case, memory 708 may be shared by CPU 712 and GPU 714 or kept separated.
System memory 708 may be made available to other components within the computing device 700. For example, any data (e.g., input graphics data) received from various interfaces to the computing device 700 (e.g., keyboard and mouse, printer port, Local Area Network (LAN) port, modem port, etc.) or retrieved from an internal storage element of the computer device 700 (e.g., hard disk drive) are often temporarily queued into system memory 708 prior to being operated upon by the one or more processor(s) in the implementation of a software program. Similarly, data that a software program determines should be sent from the computing device 700 to an outside entity through one of the computing system interfaces, or stored into an internal storage element, is often temporarily queued in system memory 708 prior to its being transmitted or stored.
Further, for example, an ICH may be used for ensuring that such data is properly passed between the system memory 708 and its appropriate corresponding computing system interface (and internal storage device if the computing system is so designed) and may have bi-directional point-to-point links between itself and the observed I/O sources/devices 704. Similarly, an MCH may be used for managing the various contending requests for system memory 708 accesses amongst CPU 712 and GPU 114, interfaces and internal storage elements that may proximately arise in time with respect to one another.
I/O sources 704 may include one or more I/O devices that are implemented for transferring data to and/or from computing device 700 (e.g., a networking adapter); or, for a large-scale non-volatile storage within computing device 700 (e.g., hard disk drive). User input device, including alphanumeric and other keys, may be used to communicate information and command selections to GPU 714. Another type of user input device is cursor control, such as a mouse, a trackball, a touchscreen, a touchpad, or cursor direction keys to communicate direction information and command selections to GPU 714 and to control cursor movement on the display device. Camera and microphone arrays of computer device 700 may be employed to observe gestures, record audio and video and to receive and transmit visual and audio commands.
Computing device 700 may further include network interface(s) to provide access to a network, such as a LAN, a wide area network (WAN), a metropolitan area network (MAN), a personal area network (PAN), Bluetooth, a cloud network, a mobile network (e.g., 3rd Generation (3G), 4th Generation (4G), etc.), an intranet, the Internet, etc. Network interface(s) may include, for example, a wireless network interface having antenna, which may represent one or more antenna(e). Network interface(s) may also include, for example, a wired network interface to communicate with remote devices via network cable, which may be, for example, an Ethernet cable, a coaxial cable, a fiber optic cable, a serial cable, or a parallel cable.
Network interface(s) may provide access to a LAN, for example, by conforming to IEEE 802.11b and/or IEEE 802.11g standards, and/or the wireless network interface may provide access to a personal area network, for example, by conforming to Bluetooth standards. Other wireless network interfaces and/or protocols, including previous and subsequent versions of the standards, may also be supported. In addition to, or instead of, communication via the wireless LAN standards, network interface(s) may provide wireless communication using, for example, Time Division, Multiple Access (TDMA) protocols, Global Systems for Mobile Communications (GSM) protocols, Code Division, Multiple Access (CDMA) protocols, and/or any other type of wireless communications protocols.
Network interface(s) may include one or more communication interfaces, such as a modem, a network interface card, or other well-known interface devices, such as those used for coupling to the Ethernet, token ring, or other types of physical wired or wireless attachments for purposes of providing a communication link to support a LAN or a WAN, for example. In this manner, the computer system may also be coupled to a number of peripheral devices, clients, control surfaces, consoles, or servers via a conventional network infrastructure, including an Intranet or the Internet, for example.
It is to be appreciated that a lesser or more equipped system than the example described above may be preferred for certain implementations. Therefore, the configuration of computing device 700 may vary from implementation to implementation depending upon numerous factors, such as price constraints, performance requirements, technological improvements, or other circumstances. Examples of the electronic device or computer system 700 may include (without limitation) a mobile device, a personal digital assistant, a mobile computing device, a smartphone, a cellular telephone, a handset, a one-way pager, a two-way pager, a messaging device, a computer, a personal computer (PC), a desktop computer, a laptop computer, a notebook computer, a handheld computer, a tablet computer, a server, a server array or server farm, a web server, a network server, an Internet server, a work station, a mini-computer, a main frame computer, a supercomputer, a network appliance, a web appliance, a distributed computing system, multiprocessor systems, processor-based systems, consumer electronics, programmable consumer electronics, television, digital television, set top box, wireless access point, base station, subscriber station, mobile subscriber center, radio network controller, router, hub, gateway, bridge, switch, machine, or combinations thereof.
Embodiments may be implemented as any or a combination of: one or more microchips or integrated circuits interconnected using a parent board, hardwired logic, software stored by a memory device and executed by a microprocessor, firmware, an application specific integrated circuit (ASIC), and/or a field programmable gate array (FPGA). The term “logic” may include, by way of example, software or hardware and/or combinations of software and hardware.
Embodiments may be provided, for example, as a computer program product which may include one or more tangible non-transitory machine-readable media having stored thereon machine-executable instructions that, when executed by one or more machines such as a computer, network of computers, or other electronic devices, may result in the one or more machines carrying out operations in accordance with embodiments described herein. A tangible non-transitory machine-readable medium may include, but is not limited to, floppy diskettes, optical disks, CD-ROMs (Compact Disc-Read Only Memories), and magneto-optical disks, ROMs, RAMs, EPROMs (Erasable Programmable Read Only Memories), EEPROMs (Electrically Erasable Programmable Read Only Memories), magnetic or optical cards, flash memory, or other type of media/machine-readable medium suitable for storing machine-executable instructions.
Moreover, embodiments may be downloaded as a computer program product, wherein the program may be transferred from a remote computer (e.g., a server) to a requesting computer (e.g., a client) by way of one or more data signals embodied in and/or modulated by a carrier wave or other propagation medium via a communication link (e.g., a modem and/or network connection).
During operation, media engines 802, crypto engines 804, and inference engines 806 can work in concert to accelerate computer vision operations or other video data stream processing. Media engines 802 enable low latency decode of multiple high-resolution (e.g., 4K, 8K) video streams. The decoded video streams can be written to a buffer in the on-chip-memory 805. The media engines can then parse the decoded video and perform preliminary processing operations on the frames of the decoded video in preparation of processing the frames using a trained image recognition model (e.g., in inference engines 806). For example, inference engines 806 can accelerate convolution operations for a convolutional neural network (CNN) that is used to perform image recognition on the high-resolution video data, while back end model computations are performed by processor subsystem 808.
The processing subsystem 808 can include control logic to assist with sequencing and synchronization of data transfers and shared memory operations performed by media engines 802, crypto engines 804, and inference engines 806. Processor subsystem 808 can also function as an application processor to execute software applications that make use of the inferencing compute capabilities of the inference engines 806.
Flowcharts representative of example hardware logic, machine readable instructions, hardware implemented state machines, and/or any combination thereof for implementing computing device 700, for example, are shown in
The machine-readable instructions described herein may be stored in one or more of a compressed format, an encrypted format, a fragmented format, a compiled format, an executable format, a packaged format, etc. Machine-readable instructions as described herein may be stored as data (e.g., portions of instructions, code, representations of code, etc.) that may be utilized to create, manufacture, and/or produce machine executable instructions. For example, the machine-readable instructions may be fragmented and stored on one or more storage devices and/or computing devices (e.g., servers). The machine-readable instructions may require one or more of installation, modification, adaptation, updating, combining, supplementing, configuring, decryption, decompression, unpacking, distribution, reassignment, compilation, etc. in order to make them directly readable, interpretable, and/or executable by a computing device and/or another machine. For example, the machine-readable instructions may be stored in multiple parts, which are individually compressed, encrypted, and stored on separate computing devices, wherein the parts when decrypted, decompressed, and combined form a set of executable instructions that implement a program such as that described herein.
In another example, the machine-readable instructions may be stored in a state in which they may be read by a computer, but require addition of a library (e.g., a dynamic link library (DLL)), a software development kit (SDK), an application programming interface (API), etc. in order to execute the instructions on a particular computing device or other device. In another example, the machine-readable instructions may be configured (e.g., settings stored, data input, network addresses recorded, etc.) before the machine-readable instructions and/or the corresponding program(s) can be executed in whole or in part. Thus, the disclosed machine-readable instructions and/or corresponding program(s) are intended to encompass such machine-readable instructions and/or program(s) regardless of the particular format or state of the machine-readable instructions and/or program(s) when stored or otherwise at rest or in transit.
The machine-readable instructions described herein can be represented by any past, present, or future instruction language, scripting language, programming language, etc. For example, the machine-readable instructions may be represented using any of the following languages: C, C++, Java, C#, Perl, Python, JavaScript, HyperText Markup Language (HTML), Structured Query Language (SQL), Swift, etc.
As mentioned above, the example process of
“Including” and “comprising” (and all forms and tenses thereof) are used herein to be open ended terms. Thus, whenever a claim employs any form of “include” or “comprise” (e.g., comprises, includes, comprising, including, having, etc.) as a preamble or within a claim recitation of any kind, it is to be understood that additional elements, terms, etc. may be present without falling outside the scope of the corresponding claim or recitation. As used herein, when the phrase “at least” is used as the transition term in, for example, a preamble of a claim, it is open-ended in the same manner as the term “comprising” and “including” are open ended.
The term “and/or” when used, for example, in a form such as A, B, and/or C refers to any combination or subset of A, B, C such as (1) A alone, (2) B alone, (3) C alone, (4) A with B, (5) A with C, (6) B with C, and (7) A with B and with C. As used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing structures, components, items, objects and/or things, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. As used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A and B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B. Similarly, as used herein in the context of describing the performance or execution of processes, instructions, actions, activities and/or steps, the phrase “at least one of A or B” is intended to refer to implementations including any of (1) at least one A, (2) at least one B, and (3) at least one A and at least one B.
As used herein, singular references (e.g., “a”, “an”, “first”, “second”, etc.) do not exclude a plurality. The term “a” or “an” entity, as used herein, refers to one or more of that entity. The terms “a” (or “an”), “one or more”, and “at least one” can be used interchangeably herein. Furthermore, although individually listed, a plurality of means, elements or method actions may be implemented by, e.g., a single unit or processor. Additionally, although individual features may be included in different examples or claims, these may possibly be combined, and the inclusion in different examples or claims does not imply that a combination of features is not feasible and/or advantageous.
Descriptors “first,” “second,” “third,” etc. are used herein when identifying multiple elements or components which may be referred to separately. Unless otherwise specified or understood based on their context of use, such descriptors are not intended to impute any meaning of priority, physical order or arrangement in a list, or ordering in time but are merely used as labels for referring to multiple elements or components separately for ease of understanding the disclosed examples. In some examples, the descriptor “first” may be used to refer to an element in the detailed description, while the same element may be referred to in a claim with a different descriptor such as “second” or “third.” In such instances, it should be understood that such descriptors are used merely for ease of referencing multiple elements or components.
The following examples pertain to further embodiments.
Example 1 is an accelerator. The accelerator of Example 1 includes a memory; a first compute zone to receive an encrypted workload downloaded from a tenant application running in a virtual machine on a host computing system attached to the accelerator; and a processor subsystem to execute a cryptographic key exchange protocol with the tenant application to derive a session key for the first compute zone and to program the session key into the first compute zone. The first compute zone is to decrypt the encrypted workload using the session key, receive an encrypted data stream from the tenant application, decrypt the encrypted data stream using the session key, and process the decrypted data stream by executing the workload to produce metadata.
In Example 2, the subject matter of Example 1 can optionally include wherein the tenant application communicates with the first compute zone over a physical function of a bus coupling the host computing system and the accelerator.
In Example 3, the subject matter of Example 1 can optionally include wherein the accelerator comprises a plurality of compute zones and the first compute zone is isolated from other compute zones in the accelerator.
In Example 4, the subject matter of Example 1 can optionally include wherein a plurality of compute zones and data stored in a protected region of the memory assigned to the first compute zone is isolated from access by other compute zones in the accelerator.
In Example 5, the subject matter of Example 4 can optionally include wherein the first compute zone stores the decrypted data stream and the metadata in the protected region of the memory assigned to the first compute zone.
In Example 6, the subject matter of Example 4 can optionally include wherein the protected region of the memory is assigned to the first compute zone by setting one or more using isolated memory region (IMR) registers in the processor subsystem.
In Example 7, the subject matter of Example 1 can optionally include wherein the first compute zone encrypts the metadata using the session key and sends the encrypted metadata to the tenant application.
In Example 8, the subject matter of Example 1 can optionally include wherein the processor subsystem operates in a trusted execution environment.
In Example 9, the subject matter of Example 1 can optionally include wherein the first compute zone comprises one or more cryptographic engines to perform cryptographic operations on the encrypted workload and the encrypted data stream; one or more media engines to perform media operations on the decrypted data stream, and one or more inference engines to execute the decrypted workload to process the decrypted data stream.
In Example 10, the subject matter of Example 9 can optionally include wherein the one or more inference engines comprise one or more machine learning models.
In Example 11, the subject matter of Example 1 optionally comprising an accelerator embodying the memory, the first compute function and the processor subsystem, as a system on a chip (SoC) attached the host computing system over one or more physical functions of a bus.
In Example 12, the subject matter of Example 11 can optionally include wherein the host computing system comprises a resource manager to detect one or more compute zones in the accelerator, assign at least one physical function to each of the one or more detected compute zones, receive a request to assign the first compute zone to the tenant application, assign the first compute zone to the virtual machine of the tenant application, start the virtual machine, and start the tenant application in the virtual machine.
In Example 13, the subject matter of Example 12 can optionally include wherein the virtual machine comprises a compute zone driver to detect the physical function coupled to the first compute zone and to cause the accelerator to initialize the first compute zone.
Example 14 is a method. The method includes receiving, by a first compute zone of an accelerator, an encrypted workload downloaded from a tenant application running in a virtual machine on a host computing system attached to the accelerator; executing, by a processor subsystem of the accelerator, a cryptographic key exchange protocol with the tenant application to derive a session key for the first compute zone and to program the session key into the first compute zone, decrypting, by the first compute zone, the encrypted workload using the session key; receiving, by the first computer zone, an encrypted data stream from the tenant application; decrypting, by the first compute zone, the encrypted data stream using the session key; and processing, by the first compute zone, the decrypted data stream by executing the workload to produce metadata.
In Example 15, the subject matter of Example 14 can optionally include wherein the accelerator comprises a plurality of compute zones and comprising isolating, by the accelerator, data stored in a protected region of the memory assigned to the first compute zone from access by other compute zones in the accelerator.
In Example 16, the subject matter of Example 14 can optionally include storing, by the first compute zone, the decrypted data stream and the metadata in a protected region of a memory assigned to the first compute zone.
In Example 17, the subject matter of Example 14 can optionally include wherein the first compute zone encrypts the metadata using the session key and sends the encrypted metadata to the tenant application.
Example 18 is at least one non-transitory machine-readable storage medium comprising instructions that, when executed, cause at least one processor to perform receiving, by a first compute zone of an accelerator, an encrypted workload downloaded from a tenant application running in a virtual machine on a host computing system attached to the accelerator; executing, by a processor subsystem of the accelerator, a cryptographic key exchange protocol with the tenant application to derive a session key for the first compute zone and to program the session key into the first compute zone, decrypting, by the first compute zone, the encrypted workload using the session key; receiving, by the first computer zone, an encrypted data stream from the tenant application; decrypting, by the first compute zone, the encrypted data stream using the session key; and processing, by the first compute zone, the decrypted data stream by executing the workload to produce metadata.
In Example 19, the subject matter of Example 18 can optionally include wherein the accelerator comprises a plurality of compute zones and wherein the instructions further include instructions for comprising isolating, by the accelerator, data stored in a protected region of the memory assigned to the first compute zone from access by other compute zones in the accelerator.
In Example 20, the subject matter of Example 19 can optionally include wherein the instructions further include instructions for storing, by the first compute zone, the decrypted data stream and the metadata in a protected region of a memory assigned to the first compute zone.
The foregoing description and drawings are to be regarded in an illustrative rather than a restrictive sense. Persons skilled in the art will understand that various modifications and changes may be made to the embodiments described herein without departing from the broader spirit and scope of the features set forth in the appended claims.
This application is a continuation of co-pending International Patent Application No. PCT/CN2021/082931 filed Mar. 25, 2021, the full disclosure of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2021/082931 | Mar 2021 | US |
Child | 17569488 | US |