This application claims priority to Korean Patent Application No. 10-2023-0058605 (filed on May 4, 2023), which is hereby incorporated by reference in its entirety.
The present patent application has been filed as a research project as described below.
The present disclosure relates to an artificial intelligence (AI) neural network framework technology, and more particularly, to an AI device based on a trust environment, capable of safely accelerating the execution of an artificial neural network in a trust environment.
A deep neural network (DNN) is an artificial neural network including several hidden layers between an input layer and an output layer. DNNs have been widely used in mobile and embedded applications. In particular, DNNs are useful for applications that perform biometric authentication using users' biological characteristics (e.g., fingerprint, iris, face, etc.) to verify the users' identities.
Since DNNs include a lot of sensitive user data, mobile and embedded devices should implement a secure DNN execution environment that may safely protect users and DNN data from security attacks.
Previously, it was proposed to run DNNs in a trusted execution environment via TrustZone, a hardware-based security technology available in advanced RISC machine (ARM) processors. The TrustZone is a hardware-based security technology that protects important information by placing an independent security zone in a processor. However, running a DNN in the TrustZone alone does not fully protect data. This is because the TrustZone have limited memory protection. Using the TrustZone allows DNN execution to be separated from other processes by partitioning hardware and software resources into safe, general zones. Accessing data of secure zones from general zones may be prevented by hardware. However, because TrustZone maintains data unencrypted in volatile memory, physical security attacks, such as cold boot attacks, may obtain sensitive user and DNN data despite running the DNN in the TrustZone.
To protect data from physical attacks, data may be selectively encrypted and decrypted only in secure on-chip memory. This approach not only isolates DNN execution from other processes by using the TrustZone, but also protects users and DNN data from physical attacks through encryption. However, protecting encrypted data in memory may cause a problem in that slow memory access significantly increases and a DNN execution time significantly increases due to high data encryption/decryption overhead imposed on the processor.
Therefore, there is a need for a new secure DNN framework that may not only protecting sensitive users and DNN data from physical attacks, but also reduce the DNN execution time by overcoming slow memory access and high data encryption/decryption overhead.
An embodiment of the present disclosure is to provide an artificial intelligence (AI) device based on a trust environment capable of safely accelerating the execution of an artificial neural network in a trust environment.
An embodiment of the present disclosure is to provide an AI device based on a trust environment capable of strengthening security from physical attacks by encrypting data in a trust space and reducing the number of memory accesses by performing direct convolution-based neural network computation.
An embodiment of the present disclosure is to provide an AI device based on a trust environment capable of utilizing processor resources limited in neural network execution operations by performing offloading to cryptographic hardware and shortening an artificial neural network execution time by overlapping with data encryption and decryption through intra-layer pipelining.
According to embodiments of the present disclosure, an artificial intelligence (AI) device based on a trust environment includes: a first type memory configured to transmit encrypted input data and receive encrypted output data; and a trust AI processing unit configured to operate in a trust space and perform AI computation of the encrypted input and output data, wherein the trust AI processing unit includes: a cryptographic processing front-end processor configured to generate decrypted input data through decryption of the encrypted input data and perform encryption of non-encrypted output data to generate the encrypted output data; a second type memory configured to provide a buffer for the decrypted input data and the non-encrypted input data; and a processor configured to perform a neural network computation based on the decrypted input data to generate the non-encrypted output data.
The cryptographic processing front-end processor may be configured to receive an encryption input activation and an encryption filter, as the encrypted input data, from the first type memory and store a decryption input activation and a decryption filter in the second type memory.
The cryptographic processing front-end processor may be configured to receive an on-demand request by the processor in the course of the AI computation and access the first type memory to import the encrypted input data.
The second type memory may have a relatively faster operating speed and a smaller storage capacity than the first type memory.
The processor may be configured to directly perform a convolution-based neural network computation to reduce the number of accesses to the first type memory.
The processor may be configured to regularly store a decryption input activation in the second type memory and store a decryption filter and a non-encryption output activation in a circular queue manner.
The processor may be configured to perform data transmission and reception with the first and second type memories through interrupt-driven offloading of the cryptographic processing front-end processor.
The processor may be configured to perform data transmission and reception with the cryptographic processing front-end processor and the first and second type memories through direct memory access (DMA)-driven offloading of a DMA controller.
The processor may be configured to implement intra-layer pipelining by performing the neural network computation to overlap with the encryption and decryption operations performed by the cryptographic processing front-end processor.
The processor may be configured to perform the neural network computation seamlessly by allowing the cryptographic processing front-end processor to perform a decryption operation of the encrypted input data in the middle of performing the neural network computation.
The processor may be configured to perform the neural network computation seamlessly by subdividing a data decryption operation, a calculation operation, and a data encryption operation for the intra-layer pipelining.
According to embodiments of the present disclosure, an artificial intelligence (AI) device based on trust environment includes: a first type memory configured to transmit encrypted input data; and a trust AI processing unit configured to operate in a trust space and perform an AI computation of the encrypted input data, wherein the trust AI processing unit includes: a cryptographic processing front-end processor configured to generate decrypted input data through decryption of the encrypted input data; a second type memory configured to provide a buffer for the decrypted input data and the non-encrypted output data; and a processor configured to perform a neural network computation based on the decrypted input data to generate the non-encrypted input data.
The processor may be configured to reduce the number of accesses to the first type memory by performing a direct convolution-based neural network computation.
The processor may be configured to regularly store a decryption input activation in the second type memory and store a decryption filter and a non-encryption output activation in a circular queue manner.
The processor may be configured to implement intra-layer pipelining by performing the neural network computation to overlap with the encryption and decryption operations performed by the cryptographic processing front-end processor.
The disclosed technology may have the following effects. However, it does not mean that a specific embodiment should include all of the following effects or only the following effects, so it should not be understood that the scope of the disclosed technology is limited thereby.
The artificial intelligence (AI) device based on a trust environment according to the present disclosure may safely accelerate the execution of an artificial neural network in a trust environment.
The AI device based on a trust environment according to the present disclosure may strengthen security from physical attacks by encrypting data in a trust space and reduce the number of memory accesses by directly performing convolution-based neural network computations.
The AI device based on a trust environment according to the present disclosure may utilize processor resources limited in neural network execution operations by performing offloading to cryptographic hardware and shorten an artificial neural network execution time by overlapping with data encryption and decryption through intra-layer pipelining.
Description of the present disclosure is merely an embodiment for structural or functional explanation, so the scope of the present disclosure should not be construed to be limited to the embodiments explained in the embodiment. That is, since the embodiments may be implemented in several forms without departing from the characteristics thereof, it also should be understood that the above-described embodiments are not limited by any of the details of the foregoing description, unless otherwise specified, but rather should be construed broadly within its scope as defined in the appended claims. Therefore, various changes and modifications that fall within the scope of the claims, or equivalents of such scope are therefore intended to be embraced by the appended claims.
Terms described in the present disclosure may be understood as follows.
While terms, such as “first” and “second,” etc., may be used to describe various components, such components have to not be understood as being limited to the above terms. For example, a first component may be named a second component and, similarly, the second component may also be named the first component.
It will be understood that when an element is referred to as being “connected to” another element, it may be directly connected to the other element or intervening elements may also be present. In contrast, when an element is referred to as being “directly connected to” another element, no intervening elements are present. In addition, unless explicitly described to the contrary, the word “comprise” and variations, such as “comprises” or “comprising,” will be understood to imply the inclusion of stated elements but not the exclusion of any other elements. Meanwhile, other expressions describing relationships between components, such as “˜ between”, “immediately ˜ between” or “adjacent to ˜” and “directly adjacent to ˜” may be construed similarly.
Singular forms “a”, “an” and “the” in the present disclosure are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that terms, such as “including” or “having,” etc., are intended to indicate the existence of the features, numbers, operations, actions, components, parts, or combinations thereof disclosed in the specification, and are not intended to preclude the possibility that one or more other features, numbers, operations, actions, components, parts, or combinations thereof may exist or may be added.
Identification letters (e.g., a, b, c, etc.) in respective operations are used for the sake of explanation and do not describe order of respective operations. The respective operations may be changed from a mentioned order unless specifically mentioned in context. Namely, respective operations may be performed in the same order as described, may be substantially simultaneously performed, or may be performed in reverse order.
Unless otherwise defined, all terms used herein, including technical or scientific terms, have the same meanings as those generally understood by those with ordinary knowledge in the field of art to which the present disclosure belongs. Such terms as those defined in a generally used dictionary are to be interpreted to have the meanings equal to the contextual meanings in the relevant field of art, and are not to be interpreted to have ideal or excessively formal meanings unless clearly defined in the present application.
Referring to
The first type memory 110 may transmit encrypted input data and receive encrypted output data. Here, the first type memory 110 may include dynamic random access memory (DRAM), but is not necessarily limited thereto.
The trust AI processing unit 130 may operate in a trust space and may perform AI computation of encrypted input and output data. To this end, the trust AI processing unit 130 may include a cryptographic processing front-end processor 131, a second type memory 133, and a processor 135.
The cryptographic processing front-end processor 131, as cryptographic hardware, may generate decrypted input data through decryption of encrypted input data and perform encryption of non-encrypted output data to generate encrypted output data. The cryptographic processing front-end processor 131 may receive an encryption input activation and an encryption filter as encrypted input data from the first type memory 110 and store a decryption input activation and decryption filter in the second type memory 133. The cryptographic processing front-end processor 131 may receive an on-demand request by the processor 135 in the process of AI computation and accesses the first type memory 110 to import encrypted input data.
The second type memory 133 may provide a buffer for the decrypted input data and the non-encrypted input data. Here, the second type memory 133 may have a faster operating speed and smaller storage capacity than the first type memory 110. For example, when the first type memory 110 includes DRAM, the second type memory 133 may include static random access memory (SRAM). When data is transmitted and received between the first and second type memories 110 and 133, the cryptographic front-end processor 131 may encrypt and decrypt the data to ensure accuracy and security.
The processor 135 may generate the non-encrypted input data by performing a neural network computation based on the decrypted input data. The processor 135 may directly perform a convolution-based neural network computation to reduce the number of accesses to the first type memory 110. That is, the processor 135 may directly minimize a working set size of a neural network computation by using convolution, and due to the reduced working set size, access to the first type memory 110, relatively slower than an operation speed of the second type memory 133, and calls for data encryption and decryption may be reduced, while performing the neural network computation.
The processor 135 may regularly store decryption input activations and store a decryption filter and a non-encryption output activation in the second type memory 133 in a circular queue manner. The processor 135 may efficiently manage the second type memory 133 based on an access pattern of input activations, filters, and output activations of a convolution layer in performing a neural network computation. The processor 135 may regularly store the input activations because the input activations tend to increase the reuse of the data, may store the filters in a circular queue manner because the filters are much smaller than the input activations in size and is required to generate only one output channel although the same filter is used multiple times when creating an output channel, and may store the output activation in the circular queue manner because the output activation is write-once data with high spatial locality.
The processor 135 may transmit/receive data to and from the first and second type memories 110 and 131 through interrupt-driven offloading of the cryptographic processing front-end processor 131. In addition, the processor 135 may transmit and receive data to and from the cryptographic processing front-end processor 131 and the first and second type memories 110 and 133 through direct memory access (DMA)-driven offloading of a DMA controller. The processor 135 may select among interrupt-driven offloading and DMA-driven offloading. Interrupt-driven offloading is suitable for performing latency-critical operations, while DMA-driven offloading is suitable for processing larger amounts of data at once. Here, the two offloading mechanisms may accelerate neural network computation.
The processor 135 may perform the neural network computation to overlap with the encryption and decryption operations performed by the cryptographic processing front-end processor 131 to implement intra-layer pipelining. The processor 135 allows the cryptographic processing front-end processor 131 to perform a decryption operation on the encrypted input data in the middle of performing the neural network computation, so that the neural network computation may be seamlessly performed. For intra-layer pipelining, the processor 135 may seamlessly perform neural network computation by subdividing into a data decryption step, a calculation step, and a data encryption step. The processor 135 may parallelize the data decryption step, the calculation step, and the data encryption step through intra-layer pipelining.
In
The AI device 100 performs a neural network computation based on the decrypted input data to generate non-encrypted input data (step S230). Here, the AI device 100 may directly perform a convolution-based neural network computation through the processor 135, thereby reducing the number of accesses by the cryptographic processing front-end processor 131 to access the first type memory 110 and import the encrypted input data. The AI device 100 may store the non-encrypted input data in the second type memory 133.
The AI device 100 performs encryption on the non-encrypted input data to generate encrypted output data (step S250). The AI device 100 may perform encryption on the non-encrypted output data through the cryptographic processing front-end processor 310. The AI device 100 may output the encrypted output data to the first type memory 110.
The AI device 100 may perform the neural network computation of the processor 135 to overlap performing of the encryption and decryption operations of the cryptographic processing front-end processor 131 to implement intra-layer pipelining by a data decryption step, a calculation step, and a data encryption step, thereby accelerating the neural network computation.
Hereinafter, the AI device based on a trust environment according to the present disclosure will be described in detail with reference to
Referring to
To implement a TrustZone-enabled TEE that may be used in mobile and embedded devices, TrustZone implements a secure processor mode allowing a CPU to operate exclusively on one of the TEE and REE at a given point in time. Transitions and interactions between the TEE and REE are managed by a secure monitor that may be called by secure monitor calls (SMC). In addition, TrustZone divides DRAM into secure and non-secure regions, and does not allow the REE to access the secure region to protect sensitive data in the TEE.
An existing DNN framework that runs in Trustzone-enabled TEEs {circle around (1)} receive input data (e.g., a fingerprint image from a fingerprint sensor) from a peripheral device and start in REE. {circle around (2)} After executing some initial layers of the DNN in the REE, {circle around (3)} the DNN framework encrypts an output activation and transmit the same to the TEE through the SMC. Thereafter, the DNN framework {circle around (4)} decrypts an activation transmitted within the TEE, and {circle around (5)} executes the remaining layers using the decrypted activation and a pre-transmitted filter. When the execution of the remaining layers is completed, the DNN framework {circle around (6)} returns prediction made by the DNN to the REE through the SMC.
In this manner, the execution of the DNN may be isolated and may be protected from several security attacks.
In the case of an existing DNN framework of
As shown in
As for the existing DNN framework, the number of slow DRAM accesses may increase due to the limited capacity of embedded SRAM and a performance bottleneck phenomenon due to high data encryption and decryption overhead imposed on the CPU may affect the DNN execution speed. Specifically, the existing DNN framework utilizes embedded SRAM as a secure on-chip memory in which the TEE may safely load encrypted DRAM data and decrypt the loaded data. However, since the capacity (hundreds of kilobytes) of SRAM is typically smaller than an on-chip CPU cache (hundreds of kilobytes) of mobile and embedded devices, SRAM is used as a secure on-chip buffer. A size of the effective on-chip memory is reduced by one digit. This increases the number of slow DRAM accesses when executing the DNN, which slows down the DNN execution speed.
As shown in
When exchanging data between embedded SRAM and encrypted DRAM, data should be encrypted and decrypted by the CPU to ensure functional correctness and high security. However, CPU-based encryption/decryption is not only slow, but also consumes a large portion of the limited CPU bandwidth, so the limited CPU bandwidth is shared by both the computationally intensive DNN execution and encryption and decryption, which slows down the DNN execution.
As shown in
Accordingly, the present disclosure proposes GuardiaNN, a fast and secure DNN framework for mobile and embedded devices, by solving the slow DNN execution problem of the existing DNN framework. The GuardiaNN proposed in the present disclosure has the following characteristics.
Referring to
Compared to the working model of the existing DNN framework in
The present disclosure may significantly reduce slow DRAM access during DNN execution by reducing the working set size of the convolution layer by using direct convolution and maximizing reuse of SRAM data by using DNN-friendly SRAM management.
The existing mobile and embedded DNN framework (e.g. DarkneTZ, TensorFlow Lite) performs time-consuming convolutional layers using im2col (Image to Column) convolution. Im2col refers to a function that converts multi-dimensional data into a matrix to perform matrix operations. The convolution of multi-dimensional data is equal to the dot product of data converted to a matrix through im2col. Im2col convolution may achieve fast convolutional layer execution by flattening each patch (i.e., a set of input activations whose elements are mapped to elements of a filter) into a two-dimensional matrix and performing matrix multiplication between the flattened patch and the filter, as shown in
In order to minimize the working set size of the convolution layer, direct convolution may be used instead of im2col convolution in the present disclosure. In direct convolution, the filter of the convolution layer is correlated with the input, and results of calculating correlation values in all regions using a sliding window method are output. Direct convolution does not increase the working set size because the necessary input activations are imported on demand, as shown in
As shown in
As shown in
In the present disclosure, data encryption and decryption may be performed using cryptographic hardware, so that limited CPU resources may be fully used for DNN execution. By doing so, it is possible to fully dedicate CPU resources to the DNN execution as well as utilize the high performance of the cryptographic hardware to achieve fast DNN execution. Data encryption and decryption take place whenever SRAM loads and stores data from encrypted DRAM and consumes a significant amount of limited CPU resources. To overcome high overhead of data encryption and decryption, encryption and decryption operations may be performed using cryptographic hardware. By offloading data encryption and decryption with high overhead to the cryptographic hardware, limited CPU resources may be fully dedicated and DNN execution may be accelerated (refer to
The present disclosure may select from two hardware offloading mechanisms of
Cryptographic hardware may accelerate DNN execution by fully dedicating limited CPU resources to DNN execution, allowing for additional performance optimization. A redundant DNN execution and data encryption and decryption are executed in the CPU and cryptographic hardware, respectively. One property of the convolutional layer is that the operations of generating different output channels are independent of each other. The convolutional layer generates one output channel using one filter, and input activations are shared as read-only data among all output channels. This property is also applied to a pooling layer. Each output channel of the pooling layer requires only a corresponding input channel, making the take of the output channel in parallel. Based on this, the execution of the large output channels of the convolution layer may be divided into three pipeline steps: data decryption step, calculation step, and data encryption step. Thereafter, three steps of different bulks of the output channel may be pipelined to achieve faster DNN execution. This is called intra-layer pipelining. This is because various output channel operations of the convolution layer are connected by a pipeline.
In
However, applying intra-layer pipelining to convolutional layers consumes more SRAM than layer execution without using pipeline. Larger capacity is required to ensure functional accuracy of intra-layer pipelining. In order to overlap the calculation step of the large amount of output channels and the data decryption step of the next large amount of output channels, SRAM buffers for two large amounts should be allocated at the same time. If larger SRAM buffers are required, the output channels of the layer are grouped into a larger number of bulks, which may slow down DNN execution. However, despite the need for larger SRAM buffers, the performance benefits of intra-layer pipelining may outweigh the potential performance degradation from fewer output channels per bulk. Therefore, basically, intra-layer pipelining for convolutional and pooling layers is activated. However, for mobile and embedded devices with very small embedded SRAM, intra-layer pipelining may be deactivated to avoid potential performance degradation.
In
Direct convolution may only be implemented only within the GuardiaNN runtime, and should interact with an OS (e.g. OP-TEE) trusted by the GuardianNN runtime to allocate SRAM and use cryptographic hardware and DMA. To implement DNN-friendly SRAM management and intra-layer pipelining, the GuardiaNN runtime should allocate buffers that are not swapped out for DRAM in SRAM. To this end, TEE_Malloc( ) a TEE Internal Core API function for TEE memory allocation, is extended to use an additional input argument called isSRAM. If the value of isSRAM is true, the trusted OS allocates an SRAM buffer and prevents the buffer from being swapped out to DRAM. The memory manager of the trusted OS (e.g. Pager) is extended to prevent swapping allocated SRAM buffers by calling TEE_Malloc( ) with isSRAM set to true for DRAM. The default value of isSRAM is set to false to ensure functional accuracy for existing trusted applications that is not aware of isSRAM. For example, when TEE_Malloc(1024, hint, true) is called, a 1 KB non-removable SRAM buffer is allocated to DRAM. The hints here give some hints about the nature of the buffer (e.g. Fill it with 0).
For DMA-driven data encryption and decryption offloading, the GuardiaNN runtime invokes a custom system call defined by a DMA device driver of the the trusted OS. The GuardiaNN implementation extends Trusted OS to provide two custom system calls, EncryptData( ) and DecryptData( ) The EncryptData( ) system call takes as input the encryption context (including encryption type, key size, etc.), SRAM start address, DRAM start address, and data size. The system call then flushes the CPU cache to remove dirty cache lines to SRAM, reads the unencrypted SRAM data from the SRAM start address, encrypts the data using the encryption context and cryptographic hardware, and stores the encrypted data to DRAM at DRAM start address. In a similar manner, the DecryptData( ) system call takes as input four arguments: cryptographic context, DRAM starting address, SRAM starting address, and data size. Then, following a procedure similar to the EncryptData( ) system call, the encrypted DRAM data is read, the data is decrypted, and the decrypted data is stored in SRAM.
Using two custom system calls along with the extended TEE_Malloc( ) API function and the existing GlobalPlatform API, the present disclosure may be faithfully implemented on mobile and embedded devices.
To evaluate the effectiveness of GuardiaNN on fast and secure DNN execution, GuardiaNN was prototyped on top of the STM32MP157C-DK2 development board and the DNN execution speed and energy consumption were compared with a basic secure DNN framework. The development board is officially supported by OP-TEE, an open-source TrustZone-based TEE implementation, and reflects the typical hardware configuration of modern embedded devices. It includes a dual-core ARM Cortex-A7 CPU, 256 KB secure embedded SRAM, cryptographic hardware and 512 MB DDR3L DRAM. As a trusted OS, OP-TEE v3.11.0 is used together with Pager supporting GlobalPlatform TEE Client API v1.1 and Internal Core API v1.0. Implementation of GuardiaNN and the basic framework use only one CPU core because OP-TEE currently does not support multithreading within a single TEE instance. The implementation of DNN layer of DarkneTZ with ARM NEON single instruction multiple data instructions is extended to take advantage of data parallelism between each DNN layer tasks. The extended DNN layer implementation is applied to both GuardiaNN and the base framework. It is assumed that the trusted OS allocates all SRAM (except for the 4 KB reserved by OP-TEE as shared memory between the TEE and the REE) to both GuardiaNN and the basic framework. For energy consumption comparison, a Monsoon HVPM (High Voltage Power Monitor) is used and the energy consumption of the entire unit is measured.
As benchmarks, eight quantized DNNs were selected using 8-bit integer quantization and five representative mobile and embedded application domains related to sensitive user and DNN data are handled. The five domains are image classification, face recognition, fingerprint recognition, eye tracking and emotion recognition. There are various DNNs for each domain. However, a representative lightweight DNN is selected in each domain with a reasonably short execution latency to execute here. For example, on the STM32MP157C-DK2 development board, ResNet-18, a DNN for image classification, is not included as a benchmark because the basic framework took 192 seconds to execute the DNN. Table 2 below lists eight DNNs and characteristics thereof.
First, the DNN execution speed of GuardiaNN is evaluated by measuring execution latency of all selected DNNs. To analyze the contribution of the proposed techniques, the DNN execution latency is measured by gradually applying each proposed technique, starting from the basic framework. Working on direct convolution and DNN-friendly SRAM management, cryptographic hardware, and intra-layer pipelines provides a total of 5 configurations indicated by the 5 bars in
(a) of
As shown in Table 3, the GuardiaNN proposed in the present disclosure may accelerate the execution of a wide range of DNNs without compromising security guarantees.
The energy consumption of DNN execution is examined in all configurations for each DNN in the benchmark. First, average powers across the device in idle state and during DNN execution are measured, and the two values are subtracted to calculate an average power increase due to DNN execution. The energy consumption of the DNN execution is calculated by multiplying the average power increase by the latency. The normalized results are shown in (b) of
To study the effect of the bulk size of intra-layer pipelining on DNN execution speed, DNN execution latency of GuardiaNN is measured with four bulk sizes (4, 8, 16, and 32 output channels).
The DMA-capable cipher hardware used by GuardiaNN supports several block ciphers and operation modes. GuardiaNN uses AES for enhanced security, and here two operation modes of AES-ECB and AES-CBC are compared. The throughput of AES-ECB and AES-CBC encryption and decryption with various key sizes is measured by executing in a CPU with (a) cryptographic hardware and (b) basic CPU-based cryptographic library of OP-TEE, LibTomCrypt.
As interest in executing DNNs on mobile and embedded devices has increased, various technologies have emerged to accelerate DNN execution on those devices. However, there are still many privacy issues because the data processed within devices, such as federated learning is based on differential privacy. Therefore, it is a reasonable direction to utilize a trusted execution environment for DNNs. For example, SecureTF is a distributed secure machine learning framework based on TensorFlow that utilizes a trusted execution environment. PPFL accelerates secure federated learning by utilizing a trusted execution environment for local training and aggregation, and multi-part ML. Chiron and Myelin enable a trusted execution environment as a service in machine learning.
DarkneTZ, Infenclave, and Slalom all propose execution of parts of DNNs within a trusted execution environment. However, naively applying the approach by exposing the remaining layers to attack incurs a significant amount of performance overhead. To solve this problem, HybridTEE proposes requesting DNN execution to the trusted execution environment of a remote server. However, the acceleration effect of HybridTEE is not significant because it does not optimize the DNN execution in the local trust execution environment.
The present disclosure not only proposes complete protection of DNNs in a local trusted execution environment, but may also provide remarkable speed-up to enable execution in real mobile and embedded environments.
Another direction to secure DNNs is to encrypt DNN data. For example, SecureML utilized secure multiparty computation to build a scalable, privacy-preserving DNN framework. SoftME provides a reliable environment and executes reliable operations including encryption and decryption and computational operations. Executing a DNN with SoftME guarantees confidentiality, but uses the CPU for data encryption and decryption and incurs a large performance overhead. Among them, homomorphic encryption may be promising because it enables operations on encrypted data. CryptoNets demonstrates that such an idea is feasible and extends the idea to security education. MiniONN proposes a technique to transform a pre-trained DNN not to be awared. It also configures a secure federated transfer learning protocol by utilizing homomorphic encryption. However, homomorphic encryption has low computational throughput and is often considered impractical for mobile and embedded devices.
TEE is a popular commercial implementation of ARM TrustZone and Intel SGX, which is attracting attention due to high security guarantee thereof. Although software-based security solutions may be applied, they have been successfully exploited to protect various applications. However, there are many threats targeting TEE systems, such as cached architectures, dual-instanced applications or nested applications. Accordingly, there are several recent proposals for strengthening security. In addition, many works have been proposed to alleviate the difficulties of utilizing TEE. A minimal kernel builds a small kernel to solve the TEE's limited memory problem. CoSMIX allows application-level secure page fault handlers. TEEMon is a performance monitoring framework for TEE.
The AI device based on a trust environment according to the present disclosure isolates execution of an artificial neural network in a trust execution environment and encrypts data stored in DRAM having a slow operating speed to enhance security. In addition, the number of DRAM accesses may be reduced through direct convolution and SRAM management, and data encryption and decryption may be offloaded to cryptographic hardware and pipelining may be implemented to perform neural network computations overlapping with data encryption and decryption operations, thereby accelerating the execution of artificial neural networks.
While the present disclosure has been described with reference to the embodiments, it is to be understood that the present disclosure may be variously modified and changed by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2023-0058605 | May 2023 | KR | national |