This application claims priority of China application No. 202210608385.4, filed on May 31, 2022, which is incorporated by reference in its entirety.
The present application relates to a computing accelerator, and more particularly, to a computing accelerator capable of accelerating the linear computation of homomorphic encryption.
Since artificial intelligence (AI) models, such as neural network models, can analyze huge amounts of data and extract meaningful information from it, they can be useful for many kinds of industries. However, AI models often require large amounts of expensive computing hardware resources that not every company or research institute can afford; therefore, in order to allow more industries to benefit from the data analysis capabilities of AI, some server providers have started to provide remote computing services. In other words, users can upload the data they want to calculate or analyze to the cloud, and the server providers can provide the service of computing data remotely, and then eventually transmit the calculation results back to the users.
However, the data provided by the user may be confidential and therefore such a service may have security issues. Homomorphic encryption has been introduced to improve the security of data during such services. The homomorphic encryption allows the provider of computing services to perform a specific form of algebraic operation on the encrypted ciphertext, and the encrypted data obtained from the algebraic operation, when decrypted, may be the same as the result of the same algebraic operation on the plaintext data. In other words, the computing service provider can directly use the ciphertext to perform a specific form of computation, such as linear computation, without knowing the contents of the plaintext data, thus improving the security of the service. However, the format of the ciphertext generated by homomorphic encryption often has a more complex format, which requires more time or hardware resources for the computing service provider to complete the computation. Therefore, how to improve the computational performance of homomorphic encryption has become an urgent issue in the related field.
One embodiment of the present disclosure discloses a computing accelerator. The computing accelerator is configured to perform computations on a plurality of input polynomials of homomorphic encryption to generate output polynomials, wherein the plurality of input polynomials are ciphertexts generated from a plaintext data after ring learning with error (RLWE) encryption, and the output polynomials correspond to results after performing a linear computation on the plaintext data. The computing accelerator includes a polynomial multiplication unit, a coefficient extraction unit and a ciphertext wrapping unit. The polynomial multiplication unit is configured to multiply a first input polynomial and a second input polynomial in the plurality of input polynomials to generate a first intermediate polynomial, wherein the first input polynomial corresponds to a plurality of first plaintext values in the plaintext data, the second input polynomial corresponds to a plurality of second plaintext values in the plaintext data, and the first intermediate polynomial is a ciphertext encrypted using RLWE. The coefficient extraction unit is configured to convert the first intermediate polynomial into a first target polynomial of a learning with errors (LWE) ciphertext according to a first target coefficient in a plurality of coefficients of the first intermediate polynomial, wherein the first target coefficient corresponds to a result of performing the linear computation on the plurality of first plaintext values and the plurality of second plaintext values. The ciphertext wrapping unit is configured to generate the output polynomial according to at least the first target polynomial, wherein the output polynomial is a ciphertext encrypted usingRLWE.
A further embodiment of the present disclosure provides a data processor. The data processor is configured to convert a plaintext data as a plurality of input polynomials of homomorphic encryption and transmit the plurality of input polynomials to a remote computing accelerator so that the computing accelerator performs a linear computation required by the plaintext data. The data processor includes an encoding unit and an encryption unit. The encoding unit is configured to encode a plurality of values in the plaintext data into a plurality of plaintext input polynomials according to a type of the linear computation. The encryption unit is configured to encrypt the plurality of plaintext input polynomials to generate the plurality of input polynomials according to RLWE.
A further embodiment of the present disclosure provides a method for performing computations on a plurality of input polynomials of homomorphic encryption to achieve a linear computation of a plaintext data. The method includes: using a computing accelerator to receive the plurality of input polynomials, wherein the plurality of input polynomials are ciphertexts generated from a plaintext data after RLWE encryption; using the computing accelerator to multiply a first input polynomial and a second input polynomial of the plurality of input polynomials to generate a first intermediate polynomial, wherein the first input polynomial corresponds to a plurality of first plaintext values in the plaintext data, the second input polynomial corresponds to a plurality of second plaintext values in the plaintext data, and the first intermediate polynomial is a ciphertext encrypted using RLWE; using the computing accelerator to convert the first intermediate polynomial into a first target polynomial according to a target coefficient in a plurality of coefficients of the first intermediate polynomial, wherein the first target coefficient corresponds to a result after performing the linear computation on the plurality of first plaintext values and the plurality of second plaintext values, and the first target polynomial is a ciphertext encrypted using learning with errors (LWE); using the computing accelerator to generate the output polynomial according to at least the first target polynomial, wherein the output polynomial is a ciphertext encrypted using RLWE; and using the computing accelerator to output the output polynomial to a data processor.
The data processor, computing accelerator and calculation method of homomorphic encryption provided in the embodiments of the present application can encode values in plaintext data into polynomials based on the type of linear computation, so that after performing homomorphic encryption on the polynomials, the computing accelerator only needs to multiply the corresponding polynomials to generate intermediate polynomials so as to obtain the ciphertext corresponding to the calculation results of the plaintext data in the coefficients of the specific terms of the intermediate polynomials, thereby reducing the computational complexity required for the computing accelerator and achieving the efficacy of accelerated computation.
The following disclosure provides various different embodiments or examples for implementing different features of the present disclosure. Specific examples of components and arrangements are described below to simplify the present disclosure. These are, of course, merely examples and are not intended to be limiting. For example, the formation of a first feature over or on a second feature in the description that follows may include embodiments in which the first and second features are formed in direct contact and may also include embodiments in which additional features may be formed between the first and second features, such that the first and second features may not be in direct contact. In addition, the present disclosure may repeat reference numerals and/or letters in the various embodiments. This repetition is for the purpose of simplicity and clarity and does not in itself dictate a relationship between the various embodiments and/or configurations discussed.
Notwithstanding that the numerical ranges and parameters setting forth the broad scope of the invention are approximations, the numerical values set forth in the specific examples are reported as precisely as possible. Any numerical value, however, inherently contains certain errors necessarily resulting from the standard deviation found in the respective testing measurements. Also, as used herein, the term “about” generally means within 10%, 5%, 1%, or 0.5% of a given value or range. Alternatively, the term “generally” means within an acceptable standard error of the mean when considered by one of ordinary skill in the art. As could be appreciated, other than in the operating/working examples, or unless otherwise expressly specified, all of the numerical ranges, amounts, values, and percentages (such as those for quantities of materials, duration of times, temperatures, operating conditions, portions of amounts, and the likes) disclosed herein should be understood as modified in all instances by the term “generally.” Accordingly, unless indicated to the contrary, the numerical parameters set forth in the present disclosure and attached claims are approximations that can vary as desired. At the very least, each numerical parameter should at least be construed in light of the number of reported significant digits and by applying ordinary rounding techniques. Here, ranges can be expressed herein as from one endpoint to another endpoint or between two endpoints. All ranges disclosed herein are inclusive of the endpoints, unless specified otherwise.
In the present embodiment, the computing accelerator 120 can be disposed at a server terminal or a service terminal configured to provide computation services, whereas the data processor 110 can be disposed at a data provider terminal or a user terminal that can use the computation services. In other words, the data processor 110 can use the computing accelerator 120 at a remote server terminal to perform linear computation of the plaintext data D1, thereby reducing the hardware requirements for the user terminal where the data processor 110 locates.
In the embodiments of the present disclosure, the homomorphic encryption calculation system 10 can use the ring learning with errors (RLWE) technique to perform homomorphic encryption. Since the RLWE encryption uses polynomials as the format of input data, the data processor 110 can first encode the plaintext data D1 to be computed into polynomials of the plaintext, and then encrypt the plaintext polynomials to generate ciphertext polynomials of RLWE.
As shown in
Since RLWE can only ensure that the result of ciphertext after linear computation will remain homomorphic, but not after other forms of computations, the computing accelerator 120 is generally used to perform linear computations on the ciphertext as required in neural network-like or artificial intelligence models, such as but not limited to vector dot product, matrix multiplication, and convolution. Since the calculations of these linear computations and polynomial multiplications are mainly dot product calculations, if the data processor 110 can encode the values in the plaintext data D1 into polynomials in an appropriate order during the encoding stage, the computing accelerator 120 can multiply the corresponding values by polynomial multiplication, so as to quickly complete the linear computation of the plaintext data D1, and in such case, it can effectively reduce the the computational burden of data processor 110.
For example, the data processor 110 can encode the vectors v [p0, p1, p2, p3, p4] in the plaintext data D1 into a polynomial P (x) according to an encoding order and can encode the vectors u [q0, q1, q2, q3, q4] in the plaintext data D1 into a polynomial Q (x) according to an inverse encoding order, as shown in Equation (1) and Equation (2).
P(x)=p0+p1x+p2x2+p3x3+p4x4 Equation (1)
Q(x)=q4+q3x+q2x2+q1x3+q0x4 Equation (2)
In such case, the result, polynomial R (x), of multiplying the polynomial P (x) with the polynomial Q (x) can be shown as Equation (3).
R(x)=P(x)·Q(x)=(p0q4)+(p1q4+p0q3)x+(p2q4+p1q3+p0q2)x2++(p3q4+p2q3+p1q2+p0q1)x3+(p4q4+p3q3+p2q2+p1q1+p0q0)x4+. . . +p4q0x8 Equation (3)
In Equation (3), the coefficient of the quadratic term of the polynomial R (x) (p4q4+p3q3+p2q2+p1q1+p0q0) is equivalent to the dot product result of the vector v and the vector u. In other words, after the computing accelerator 120 receives the ciphertext polynomials of the polynomial P (x) and the polynomial Q (x) after homomorphic encryption, it only needs to mutiply the two ciphertext polynomials and extract the coefficient of the quadratic term thereof so as to obtain the dot product result of the vector v and the vector u, and it is not necessary to decompose the components of each of the vector v and vector u from the ciphertext polynomials and then perform the corresponding multiplication calculations; therefore, the calculation complexity of the computing accelerator 120 can be effectively reduced.
In the present embodiment, the encoding unit 112 can encode the values in the plaintext data D1 into coefficients of the plaintext input polynomials PP1 to PPX in an appropriate order according to the type of the linear computation to be performed, such as, but not limited to, vector dot product, matrix multiplication, and convolution, so that the computing accelerator 120, after receiveing the ciphertexts of the plaintext input polynomials PP1 to PPX (i.e., the input polynomials IP1 to IPX), can perform polynomial multiplication on the input polynomials IP1 to IPX2, and obtain the desired computational result from the multiplied polynomial coefficients quickly, thereby reducing the computational effort of the computing accelerator 120.
In Step S210, the encoding unit 112 of the data processor 110 of can encode a plutality of values in the plaintext data D1 into plaintext input polynomials PP1 to PPX according to the type of linear computation. For example, the encoding unit 112 can encode the aforementioned vector u and vector v into polynomial P (x) and Q (x) as the plaintext input polynomials PP1 and PP2. However, the present disclosure is not limited thereto, in some other embodiments, the homomorphic encryption calculation system 100 may impose a specific requirement for the amount of terms of the plaintext input polynomials PP1 and PP2, such as, but not limited to, 1024 terms; in such case, the encoding unit 112 may encode the values in the plaintext data correspondingly. Further, the plaintext data D1 may correspond to a plurality of matrices in the matrix multiplication or correspond to an input image and a convolutional kernel of the convolutional computation; in these cases, the encoding unit 112 may encode the values of the plaintext data D1 according to the type of calculation to be performed for generating the plaintext input polynomials PP1 and PP2.
After the encoding unit 112 completes encoding, in Step S220, the encryption unit 114 in the data processor 110 may encrypt plaintext input polynomials PP1 to PPX according to RLWE to generate the input polynomials IP1 to IPX. Next, in Step S230, the computing accelerator 120 can receive the input polynomials IP1 to IPX generated by the data processor 110, and in Step S240 to Step S260, it may perform computations on the input polynomials IP1 to IPX to generate an output polynomial OP1 corresponding to the calculation result.
Further, since the polynomial multiplication is rather complex, in the present embodiment, the polynomial multiplication unit 122 may adopt the number-theoretic transformation (NTT) to simplify the calculation of the polynomial multiplication.
In the present embodiment, the number-theoretic transformation unit 1221 can perform number-theoretic transformation or fast fourier transformation (FFT) on the input polynomials IP1 and IP2; in such case, the multiplication unit 1222 only needs to multiply each coefficient in the transformed polynomial CP1 with a corresponding coefficient in the transformed polynomial CP2 so as to generate the intermediate transformed polynomial MCP. In other words, in the case where the number of terms of each of the input polynomials IP1 and IP2 is N, wherein N is an integer greater than 1, the multiplication of the transformed polynomials CP1 and CP2 requires only (N+1) operations of coefficient multiplication. In contrast, without the transformation of NTT or FFT, the multiplication of the input polynomials IP1 and IP2 would require (N+1) 2 operations of coefficient multiplication. Therefore, by performing NTT or FFT, the complexity of polynomial multiplication can be reduced. After multiplying the transformed polynomials CP1 and CP2 to generate the intermediate transform polynomial MCP, the inverse number-theoretic transformation unit 1223 can perform an inverse number theoretic transformation or inverse fast Fourier transformation on the intermediate transformed polynomial MCP to generate the intermediate polynomial MP1.
In the present embodiment, the input polynomial IP1 can correspond to a plurality of first plaintext values in the plaintext data D1, for example, but not limited to, the component values p0, p1, p2, p3 and p4 of the vector v, and the input polynomial IP2 can correspond to a plurality of second plaintext values in the plaintext data D1, for example, but not limited to, the component values q0, q1, q2, q3 and q4 of the vector u. In such case, the input polynomial IP1 and the input polynomial IP2 are equivalent to the RLWE ciphertexts of the polynomials P (x) and Q (x) of Equation (1) and Equation (2), whereas as shown in Equation (3), the coefficient of each term of the intermediate polynomial MP1 will be related to each dot product result of the first plaintext values p0, p1, p2, p3 and p 4 and second plaintext values q0, q1, q2, q3 and q4.
In the present embodiment, if the linear computation that the homomorphic encryption calculation system 100 intends to perform on the plaintext data D1 is the dot product of the vector v and the vector u, then the coefficient of the quadratic term of the intermediate polynomial MP1 will correspond to the result of dot product calculatation of the first plaintext values p0, p1, p2, p3 and p4 and second plaintext values q0, q1, q2, q3 and q4. Thus, in Step S250, the coefficient extraction unit 124 can take the coefficient of the quadratic term of the intermediate polynomial MP1 as its target coefficient, and convert the intermediate polynomial MP1 of RLWE ciphertext into ciphertext encrypted with learning with errors (LWE), i.e., the target polynomial TP1, according to the target coefficient.
However, the present disclosure is not limited to performing vector dot product computation on the plaintext data; in some embodiments, the purpose of the user terminal may be performing a linear computation other than the vector dot product computation on the plaintext data D1. For example, the plaintext data D1 can include an input image and a convolutional kernel, and the purpose of the user terminal is to obtain a feature image after performing convolutional computation on the input image and the convolutional kernel. In such case, in Step S210, the data processor 110 may, for example, encode a plurality of plaintext values corresponding to at least a portion of the input image in the plaintext data D1 into the plaintext input polynomial PP1, and encode a plurality of second plaintext values corresponding to the convolutional kernel in the plaintext data D1 into the plaintext input polynomial PP2. In this way, after the data processor 110 performs the appropriate encoding, the computing accelerator 120 may generate the intermediate polynomial MP1 by multiplying the input polynomials IP1 and IP2 and obtain the target coefficient corresponding to at least a portion of the feature image from coefficients of a plurality terms of the intermediate polynomial MP1.
In addition, in some other embodiments, the plaintext data D1 may include a first matrix and a second matrix, and the purpose of the user terminal is to obtain a third matrix by multiplying the two matrices. In such case, in Step S210, the data processor 110 may, for example, encode a plurality of elements of the first column in the first matrix into the plaintext input polynomial PP1 and encode a plurality of elements of the first row in the second matrix into the plaintext input polynomial PP2. In such case, the target coefficient of the intermediate polynomial MP1 may, for example, correspond to the matrix element at the first column and the first row in the third matrix.
In some embodiments, to ensure that in Step S250, the computing accelerator 120 can extract the corresponding target coefficient, the data processor 110 may determine the encoding means for the plaintext data D1 according to the type of the linear computation to be performed, and may transmit a message to the computing accelerator 120 to inform the computing accelerator 120 the encoding means that should be adopted or the number of terms that corresponds to the target coefficient.
Furthermore, based on the contents of the plaintext data D1 and the needs of the linear computation to be performed, the data processor 110 may generate more than two input polynomials IP1 to IPX in Step S210 to Step S220, so that in Step S240, the computing accelerator 120 may also perform multiple rounds of polynomial multiplications on the input polynomials IP1 to IPX, and generate a plurality of intermediate polynomials MP1 to MPY correspondingly. Taking the aforementioned convolution computation and matrix multiplication as an example, the intermediate polynomials MP1 to MPY may each correspond, for example, to parts of the output feature image or to one of the elements in the third matrix, respectively, so that after the coefficient extraction unit 124 converts the intermediate polynomials MP1 to MPY into the target polynomials TP1 to TPY according to the target coefficients of the intermediate polynomials MP1 to MPY, the ciphertext wrapping unit 126 of the computing accelerator 120 may further, in Step S260, wrap the target polynomials TP1 to TPY into an output polynomials OP1 of the RLWE ciphertext. Consequently, the data processor 110 can obtain a more complete calculation result according to the output polynomial OP1, such as the full feature image or all elements in the third matrix.
It should be noted that in the method 200, the computing accelerator 120 may perform two rounds of conversions in ciphertext formats on the polynomials it computes; the first round is in Step S250, where the computing accelerator 120 performs the procedure of converting a RLWE ciphertext into a LWE ciphertext, and the second round is in Step S260, where the computing accelerator 120 performs the procedure of converting a plurality of LWE ciphertexts into a RLWE ciphertext. In the present embodiment, these two conversions in the ciphertext format may be done according to the principles of RLWE and LWE as well as each known conversion method.
For example, Step S260 may be performed in two parts, wherein in the first part, a plurality of LWE ciphertexts are combined, and in second part, the combined ciphertext is converted.
The ciphertext combining circuit 1261 can combine the target polynomials TP1 to TPY into a combined polynomial CMP1. As shown in
After the computing accelerator 120 generates the output polynomial OP1 according to the target polynomials TP1 to TPY, in Step S270, the computing accelerator 120 can output the output polynomial OP1 to the data processor 110.
As shown in
In summary, since the data processors, computing accelerators and calculation methods for homomorphic encryption provided in the embodiments of the present application can encode values in the plaintext data into polynomials based on the type of the linear computation to be performed, so after performing homomorphic encryption on the polynomials, the computing accelerator only needs to multiply the corresponding polynomials to generate the intermediate polynomials for obtaining the ciphertext corresponding to the calculation results of the plaintext data from coefficients of specific terms of the intermediate polynomials. As a result, the computational complexity required for the computing accelerator can be reduced, thereby achieving the efficacy of accelerated computation. Further, because the computing accelerator can convert the intermediate polynomial into a target polynomial according to the coefficients of the specific terms of a plurality of intermediate polynomials, and then wrap a plurality of target polynomials into one output polynomial, the transmission between the data processor and the computing accelerator can be more efficient.
The foregoing description briefly sets forth the features of some embodiments of the present application so that persons having ordinary skill in the art more fully understand the various aspects of the disclosure of the present application. It may be apparent to those having ordinary skill in the art that they can easily use the disclosure of the present application as a basis for designing or modifying other processes and structures to achieve the same purposes and/or benefits as the embodiments herein. It should be understood by those having ordinary skill in the art that these equivalent implementations still fall within the spirit and scope of the disclosure of the present application and that they may be subject to various variations, substitutions, and alterations without departing from the spirit and scope of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
202210608385.4 | May 2022 | CN | national |