Secure search of secret data in a semi-trusted environment using homomorphic encryption

Information

  • Patent Grant
  • 11764940
  • Patent Number
    11,764,940
  • Date Filed
    Friday, January 10, 2020
    4 years ago
  • Date Issued
    Tuesday, September 19, 2023
    a year ago
Abstract
A system and method for secure searching in a semi-trusted environment by comparing first and second data (query and target data). A first data provider may map first secret data to a first plurality of tokens using a token codebook, concatenate the first plurality of tokens to generate a first token signature, and homomorphically encrypt the first token signature. A second data provider may map second data to a second plurality of tokens using the token codebook, concatenate the second plurality of tokens to generate a second token signature, and compare the homomorphically encrypted first token signature and an unencrypted or homomorphically encrypted second token signature to generate a homomorphically encrypted comparison. A trusted party may decrypt the homomorphically encrypted comparison, using a secret homomorphic decryption key, to determine if the token signatures match or not respectively indicating the search query is found or not in the target data.
Description
FIELD OF THE INVENTION

Embodiments of the invention are directed to data privacy, security, and encryption of secret data. Embodiments of the invention include systems and methods to encrypt secret data to safely share them with an external or third party, which can then execute queries, searches, or other computations, only on the encrypted secure data, without decrypting and exposing the underlying secret data. In particular embodiments of the invention are directed to fast and efficient searching of homomorphically encrypted (“HE”) secret data.


BACKGROUND OF THE INVENTION

Today, massive amounts of data live in many organizations, with barriers between them, erected by mistrust, economic incentives and regulatory hurdles. When secret data, such as, personal, medical, or financial data, is involved, privacy becomes a major concern for all parties involved, as that information can be used to identify or exploit the individuals.


To encourage collaboration, while still protecting data secrecy, cryptosystems have been developed that allow parties to operate on encrypted data (i.e., ciphertexts) in an encrypted domain:


Fully Homomorphic Encryption (FHE) cryptosystems allow a third party to evaluate any computation on encrypted data without learning anything about it, such that only the legitimate recipient of the homomorphic calculation will be able to decrypt it using the recipient's secret key. Although FHE can theoretically work on any data, practically, FHE is too computationally burdensome and unrealistic to use in most real-world settings, especially when large amounts of data and complex computations are involved.


Functional Encryption (FE) cryptosystems allow authorized third parties who cannot decrypt, to evaluate selective authorized computations on encrypted data, without decrypting first. Such authorized third parties receive a different secret key for each computation, which enables the calculation of the computation on the data without decryption. In secret-key functional encryption schemes, both decryption and encryption require knowing a secret-key. In public-key functional encryption, decryption requires knowing a secret key, whereas encryption can be performed without knowing a secret-key and does not compromise security.


Proxy re-encryption (PRE) cryptosystems transform data encrypted in one key to data encrypted in another key. PRE may be used in settings involving two or more parties each holding a secret key to a different encryption scheme, and for classical encryption schemes.


However, these cryptosystems are often inefficient, adding extra layers of computations. Further, because the data being operated on is encrypted, it is difficult to find and target specific data. Current operations to search for specific data are often performed across an entire encrypted data set, which becomes prohibitively inefficient, especially when the datasets are large.


Accordingly, there is a need in the art for a fast and efficient technique to search for and target specific data within a ciphertext in the encrypted domain. There is also a need to be able to perform fast and efficient secret searches, such as financial fraud or other types of criminal investigations, in the encrypted domain, on cleartext or ciphertext, which does not compromise the secret search.


SUMMARY OF THE INVENTION

To overcome the aforementioned limitations inherent in the art, embodiments of the invention may provide a fast and efficient targeted search in the encrypted domain, where at least one, or both, of the search query and the targeted data are homomorphically encrypted.


In an embodiment of the invention, a system and method is provided for securely searching data in a semi-trusted environment by comparing first and second data. The first data element may be the search query and the second data element the target data to be searched, or the second data element may be the search query and the first data element the target data to be searched. A first data provider comprising one or more first processors may be configured to map a first data element comprising secret data to a first plurality of tokens using a codebook of tokens to represent data elements, concatenate the first plurality of tokens to generate a first token signature comprising the first plurality of tokens that uniquely represents the first data element, and homomorphically encrypt the first token signature using a public homomorphic encryption key to generate a homomorphically encrypted first token signature representing the first data element. A second data provider comprising one or more second processors may be configured to map a second data element to a second plurality of tokens using the token codebook, wherein one of the first and second data elements is a search query and the other is target data being searched, concatenate the second plurality of tokens to generate a second token signature comprising the second plurality of tokens that uniquely represent the second data element, and compare the homomorphically encrypted first token signature representing the first data element and an unencrypted or homomorphically encrypted second token signature to generate a homomorphically encrypted comparison of the first and second token signatures. The comparison of the first and second token signatures may be a binary indication of whether or not the search query is found in the target data, or a matching score indicating a frequency or certainty with which the search query is found in the target data. A trusted party comprising one or more processors may be configured to decrypt the homomorphically encrypted comparison, using a private homomorphic decryption key, to determine if the first and second token signatures match or not respectively indicating that the search query is found or not in the target data. The first data provider may operate in a trusted environment, the second data provider may operate in a trusted or semi-trusted environment, and the trusted party may operate in a trusted environment. The trusted party may be the first data provider, the second data provider, or a distinct third party system.


In an embodiment of the invention, the first and second data elements may be mapped to tokens by dividing the data element into one or more atomic data units, searching the codebook for a plurality of tokens matching each instance of each atomic data unit, and generating an ordered set of the plurality of tokens for the plurality of atomic units. The codebook of tokens may be dynamically updated by adding new tokens to the codebook and deleting preexisting tokens from the codebook, wherein the updated codebook is simultaneously available to both the first and second data providers.


In an embodiment of the invention, a first data provider is provided for securely searching data in a semi-trusted environment. The first data provider may comprise one or more memories configured to store a first data element comprising secret data, a codebook of tokens to represent data elements, and a public homomorphic encryption key. The first data provider may comprise one or more processors configured to map the first data element comprising secret data to a first plurality of tokens using the token codebook, concatenate the first plurality of tokens to generate a first token signature comprising the first plurality of tokens that uniquely represents the first data element, homomorphically encrypt the first token signature using the public homomorphic encryption key to generate a homomorphically encrypted first token signature representing the first data element, transmit the homomorphically encrypted first token signature to a second data provider to compare the homomorphically encrypted first token signature representing the first data element and an unencrypted or homomorphically encrypted second token signature representing a second data element and generate a homomorphically encrypted comparison of the first and second token signatures, wherein one of the first and second data elements is a search query and the other is target data being searched, and receive the result of decrypting the homomorphically encrypted comparison at a trusted device, using a private homomorphic decryption key, to determine if the first and second token signatures match or not respectively indicating that the search query is found or not in the target data.


In an embodiment of the invention, a second data provider is provided for securely searching data in a semi-trusted environment. The second data provider may comprise one or more memories configured to store a second data element, and a codebook of tokens to represent data elements. The one or more memories may be further configured to store the public homomorphic encryption key when the second data provider generates the homomorphically encrypted second token signature. The second data provider may comprise one or more processors configured to map the second data element to a second plurality of tokens using the token codebook, concatenate the second plurality of tokens to generate a second token signature comprising the second plurality of tokens that uniquely represent the second data element, receive, from a first data provider, a homomorphically encrypted first token signature that is a homomorphically encryption of a concatenation of a first plurality of tokens uniquely representing a first data element comprising secret data according to the codebook of tokens, wherein one of the first and second data elements is a search query and the other is target data being searched, compare the homomorphically encrypted first token signature representing the first data element and an unencrypted or homomorphically encrypted second token signature to generate a homomorphically encrypted comparison of the first and second token signatures, and transmit the homomorphically encrypted comparison to a trusted device to decrypt the homomorphically encrypted comparison, using a private homomorphic decryption key, to determine if the first and second token signatures match or not respectively indicating that the search query is found or not in the target data.





BRIEF DESCRIPTION OF THE DRAWINGS

The subject matter regarded as the invention is particularly pointed out and distinctly claimed in the concluding portion of the specification. The invention, however, both as to organization and method of operation, together with objects, features, and advantages thereof, may best be understood by reference to the following detailed description when read with the accompanying drawings in which:



FIG. 1 is a schematic illustration of a multi-party system and workflow for performing a fast and secure targeted search in a semi-trusted environment using homomorphic encryption, according to an embodiment of the invention;



FIG. 2 is a schematic illustration of a multi-party system comprising a trusted first homomorphic encryption (HE) data provider (e.g., of a HE search query), a semi-trusted or untrusted second data provider (e.g., of HE or unencrypted target data to be searched), and a trusted third party (e.g., to homomorphically decrypt the search results), according to an embodiment of the invention;



FIG. 3 is a schematic illustration of a multi-party system for securely searching data in a semi-trusted environment, according to an embodiment of the invention; and



FIG. 4 is a flowchart of a method for securely searching data in a semi-trusted environment, according to an embodiment of the invention.





It will be appreciated that for simplicity and clarity of illustration, elements shown in the figures have not necessarily been drawn to scale. For example, the dimensions of some of the elements may be exaggerated relative to other elements for clarity. Further, where considered appropriate, reference numerals may be repeated among the figures to indicate corresponding or analogous elements.


DETAILED DESCRIPTION OF THE INVENTION

Embodiments of the invention enable fast and efficient targeted searches, in the homomorphic encryption encrypted domain, where the search query (e.g., search keywords) and/or target data being searched (e.g., database, document, image, video, or any other type of file or data repository, or metadata thereof) are homomorphically encrypted.


By utilizing homomorphic encryption, ciphertext can be compared to ciphertext, or ciphertext can be compared to plaintext, such that, a data provider or query provider may be able to compare (1) a homomorphically encrypted search query and homomorphically encrypted target data being searched; (2) a homomorphically encrypted search query and unencrypted target data being searched; or (3) an unencrypted search query and a homomorphically encrypted target data being searched. As long as at least one of the search query or target data are homomorphically encrypted, the comparison therebetween in all three scenarios are performed under homomorphic encryption, without exposing the underlying search query or target data being searched, i.e., in the homomorphic encryption domain. This is because homomorphic encryption provides an injective or one-to-one (1:1) mapping between operations on plaintext and operations on ciphertext. Accordingly, a search that is a comparison between the query and target data, performed between a homomorphically encrypted ciphertext and a plaintext (or between two ciphertexts), generates a homomorphically encrypted comparison. It is therefore secure to search by comparing two terms where at least one is homomorphically encrypted in a semi-trusted domain because neither the underlying homomorphically encrypted term(s) or homomorphically encrypted comparison can be decrypted or exposed without the associated secret homomorphic decryption key. The secret homomorphic decryption key is only stored or accessible to a trusted party.


When used herein, the term trusted may refer for example to an entity or system which is sufficiently trusted to correctly perform computations or operations such as a search (e.g., a comparison between a search query and target data to be searched), and also trusted to keep private data secret. When used herein, the term semi-trusted may refer for example to an entity or system which is sufficiently trusted to correctly perform computations or operations such as a search (e.g., a comparison between a search query and target data to be searched), but not trusted to access or keep private data secret. When used herein, the term untrusted may refer for example to an entity or system which is not sufficiently trusted to correctly perform computations or operations such as a search (e.g., a comparison between a search query and target data to be searched), and not trusted to access or keep private data secret.


The homomorphically encrypted search result or comparison may then be transmitted to a secure environment, e.g., the query provider, the data provider, or a third party, which securely stores the private homomorphic decryption key, to decrypt and analyze the results of the comparison, e.g., to determine if the search comparison renders a match or not.


Scenarios (1)-(3) above may be applied depending on the application, for example, depending on the secrecy or security level of the data (e.g., queries and/or target data with sensitive or proprietary information are generally encrypted, while data that is less or not sensitive such as public data is generally not encrypted), whether the devices, parties or environments storing the data are trusted, semi-trusted, or untrusted (e.g., data is generally encrypted in untrusted and semi-trusted environments and unencrypted in trusted environments), whether the communication channel is trusted to be secret, and/or whether the computation is likely to be compromised in the future.


Examples of the search comparison in the aforementioned three scenarios (1)-(3) are:


In scenario (1), in which both the query and the target data being searched are homomorphically encrypted, e.g., as shown in FIG. 1, the homomorphically encrypted comparison may be a difference of a value of a query, value1, plus noise generated from homomorphically encrypting the query, noise1, and of a value of the target data being searched, value2, plus noise generated from homomorphically encrypting the target data, noise2, for example as:

HEC=(value1+noise1)−(value2+noise2)  EQN. 1A

Because homomorphically encryption provides a 1:1 mapping between operations on plaintext and operations on ciphertext, the homomorphically encrypted comparison may be equivalently re-written as:

HEC=(value1−value2)+(noise1−noise2)  EQN. 1B


In scenario (2), in which the query is homomorphically encrypted, but the target data being searched is not homomorphically encrypted, the homomorphically encrypted comparison may be a difference of a value of a query, value1, plus noise generated from homomorphically encrypting the query, noise1, and of a value of the target data being searched, value2, for example as.

HEC=(value1+noise1)−(value2)  EQN. 2A

The homomorphically encrypted comparison may be equivalently re-written as:

HEC=(value1−value2)+(noise1)  EQN. 2B


In scenario (3), in which the query is unencrypted but the target data being searched are homomorphically encrypted, the homomorphically encrypted comparison may be a difference of a value of a query, value1, and of a value of the target data being searched, value2, plus noise generated from homomorphically encrypting the target data being searched, noise2.

HEC=(value1)−(value2+noise2)  EQN. 3A

The homomorphically encrypted comparison may be equivalently re-written as:

HEC=(value1−value2)+(noise2)  EQN. 3B


The homomorphically encrypted comparison in each of the above equations 1B, 2B, and 3B, is a search result that is a comparison or difference between the query and target data (value1−value2) where 0 indicates a match (value1−value2=0, or equivalently, value1=value2) and a nonzero difference indicates no match (value1−value2≠0, or equivalently, value1*value2), as well as homomorphic encryption (noise1 and/or noise2). Because each comparison is homomorphically encrypted, the comparison appears as a ciphertext, or random string, that is impossible or impractical for an unverified observer to decipher or learn anything from the homomorphically encrypted comparison without the private homomorphic decryption key.


The homomorphically encrypted comparison may be defined by a binary indicator of whether or not the query matches the target data for any of the keywords/concatenations of keywords or not, as above, or may be defined by a more sophisticated “matching score” based on the frequency or certainty of the matching words/concatenations of key words in the document in the target data being searched. A matching score may enable the system to order the relevance of the target data being searched based on the matching score, e.g., sequentially listed in descending order form highest to lowest matching score.


Decrypting the homomorphically encrypted comparison with the private homomorphic decryption key removes the noise generated from the homomorphically encrypted comparison (e.g., noise1 from homomorphically encrypting the query and/or noise2 from homomorphically encrypting the target data being searched) to get an unencrypted search result, which may be a difference of a value of the search query and a value of the target data being searched:

UR=value1−value2  EQN. 4

where UR is the unencrypted result of the homomorphically encrypted comparison of each of EQNs. 1-3 in scenarios (1)-(3), respectively.


The homomorphically encrypted comparison may also be defined by other equations, for example, derived from, depending on, or permuting the terms in equations 1-3. For example, instead of the difference (value1−value2) where 0 indicates a match and a nonzero difference indicates no match, a ratio (value1/value2) may be used where a 1 indicates a match and any other ratio indicates no match, or a sum (value1+value2) where 2×query (when the query is known) indicates a match and any other sum indicates no match.


Scenarios (2) and (3), where one of the data elements (e.g., the search query or the target data being searched) is unencrypted, may be executed faster and with less computations than scenario (1) where both data elements are homomorphically encrypted, while providing the same benefit of secrecy by encrypting the search result.


In all scenarios (1)-(3), further optimization may be implemented using token-based searching to speed up computations in the homomorphic encryption domain which is typically slow and cumbersome in real-world settings, especially when large amounts of data are involved. The search environment may be optimized by using a codebook to create the token-based system that allows fast and efficient searching in the homomorphic encryption domain. Embodiments of the invention may tokenize the search query and each of a plurality of data entries or units of the target data being searched to create a token-signature for each data element. In some embodiments, the token-signature may be a concatenated string of a plurality of tokens representing a plurality of respective data, properties, parameters, or characteristics of each data element. Each concatenated token-signature is a concise manner of combining all or multiple aspects of each data element to avoid redundant searches for duplicative or similar terms. This may simplify and reduce the number of independent searches that need to be performed in the homomorphic encryption domain. For example, instead of running multiple independent searches of an address for boulevard and similar terms road and street, embodiments of the invention concatenate tokens for boulevard, road, and street into a single token-signature which requires a single HE comparison.


In order to achieve the goal of fast and efficient targeted searches, in the encrypted domain, the data elements may be initially processed to generate normalized data entries or units of the target data being searched by tokenizing a data element, such as a search query or target data being searched, by splitting it into a sequence of tokens. For example, information in the data element, such as words, numbers, or pixel values, may be mapped to tokens based on a codebook of tokens. The target data being searched may be a file to be searched itself, such as documents, images, or videos, or may refer to data located within such files, such as a field, column, or row within a document, or may be metadata of any of these data.


The data elements may be transformed into a set of tokens, for example as follows. Initially, the original data elements may be used or transformed into corresponding meta-data elements representing information extracted from the data element using rule-based or machine learning classification. Data processing standardizes the data elements, e.g., removing all common separators, operators, punctuations and non-printable characters and stemming and/or lemmatization to obtain the stem of a word that is a morphological root by removing the suffixes that present grammatical or lexical information about the word. The data elements may then be divided into one or more discrete atomic data units, such as, a number, a phoneme, discrete data blocks (e.g., a row of a table, or a pixel block of an image), etc. For example, an address, 11 Allen Street, may be broken down to the atomic data units of “eleven,” “allen,” and “street.” The codebook of tokens may be searched for a plurality of tokens matching each instance of each atomic data unit, such as a token representing something with the same meaning or representing a synonym. An instance may be a token mapped to the same or similar meaning as the atomic data unit. For example, instances of tokens in the token book having the same meaning as the number “11” in “11 Allen Street”, may be mapped to the numbers “11”, the word “eleven”, and the meaning “address number.” An ordered set of the plurality of tokens for the plurality of atomic units may be generated based on the predefined token numbering e.g., in the codebook. In some embodiments, one or more processors of all data providers utilizing the system may be configured to order the plurality of tokens in a set way, such that an ordered list generated by either data provider for the same atomic unit have the same order. Alternatively, the lists may not be ordered in a specific way and, instead, all permutations of the ordering of the tokens in the list may be searched.


Using the codebook of tokens to map multiple different representations of the same or substantially similar information to a single token may reduce the number of comparisons needed to search. For example, “street”, “st.”, and “str.”, which all refer to the word “street” may be mapped to the same token, reducing the number of independent HE comparisons by a factor of three. Because homomorphic encryption requires multiple computations per comparison, reducing the number of HE comparisons provides a speedup that is superliner to the factor by which the number of comparisons are reduced (e.g., more than a three-fold speed-up in the above comparison).


The use of the codebook of tokens allows embodiments of the invention to incorporate new tokens as they are created, creating a dynamic codebook that evolves as new data elements are added or preexisting data elements are deleted. For example, when a new data element is incorporated to the system and the data element contains a new word that is not already mapped to a token in the codebook, a new token and/or a new mapping to preexisting tokens may be created to map that new word to multiple relevant tokens in the codebook. Similarly, when words or data elements are deleted, one or more preexisting related tokens and/or their associated mappings may be deleted. The updated codebook should be simultaneously available to both the first and second data providers, so that both generate the same token string for the same data elements. When codebooks are locally stored at the data providers, a new updated codebook or only the changes with respect to the last version may be transmitted and/or locally stored by the first and second data providers. When codebooks are remotely stored, both data providers may access the same copy or two copies may be simultaneously updated. Accordingly, each data provider may have access to the same or an identical version of the codebook so that, for example, the same data elements are mapped to the same token string. In other words, this ensures that the same data element is not mapped to a different token string by using different data providers' token codebooks.


The codebook of tokens may also or alternatively be created and updated using publicly available information outside of the database being searched, such as one of the standard dictionaries for any language (e.g., Webster dictionary for English), a list of standard names, a list of phone numbers from a phonebook, and/or a list of street names, counties, states, and/or countries, and updated as new words, names, or numeric identifiers are found in data entries. Embodiments of the invention may utilize dictionaries, lists of abbreviations, and other information to determine words that have the same meaning and that should be mapped to the same token. In some embodiments of the invention, machine learning is utilized to create and update the codebook of tokens.


Embodiments of the invention may further speed up processing by reducing the number of comparisons in the homomorphic encryption domain by generating token signatures that are a concatenated string of a plurality of tokens that uniquely represent all tokens in the codebook associated with each data element, such as a query or target data being searched. Comparison of a single concatenated string of tokens thus replaces multiple distinct individual comparisons of each individual token in the string. For example, an address, 11 Allen Street, may be mapped to a concatenated string of tokens representing “address,” “11,” “eleven,” “number,” “street,” “userID,” “username,” etc., which are all the tokens in the codebook associated with a particular user's address. Concatenating thus reduces the number of independent searches in this example by a factor of seven. As discussed, because homomorphic encryption requires multiple computations per comparison, reducing the number of HE comparisons provides a speedup that is superliner to the factor by which the number of comparisons are reduced (e.g., more than a seven-fold speed-up in the above comparison).


Concatenation may be performed based on known patterns of information, such as words, names, and numbers, that may be grouped together, such as how an address may group together a house number, street name, city name, state name, and zip code. Concatenation may reduce the amount of searches and comparisons in the homomorphic encryption domain by utilizing searching token signature rather than searching separately for each term or token. For example, a conventional query meant to search for a particular user by its name and phone number may be mapped to a single token signature for the user including a string of concatenated tokens representing all relevant token(s) in the codebook including name, phone number, address, and all other identifying information for the user. A single search may be performed for the queries' token signature rather than performing multiple conventional searches separately for the user's name and phone number.


Since the token strings are indecipherable in the encrypted domain, the tokens may be concatenate in the same order by both the query provider and the target data provider to ensure like objects are being compared. In one example, each token may have a rank, order or unique identifier (e.g., the order in which it is listed in the codebook) and may be concatenated in that order (e.g., in ascending or descending order). Additionally or alternatively, tokens may be concatenated in a logical or rules-based order. For example, tokens for an address may be concatenated to have the tokens for the house number, street name, city name, state name, and zip code in a specific order. Alternatively, the tokens for the query and target data may be concatenated in random or different orders, and the system may be configured to compare each encrypted permutation of tokens in the signatures for the query and target data to determine if there is a match.


Some embodiments of the invention may store each token signature generated from a data element in a metadata file associated with the data element. The metadata file may additionally include a frequency of each mapping to the token signature in the data element.


One application of embodiments of the invention is for a first data provider to search the data of a second semi-trusted data provider, while keeping the search query and results secret from the second data provider. In one example, the first data provider may be a law enforcement agency conducting a financial fraud investigation, where the investigation is not public, and so cannot reveal its query, e.g., the person targeted by the investigation to the second party, e.g., a bank holding the person's financial records. The second party is considered semi-trusted because it can perform the search, but cannot access the query or results without compromising the investigation. In such a situation, the second semi-trusted data provider may perform the search on a homomorphically encrypted query, and may return a homomorphically encrypted search comparison, without decrypting or exposing the query or search results.


In some embodiments, the first data provider (e.g., a query provider), may be configured to map a first data element comprising secret data (e.g., a secret query), to a first plurality of tokens using a codebook of tokens to represent data elements. The first data provider may be configured to concatenate the first plurality of tokens to generate a first token signature comprising the first plurality of tokens that uniquely represents the first data element. The first data provider may homomorphically encrypt the first token signature using a pubic homomorphic encryption key to generate a homomorphically encrypted first token signature representing a first data element, and may transmit the first token signature to a second data provider (e.g., a semi-trusted database or cloud/file management system storing target data being searched).


Similarly, the second data provider may be configured to map a second data element (e.g., the target data being searched) to a second plurality of tokens using the same token codebook, and concatenate the second plurality of tokens to generate a second token signature comprising the second plurality of tokens that uniquely represents the second data element.


The second data provider, or another device or party, may be configured to compare the homomorphically encrypted first token signature representing the first data element and an unencrypted or homomorphically encrypted second token signature to generate a homomorphically encrypted comparison of the first and second token signatures. At least the first data element is homomorphically encrypted, so its comparison with the second data element (in unencrypted plaintext or also homomorphically encrypted) is likewise homomorphically encrypted, and so cannot be decrypted by the second data provider, which is only semi-trusted and so does not have access to the secret homomorphic decryption key. The second data provider may be configured to transmit the homomorphically encrypted comparison to a trusted third party, which may be the first data provider, or to another external party or device, to decrypt the homomorphically encrypted comparison.


In some embodiments of the invention, the trusted third party receives the homomorphically encrypted comparison, and the third party decrypts the homomorphically encrypted comparison, using the private homomorphic decryption key. The decrypted comparison may indicate if the first and second token signatures match or not respectively indicating that the search query is found or not found in the target data. Additionally or alternatively, the decrypted comparison may include a matching score indicating a frequency or certainty with which the search query is found in the target data.


In some embodiments, the second data provider may also homomorphically encrypt the second token signature using a public homomorphic encryption key, e.g., when the second data element, such as the target data being searched, comprises secret data, or could be used to derive secret data, such as a hospital record or a bank record. In some embodiments of the invention, the public homomorphic encryption key used by the first and second data providers are the same. In some embodiments of the invention the public homomorphic encryption key used by the first and second data providers are different. In some embodiments, when two different encryption keys are used, the homomorphically encrypted comparison may be decrypted by two different corresponding decryption keys, e.g., both stored at one device, or each stored at a different decryption device both of which together decrypt the data. In some embodiments, the second public homomorphic encryption key may be a proxy re-encryption key, which may switch encryptions from a first encryption key to a second encryption key. In this case a single decryption key associated with the second key may decrypt the PRE HE comparison.


In some embodiments of the invention, the second data provider may not homomorphically encrypt the second token signature, e.g., when the second data element contains only publicly accessible information, such as data entries stored on a publicly accessible database, or the second data provider is trusted.


The private key for decrypting the homomorphically encrypted comparison may only be stored on or accessible to a trusted device or system, such as the first data provider, such as the query provider, or a trusted third party (and not by a semi-trusted device, such as the second data provider). Accordingly, only the trusted device can decrypt the homomorphically encrypted comparison and accessing the results to determine if the search query is found or not in the target data.


In some embodiments of the invention, the privacy of the search query may not need to be protected and, instead, only the privacy of the target data being searched needs to be protected. In such embodiments, the first data provider may be the target data provider, such as a data warehouse or cloud/file management system, providing the first data element as the target data being searched, and the second data provider may be the query provider providing the second data element as the search query.


In some embodiments of the invention, the first data provider may be both the query provider and owns the target data being search, which is stored at the second data provider, such a semi-trusted database, semi-trusted cloud service or a semi-trusted file system. In such embodiments, the first data provider may search for its own data that is stored remotely. The first data provider may request that the second data provider performs encrypted searches of the first data provider's target data, without accessing the first data provider's target data.


For example, a first data provider, such as a hospital or bank, may store a plurality of second data elements, such as such as medical records or bank records, in one or more memories of the second data provider. The first data provider may need the second data provider to perform a search for certain second data elements without the second data provider being able to access those second data elements or being able to derive that information. Therefore, in such an application, the second data elements is encrypted at the second data provider, and the decryption key may only be present on a trusted device or system, such as the first data provider. This protocol ensures that the data stored at the second data provider, such as semi-trusted database is encrypted and cannot be unlocked since the secret decryption key is not shared. This protocol also ensures that it is possible to search target data without disclosing the target data. In such an application, the first data element and the first token signature itself may not be encrypted, for example, when the search query itself does not contain sensitive information that needs to be protected. Alternatively, the first data element and token signature may be homomorphically encrypted, for example, when the search query does not contain sensitive information that needs to be protected.


In such embodiments, the first data provider may encrypt the second data element using any standard encryption, and, transmit the encrypted second data element to the second data provider for storage. Instead of the second data provider, the first data provider may map the second data element to a second plurality of tokens using the token codebook, concatenate the second plurality of tokens to generate the second token signature, homomorphically encrypt the second token signature using the public homomorphic encryption key to generate the homomorphically encrypted second token signature, and transmit the homomorphically encrypted second token signature to the second data provider to store with a correlation to the encrypted second data element.


Reference is made to FIG. 1 which is a schematic illustration of a multi-party system and workflow for providing a fast and secure targeted search in a semi-trusted environment using homomorphic encryption, according to an embodiment of the invention. The multi-party system of FIG. 1 comprises a trusted first homomorphic encryption (HE) data provider 140, a semi-trusted or untrusted second data provider 150, and a trusted third party 125 (e.g., to homomorphically decrypt the search results).


In scenario (1), shown in FIG. 1, both data providers 140 and 150 homomorphically encrypt their data. In scenarios (2) and (3), only the first data provider 140 homomorphically encrypts its first data element, while the second data provider 150 leaves its second data element unencrypted in plaintext. In scenario (2), the first data provider 140 is the query provider (generating a HE query) and the second data provider 150 is the target data provider (generating unencrypted target data). In scenario (3), the first data provider 140 is the target data provider (generating HE target data) and the second data provider 150 is the query data provider (generating an unencrypted query).


The second data provider 150 may store at least one second data element 111, such as target data being searched. The first data provider 140, which may be, for example, a trusted query provider requesting that the second data provider 150 determine if any of the at least one second data element 111 matches the first data element 101 without exposing the first data element 101 to the second data provider 150. In some embodiments of the invention, the first data provider 140 may be, for example, a trusted query provider storing first data element 101, such as a search query, and the second data provider 150 may be, for example, a semi-trusted database or cloud/file management system storing second data element 111, such as target data being searched, or vice versa.


The first data provider 140 may map the first data element 101 to a first plurality of tokens 103 using a codebook of tokens to represent first data element 101. The first data provider 140 may concatenate the first plurality of tokens to generate a first token signature 105 comprising the first plurality of tokens that uniquely represents the first data element 101. The first data provider 140 may homomorphically encrypt the first token signature 107 and transmit the homomorphically encrypted first token signature 107 to a device for comparison (e.g., the second data provider 150 or another external device or party).


The second data provider 150 may map the second data element 111 to at least one second plurality of tokens 113 using the token codebook. The second data provider 150 may concatenate the at least one second plurality of tokens to generate at least one second token signature 115. In scenario (1), e.g., when the second data element 111 contains secret data, the second data provider 150 may be configured to homomorphically encrypt the second token signature 115 using a public homomorphic encryption key to generate a homomorphically encrypted second token signature 117 representing the second data element. The public homomorphic encryption key used by the first data provider 140 and second data provider 150 may be the same or different. In scenarios (2) and (3), e.g., when the second data element 111 does not contain secret data, homomorphic encryption may be skipped, and the second token signature 115 may be left unencrypted. When an external device performs the search comparison (e.g., trusted party 125), the second data provider 150 may transmit the unencrypted or homomorphically encrypted second token signature 117 to the external device or party. Otherwise, the second data provider 150 may keep and store the unencrypted or homomorphically encrypted second token signature 117 in memory.


The second data provider 150 or an external device (e.g., trusted party 125) may be configured to perform a homomorphic search by comparing the homomorphically encrypted first token signature 107 representing the first data element 101 to each of at least one unencrypted or homomorphically encrypted second token signatures 117 representing the second data element 111 to generate at least one homomorphically encrypted comparison 119 of the first and second token signatures. Examples of computations for generating the homomorphically encrypted comparison 119 are defined in equations (1)-(3). The homomorphically encrypted comparison 119 may be transmitted to (or remain in) a trusted device for decryption and analysis.


The trusted device, e.g., the first data provider 140 or trusted party 125, may decrypt the homomorphically encrypted comparison 119, using a private homomorphic decryption key, to generate an unencrypted comparison 121. The unencrypted comparison 121 may indicate whether or not the first and second token signatures match respectively indicating that the search query is found or not in the target data. The unencrypted comparison 121 may additionally or alternatively indicate a matching score defining a frequency or certainty with which the search query is found in the target data.


Data structures 101-121, although shown to be performed by particular devices in FIG. 1, may be performed by any one or more individual or combinations of first data provider(s) 140, second data provider(s) 150 and trusted part(ies) 125, or other external devices or third party systems.


Reference is made to FIG. 2, which schematically illustrates a multi-party system comprising a trusted first homomorphic encryption (HE) data provider 140 (e.g., of a HE search query), a semi-trusted or untrusted second data provider 150 (e.g., of HE or unencrypted target data to be searched), and a trusted third party 125 (e.g., to homomorphically decrypt the search results), according to an embodiment of the invention. In various embodiments, the third party 125 may be the first data provider 140, the second data provider 150, or a distinct external system or party.


The first data provider may be, for example, a query provider which includes one or more memories and one or more processors. The one or more memories of the first data provider may be configured to store a first data element comprising secret data, a codebook of tokens to represent data elements, and a first public homomorphic encryption key. The first data provider may be configured to map the first data element to a plurality of tokens using the token codebook, concatenate the plurality of tokens to generate a first token signature comprising the plurality of tokens that uniquely represent the first data element, and homomorphically encrypt the first token signature by utilizing the first public homomorphic encryption key to generate a homomorphically encrypted first token signature representing the first data element.


The second data provider may be, for example, a database or cloud/file management system which includes one or more memories and one or more processors. The one or more memories of the second data provider may be configured to store a second data element comprising secret or public data, and the same codebook of tokens as the first data provider. In some embodiments of the invention, the one or more memories of the second data provider also store a second public homomorphic encryption key, which may be the same or different key as the first public homomorphic encryption key. The second data provider may be configured to map the second data element to a plurality of tokens using the token codebook, and concatenate the plurality of tokens to generate a second token signature comprising the plurality of tokens that uniquely represent the second data element. The second data provider may be configured to transmit the second token signature to the trusted third party 125, or may be configured to homomorphically encrypt the second token signature by utilizing the second public homomorphic encryption key to generate a homomorphically encrypted second token signature representing the first data element and transmit the homomorphically encrypted second token signature to the trusted third party 125.


The trusted third party 125 may be a computation host configured as one or more centralized server(s) or part(ies), which may offer services, such as performing search and retrieval of secure or encrypted data, to a variety of users, such as the first data provider and the second data provider. In scenarios where the second data provider is untrusted and should not perform the comparison between the homomorphically encrypted first token signature and the unencrypted or homomorphically encrypted second token signature, the first data provider and the second data provider may transmit the homomorphically encrypted first and second token signatures to the trusted or semi-trusted third party 125 to perform the comparison.


The third party may be configured to compare the homomorphically encrypted first token signature representing the first data element and the unencrypted or homomorphically encrypted second token signature to generate a homomorphically encrypted comparison of the first and second token signatures. If the third party is trusted, the third party may be configured to decrypt the homomorphically encrypted comparison, using a private homomorphic decryption key, to determine if the first and second token signatures match or not respectively indicating that the search query is found or not in the target data. If the second token signature utilized for the comparison was not homomorphically encrypted or the second public homomorphic encryption key is the same as the first public homomorphic encryption key such that the private homomorphic decryption key also corresponds to the second public homomorphic encryption key, then the third party may be configured to decrypt the homomorphically encrypted comparison utilizing the private homomorphic decryption key. If the second token signature utilized for the comparison was homomorphically encrypted utilizing the second public homomorphic encryption key, and the second public homomorphic encryption key is not the same as the first public homomorphic encryption key, then the third party may be configured to decrypt the homomorphically encrypted comparison utilizing a first and second private homomorphic decryption key (alone or in combination with another device possessing the second private key). If the third party is not trusted, but semi-trusted, the third party may transmit the homomorphically encrypted comparison to a trusted device, such as, the first data provider to decrypt.


If the second data provider is semi-trusted and the third party is trusted, the first data provider may be configured to transmit the homomorphically encrypted first token signature to the second data provider to compare to the unencrypted or homomorphically encrypted second token signature, and may transmit the homomorphically encrypted comparison to the third party to decrypt.


The first data provider may be the query provider and the second data provider may be the target data provider, or vice versa. The first and second data providers 140 and 150 and trusted party 125 in FIG. 2 may be the same of different as those in FIG. 1.


Reference is made to FIG. 3, which schematically illustrates a multi-party system 300 for securely searching data in a semi-trusted environment according to an embodiment of the invention. The systems described in reference to FIGS. 1 and 2 may include devices and/or components of system 300 of FIG. 3. The devices of system 300 may be operated by one of the parties disclosed herein including, for example, a query provider, a target data provider such as a database or cloud/file management system that stores the data to be searched, and/or one or more third parties, such as a trusted server.


Multi-party system 300 comprises one or more first computer(s) 340 (e.g., operated by first data provider(s) 140 of FIGS. 1 and 2), one or more second computer(s) 350 (e.g., operated by second data provider(s) 150 of FIGS. 1 and 2), and one or more third party server(s) 310 (e.g., operated by third part(ies) 125 of FIGS. 1 and 2). In one example, the first computer 340 may be operated by a query provider and the second computer 350 may be operated by a database or cloud/file management system that stores the target data being searched. Other parties may also operate these devices in accordance with other embodiments of the invention. Computer(s) 340 and 350 and third party server(s) 310 may be connected via one or more wired or wireless communication networks 320 (e.g., network 120 of FIG. 1).


The first data provider computer 340 may store, in memory unit 358, a first data element (e.g., 101 of FIG. 1) comprising secret data, a codebook of tokens representing data elements, and a first public homomorphic encryption key. The first data provider computer 340 may use the codebook to map the first data element to at least one first token signature (e.g., 105 of FIG. 1). The first data provider computer 340 may use the first public homomorphic encryption key to encrypt the first token signature to generate a homomorphically encrypted first token signature (e.g., 107 of FIG. 1) representing the first data element. The first data provider computer 340 may transmit the homomorphically encrypted first token signature to a search device, such as second data provider computer 350 or third party computation host server(s) 310.


The second data provider computer 350 may store, in memory unit 348, a second data element (e.g., 111 of FIG. 1) comprising either secret or non-secret data and the same token codebook as stored on the first data provider computer 350. In some embodiments of the invention, e.g., where the second data element comprises secret data, the second data provider computer 350 may store a public homomorphic encryption key that is the same as or different than the first public homomorphic encryption key. The second data provider computer 350 may use the codebook to map the second data element to at least one second token signature (e.g., 115 of FIG. 1). The second data provider computer 350 may leave the second token signature unencrypted or use its public homomorphic encryption key to encrypt the second token signature to generate a homomorphically encrypted second token signature (e.g., 117 of FIG. 1) representing the second data element. The second data provider computer 350 may compare locally or transmit the second token signature to an external comparison device, such as third party computation host server(s) 310.


Second data provider computer 350 or third party computation host server(s) 310 may host computations or tests, such as comparing the homomorphically encrypted first token signature representing the first data element and an unencrypted or homomorphically encrypted second token signature, e.g., according to EQNs. (1)-(3), to generate a homomorphically encrypted comparison (e.g., 119 of FIG. 1) of the first and second token signature. In embodiments of the invention where the second data provider computer 350 is configured to perform the comparison, the second data provider computer 340 may transmit it to an external trusted decryption device, such as first data provider computer 340 or third party computation host server(s) 310. In embodiments of the invention where the third party computation host server(s) 310 is configured to perform the comparison, the third party computation host server(s) 310 may keep the HE comparison locally or transmit it to an external trusted decryption device, such as first data provider computer 340 or another third party device.


The third party computation host server(s) 310, the first data provider computer 340, or another trusted external device, may store a private homomorphic decryption key for decrypting the homomorphically encrypted comparison to generate an unencrypted comparison of the first and second token signatures (e.g., 121 of FIG. 1). In embodiments where only the first, but not second, token signatures is encrypted, or both token signatures are encrypted using the same key, decryption may be executed with a single corresponding private homomorphic decryption key. In embodiments where the first and second token signatures are both encrypted using two different respective encryption keys, two separate corresponding private homomorphic decryption keys may be used to decrypt by one or more of the third party computation host server(s) 310, the first data provider computer 340, and/or another trusted external device.


Third party computation host server(s) 310 may include a separate secure memory 315 secret data 317. Secret data 317 may include the first and/or second private homomorphic decryption key(s) and/or the unencrypted comparison (e.g., 121 of FIG. 1). Secure memory 315 may be internal or external to one or more of the third party computation host server(s) 310 and may be connected thereto by a local or remote and a wired or wireless connection. In alternate embodiments, data 317 may be stored in an alternate location separate from database 315, e.g., memory unit(s) 318.


Data provider computers 340 and 350 and third party computation host server(s) 310 may be servers, personal computers, desktop computers, mobile computers, laptop computers, and notebook computers or any other suitable device such as a cellular telephone, personal digital assistant (PDA), video game console, etc., and may include wired or wireless connections or modems to connect to network 320. Data provider computers 340 and 350 may include one or more input devices 342 and 352, respectively, for receiving input from a user (e.g., via a pointing device, click-wheel or mouse, keys, touch screen, recorder/microphone, other input components). Data provider computers 340 and 350 may include one or more output devices 344 and 354 (e.g., a monitor or screen) for displaying data to a user provided by or for computation host server(s) 310.


Network 320, which connects third party computation host server(s) 310 and data provider computers 340 and 350, may be any public or private network such as the Internet. Access to network 320 may be through wire line, terrestrial wireless, satellite or other systems well known in the art.


Third party computation host server(s) 310 and data provider computers 340 and 350, may include one or more controller(s) or processor(s) 316, 346, and 356, respectively, for executing operations according to embodiments of the invention and one or more memory unit(s) 315/318, 348, and 358, respectively, for storing data (e.g., data elements, token signatures, homomorphic encryption keys and decryption keys, encrypted token signatures, and homomorphically encrypted and decrypted comparisons) and/or instructions (e.g., software for mapping data elements to plurality of tokens, concatenating plurality of tokens into token signatures, applying test computations or calculations, keys to encrypt, decrypt or re-encrypt data according to embodiments of the invention) executable by the processor(s). Processor(s) 316, 346, and/or 356 may include, for example, a central processing unit (CPU), a digital signal processor (DSP), a microprocessor, a controller, a chip, a microchip, an integrated circuit (IC), or any other suitable multi-purpose or specific processor or controller. Memory unit(s) 318, 348, and/or 358 may include, for example, a random access memory (RAM), a dynamic RAM (DRAM), a flash memory, a volatile memory, a non-volatile memory, a cache memory, a buffer, a short term memory unit, a long term memory unit, or other suitable memory units or storage units. Secure memory 315 may be memory that is specialized memory physically separate from general memor(ies) that validates prescribed security configurations, such as the Intel™ SGX product. Secure memory 315 may allow computation on a “TPM—Trusted Program Module.”


Some embodiments may be provided in a computer program product that may include a non-transitory machine-readable medium, stored thereon instructions, which may be used to program a computer, or other programmable devices, to perform methods as disclosed herein. Embodiments of the invention may include an article such as a computer or processor non-transitory readable medium, or a computer or processor non-transitory storage medium, such as for example a memory, a disk drive, or a USB flash memory, encoding, including or storing instructions, e.g., computer-executable instructions, which when executed by a processor or controller, carry out methods disclosed herein. The storage medium may include, but is not limited to, any type of disk including floppy disks, optical disks, compact disk read-only memories (CD-ROMs), rewritable compact disk (CD-RWs), and magneto-optical disks, semiconductor devices such as read-only memories (ROMs), random access memories (RAMs), such as a dynamic RAM (DRAM), erasable programmable read-only memories (EPROMs), flash memories, electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, or any type of media suitable for storing electronic instructions, including programmable storage devices.


Reference is made to FIG. 4, which is flowchart of a method for securely searching data in a semi-trusted environment, according to an embodiment of the invention. Operations of FIG. 4 may be performed using the devices, architectures, data structures, and/or workflows described in reference to FIGS. 1-3. Other devices and configurations may also be used.


In operation 401, a first data provider (e.g., 140 of FIGS. 1 and 2 or 340 of FIG. 3) may map a first data element (e.g., 101 of FIG. 1) comprising secret data to a first plurality of tokens (e.g., 103 of FIG. 1) using a codebook of tokens to represent data elements.


In operation 403, the first data provider may concatenate the first plurality of tokens to generate a first token signature (e.g., 105 of FIG. 1) comprising the first plurality of tokens that uniquely represents the first data element.


In operation 405, the first data provider may homomorphically encrypt the first token signature using a first public homomorphic encryption key to generate a homomorphically encrypted first token signature (e.g., 107 of FIG. 1) representing the first data element.


The first data provider may be configured to transmit the homomorphically encrypted first token signature to a second data provider (e.g., 150 of FIGS. 1 and 2 or 350 of FIG. 3) or a third party (e.g., the trusted party 125 of FIGS. 1 and 2 or 310 of FIG. 3) for comparison.


In operation 407, the second data provider may map a second data element (e.g., 111 of FIG. 1) to a second plurality of tokens (e.g., 113 of FIG. 1) using the token codebook. One of the first and second data elements (e.g., 101 or 111 of FIG. 1) may be a search query and the other (e.g., 111 or 101 of FIG. 1, respectively) may be target data to be searched.


In operation 409, the second data provider may concatenate the second plurality of tokens to generate a second token signature (e.g., 115 of FIG. 1).


In operations, If the second data element includes secret data, in operation 411 the second data provider may homomorphically encrypt the second token signature using a second public homomorphic encryption key to generate a homomorphically encrypted second token signature (e.g., 117 of FIG. 1) representing the second data element. In embodiments of the invention where the second data element does not contain secret data, operation 411 may be skipped, and the second token signature may be left unencrypted.


In embodiments of the invention where the third party is configured to generate the homomorphically encrypted comparison, the second data provider may be configured to transmit the homomorphically encrypted or unencrypted second token signature to the third party.


In operation 413, the second data provider or third party may compare the homomorphically encrypted first token signature representing the first data element and the unencrypted or homomorphically encrypted second token signature to generate a homomorphically encrypted comparison (e.g., 119 of FIG. 1) of the first and second token signatures. The second data provider or third party may transmit the homomorphically encrypted comparison to the first data provider or the third party. In some embodiments of the invention, in operation 413, the third party may compare and decrypt, this transmission may be omitted.


In operation 415, the trusted first data provider or third party may decrypt the homomorphically encrypted comparison, utilizing a private homomorphic decryption key, to expose the unencrypted comparison (e.g., 121 of FIG. 1). The unencrypted comparison may be a binary indication of whether or not the first and second token signatures match respectively indicating that the search query is found or not in the target data, and/or a matching score (e.g., a continuous value or value in a range of three or more numbers) indicating a frequency or certainty with which the search query is found in the target data. In some embodiments of the invention, when the first and second token signatures are encrypted with two different decryption keys, in operation 415, the trusted first data provider and/or third party may decrypt the homomorphically encrypted comparison utilizing both decryption keys.


In the foregoing description, various aspects of the present invention are described. For purposes of explanation, specific configurations and details are set forth in order to provide a thorough understanding of the present invention. However, it will also be apparent to one of ordinary skill in the art that the present invention may be practiced without the specific details presented herein. Furthermore, well known features may be omitted or simplified in order not to obscure the present invention.


Unless specifically stated otherwise, as apparent from the foregoing discussion, it is appreciated that throughout the specification discussions utilizing terms such as “processing,” “computing,” “calculating,” “determining,” or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulates and/or transforms data represented as physical, such as electronic, quantities within the computing system's registers and/or memories into other data similarly represented as physical quantities within the computing system's memories, registers or other such information storage, transmission or display devices.


It should be recognized that embodiments of the present invention may solve one or more of the objectives and/or challenges described in the background, and that embodiments of the invention need not meet every one of the above objectives and/or challenges to come within the scope of the present invention. While certain features of the invention have been particularly illustrated and described herein, many modifications, substitutions, changes, and equivalents may occur to those of ordinary skill in the art. It is, therefore, to be understood that the appended claims are intended to cover all such modifications and changes in form and details as fall within the true spirit of the invention.


In the above description, an embodiment is an example or implementation of the inventions. The various appearances of “one embodiment,” “an embodiment” or “some embodiments” do not necessarily all refer to the same embodiments.


Although various features of the invention may be described in the context of a single embodiment, the features may also be provided separately or in any suitable combination. Conversely, although the invention may be described herein in the context of separate embodiments for clarity, the invention may also be implemented in a single embodiment.


Reference in the specification to “some embodiments”, “an embodiment”, “one embodiment” or “other embodiments” means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least some embodiments, but not necessarily all embodiments, of the inventions.


It is to be understood that the phraseology and terminology employed herein is not to be construed as limiting and are for descriptive purpose only.


The principles and uses of the teachings of the present invention may be better understood with reference to the accompanying description, figures and examples.


It is to be understood that the details set forth herein do not construe a limitation to an application of the invention.


Furthermore, it is to be understood that the invention can be carried out or practiced in various ways and that the invention can be implemented in embodiments other than the ones outlined in the description above.


It is to be understood that the terms “including”, “comprising”, “consisting” and grammatical variants thereof do not preclude the addition of one or more components, features, steps, or integers or groups thereof and that the terms are to be construed as specifying components, features, steps or integers.


If the specification or claims refer to “an additional” element, that does not preclude there being more than one of the additional element.


It is to be understood that where the claims or specification refer to “a” or “an” element, such reference is not be construed that there is only one of that element.


It is to be understood that where the specification states that a component, feature, structure, or characteristic “may”, “might”, “can” or “could” be included, that particular component, feature, structure, or characteristic is not required to be included.


Where applicable, although state diagrams, flow diagrams or both may be used to describe embodiments, the invention is not limited to those diagrams or to the corresponding descriptions. For example, flow need not move through each illustrated box or state, or in exactly the same order as illustrated and described.


Methods of the present invention may be implemented by performing or completing manually, automatically, or a combination thereof, selected steps or tasks.


The descriptions, examples, methods and materials presented in the claims and the specification are not to be construed as limiting but rather as illustrative only.


Meanings of technical and scientific terms used herein are to be commonly understood as by one of ordinary skill in the art to which the invention belongs, unless otherwise defined. The present invention may be implemented in the testing or practice with methods and materials equivalent or similar to those described herein.


While the invention has been described with respect to a limited number of embodiments, these should not be construed as limitations on the scope of the invention, but rather as exemplifications of some of the preferred embodiments. Other possible variations, modifications, and applications are also within the scope of the invention. Accordingly, the scope of the invention should not be limited by what has thus far been described, but by the appended claims and their legal equivalents.

Claims
  • 1. A system for securely searching data in a semi-trusted environment, the system comprising: a first data provider comprising one or more first processors configured to: map a first data element comprising secret data to a first plurality of tokens by searching a codebook storing unencrypted tokens and associated mappings to represent data elements, wherein searching the codebook associates the first plurality of tokens that match the same or substantially similar information as the first data element;concatenate the first plurality of tokens to generate a first token signature comprising the first plurality of tokens that uniquely represents the first data element, andhomomorphically encrypt the first token signature using a public homomorphic encryption key to generate a homomorphically encrypted first token signature representing the first data element;a second data provider comprising one or more second processors configured to: map a second data element to a second plurality of tokens by searching the token codebook, wherein searching the codebook associates the second plurality of tokens that match the same or substantially similar information as the second data element, and wherein one of the first and second data elements is a search query and the other is target data being searched,concatenate the second plurality of tokens to generate a second token signature comprising the second plurality of tokens that uniquely represent the second data element, and compare the homomorphically encrypted first token signature representing the first data element and an unencrypted or homomorphically encrypted second token signature to generate a homomorphically encrypted comparison of the first and second token signatures;and a trusted party comprising one or more processors configured to: decrypt the homomorphically encrypted comparison, using a private homomorphic decryption key, to determine that the first and second token signatures match or not respectively indicating that the search query is found or not in the target data.
  • 2. The system of claim 1, wherein the first data element is the search query and the second data element is the target data to be searched, or the second data element is the search query and the first data element is the target data to be searched.
  • 3. The system of claim 1, wherein the first data provider operates in a trusted environment, the second data provider operates in a trusted or semi-trusted environment, and the trusted party operates in a trusted environment.
  • 4. The system of claim 1, wherein the trusted party is the first data provider, the second data provider, or a distinct third party system.
  • 5. The system of claim 1, wherein the one or more first and second processors of the first and second data providers are configured to map the first and second data elements, respectively, to a plurality of tokens, comprising for each data element: divide the data element into one or more atomic data units,search the codebook for a plurality of tokens matching each instance of each atomic data unit, andgenerate an ordered set of the plurality of tokens for the plurality of atomic units.
  • 6. The system of claim 1, wherein the codebook of tokens is dynamically updated by adding new tokens to the codebook and deleting preexisting tokens from the codebook, wherein the updated codebook is simultaneously available to both the first and second data providers.
  • 7. The system of claim 1, wherein the comparison of the first and second token signatures is a binary indication of whether or not the search query is found in the target data.
  • 8. The system of claim 1, wherein the comparison of the first and second token signatures is a matching score indicating a frequency or certainty with which the search query is found in the target data.
  • 9. A first data provider for securely searching data in a semi-trusted environment, the first data provider comprising: one or more memories configured to store a first data element comprising secret data, a codebook of unencrypted tokens and associated mappings to represent data elements, and a public homomorphic encryption key;and one or more processors configured to:map the first data element comprising secret data to a first plurality of tokens by searching the token codebook, wherein searching the token codebook associates the first plurality of tokens that match the same or substantially similar information as the first data element;concatenate the first plurality of tokens to generate a first token signature comprising the first plurality of tokens that uniquely represents the first data element;homomorphically encrypt the first token signature using the public homomorphic encryption key to generate a homomorphically encrypted first token signature representing the first data element;transmit the homomorphically encrypted first token signature to a second data provider to compare the homomorphically encrypted first token signature representing the first data element and an unencrypted or homomorphically encrypted second token signature representing a second data element and generate a homomorphically encrypted comparison of the first and second token signatures, wherein one of the first and second data elements is a search query and the other is target data being searched; andreceive the result of decrypting the homomorphically encrypted comparison at a trusted device, using a private homomorphic decryption key, to determine that the first and second token signatures match or not respectively indicating that the search query is found or not in the target data.
  • 10. The first data provider of claim 9, wherein the first data provider operates in a trusted environment, the second data provider operates in a trusted or semi-trusted environment, and the trusted device operates in a trusted environment.
  • 11. A second data provider for securely searching data in a semi-trusted environment, the second data provider comprising: one or more memories configured to store a second data element, and a codebook of unencrypted tokens and associated mappings to represent data elements; andone or more processors configured to:map the second data element to a second plurality of tokens by searching the token codebook, wherein searching the codebook associates the second plurality of tokens that match the same or substantially similar information as the second data element,concatenate the second plurality of tokens to generate a second token signature comprising the second plurality of tokens that uniquely represent the second data element,receive, from a first data provider, a homomorphically encrypted first token signature that is a homomorphically encryption of a concatenation of a first plurality of tokens uniquely representing a first data element comprising secret data according to the codebook of tokens, wherein one of the first and second data elements is a search query and the other is target data being searched,compare the homomorphically encrypted first token signature representing the first data element and an unencrypted or homomorphically encrypted second token signature to generate a homomorphically encrypted comparison of the first and second token signatures, andtransmit the homomorphically encrypted comparison to a trusted device to decrypt the homomorphically encrypted comparison, using a private homomorphic decryption key, to determine that the first and second token signatures match or not respectively indicating that the search query is found or not in the target data.
  • 12. The second data provider of claim 11, wherein the one or more memories are further configured to store the public homomorphic encryption key to generate the homomorphically encrypted second token signature.
  • 13. The second data provider of claim 11, wherein the first data provider operates in a trusted environment, the second data provider operates in a trusted or semi-trusted environment, and the trusted device operates in a trusted environment.
  • 14. A method for securely searching data in a semi-trusted environment, the method comprising, at a first data provider: mapping a first data element comprising secret data to a first plurality of tokens by searching a codebook storing unencrypted tokens and associated mappings to represent data elements, wherein searching the codebook associates the first plurality of tokens that match the same or substantially similar information as the first data element;concatenating the first plurality of tokens to generate a first token signature comprising the first plurality of tokens that uniquely represents the first data element;homomorphically encrypting the first token signature using a public homomorphic encryption key to generate a homomorphically encrypted first token signature representing the first data element;transmitting the homomorphically encrypted first token signature to a second data provider to compare the homomorphically encrypted first token signature representing the first data element and an unencrypted or homomorphically encrypted second token signature representing a second data element and generate a homomorphically encrypted comparison of the first and second token signatures, wherein one of the first and second data elements is a search query and the other is target data being searched;receiving the result of decrypting the homomorphically encrypted comparison at a trusted device,using a private homomorphic decryption key, to determine that the first and second token signatures match or not respectively indicating that the search query is found or not in the target data.
  • 15. The method of claim 14, wherein the first data element is the search query and the second data element is the target data to be searched, or the second data element is the search query and the first data element is the target data to be searched.
  • 16. The method of claim 14, wherein the first data provider operates in a trusted environment, and the second data provider operates in a trusted or semi-trusted environment.
  • 17. The method of claim 14, wherein mapping the first element to the first plurality of tokens comprises: dividing the data element into one or more atomic data units; searching the codebook for a plurality of tokens matching each instance of each atomic data unit; and generating an ordered set of the plurality of tokens for the plurality of atomic units.
  • 18. The method of claim 14, further comprising dynamically updating the codebook of tokens by adding new tokens to the codebook and deleting preexisting tokens from the codebook, wherein the updated codebook is simultaneously available to both the first and second data providers.
  • 19. The method of claim 14, wherein the comparison of the first and second token signatures is a binary indication of whether or not the search query is found in the target data.
  • 20. The method of claim 14, wherein the comparison of the first and second token signatures is a matching score indicating a frequency or certainty with which the search query is found in the target data.
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application Ser. No. 62/790,696, filed Jan. 10, 2019, which is hereby incorporated by reference in its entirety.

US Referenced Citations (320)
Number Name Date Kind
3673399 Hancke Jun 1972 A
3851162 Munoz Nov 1974 A
4450525 Demuth May 1984 A
4493048 Kung Jan 1985 A
4689762 Thibodeau, Jr. Aug 1987 A
4791590 Ku Dec 1988 A
4896287 O'Donnell Jan 1990 A
4949294 Wambergue Aug 1990 A
4972358 Welles, II Nov 1990 A
5177691 Welles Jan 1993 A
5272755 Miyaji Dec 1993 A
5317639 Mittenthal May 1994 A
5377207 Perlman Dec 1994 A
5627863 Aslanis May 1997 A
5647001 Mittenthal Jul 1997 A
5717620 Williams Feb 1998 A
5831883 Suter Nov 1998 A
6003056 Auslander Dec 1999 A
6073154 Dick Jun 2000 A
6081821 Hopkinson Jun 2000 A
6098088 He Aug 2000 A
6098152 Mounes-Toussi Aug 2000 A
6128764 Gottesman Oct 2000 A
6144740 Laih Nov 2000 A
6167392 Ostrovsky et al. Dec 2000 A
6240143 Shanbhag May 2001 B1
6263436 Franklin Jul 2001 B1
6308295 Sridharan Oct 2001 B1
6408368 Parady Jun 2002 B1
6438554 Di-Crescenzo et al. Aug 2002 B1
6477554 Aizenberg Nov 2002 B1
6549925 Amrany Apr 2003 B1
6721349 Willenegger Apr 2004 B1
6792108 Patera Sep 2004 B1
6801579 Hassibi Oct 2004 B1
7158569 Penner Jan 2007 B1
7966359 Brenner Jun 2011 B2
8291237 Ostrovsky et al. Oct 2012 B2
8533489 Roeder Sep 2013 B2
8619980 Suzuki et al. Dec 2013 B2
8903083 Gentry et al. Dec 2014 B2
8948375 Burnett Feb 2015 B2
9009089 El Defrawy Apr 2015 B1
9083526 Gentry Jul 2015 B2
9094378 Yung et al. Jul 2015 B1
9118631 Yung et al. Aug 2015 B1
9197327 Geyer Nov 2015 B2
9215068 Iwamura et al. Dec 2015 B2
9225385 Keusgen Dec 2015 B2
9281941 Gentry et al. Mar 2016 B2
9305180 Cardno Apr 2016 B2
9313179 Yung et al. Apr 2016 B1
9436835 Saldamli Sep 2016 B1
9438412 Rane Sep 2016 B2
9442980 Trepetin et al. Sep 2016 B1
9449191 MacCarthy Sep 2016 B2
9519798 Egorov et al. Dec 2016 B2
9608817 Gentry et al. Mar 2017 B2
9621346 Gentry et al. Apr 2017 B2
9680653 Bradbury Jun 2017 B1
9742556 Bacon et al. Aug 2017 B2
9742566 Gentry et al. Aug 2017 B2
9787647 Wu Oct 2017 B2
9794068 Yasuda Oct 2017 B2
9825758 Feng Nov 2017 B2
9892211 Yoshino et al. Feb 2018 B2
9894042 Dawoud Feb 2018 B2
9942032 Kornaropoulos Apr 2018 B1
9946810 Trepetin et al. Apr 2018 B1
9967289 White May 2018 B2
9971907 Egorov et al. May 2018 B2
9973334 Hibshoosh May 2018 B2
10019709 Abbott Jul 2018 B2
10057057 Gentry et al. Aug 2018 B2
10075288 Khedr et al. Sep 2018 B1
10097522 Philipp et al. Oct 2018 B2
10116437 Krendelev Oct 2018 B1
10176207 Dawoud Jan 2019 B1
10181049 El Defrawy Jan 2019 B1
10211986 Isshiki Feb 2019 B2
10255464 Terra Apr 2019 B2
10268832 Ciubotariu Apr 2019 B1
10296709 Laine May 2019 B2
10318952 Wade Jun 2019 B1
10333695 Laine et al. Jun 2019 B2
10333715 Chu Jun 2019 B2
10341103 Shaked Jul 2019 B2
10346617 El Defrawy Jul 2019 B1
10348485 Stueve Jul 2019 B2
10375066 Chabanne Aug 2019 B2
10380389 Wade Aug 2019 B1
10394860 Zelenov et al. Aug 2019 B1
10395060 Setty Aug 2019 B2
10402816 Terra Sep 2019 B2
10404667 Iyer Sep 2019 B2
10404669 Dawoud Sep 2019 B2
10417442 Ohara Sep 2019 B2
10438189 Rezayee Oct 2019 B2
10496631 Tschudin Dec 2019 B2
10505711 Thiebeauld de La Crouee Dec 2019 B2
10541805 Laine et al. Jan 2020 B2
10554385 Gajek Feb 2020 B2
10567511 Schmidt Feb 2020 B2
10587563 Moffat Mar 2020 B2
10599874 Brady Mar 2020 B2
10621364 El Defrawy Apr 2020 B1
10621590 Rezayee Apr 2020 B2
10644876 Williams et al. May 2020 B2
10673614 Hirano Jun 2020 B2
10790961 Lin Sep 2020 B2
10812252 Laine Oct 2020 B2
10831903 Li Nov 2020 B2
10917235 Gama Feb 2021 B2
20010009030 Piret Jul 2001 A1
20010019630 Johnson Sep 2001 A1
20010034640 Chaum Oct 2001 A1
20020009197 Keyes Jan 2002 A1
20020027986 Brekne Mar 2002 A1
20020049601 Asokan Apr 2002 A1
20020114452 Hamilton Aug 2002 A1
20020178194 Aizenberg Nov 2002 A1
20030021365 Min Jan 2003 A1
20030065632 Hubey Apr 2003 A1
20030081785 Boneh May 2003 A1
20030182554 Gentry Sep 2003 A1
20040105546 Chernyak Jun 2004 A1
20040151307 Wang Aug 2004 A1
20040156498 Paeng Aug 2004 A1
20040179622 Calabro Sep 2004 A1
20040205036 Prabhu Oct 2004 A1
20040223616 Kocarev Nov 2004 A1
20050055546 Dzung Mar 2005 A1
20050094806 Jao May 2005 A1
20050138516 Yedidia Jun 2005 A1
20050246533 Gentry Nov 2005 A1
20060075010 Wadleigh Apr 2006 A1
20060098814 Al-Khoraidly May 2006 A1
20060129800 Lauter Jun 2006 A1
20060140401 Johnson Jun 2006 A1
20060177051 Lauter Aug 2006 A1
20060206554 Lauter Sep 2006 A1
20060286587 Lee Dec 2006 A1
20070088774 Li Apr 2007 A1
20070095909 Chaum May 2007 A1
20070106718 Shum May 2007 A1
20070162373 Kongtcheu Jul 2007 A1
20070165843 Lauter Jul 2007 A1
20070271326 Li Nov 2007 A1
20070294183 Camenisch Dec 2007 A1
20080036760 Smith Feb 2008 A1
20080063110 Averbuch Mar 2008 A1
20080140750 Kershaw Jun 2008 A1
20080201394 Li Aug 2008 A1
20080208560 Johnson Aug 2008 A1
20080263285 Sharma Oct 2008 A1
20080298582 Sakai Dec 2008 A1
20090010428 Delgosha Jan 2009 A1
20090106633 Fujiwara Apr 2009 A1
20090112955 Kershaw Apr 2009 A1
20090135717 Kamal May 2009 A1
20090160576 Dent Jun 2009 A1
20090249162 Tjhai Oct 2009 A1
20090285332 Damen Nov 2009 A1
20090307218 Selly Dec 2009 A1
20090327255 Larson Dec 2009 A1
20100002872 Shibutani Jan 2010 A1
20100111296 Brown May 2010 A1
20100131807 Truong May 2010 A1
20100146299 Swaminathan Jun 2010 A1
20100169346 Boldyrev Jul 2010 A1
20100251378 Eker Sep 2010 A1
20100332942 Wezelenburg Dec 2010 A1
20110013716 Brodzik Jan 2011 A1
20110107201 Kim May 2011 A1
20110125439 Guruprasad May 2011 A1
20120030468 Papamanthou Feb 2012 A1
20120039463 Gentry et al. Feb 2012 A1
20120039465 Gentry et al. Feb 2012 A1
20120039469 Mueller Feb 2012 A1
20120096328 Franceschini Apr 2012 A1
20120120074 Huysmans May 2012 A1
20120163584 Adjedj Jun 2012 A1
20120198560 Fiske Aug 2012 A1
20130010950 Kerschbaum Jan 2013 A1
20130019139 Panteleev Jan 2013 A1
20130097431 Hriljac Apr 2013 A1
20130099874 Bromberger Apr 2013 A1
20130158918 Spanier Jun 2013 A1
20130170640 Gentry Jul 2013 A1
20130216044 Gentry et al. Aug 2013 A1
20130262863 Yoshino et al. Oct 2013 A1
20130318360 Yamamoto Nov 2013 A1
20130326315 Elia Dec 2013 A1
20130329883 Tamayo-Rios Dec 2013 A1
20130346755 Nguyen Dec 2013 A1
20140064407 Dandach Mar 2014 A1
20140105403 Baldi Apr 2014 A1
20140237253 Joye Aug 2014 A1
20140280427 Bocharov et al. Sep 2014 A1
20140283040 Wilkerson Sep 2014 A1
20140355756 Iwamura et al. Dec 2014 A1
20140379726 Roger Dec 2014 A1
20150033025 Hoffstein Jan 2015 A1
20150039912 Payton Feb 2015 A1
20150055688 Xiong Feb 2015 A1
20150067874 Johnson Mar 2015 A1
20150092872 Keusgen Apr 2015 A1
20150095747 Tamo Apr 2015 A1
20150106418 Kliuchnikov Apr 2015 A1
20150128290 De Ayala May 2015 A1
20150143509 Selander May 2015 A1
20150154147 Alboszta Jun 2015 A1
20150180667 Bringer Jun 2015 A1
20150280906 Shany Oct 2015 A1
20150312031 Seo Oct 2015 A1
20150318865 Rotge Nov 2015 A1
20150333905 Parann-Nissany Nov 2015 A1
20150358153 Gentry Dec 2015 A1
20150365239 Gajek et al. Dec 2015 A1
20150378734 Hansen Dec 2015 A1
20160004965 Liebig Jan 2016 A1
20160070673 Wang Mar 2016 A1
20160105402 Soon-Shiong Apr 2016 A1
20160164670 Gentry et al. Jun 2016 A1
20160164671 Gentry et al. Jun 2016 A1
20160164676 Gentry et al. Jun 2016 A1
20160173124 Majumdar Jun 2016 A1
20160180238 Alboszta Jun 2016 A1
20160180241 Alboszta Jun 2016 A1
20160189053 Alboszta Jun 2016 A1
20160210560 Alboszta Jul 2016 A1
20160232362 Conway Aug 2016 A1
20160239463 Song Aug 2016 A1
20160283600 Ackerly Sep 2016 A1
20160328253 Majumdar Nov 2016 A1
20160330017 Youn Nov 2016 A1
20160335450 Yoshino et al. Nov 2016 A1
20160344707 Philipp et al. Nov 2016 A1
20160357799 Choi Dec 2016 A1
20160372128 Baeckstroem Dec 2016 A1
20170024585 Mooij Jan 2017 A1
20170063525 Bacon et al. Mar 2017 A1
20170109537 Patzer Apr 2017 A1
20170116410 Wajs Apr 2017 A1
20170134157 Laine May 2017 A1
20170134158 Pasol May 2017 A1
20170147835 Bacon et al. May 2017 A1
20170149796 Gvili May 2017 A1
20170155628 Rohloff et al. Jun 2017 A1
20170228448 Fan Aug 2017 A1
20170242961 Shukla et al. Aug 2017 A1
20170249460 Lipton Aug 2017 A1
20170250796 Samid Aug 2017 A1
20170300372 Andreopoulos Oct 2017 A1
20170324556 Brown Nov 2017 A1
20180011996 Dolev et al. Jan 2018 A1
20180034482 Wu Feb 2018 A1
20180034626 Yamada Feb 2018 A1
20180053112 Bravyi Feb 2018 A1
20180060604 Bent Mar 2018 A1
20180083732 Rekaya-Ben Othman Mar 2018 A1
20180091306 Antonopoulos Mar 2018 A1
20180109376 Gentry et al. Apr 2018 A1
20180117447 Tran May 2018 A1
20180123849 Si May 2018 A1
20180131506 Laine et al. May 2018 A1
20180183571 Gajek Jun 2018 A1
20180198601 Laine et al. Jul 2018 A1
20180198613 Anderson Jul 2018 A1
20180212750 Hoffstein et al. Jul 2018 A1
20180212753 Williams Jul 2018 A1
20180212775 Williams Jul 2018 A1
20180212933 Williams Jul 2018 A1
20180254893 Saxena Sep 2018 A1
20180267981 Sirdey Sep 2018 A1
20180278410 Hirano Sep 2018 A1
20180300497 Carpov Oct 2018 A1
20180323805 Li Nov 2018 A1
20180337788 Gajek et al. Nov 2018 A1
20180349577 Goldwasser et al. Dec 2018 A1
20180357434 Radhika Dec 2018 A1
20180357530 Beery Dec 2018 A1
20180359078 Jain Dec 2018 A1
20180359084 Jain Dec 2018 A1
20180367293 Chen Dec 2018 A1
20180373672 Marin Dec 2018 A1
20190007196 Malluhi et al. Jan 2019 A1
20190007197 Laine Jan 2019 A1
20190036678 Ahmed Jan 2019 A1
20190052412 Lopez Feb 2019 A1
20190058622 Bouttier Feb 2019 A1
20190080254 Haah Mar 2019 A1
20190080392 Youb Mar 2019 A1
20190109712 Blass Apr 2019 A1
20190114228 Twitto Apr 2019 A1
20190141051 Ikarashi May 2019 A1
20190158119 Shany May 2019 A1
20190205773 Ackerman Jul 2019 A1
20190229972 Kohda Jul 2019 A1
20190251262 Fiske Aug 2019 A1
20190303511 McKemey Oct 2019 A1
20190334694 Chen Oct 2019 A1
20190347162 Moussa Nov 2019 A1
20190363871 Cheon Nov 2019 A1
20190372754 Gou Dec 2019 A1
20190385057 Litichever Dec 2019 A1
20190386814 Ahmed Dec 2019 A1
20200005173 del Pino Jan 2020 A1
20200034347 Selly Jan 2020 A1
20200044837 Bos Feb 2020 A1
20200050959 Ashrafi Feb 2020 A1
20200097256 Marin Mar 2020 A1
20200111022 Silberman Apr 2020 A1
20200112322 Galbraith Apr 2020 A1
20200118026 Ashrafi Apr 2020 A1
20200134199 Conway Apr 2020 A1
20200143075 Piatek May 2020 A1
20200151356 Rohloff et al. May 2020 A1
20200177428 Si Jun 2020 A1
20200322124 Isshiki Oct 2020 A1
Non-Patent Literature Citations (3)
Entry
Ioannis, Daskalopoulous, “Dynamic Searchable Symmetric Encryption”; Thesis; University of Piraeus, Athens; (Year: 2015); http://dione.lib.unipi.gr/xmlui/bitsream/handle/unipi/8775/Daskalopoulos_Ioannis.pdf.
Williams_1_Provisional Application Combined; Jan. 2017 (Year: 2017).
Conway_Non-Provisional_Application: Feb. 2016 (Year: 2016).
Related Publications (1)
Number Date Country
20200228308 A1 Jul 2020 US
Provisional Applications (1)
Number Date Country
62790696 Jan 2019 US