The present invention relates to data security, and more particularly to data anonymization.
There are many situations in which there is a need to obfuscate sensitive data. Usually, it is necessary to hide some aspects of data, but not other aspects. For example, data items consisting of national identifiers (IDs) and user addresses may need to be available to a user, but not the mapping between the data items. Conventional data security techniques encode a primary key, a foreign key, or both the primary key and the foreign key. Before the encryption, the conventional techniques may physically order the data by the primary key and the foreign key.
In one embodiment, the present invention provides a computer-implemented method of encrypting data. The method includes encrypting, by one or more processors and using an encryption function, values of keys in a database table ordered by the keys in a relational database management system. The keys are primary keys in a first database table or foreign keys in a second database table. The method further includes determining, by the one or more processors, that the encryption function is homomorphic to sorting operators. The method further includes determining, by the one or more processors, that a decryption function that decrypts the encrypted keys in the database table is homomorphic to sorting operators. The method further includes in response to the encryption and decryption functions being determined to be homomorphic, selecting, by the one or more processors, a merge join operation. The merge join operation operates on the first and second database tables and includes the decryption function in a joining condition. The method further includes using the selected merge join operation, optimizing, by the one or more processors, an execution of a query that accesses one or more data items in the first or second database table.
In another embodiment, the present invention provides a computer program product for encrypting data. The computer program product includes a computer readable storage medium. Computer readable program code is stored in the computer readable storage medium. The computer readable storage medium is not a transitory signal per se. The computer readable program code is executed by a central processing unit (CPU) of a computer system to cause the computer system to perform a method. The method includes using an encryption function, the computer system encrypting values of keys in a database table ordered by the keys in a relational database management system. The keys are primary keys in a first database table or foreign keys in a second database table. The method further includes the computer system determining that the encryption function is homomorphic to sorting operators. The method further includes the computer system determining that a decryption function that decrypts the encrypted keys in the database table is homomorphic to sorting operators. The method further includes in response to the encryption and decryption functions being determined to be homomorphic, the computer system selecting a merge join operation. The merge join operation operates on the first and second database tables and includes the decryption function in a joining condition. The method further includes using the selected merge join operation, the computer system optimizing an execution of a query that accesses one or more data items in the first or second database table.
In another embodiment, the present invention provides a computer system including a central processing unit (CPU); a memory coupled to the CPU; and a computer readable storage medium coupled to the CPU. The computer readable storage medium contains instructions that are executed by the CPU via the memory to implement a method of encrypting data. The method includes using an encryption function, the computer system encrypting values of keys in a database table ordered by the keys in a relational database management system. The keys are primary keys in a first database table or foreign keys in a second database table. The method further includes the computer system determining that the encryption function is homomorphic to sorting operators. The method further includes the computer system determining that a decryption function that decrypts the encrypted keys in the database table is homomorphic to sorting operators. The method further includes in response to the encryption and decryption functions being determined to be homomorphic, the computer system selecting a merge join operation that operates on the first and second database tables and that includes the decryption function in a joining condition. The method further includes using the selected merge join operation, the computer system optimizing an execution of a query that accesses one or more data items in the first or second database table.
A conventional data security technique that physically orders data in two tables by primary key and foreign key and subsequently encodes the primary key or the foreign key column, typically does not change the physical order of the data because the encoding function is not homomorphic. In this case, the SQL engine is no longer able to perform a merge join, which causes a negative performance impact on the data in the two tables.
Embodiments of the present invention address the unique performance impact challenges of the conventional data security techniques by using an encryption algorithm that is homomorphic to sorting (i.e., compare) operators. Making the database engine aware that the encryption is homomorphic allows an optimizer to choose merge join for queries that have a decryption function in the joining condition. For example, by using the novel techniques disclosed herein, the following query can be joined by a merge join algorithm (i.e., merge join operator):
SELECT* FROM table1 t1 join table2 t2 on decryptF1(t1.id,secret1)=decryptF2(t2.id,secret2), where decryptF1( ) and decryptF2( ) are user defined functions (UDFs) which decrypt the values of respective keys.
For database tables in a one-to-one relationship and which have encrypted pairs of foreign keys and primary keys, an attacker can reverse engineer the relation defined by the encrypted pair. In one or more embodiments, a noising system hinders or prevents such an attacker from reverse engineering the relation by (i) adding noise to values of the keys or (ii) noising a database table that has an encrypted key by adding extra duplicates of records which have fake keys.
Merge join enablement system 106 selects a merge join operation 116 that operates on first database table 108 and second database table 110. Merge join operation 116 includes decryption function 114 in a joining condition. A query optimizer 118 accesses and uses merge join operation 116 to optimize execution of a query 120 of RDBMS 104 that accesses data item(s) in first database table 108 and/or second database table 110.
In one embodiment, merge join enablement system 106 includes noising system 122, which adds noise to values of keys in first database table 108 and/or second database table 110 or which adds noise to a database table by adding extra duplicates of records having fake keys into first database table 108 and/or second database table 110. The addition of noise by noising system 122 prevents an attacker from reverse engineering a relation defined by encrypted value pairs, where each pair includes a primary key value in first database table 108 and a foreign key value in second database table 110. In an alternate embodiment, merge join enablement system 106 does not include noising system 122 and merge join enablement system 106 does not perform the noising of the values of the keys or the noising of the database table.
In one embodiment, RDBMS 104 includes a SQL engine (not shown), which includes merge join enablement system 106, query optimizer 118, and a database that includes first database table 108 and second database table 110.
In one embodiment, merge join enablement system 106 includes query optimizer 118.
The functionality of the components shown in
In step 204, merge join enablement system 106 (see
In step 206, merge join enablement system 106 (see
In step 208, based on encryption function 112 (see
In step 210, using merge join operation 116 (see
After step 210, the process of
A challenge resulting from using the homomorphic encryption function 112 (see
In step 304, based on the relationship determined in step 302, merge join enablement system 106 (see
If merge join enablement system 106 (see
Returning to step 304, if merge join enablement system 106 (see
In step 308, merge join enablement system 106 (see
In one embodiment, noising system 122 (see
In step 310, using encryption function 112 (see
In step 312, merge join enablement system 106 (see
In step 314, merge join enablement system 106 (see
In step 316, based on encryption function 112 (see
In step 318, query optimizer 118 (see
After step 318, the process of
In one embodiment, the process of
In step 404, based on the relationship determined in step 402, merge join enablement system 106 (see
If merge join enablement system 106 (see
Returning to step 404, if merge join enablement system 106 (see
In step 408, merge join enablement system 106 (see
In step 410, merge join enablement system 106 (see
In one embodiment, noising system 122 (see
In step 412, using encryption function 112 (see
In step 414, merge join enablement system 106 (see
In step 416, merge join enablement system 106 (see
In step 418, based on encryption function 112 (see
In step 420, query optimizer 118 (see
After step 420, the process of
In one embodiment, because the values of the keys are encrypted, an unauthorized party cannot discover which records are fake and which records are genuine. During the retrieval or joining process, the data items that are in the fake records can be skipped based on a detection of the extra byte(s) that indicate the fake values of the keys.
In an alternate embodiment that selects the process of
SELECT birth_date, name as city_name FROM cities INNER JOIN citizens on id=city_id
Applying merge join operation 116 (see
e(x,key)=x+key,
where key is 7 in this example.
After encoding with the naïve homomorphic encryption function 112 (see
In this example, the decryption function 114 (see
SELECT birth_date, name as city_name FROM cities INNER JOIN citizens_d on d(city_id)=id.
n(x)=x*64+rand(64),
where rand(64) is a random value between 0 and 63 generated by a pseudo-random number generator or a hardware random number generator.
Given the noise function presented above, encryption function 112 (see
e1(x,key)=e(n(x),key)
Examples of the noised and encoded foreign key values are included in the city_id column in citizens database table 580 in
Given the noise function and the encryption function presented in the example relative to
d1(x,key)=dn(d(x,key)),
where d(x) is the decryption function discussed above
relative to
dn(x)=x/64
Memory 604 includes a known computer readable storage medium, which is described below. In one embodiment, cache memory elements of memory 604 provide temporary storage of at least some program code (e.g., program code 614) in order to reduce the number of times code must be retrieved from bulk storage while instructions of the program code are executed. Moreover, similar to CPU 602, memory 604 may reside at a single physical location, including one or more types of data storage, or be distributed across a plurality of physical systems in various forms. Further, memory 604 can include data distributed across, for example, a local area network (LAN) or a wide area network (WAN).
I/O interface 606 includes any system for exchanging information to or from an external source. I/O devices 610 include any known type of external device, including a display, keyboard, etc. Bus 608 provides a communication link between each of the components in computer 102, and may include any type of transmission link, including electrical, optical, wireless, etc.
I/O interface 606 also allows computer 102 to store information (e.g., data or program instructions such as program code 614) on and retrieve the information from computer data storage unit 612 or another computer data storage unit (not shown). Computer data storage unit 612 includes a known computer readable storage medium, which is described below. In one embodiment, computer data storage unit 612 is a non-volatile data storage device, such as a magnetic disk drive (i.e., hard disk drive) or an optical disc drive (e.g., a CD-ROM drive which receives a CD-ROM disk).
Memory 604 and/or storage unit 612 may store computer program code 614 that includes instructions that are executed by CPU 602 via memory 604 to encrypt data for merge join enablement. Although
Further, memory 604 may include an operating system (not shown) and may include other systems not shown in
As will be appreciated by one skilled in the art, in a first embodiment, the present invention may be a method; in a second embodiment, the present invention may be a system; and in a third embodiment, the present invention may be a computer program product.
Any of the components of an embodiment of the present invention can be deployed, managed, serviced, etc. by a service provider that offers to deploy or integrate computing infrastructure with respect to encrypting data for merge join enablement. Thus, an embodiment of the present invention discloses a process for supporting computer infrastructure, where the process includes providing at least one support service for at least one of integrating, hosting, maintaining and deploying computer-readable code (e.g., program code 614) in a computer system (e.g., computer 102) including one or more processors (e.g., CPU 602), wherein the processor(s) carry out instructions contained in the code causing the computer system to encrypt data for merge join enablement. Another embodiment discloses a process for supporting computer infrastructure, where the process includes integrating computer-readable program code into a computer system including a processor. The step of integrating includes storing the program code in a computer-readable storage device of the computer system through use of the processor. The program code, upon being executed by the processor, implements a method of encrypting data for merge join enablement.
While it is understood that program code 614 for encrypting data for merge join enablement may be deployed by manually loading directly in client, server and proxy computers (not shown) via loading a computer readable storage medium (e.g., computer data storage unit 612), program code 614 may also be automatically or semi-automatically deployed into computer 102 by sending program code 614 to a central server or a group of central servers. Program code 614 is then downloaded into client computers (e.g., computer 102) that will execute program code 614. Alternatively, program code 614 is sent directly to the client computer via e-mail. Program code 614 is then either detached to a directory on the client computer or loaded into a directory on the client computer by a button on the e-mail that executes a program that detaches program code 614 into a directory. Another alternative is to send program code 614 directly to a directory on the client computer hard drive. In a case in which there are proxy servers, the process selects the proxy server code, determines on which computers to place the proxy servers' code, transmits the proxy server code, and then installs the proxy server code on the proxy computer. Program code 614 is transmitted to the proxy server and then it is stored on the proxy server.
Another embodiment of the invention provides a method that performs the process steps on a subscription, advertising and/or fee basis. That is, a service provider can offer to create, maintain, support, etc. a process of encrypting data for merge join enablement. In this case, the service provider can create, maintain, support, etc. a computer infrastructure that performs the process steps for one or more customers. In return, the service provider can receive payment from the customer(s) under a subscription and/or fee agreement, and/or the service provider can receive payment from the sale of advertising content to one or more third parties.
The present invention may be a system, a method, and/or a computer program product at any possible technical detail level of integration. The computer program product may include a computer readable storage medium (or media) (i.e., memory 604 and computer data storage unit 612) having computer readable program instructions 614 thereon for causing a processor (e.g., CPU 602) to carry out aspects of the present invention.
The computer readable storage medium can be a tangible device that can retain and store instructions (e.g., program code 614) for use by an instruction execution device (e.g., computer 102). The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
Computer readable program instructions (e.g., program code 614) described herein can be downloaded to respective computing/processing devices (e.g., computer 102) from a computer readable storage medium or to an external computer or external storage device (e.g., computer data storage unit 612) via a network (not shown), for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, switches, firewalls, switches, gateway computers and/or edge servers. A network adapter card (not shown) or network interface (not shown) in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
Computer readable program instructions (e.g., program code 614) for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, configuration data for integrated circuitry, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++, or the like, and procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
Aspects of the present invention are described herein with reference to flowchart illustrations (e.g.,
These computer readable program instructions may be provided to a processor (e.g., CPU 602) of a general purpose computer, special purpose computer, or other programmable data processing apparatus (e.g., computer 102) to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium (e.g., computer data storage unit 612) that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions (e.g., program code 614) may also be loaded onto a computer (e.g. computer 102), other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the Figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts or carry out combinations of special purpose hardware and computer instructions.
While embodiments of the present invention have been described herein for purposes of illustration, many modifications and changes will become apparent to those skilled in the art. Accordingly, the appended claims are intended to encompass all such modifications and changes as fall within the true spirit and scope of this invention.