The subject matter of this application is related to the subject matter in a co-pending non-provisional application by Richard R. Wessman entitled, “Method and Apparatus for Automatic Database Encryption,” having Ser. No. 09/680,599, and filing date 6 Oct. 2000.
The present invention relates to database security. More specifically, the present invention relates to a method and an apparatus for transparently encrypting and decrypting data on a column-by-column basis within a database.
Database security is an important feature in many database systems. In database systems, security is often achieved by encrypting data within the database system. Currently, there are two primary approaches for encrypting data stored in a database system. The first approach can be characterized as “bulk encryption” and performs cryptographic operations on entire database files. The second approach selectively applies cryptographic operations to specific sensitive columns within a database.
Bulk encryption typically entails encryption of the entire database because sensitive data is not just stored inside a particular table. Sensitive data may also appear in other database objects. For example, sensitive data may appear in an index, in change records of undo and redo logs, and in temporary sorting areas. Since these database objects are designed to be shared by the entire database system, it is not practical to separate data within these database objects so that some data is encrypted and some is not.
While bulk encryption is relatively simple to implement and is transparent to an application accessing the database, there are significant drawbacks. Chief among these drawbacks is the system performance degradation. It takes a long time to encrypt or decrypt the entire database file. In such a system, a rekey operation can involve decrypting and then re-encrypting the entire database file. These operations can take a large amount of time, which makes this solution unfit for large on-line transaction processing deployments. Also, the security of the system can be compromised because database records are exposed in shared memory as plain text after the data records are decrypted from the files.
The second approach limits the encryption to only those sensitive columns within the database, which can theoretically reduce the overhead involved in performing cryptographic operations. However, the systems currently available that use this approach suffer from some major drawbacks. The encrypt and decrypt operations must be explicitly applied to any references of the encrypted columns. For example, an application desiring to issue a command to retrieve the credit card number of a customer whose social security number is ‘123456789’ might issue the command:
Therefore, in this second approach the encryption and decryption operations are not transparent to application developers despite claims to the contrary by database system vendors. When a sensitive column is accessed, the encrypt or decrypt functions must explicitly be applied to the column data. To make such runtime function execution transparent to the user and secured, the application schema objects must be significantly altered. For example, a table with sensitive columns must be turned into a view in order to hide the cryptographic functions. This also means then that base object must be renamed because views and tables are in the same name space and cannot share a name. Triggers need to be created so that insert or update of the views will cause the data in the base table to be encrypted implicitly. Moreover, index support is limited because the server can build an index only with encrypted data which has lost its lexicographical order. This is so because encrypt and decrypt operations cannot be integrated with the index processing layers.
Hence, what is needed is a method and an apparatus for transparently encrypting and decrypting data on a column-by-column basis within a database system.
One embodiment of the present invention provides a system that facilitates encryption of data within a column of a database. The system operates by first receiving a command to perform a database operation. Next, the system parses the command to create a parse tree. The system then examines the parse tree to determine if a column referenced in the parse tree is an encrypted column. If a column referenced in the parse tree is an encrypted column, the system implicitly transforms the parse tree to include one or more cryptographic operations to facilitate accessing the encrypted column while performing the database operation.
In a variation of this embodiment, if the database operation includes a reference operation from the encrypted column, the system transforms the parse tree to decrypt data retrieved from the encrypted column during the reference operation to provide clear text.
In a further variation, if the command includes an update operation to the encrypted column, the system transforms the parse tree to encrypt data being updated in the encrypted column during the update operation to place encrypted data in the database.
In a further variation, if a column referenced in the parse tree is encrypted, the system identifies a cryptographic key for the column. The key is recovered only once for all accesses to the column for each command.
In a further variation, examining the parse tree involves determining if the user command is an explicit request to encrypt a presently unencrypted column in the database. If so, the system encrypts the column.
In a further variation, examining the parse tree involves determining if the user command is an explicit request to change an encryption key for a column. If so, the system decrypts the column with the current encryption key, and encrypts the column with a new encryption key.
In a further variation, examining the parse tree involves determining if the user command is an explicit request to decrypt an encrypted column in the database. If so, the system decrypts the column.
In a further variation, examining the parse tree involves determining if the user command is an explicit request to change an encryption algorithm for a column. If so, the system decrypts the column with a current encryption algorithm, and encrypts the column with a new encryption algorithm.
The following description is presented to enable any person skilled in the art to make and use the invention, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present invention. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
The data structures and code described in this detailed description are typically stored on a computer readable storage medium, which may be any device or medium that can store code and/or data for use by a computer system. This includes, but is not limited to, magnetic and optical storage devices such as disk drives, magnetic tape, CDs (compact discs) and DVDs (digital versatile discs or digital video discs), and computer instruction signals embodied in a transmission medium (with or without a carrier wave upon which the signals are modulated). For example, the transmission medium may include a communications network, such as the Internet.
Database System
Server 104 can generally include any computational node including a mechanism for servicing requests from a client for computational and/or data storage resources. Server 104 communicates with one or more clients and provides services to each client. This communication is typically across a network (not shown) such as the Internet or a corporate intranet. Server 104 may be implemented as a cluster of servers acting in concert to supply computational and database services.
Database 106 can include any type of system for storing data in non-volatile storage. This includes, but is not limited to, systems based upon magnetic, optical, and magneto-optical storage devices, as well as storage devices based on flash memory and/or battery-backed up memory. Database 106 can be directly coupled to server 104 or can be accessed across a network such as a corporate intranet or the Internet.
During operation, client 102 sends database commands to server 104. These commands are typically in a database language such as structured query language (SQL) and can include reference and update operations on database 106. If any of these reference or update operations include operations on encrypted columns, the operations are processed as described below in conjunction with
Server
Command parser 204 parses the command into the individual elements (operands, operators, etc.) that comprise the command. Command parsing is well known in the art and will not be discussed further in this description.
Command transformer 206 examines the parsed elements of the command to locate any reference or update operations related to encrypted columns within database 106. Upon locating a reference or update operation related to an encrypted column, command transformer 206 transforms the operation to include the necessary cryptographic operations to access the encrypted column. These transforming operations are described in detail in conjunction with
Cryptographic unit 208 performs cryptographic operations such as key management, encryption, and decryption. Any of a large number of standard key management systems can be used with this system. Encryption and decryption can be performed using any acceptable algorithm, such as the data encryption standard (DES), triple DES, or the advanced encryption standard (AES). Additionally, these encryption algorithms can be combined with integrity techniques such as secure hash algorithm 1 (SHA-1) or message digest 5 (MD5).
Database interface 210 includes mechanisms for accessing database 106. These accessing operations can include retrieving data from database 106 and storing or updating data within database 106. Note that transformation of a command and execution of the command may not happen in the same sequence of events. Execution of the command may happen at a later time.
Transforming Database Operations
After locating a referenced column, the system determines if the column is encrypted (step 308). If so, the system transforms the operation on this encrypted column to include cryptographic operations (step 310). Note that this transforming process is transparent to the user.
If the column is not encrypted at step 308 or after transforming the command to include cryptographic operations at step 310, the system performs the operations specified in the command thereby completing the command (step 312). Note that transformation of a command and execution of the command may not happen in the same sequence of events. Execution of the command may happen at a later time.
Parse Trees
After the DO has decrypted the “sal” column and the results have been multiplied by 1.01 by the “*” operator, the encryption operator “EO” encrypts the results and passes the results back to the database for storage in the “sal” column. The inputs to the EO operator are the results of the “*” operator, the algorithm identifier “alg_id” and the results of the “GK” operator. Note that since the key has not changed, the results of the “GK” operator for the decryption are shared with the EO for encryption. In fact, the “GK” operator is invoked only once to update the entire “sal” column.
Commands Involving Cryptography for a Referenced Column
As an example, if the command is a command to change the encryption key, such as:
The present invention provides encryption of data at the granularity of a column or column attribute (in the case of an Object database). This encryption is transparent to the applications that access the encrypted columns within the database.
Instead of relying on built-in or user defined encrypt and decrypt functions, the secrecy of a column is supported as part of the column properties. Like any other column properties, such as constraints or data type, the cryptographic characteristics of the column can be defined and altered at any time using data definition language (DDL) commands. The following are examples of typical administrative tasks that define and alter encrypted column properties.
The sensitive data can be re-encrypted with a different encryption algorithm using a statement such as:
If a decision is made to make the encrypted data available in plaintext instead, the following command can be used:
A column can also be declared as encrypted when a table is created. The following DDL command gives an example of encrypting the SSN and salary fields of an employee table during creation of the employee table:
;
When a column is specified as encrypted, all data in that column is encrypted with a column encryption key. This key is wrapped by one or more master keys before being stored in the server's metadata table. Retrieval of the master keys depends upon the selected key management scheme and the storage location of the master keys. This means that for any column cryptographic operation, the server must first find the master key and then use the master key to decrypt the encrypted column encryption key.
Runtime support of transparent cryptographic operations is based on the introduction of three internal operators on the server. They are (1) column encryption key retrieval, (2) encrypt data, and (3) decrypt data.
The key retrieval operator has arguments to accept the column encryption key identity, the master key identity, and the key management type. This operator returns the column encryption key in plain text.
The encrypt and decrypt operators have arguments for identifying the encrypted column data, the column encryption key, and the encryption algorithm identity. The key retrieval operator is separated from the encrypt/decrypt operators because it is desirable for the key retrieval operator to be evaluated only once per statement execution.
At statement parse time, a decrypt operator can be implicitly added around column attributes, which will receive encrypted data from the server. After the transformation, a typical reference of a column alone in an expression is equivalent to the following as if the decrypt function was explicitly applied.
Note that the arguments to the DECRYPT and ENCRYPT commands, except for the encrypted column data, are known at parse time because these values are included in the metadata being maintained at the time of a DDL command, which affects the encrypted column. These arguments are part of the statement context in the shared memory. However, these arguments reveal no sensitive information. The encryption key itself is retrieved only once at execution time and will appear only in the user session's per-execution memory. The algorithm for key encryption can be a system-wide configurable parameter or an optional argument can be added to the GET_KEY command.
At execution time, because of the implicit transformation on the statement context as described above, the encrypted data is decrypted before an expression evaluation. The plain text is encrypted after expression evaluation for inserted and updated values going into the persistent store. This also guarantees that the column's native data type format is preserved during encryption and decryption. Therefore, existing implementations for expression evaluation are not affected.
For example, assume that the salary column “sal” in the employee table is encrypted. The following update statement for a pay raise:
The key retrieval operator is capable of supporting multiple key management schemes where every master key or column encryption key has its own identity, which is universally unique. Note that the column encryption key protecting a particular column can have multiple copies, with each copy being wrapped by a different master key.
The following is an example showing the flexibility of the system. Assume that the system has a key management type identified by the variable “SERVER_HELD.” In this scheme, all of the column encryption keys are wrapped by a single master key kept in the server's wallet. The administrator may use the wallet manager to generate any number of master keys. The server, however, will pick only one for the database when a SQL command such as:
Hence, the my_db_ms_key is the key's external name. The server also creates a universally unique identity associated with the key. The server remembers only the current master key identity, while the key itself remains in the wallet. The adoption of a master key may also take place at database creation time because the wallet manager is not part of the database. Note that the above command also entails a re-key of all the column encryption keys in the database. The encrypted column data is not affected, however.
When the server generates a new column encryption key or replaces an old column encryption key as a result of one of the DDLs which manage the encrypted column, the column encryption key is wrapped by the server master key. The column encryption key ID, the master key ID, and the encrypted column key information are used as parameters for the key retrieval operator GET_KEY as described above. Based on the “SERVER_HELD” key management type, the operator is able to find the server master key in the wallet through the wallet application programming interface (API) and thereby recover the plain text column encryption key used for both encrypt and decrypt operations.
Clearly, the logic of the key retrieval operator is driven by the key management type. The key retrieval always sees the column encryption key identity and the master key identity. New key management schemes can easily be plugged into the system without affecting the implementation of the transparent data conversion between clear text and cipher text. With these universally unique identities, the keys can be stored anywhere as long as the operator can find them at runtime. DDL commands may need to be enhanced or new DDL commands may need to be added to support different key management types.
The foregoing descriptions of embodiments of the present invention have been presented for purposes of illustration and description only. They are not intended to be exhaustive or to limit the present invention to the forms disclosed. Accordingly, many modifications and variations will be apparent to practitioners skilled in the art. Additionally, the above disclosure is not intended to limit the present invention. The scope of the present invention is defined by the appended claims.
Number | Name | Date | Kind |
---|---|---|---|
5052040 | Preston et al. | Sep 1991 | A |
5311595 | Bjerrum et al. | May 1994 | A |
5369702 | Shanton | Nov 1994 | A |
5751949 | Thomson et al. | May 1998 | A |
5852046 | Lang | Dec 1998 | A |
5898781 | Shanton | Apr 1999 | A |
5924094 | Sutter | Jul 1999 | A |
5963642 | Goldstein | Oct 1999 | A |
5966707 | Van Huben et al. | Oct 1999 | A |
6185681 | Zizzi | Feb 2001 | B1 |
6292899 | McBride | Sep 2001 | B1 |
6327594 | Van Huben et al. | Dec 2001 | B1 |
6327595 | Lyson | Dec 2001 | B1 |
6336121 | Lyson | Jan 2002 | B1 |
6339832 | Bowman-Amuah | Jan 2002 | B1 |
6434568 | Bowman-Amuah | Aug 2002 | B1 |
6442748 | Bowman-Amuah | Aug 2002 | B1 |
6564225 | Brogliatti et al. | May 2003 | B1 |
7093137 | Sato | Aug 2006 | B1 |
7111005 | Wessman | Sep 2006 | B1 |
7269729 | He | Sep 2007 | B2 |
20010029582 | Goodman et al. | Oct 2001 | A1 |
20020078049 | Samar | Jun 2002 | A1 |
20030014394 | Fujiwara et al. | Jan 2003 | A1 |
20030046572 | Newman et al. | Mar 2003 | A1 |
20040243816 | Hacigumus et al. | Dec 2004 | A1 |
20050044366 | Pucheral et al. | Feb 2005 | A1 |
20060005017 | Black et al. | Jan 2006 | A1 |
20060095791 | Wong | May 2006 | A1 |
20060236104 | Wong et al. | Oct 2006 | A1 |
20060288232 | Ho et al. | Dec 2006 | A1 |
20070079140 | Metzger et al. | Apr 2007 | A1 |
20070174271 | Mattsson et al. | Jul 2007 | A1 |
20080033960 | Banks et al. | Feb 2008 | A1 |
20090077378 | Hacigumus et al. | Mar 2009 | A1 |
20110113050 | Youn et al. | May 2011 | A1 |
Number | Date | Country |
---|---|---|
0 518 466 | Dec 1992 | EP |
1089194 | Apr 2001 | EP |
1207462 | Jul 2001 | EP |
WO 0135226 | May 2001 | WO |
WO 0229577 | Apr 2002 | WO |
Entry |
---|
Bruce Benfield, Richard Swagerman: Encrypting Data Values in DB2 Universal Database, Online! Aug. 14, 2001 (Aug. 14, 2001), pp. 1-5, XP002295440. Retrieved from the Internet: URL:http://www-106.ibm.com/developworks/db2/library/techarticle/benfield/0108benfield.html> retrieved on Sep. 8, 2009! The whole document. |
Number | Date | Country | |
---|---|---|---|
20040255133 A1 | Dec 2004 | US |