1. Field of the Invention
The present invention is generally related to the digital data archiving systems and, in particular, to a system and methods of enabling the secure archiving and retrieval of digital data subject to access management and auditing controls.
2. Description of the Related Art
The desire and need for long term retention of personal and business data creates a complex set of problems that have not been adequately addressed to date. These problems are particularly acute for various business and scientific organizations that accumulate substantial volumes of data on a daily if not continuous basis and further expect to accumulate ever growing volumes going forward. Security concerns, particularly whenever personal data and critical business data are involved, and other factors, including regulatory and insurance requirements, impose significant complexities on the ongoing creation and maintenance of large scale data archives. Archives of comparably modest size are also subject to the same management requirements and thus encounter most if not all the same complexities.
Even beyond the complexities of organizing and controlling the ordered storage of large volumes of data, essentially arbitrary retrieval must be supported at any point within the lifetime of an archive. Particularly for business records, reliably access to archived data records is required for periods likely exceeding thirty years. Not only does the data need to be fully identifiable and recoverable, but the particular security concerns associated with particular data records, in place at the time of creation, need to be continuously maintained and enforced.
Given the size and structural diversity of business and scientific organizations, often reaching a global scope even when just considering data retention concerns, there are also fundamental requirements for archiving scalability and throughput performance. Where terabytes and more need to be archived in a matter of hours, organizations will typically implement automated tape library systems supporting the parallel striping of data to large tape drive arrays. Where the speed and capacity requirements outweigh cost issues, library systems utilizing disk drive arrays are commonly used.
Sophisticated, often proprietary backup application program and driver systems are used to manage these libraries. An inherent concern, however, is that if data security and retrieveability are dependent on proprietary hardware or software, then that hardware and software must be maintainable for the full life of the archived data. A known, but conventionally unmet desire is for archived data is to be free of such storage system dependencies, yet without compromise of the data security originally employed by those systems in the creation of data archives.
Particularly in certain publishing, data mining, and similar industries, various segments of a data archive must be maintained readily accessible for analysis and other uses during the full lifetime of the archive. These types of data releases are often limited, if not precluded, due to the unavailability of automated mechanisms for auditing, authorizing, and securely controlling individual data release transactions.
Even where an archive access transaction is permitted, a related concern is securely controlling the scope of access permitted and keeping a clear and detailed audit trail of each access. Whenever a secure access key is released in some capacity to a third party, there are limited controls that prevent use of the key to access other data secured by the same key. Conventionally, security keys are periodically rotated to enforce a compartmentalization of the secure data. Key rotation, however, imposes an additional burden on the already complex problem of accurately and securely maintaining password keys for all of the data accumulated in a data archive. Given that many different entities, including owners of different data aspects, regulators, affiliates, licensees of divisible data rights, and various system operators, should have different and detailed access controls applied to their uses, conventional security systems are generally unable to define and maintain separate password keys for such fine grained access, even without achieving the further desire of supporting and enforcing key rotation.
Consequently, there is a fundamental need for a consistent data archiving, security, and auditing system that supports the creation and long term management of fundamentally portable data archives.
Thus, a general purpose of the present invention is to provide an efficient system and methods of creating and retrieving archive data in a secure, portable, and auditable manner.
This is achieved in the present invention by providing, on an archive server, a secure storage control layer interposed in the archive data stream between an archiving application and a storage device driver. The secure storage control layer includes an encryption engine providing for cipher processing of data segments transported by the stream. A secure policy controller is coupled to the secure storage control layer and, responsive to identifying information obtained from the stream, retrieves a group of encryption keys from a secure storage repository to enable the encryption engine to selectively encrypt data segments or preferably a single encryption key conditionally enabling the encryption engine to decrypt select data segments. For both encryption and decryption, the integrity of the stream is maintained allowing operation of the secure storage control layer to be functionally transparent to the archiving application and storage device driver.
The two-level encryption is preferably implemented in the present invention in a process that operates on data units, which include a unit metadata header and a data segment, transferred as part of the archive data stream. For each of a series of archive data units, the process includes selecting a segment encryption key corresponding to a predetermined data unit, first encrypting said data segment of said predetermined data unit with the segment encryption key to produce an encrypted data segment, second encrypting the segment encryption key by each of a set of security control encryption keys and storing the segment encryption key, as encrypted, in a security metadata header, and packaging the unit metadata header, the security metadata header and the encrypted data segment as said replacement data unit in the archive data stream.
Access to the archive data is securely managed by selectively controlling the retrieval of any of the security control encryption keys that would allow decryption of the segment encryption key. For each of a series of archive data units, the process includes retrieving a security control encryption key from a secure repository, conditionally subject to a security policy that determines the user groups that may retrieve a corresponding security control encryption key, using the security control encryption key to decrypt from a security metadata header the corresponding segment encryption key, decrypting the corresponding encrypted data segment, and packaging the unit metadata header, and the decrypted data segment as a replacement data unit in the archive data stream.
An advantage of the present invention is that archived data is reliably secured effectively transparent to the particular implementation of the archiving application and underlying archive driver and devices. Consequently, access, subject to long term maintenance of the archive data, can be assured. In addition, the security controls governing access to the archived data are flexible and allow for access by multiple security policy defined groups.
Another advantage of the present invention is that implementations of the present invention are readily adaptable to and support high performance, scaleable, data archiving system architectures. The security control driver layer as typically implemented by the present invention is easily installed and maintained in well-established conventional archiving system architectures. Once installed, subject to ordinary policy management maintenance, the operation of the present invention is very nearly if not fully automated.
A further advantage of the present invention is that the system supports and enforces security policy defined key management controls. Multiple security keys can be defined on an essentially per-storage-unit basis, allowing implementation of fine grained, cross-cutting concern security controls over access to the archived data. The policy defined key management controls also enables full key rotation for all keys automatically or by minimal, centralized management of the key policies.
Still another advantage of the present invention is a variety of implementation architectures are supported enabling use in a variety of configurations and controlled uses. The secure key repositories can be flexibly implemented as local and remote software-based modules or on security control appliance. Access to archived data can be constrained to specific authenticated users or to defined user groups provided with a group authentication identifier. In the latter instance, an affiliate reader-only mode of use is supported, allowing a known generic group of users to securely access archive data, even though the specific identities of the users may not be known at the time of archive creation and do not subsequently require user explicit identification in the security policies to allow controlled access. Revocation of a user or group security policy identification effectively terminates all subsequent access to the archive data, thus ensuring continuing security control.
Yet another advantage of the present invention is that full auditing of archive data access is automatically supported through the required use of the secure key repositories. Each access of the repository to obtain an encryption key is subject to security policy evaluation and, concurrently, attempt and action logging by the repository server. This auditing allows comprehensive examination and management of the archive data use.
b provide state diagrams illustrating preferred processes of validating and enabling the encryption and decryption of content data segments in accordance with preferred embodiments of the present invention;
Given the volume of data conventionally required to be archived on a routine if not continuous basis, much of the architectural development of archiving systems has been directed to the development of fast, scaleable, if not inherently large scale archive device libraries and correspondingly complex and frequently proprietary archiving control applications. Tape and disk libraries supporting terabytes of online storage and petabytes of robotically accessible, offline storage are not uncommon. The growth in archived data is generally matched by the increasing need to ensure future accessibility and secure control over those entities allowed to access the data.
Conventional archive data system architectures are generally of the form 10 shown in
A third-party archiving application 22, such as VERITAS NetBackup™, VERITAS Backup Exec™, Legato NetWorker™, CommVault® Galaxy™, IBM® Tivoli® Storage Manager, Computer Associates BrightStor®, and BakBone® NetVault™, is typically able to interface with one if not several of these de-facto standard tape library device drivers. These archiving applications 22, in various forms, support distributed agent modules 241-N that enable typically distributed client data systems 261-N to be accessed and transfer data for archiving to the host computer system 12. Data to be archived is typically collected and streamed over an internet or intranet network connection to the archive application 22.
As generally represented in
In accordance with the preferred embodiments of the present invention, referring again to
As generally illustrated in
Operation of the secure archive driver 28 is preferably controlled by a policy enforcement manager (PEM) 30. The underlying operation of the secure archive driver 28 is to selectively encrypt and decrypt the archive data stream transferred through the secure archive driver 28. The PEM 30 preferably operates to observe the transfer of data and qualify the ciphering operation of the secure archive driver 28, including as appropriate obtaining encryption keys from a secure repository server 32 for use by the secure archive driver 28 and to authenticate, directly or indirectly as available, the user or operator 54 of the archiving application 22. In the preferred embodiments of the present invention, the secure policy server 32 is used to store and qualify access to sets of encryption keys. The secure policy server 32 may be implemented on a remote-server, as generally shown in
A clear text archive data stream 60, typically as presented to the secure archive driver 28 initially for processing, is illustrated in
In accordance with the present invention, an archive data stream 60 is modified to incorporate a security control identifier and to selectively encrypt the content segments 681-N. For the preferred embodiments of the preset invention, the incorporation of the security control identifier is accomplished by including the identifier in an available session description field conventionally provided by the archive application 22. Typically, a session description field is an otherwise empty text field offered by the archive application 22 to allow an administrator to add a custom text string to describe the type or instance of the archive session. The archive application 22 directly transcribes this text string into an optionally used field within the archive session metadata header 62, or into each of the metadata headers 661-N or both. Relative to the operation of the archive application, the text string is entirely non-functional in that the presence, absence, or content of the string has no affect on the operational function of the archive application 22; the content of the field is thus functionally transparent to the archive application 22. If an ordinary description field is not available, then any other functionally transparent field that occurs in the session metadata header 62, or in the metadata headers 661-N can be used. Alternately, if the archive application 22 is implemented in contemplation for use with the present invention, a dedicated field may be specifically provided, preferably in the session metadata header 62.
The security control identifier is preferably created by operation of the PEM 30. In the preferred embodiments, a GUI may be presented to the user 54 to assist in the creation of the identifier. Once created, the security control identifier is inserted into the chosen descriptive field within the session metadata header 62, as is preferred, or metadata headers 661-N, as received by the secure archive driver 28 from the archive application 22. As generally shown in
In the preferred embodiments, the individual archive units 641-N are processed by the secure archive driver 28 dependent on the security control identifier specified for the session that the archive units 641-N belong to and, optionally, the content source of the archive data contained in each of the archive units 641-N Consequently, the system 10 implemented by the present invention is not only tolerant, but fully supports any interleaving of archive units 641-N belonging to different archive sessions by the archive application 22. Furthermore, the system 10 can potentially vary the security controls applied to the data being archived based on the particular source of the data, as defined in the metadata headers 661-N typically in terms of a universal resource identifier (URI) or source filesystem.
The secure archive driver 28 preferably functions to encrypt and, optionally, compress the data contained in an archive unit 641-N. For example, considering an archive unit 641 as representative of the archive units 641-N a content segment 681 is encrypted and replaced in the archive data stream 60 by the combination of an encryption metadata header 721 and encrypted content segment 741. For the preferred embodiments of the present invention, a symmetric encryption key is generated for the archive unit 641 and used to create the encrypted content segment 741. This symmetric key is then encrypted using the public encryption key members of a group of public key encryption key pairs. The multiple encrypted copies 761 (A-X) of the symmetric key for the encrypted content segment 741 are then stored in the encryption metadata header 721. The metadata header 661, encryption metadata header 721 and encrypted content segment 741 then constitute a replacement archive unit 641. The replacement archive units 641-N including any selectively determined not be processed, such as the archive unit 642, are substituted by the secure archive driver 28 to create the archive data stream 70.
In the preferred embodiments of the present invention, the archive units 641-N are discretely processed to accommodate the potential interleaving of archive units from different archive sessions in the archive stream and to allow differential encryption control based on source content identifiers or other qualifying information contained in the archive unit metadata headers 661-N. As generally illustrated in
The preferred process 80 of resolving a security control identifier for purpose of enabling the processing of the archive units 641-N is generally shown in
In the preferred embodiments of the present invention, the security control identifier is a string list of one or more names of security control groups predefined on the security repository server. For example, a security control identifier may be defined as “corpA-admin01, corpA-division04,” where the secure repository server stores, subject to authenticated access, one group of encryption keys associated with the identifier “corpA-admin01” and another group of encryption keys associated with the identifier “corpA-division04.” Each of these groups may contain one or more encryption keys.
For a given archive unit 641-N then, the authentication token 82, security control identifier 84, and, optionally, a content identifier 86 extracted from the corresponding metadata header 661-N and passed to the PEM 30 are then presented as a request to the secure repository server 32. Provided the authentication token 82 is enabled, subject to the authentication rules implemented by the repository 32, the collected encryption keys 88 referenced by the security control identifier 84 are returned. These encryption keys 88 may be non-persistently cached by the PEM 30. On the implied confirmation that encryption is enabled for this given archive unit 641-N the secure archive driver 28 generates a symmetric key 90. The corresponding content segment 681-N is encrypted with the symmetric key 90 and a corresponding encryption metadata header 661-N is created. The symmetric key 88 is encrypted with each of the keys contained in the returned group of keys 88, and stored in a slot data structure 761-N (A-X) within the corresponding encryption metadata header 661-N.
The preferred process 100 of resolving a security control identifier for the purpose of reverse processing the archive units 641-N is generally shown in
Notably, all attempts to access the content of a secure data session require access requests to be posted to and resolved by the secure repository server 32. Preferably, the secure repository server 32 implements an access request log to collect general and administrative operating information, such as system initialization, shutdown, and restart, and network connects and disconnects between different client/server components, and backup and restore operation requests of critical security parameters (CSPs), including hosts, policies, and keys. Operational information related to individual and groups of access requests will also be logged, including the request time, the network identification of the system originating the request and the resulting response, and the requested backup and restore archive actions. Each logging event is preferably stored with a timestamp, event type identifier, severity value, subsystem identifier, success value, object (key, policy, host, etc.) accessed as part of the action, and an optional action description. Consequently, the present invention provides a well-defined auditing mechanism for all secured session data accesses, including both succeeded and failed requests.
Where a decryption key is returned 102, the secure archive driver 28 decrypts a corresponding one of the encrypted symmetric keys 761-N (A-X). Preferably, the decryption key is applied sequentially to the encrypted symmetric keys 761-N(A-X) and the decryption verified preferably using an envelope encryption verification or other known-text verification technique. Once verified decryption of a symmetric key is achieved, the symmetric key is used to decrypt the corresponding content segment 681-N. The encryption metadata header 721-N is discarded, and the resulting clear text archive unit 641-N is substituted into the archive data stream.
A preferred implementation 110 of the secure archive driver 28, relative to the processing of session metadata headers, is shown in
The reverse processing 130 of archive units 641-N through a preferred embodiment of the secure archive driver 28 is shown in
A preferred embodiment 140 of a secure repository server 32 is shown in
Upon receipt of a web service request, the secure web services daemon 142 qualifies the request against the authentication token. In the preferred embodiments of the present invention, the authentication token is verified against either a locally accessible smart card 144, or similar security device, or external security server 146 implementing an active directory or LDAP security service. Where the authentication token is verified, the request is considered. To process and secure a new archive session, a local key store 144 is accessed to retrieve the security control identifier determined encryption key groups. To recover a secure archive session, the private key member of the encryption key pair identified by the authentication token is retrieved from the local key store 144. Both the initial request and eventual response by the secure web services daemon 142 is transferred through a secure network connection with the requesting PEM 30.
The preparation of encryption key groups, for use in accordance with the present invention, is preferably performed on a secure archive management computer system that hosts the secure repository server 32 or that can securely connect to the secure repository server 32. An administrative process 150, as shown in
For the preferred embodiments of the present invention, a variety of information can be extracted from the host computer system 12 and archive data streams 60 that can be used to identify and qualify the use of discrete key groups 1561-N. Information identifying the host computer system 12, the archive application 22, and the content of an archive data stream 60 can be processed by PEM 30, whether obtained directly by the PEM 30 or through the secure archive driver 28, to create an attribute set that is sent as part of a request to the secure repository server 32. Preferably, the attribute set includes the security control identifier, authentication token, the user name or ID of the process owner running the archive application, the IP address and DNS name assigned to the host computer system 12, the group user id (GUID) and hardware device identifier specified by the archive application 22, and information extracted from fields existing within the archive metadata header 62 and archive unit metadata headers 661-N, including descriptive keywords and the filesystem metadata identifying the archived content. The attribute set may also include an archive application identifier, the command line string used to invoke the archive application.
A preferred process 160 of selectively retrieving encryption key groups 1561-N for use in the encryption processing of an archive session is illustrated in
A secure archiving system constructed in accordance with the present invention can be distributed and operated in a variety of modes relative to the location and number of available secure repository servers 32. As generally shown in
Alternately or in addition, remote systems 1721-N implemented in any combination of server computer systems and appliances, can support separate secure repository servers 32. These remote systems 1721-N are preferably accessible through secure network connections 174. For the preferred embodiments, each of these remote systems 1721-N can store the same and different sets of key groups 1561-N providing generalized redundancy as well as allowing specialization as administratively determined appropriate for the combined network of remote systems 1721-N. Preferably, the PEM 30 maintains a persistent list of the remote systems 1721-N administratively updateable or automatically updateable from any of the remote systems 1721-N potentially whenever a connection is made to any of the remote systems 1721-N. This configuration allows the PEM 30 to search a variety of secure repository servers 32 for the necessary information to enable operation.
Another secure archiving system configuration 180 is shown in
Thus, a system and methods for providing for the secure archiving of data has been described. While the present invention has been described particularly with reference to tape and hard disk-based storage media, the present invention is equally applicable to other forms of media and corresponding variety of media control systems.
In view of the above description of the preferred embodiments of the present invention, many modifications and variations of the disclosed embodiments will be readily appreciated by those of skill in the art. It is therefore to be understood that, within the scope of the appended claims, the invention may be practiced otherwise than as specifically described above.