The present invention relates generally to data networking and, in particular, relates to a method and system for protecting privacy-sensitive information through redundant, encrypted and distributed storage of information.
These days, people have more and more electronic devices, such as personal computers (PCs), personal digital assistants (PDAs), cell phones, and other electronic devices that use traditional ways of storing and accessing information. Each device stores information at a different storage facility than the other devices so that information is not commonly accessible. Some examples of storage facilities include a file transfer protocol (FTP) server, e-mail accounts, shared network drives and other storage locations. Not all storage facilities have appropriate security mechanisms for storing privacy-sensitive information. Some storage facilities are not permanent so that the information stored there may not remain accessible.
Various deficiencies of the prior art are addressed by various exemplary embodiments of the present invention of a method and system for protecting privacy-sensitive information through redundancy, encryption and distribution of information.
One embodiment is a method for storing information. Information is received for storage from one of a number of user devices. The information is divided into a plurality of data units that are stored on a plurality of segments. The segments are fixed size data units used to store data. The segments are stored distributed across a set of storage facilities. The storage facilities in the set are nodes on a network and associated with the devices.
Another embodiment is a system for storing information, including a virtual disk device driver and a device. The virtual disk device driver represents a set of storage facilities and communicates with a master node on a network. The master node maintains an index table. The index table stores the location the segments of a virtual disk. Each segment has an identifier and a version. The virtual disk is controlled by the virtual disk driver. The device is capable of running an operating system and the virtual disk device driver. The device runs the virtual disk device driver to store information that is divided into the segments. The device is one of a number of devices that are associated with the storage facilities.
The teachings of the present invention can be readily understood by considering the following detailed description in conjunction with the accompanying drawings, in which:
The invention is primarily described within the general context of exemplary embodiments of a method and system for protecting privacy-sensitive information through redundancy, encryption and distribution of information. However, those skilled in the art and informed by the teachings herein will realize that the invention is applicable generally to all kinds of information whether private or not, data networking, virtual disks, and interconnecting computing devices, even if the optional redundancy and encryption are not used.
One exemplary embodiment is an abstraction layer for the user 102 that provides the user 102 with one uniform view on a set of storage facilities that are accessible by the user 102. This abstraction hides details from the user 102, such as communication with different types of storage facilities 114, 116, 118, 120, 122, where information is stored, and the like. From the perspective of the user 102, the abstraction layer is just another storage facility. The uniform view is a virtual storage system, implementable as a software layer, i.e., the abstraction layer. The user 102 uses the abstraction layer to store and retrieve information from the set of storage facilities in a way that is similar to saving and loading files. The abstraction layer divides the information, encrypts the information, and distributes it redundantly over the set of storage facilities. For several reasons, this protects the privacy of the information, even if the storage facilities cannot be completely trusted. First, the information is divided so that any one piece of information does not provide the whole information. Second, the information is encrypted, making it difficult for others to understand. Third, the information is distributed and it is difficult to find where the rest of the information is located.
This exemplary embodiment of the abstraction layer stores information distributed across different storage facilities 114, 116, 118, 120, 122. These storage facilities 114, 116, 118, 120, 122 are not necessarily the same type of storage facility. Storage facilities 114, 116, 118, 120, 122 can be of any type, such as e-mail accounts, web spaces, file systems, shared network drives, and other types of storage facilities. Storage devices in the set of storage facilities may be added or removed. The information stored is divided into different parts that are encrypted and redundantly distributed across the set of storage facilities. The information is accessible by a virtual storage system, which may be accessed by the user 102 from different locations and by various devices 104, 106, 108, 110, 112.
One exemplary embodiment includes an intermediate layer, i.e., a virtual disk device driver, which stores information to maintain privacy and availability. The information is divided into segments, encrypted, and stored redundantly. From the perspective of the user, the virtual disk device driver hides all of the details about the various storage facilities. The user can store a file to the virtual disk just as the user stores a file to any disk.
As shown in
During initialization (see
For example, in
One exemplary embodiment is a method of storing information using this intermediate layer, i.e., the virtual disk device driver 204. Information 302, such as a file, is received by the virtual disk device driver 204 along with information about the degree of privacy and availability needed by the user 102, e.g., low, medium, high for both privacy and availability. Based on this information, the intermediate layer determines how to store the information. How to store the information is determined by considering the appropriate redundancy to apply, the appropriate division to apply, and the appropriate encryption to apply, or whether to apply them at all. Determining the appropriate redundancy depends on what extent and whether the segments should be replicated and where the segments should be stored. Determining the appropriate division of information depends on whether and to what extent the information should be divided and the size of the segments, among other things.
If the availability is important, it is less desirable to divide the information and more desirable to store it redundantly over the set of storage facilities. If privacy is important, it is desirable to divide the file into many pieces and distribute them across the storage facilities, possibly encrypted, but it is less desirable to store information redundantly. A trade-off is made between privacy and availability, because they have competing interests with respect to dividing information and storing it redundantly. Encryption stands orthogonal to this trade-off can be used to compensate dividing the information. Table 1 below illustrates how the redundancy, encryption, and division of files are influenced by the privacy and availability needs in an exemplary embodiment.
Another exemplary embodiment is a virtual hard disk that embodies the principle of distributed storage. In a conventional hard disk, the physical data storage area is divided into a number of fixed size sectors. When a single file is stored on the hard disk, it spans one or more sectors. A disk device driver in a computer is software that handles read and write access to the data on disk. The virtual hard disk also uses fixed sized segments to store data. However, these segments are stored on other computer nodes in the network, rather than on one physical hard disk. A virtual disk device driver hides the location of the sectors from the operating system. The operating system uses the virtual disk device driver to access the data on the virtual hard disk just as it would on a regular hard disk.
In this exemplary embodiment, a virtual disk device driver 204 uses a peer-to-peer network 216 to store its data segments 210, 212, 214 redundantly across nodes 218, 222, 224 in the peer-to-peer network 216. Peer-to-peer networks 216 have any number of nodes 218, 22, 224 that cooperate to store and locate data. These networks 216 are often used for file sharing. Peer-to-peer networks 216 are dynamic in the sense that nodes 218, 222, 224 can join or leave the network 216 at any time. Files are, therefore, typically stored as redundant copies across multiple nodes 218, 222, 224 to reduce the risk that a file cannot be found, because the node storing the data is offline.
In this exemplary embodiment, the virtual disk device driver 204 stores multiple copies of segments 304, 306 across nodes in the peer-to-peer network 216. For security reasons, each segment is encrypted 306 before it is sent to the network 216. An encryption key may be configured in the virtual disk device driver 204 and provided by the user 102 during installation.
If it is not a new data segment at 606, then, at 608 the data segment version number is updated in the index table. At 610, data segment #x version #y is written to the nodes. At 612, data segment #x version #y-1 is removed form the nodes. At 614, the index table is encrypted. At 616, the new index table is written to the nodes.
If it is a new data segment at 606, then, at 618, data segment #x version number 1 is added to the index table. At 620 data segment #x version #y is written to the nodes. At 622 the index table is encrypted. At 624, the new index table is written to the nodes.
There are many benefits to the various exemplary embodiments. The virtual disk device driver is transparent to the operating system. The operating system treats the virtual disk just as it treats any other disk. The data is stored redundantly across a number of nodes, reducing the risk of data loss. The redundancy factor can be set to balance storage needs with the risk of data loss. The virtual disk device driver concept can be applied to any operating system, independent of the file system used, by providing the appropriate virtual disk device driver for the operating system. Nodes in the peer-to-peer network do not store complete files, but only one or more data segments. Each segment contains a fixed amount of data, which can be either one or multiple files, or part of a larger file. Segment encryption can be used to provide data security.
In the prior art, some devices 104, 106, 108, 110, 112 may have ad hoc direct access to specific types of information, depending on the capabilities of a particular device 104, 106, 108, 110, 112. However, no particular device has access to all the different storage facilities 114, 116, 118, 120, 122 of the other devices or knowledge of the particular protocols or data formats necessary for information exchange. Prior art encryption mechanisms, such as encrypted file systems, file encryption standards, user logon procedures and the like, do not all support the highest level of security needed to store privacy-sensitive information. Prior art encryption mechanisms only apply to one local system within one administrative domain and do not apply to other environments.
Exemplary embodiments of the present invention have many advantages over the prior art, including securely store privacy-sensitive information in a potentially untrusted environment. Exemplary embodiments combine storage of privacy-sensitive information in an infrastructure that cannot be completely trusted with integration of various types of data storage facilities into one virtual storage system. The virtual storage system can be dynamically extended with other storage facilities and remains accessible from any device, unlike the prior art. Exemplary embodiments provide users with a way to manage their information and mechanisms to securely store privacy-sensitive information reliably. Exemplary embodiments of the present invention can help prevent a user from losing data when, for example, a disk crashes on his home computer.
The processor 730 cooperates with conventional support circuitry 720 such as power supplies, clock circuits, cache memory and the like as well as circuits that assist in executing the software routines stored in the memory 740. As such, it is contemplated that some of the steps discussed herein as software methods may be implemented within hardware, for example, as circuitry that cooperates with the processor 730 to perform various method steps. The computer 700 also contains input/output (I/O) circuitry that forms an interface between the various functional elements communicating with the computer 700.
Although the computer 700 is depicted as a general purpose computer that is programmed to perform various functions in accordance with the present invention, the invention can be implemented in hardware as, for example, an application specific integrated circuit (ASIC) or field programmable gate array (FPGA). As such, the process steps described herein are intended to be broadly interpreted as being equivalently performed by software, hardware, or a combination thereof.
The present invention may be implemented as a computer program product wherein computer instructions, when processed by a computer, adapt the operation of the computer such that the methods and/or techniques of the present invention are invoked or otherwise provided. Instructions for invoking the inventive methods may be stored in fixed or removable media, transmitted via a data stream in a broadcast media or other signal bearing medium, and/or stored within a working memory within a computing device operating according to the instructions.
While the foregoing is directed to various embodiments of the present invention, other and further embodiments of the invention may be devised without departing from the basic scope thereof. As such, the appropriate scope of the invention is to be determined according to the claims, which follow.