1. Field of the Invention
The present invention relates to a system and method for distributing and storing files.
2. Description of Related Art
Conventional distributed file systems are known for allowing different subsets of files to be on different physical file systems. A given file or a specific replicated instance is stored in its entirety on a single underlying disk or physical file system.
A conventional file system accessed from a personal computer or a workstation appears as a tree of directories, each directory containing additional directories or files. A given file is typically stored entirely on a single hard-disk drive or on a cluster of drives in a scheme called Redundant Array of Independent Disks (RAID). Both types of file systems are prone to data loss and security vulnerabilities due to the following reasons. File data that is not backed up onto a separate drive is lost when the underlying single-drive system fails. RAID systems have the ability to protect against hard-drive failures by virtue of storing data that is replicated and striped over multiple drives, or striped and stored along with error-correction parity bytes. While the approach is more robust than single-drive systems, it has been shown that multiple disk-drive failures tend to be more frequent (somewhat negating the advantage of RAID systems) than would be expected from statistical and probabilistic analysis. The reasons for the higher-than-expected correlation in failure is the identical aging profile and similar environmental factors (such as temperature, humidity vibration and power fluctuations) experienced by the underlying physical drives when clustered in the same facility or cabinet. Traditional RAID systems employ sets of identical disks that are highly synchronized in terms of access times.
When file data is located in a conventional single physical file system, it is more vulnerable to theft as a potential intruder only needs to discover the security vulnerabilities of that particular physical file system.
It is desirable to provide an improved distributed storage scheme having reduced storage requirements while simultaneously improving security and availability by mitigating the impact of accidental loss or theft of one or more disks .
The present invention relates to a distributed storage scheme, referred to as
Distributed Fragments File System (DFFS). The Distributed Fragments File System (DFFS) uses N (N>=1) conventional physical file systems, such as Network File System (NFS), Internet SCSI (iSCSI), Service Message Block (SMB), or Hypertext Transfer Protocol (HTTP) REST-based Cloud storage, to create a unified and, if desired, physically distributed file system. In one embodiment of the distributed storage scheme, every file is encrypted, interleaved and broken into N fragments, and the various fragments are stored on different constituent physical file systems. This is in contrast to conventional distributed file systems where different groups of files may be on different physical file systems, but a given file or its replicated instance is stored in its entirety on a single underlying disk or physical file system. The distribution of file fragments of the present invention—as opposed to distribution of entire files—provides the following advantages:
1. Intrinsic security: Distributed Fragments File System (DFFS) improves security as a given file is not found in its entirety on any specific physical file system, but is encrypted, interleaved, fragmented, and dispersed across a plurality (N) of file systems.
2. Fault tolerance: The distributed storage scheme supports redundancy in storage that tolerates one or more faults in the constituent physical storage systems. Redundancy is achieved without replicating file data. File access remains uninterrupted and potential failures in the underlying physical storage systems are transparent to the end-user while reducing storage requirements.
3. Reduced storage requirements: Instead of full replication of files that are distributed across N file systems (implying a storage expansion factor of N), Distributed Fragments File System (DFFS) employs a scheme with an N/(N−1) storage expansion factor, where N is the number of constituent physical file systems. Redundancy storage overhead necessary to tolerate, for example, loss of a single physical file system, decreases as the number of physical file systems increases. For example, when N is 3, the storage expansion factor is 1.5. When N is 4, the storage expansion factor is only 1.33. Both those cases compare very favorably with the respective expansion factors of 3 and 4 for conventional storage schemes.
4. Use of disparate file systems with no stringent timing synchronization requirements or equality of lower-level parameters such as block sizes.
The Distributed Fragments File System (DFFS) of the present invention is broadly applicable, but is especially useful when storing data at facilities—such as a public cloud storage provider—that are not controlled or maintained by the creator of the data. Users of the Distributed Fragments File System (DFFS) can utilize multiple public cloud service providers to create a customized and virtualized cloud storage system with the following significant advantages:
No single public cloud storage provider (or physical file system) has the entire file. Only fragments derived by a sequential process involving encryption, interleaving and slicing of a file are stored at each public cloud provider or physical file system. The reverse process applies for recreating files from fragments.
The interleaving and fragmentation can be performed post-encryption. Security is enhanced as theft of a data fragment from any of the underlying file systems is inconsequential. Access to all other fragments residing at other cloud providers or constituent file systems - is essential to reconstruct the original file.
When used in conjunction with the special error correction techniques described later, DFFS does not require replication of data for tolerating failures of the underlying physical file systems or cloud storage providers. This translates into lower expenses compared to traditional replication-based storage systems.
The invention will be more fully described by reference to the following drawings.
Reference will now be made in greater detail to an embodiment of the invention, an example of which is illustrated in the accompanying drawings. Wherever possible, the same reference numerals will be used throughout the drawings and the description to refer to the same or like parts.
The following terminology is used:
1. N is the number of underlying physical file systems, each capable of hosting a conventional file system structure comprised of directories and files.
2. Virtual file system (VFS) aggregates constituent physical file systems denoted PFS0, PFS1, . . . PFSN−1.
3. A file in VFS is denoted as F, and it is fragmented and stored as file fragments F0, F1, . . . FN−1 in the underlying physical file systems, PFS0, PFS1, . . . PFSN−1, respectively. Under this scheme, fragment F0 is a valid file under physical file system PFS0, fragment F1 is a valid file stored at PFS1, and so on.
4. A virtual file system controller (VFC) is a master controller which maintains and controls the virtual file system (VFS).
5. End-user PCs and workstations communicate with the virtual file system controller (VFC) and mount virtual file system (VFS) using conventional schemes such as Network File System (NFS) and Service Message Block (SMB). The VFC in turn uses protocols specific to each of the physical file systems in a manner transparent to the end-user PCs and workstations. Physical File Systems PFS0, PFS1, . . . PFSN−1 can employ a myriad of different file system protocols such as NFS, iSCSI, SMB, or can use HTTP/HTTPS REST-based cloud storage or other protocol and storage mechanisms known in the art in accordance with the teachings of the present invention.
6. The virtual file system controller (VFC) supports conventional file operations such as open( ), close( ), read( ), write( ), delete( )and stat( ).
7. The virtual file system controller (VFC) also supports directory operations such as mkdir( ), list( ) and rmdir( ).
8. The virtual file system controller (VFC) uses local dedicated storage for its own operations and, optionally, for caching of frequently used files. It relies primarily on physical files systems PFS0, . . . PFSN−1 for actual storage.
Physical file systems, PFS0 14a and PFS1 14b and virtual file controller (VFC) 16 are on Local Area Network (LAN) 13. Physical file systems PFS2 14c, . . . , PFSN−1 14n−1 are in other geographic areas connected via network 20. Network 20 can be the internet or a private IP network. PFS0 14a and PFS1 14b are used as examples to show flexibility of placement of physical file systems. It will be appreciated that PFS0 14a and PFS1 14b do not need to be local. Physical file systems, PFS0 14a and PFS1 14b can be realized using the following different types of file systems: network-attached file system (NAS), network file system (NFS), service message block (SMB) and Cloud-based Storage. Suitable physical file systems include Netgear NFS.
Virtual file system (VFS) 12 in this embodiment is shown hosting a directory structure comprised of the root directory (/) 18, sub directories D019a and D119b, and files F015a . . . F315d. The example directory structure is not intended to imply limitations in the number of files, directory structure depth, or file sizes. Distributed Fragmented File System (DFFS) 10 scales to support extremely large file systems with directory depth levels and the number of files only limited by the amount of storage available across the constituent physical file systems.
Personal computer (PC) 21 mounts virtual file system (VFS) 12 from virtual file system controller (VFC) 16 using a drive mapping (Y:, for e.g.) 24 and protocol 25. For example, virtual file system controller (VFC) 16 can use one of the standard protocols, such as network file system (NFS) or service message block (SMB) for the drive mapping. Workstation 22 mounts virtual file system (VFS) 12 from virtual file system controller (VFC) using protocol 25. For example, workstation 22 can be a UNIX workstation. Personal computer (PC) 21 and workstation 22 mount root directory structure 18.
Virtual file system controller (VFC) includes local cache 17. Local cache 17 can utilize a Least-Recently-Used (LRU) scheme for management of the overall cache. It will be appreciated that other cache management schemes known in the art can be used in accordance with the teachings of the present invention. Subdirectories 19a,b and files 15a-d are available to both personal computer (PC) and workstation 22. Subdirectories 19a,b and files 15a-d shown for this embodiment are /D0/F0, /D0/F1, /D1/F2 and /D1/F3.
In block 53, encrypted bytes of file F015a are byte-interleaved and, optionally, subject to Forward Error Correction (FEC) coding as described below to create N equal-sized fragments denoted F0′0, F0′1, . . . F0′N−1. In block 54, each file fragment is optionally encrypted with a separate key.
In block 55, each file fragment is transmitted to its corresponding physical file system (PFS) 14a-14n for storage using its supported file system protocol. File fragment F0′k is stored on PFSk in the directory specified by the full pathname of the file. In block 56, virtual file system controller VFC 16, optionally, also stores an encrypted version of the original file in local cache 17 for fast response to future read requests from other clients. In block 57, file fragment meta-information about file F015a is stored in local cache 17 of virtual file system controller (VFC) 16. For example, file fragment meta-information can include information about the physical file system (PFS) instances that store the fragments of the file; the order of distribution of fragments across physical file systems; whether FEC was used; access permissions; creation/modification times of the original file, and creation times of the respective fragments. File fragment meta-information can also include the size of the file and the key K0 used for the encryption of the overall file. Storage of other types of information in the meta headers is not precluded.
In block 65, each file fragment F0′0, F0′1, . . . F0′N−1 is separately decrypted, if encrypted during the storage process. In block 66, each file fragment F0′0, F0′1, . . . F0′N−1 is processed through a de-interleaver and, optionally, decoded using FEC techniques as described below. The result is file F0′. In block 67, file F0′ is decrypted using the inverse of the encryption function used during the storage process. The decryption key K0 is obtained from the meta-file associated with file F0′. The virtual file system controller (VFC) responds to the requesting client with the decrypted contents of file F0.
An example interleaver is shown in
During interleaving, the bytes of the file are stored column-first in byte interleaver array 70. Bytes are stored in column 0 starting at row 0 and all rows except the last K rows (if FEC is enabled) are filled. If FEC is not used, last K rows contain regular data. Once column 0 is filled, the process is repeated for the remaining C-1 columns. Zero padding is used to fill out a partially filled array.
It will be appreciated that other interleaving and de-interleaving processes known in the art can be used in accordance with the teachings of the present invention. Rows of interleaver array 70 are assigned to N different file fragments in a round-robin fashion. Row q is assigned to fragment number (q mod N). Thus row 0 maps to fragment 0, row N−1 to fragment N−1, row N to fragment 0, row N+1 to fragment 1, and so on. All fragments that map to the same PFS instance are stored in a single file on that instance.
For the de-interleaving process, file bytes received in file fragments are stored row-wise and read out column-wise. A file fragment received from a physical file system (PFSi) is unpacked and consecutive sets of C (the number of columns in the de-interleaver array) bytes are stored at row indices i, i+N, i+2*N . . . respectively. It will be appreciated that other fragmenting and unpacking of file fragment processes known in the art can be used in accordance with the teachings of the present invention.
In an alternate embodiment, file interleaving can be performed when error correction support is enabled for tolerating failures or loss of access to one of the underlying constituent physical file systems (PFS). When storing the original bytes of the file in interleaver array 70, only K rows of each column are filled. The remainder R-K rows of each column are filled with the parity bytes computed using a suitable erasure code, such as for example Reed Solomon.
K is defined as (N−1)/N*R resulting in R/N parity bytes.
Erasure encoding is done column-wise. The first K rows of every column contain actual file data and provide input of information bytes to the erasure encoding process, which generates R-K parity bytes for every column.
All R rows of interleaver array act 70 act as the source of data put into fragments stored on a plurality of physical file systems (PFS). It will be appreciated that other erasure processes known in the art can be used in accordance with the teachings of the present invention.
File de-interleaving can be performed when error correction support is enabled. File fragments received from various physical file systems (PFS) 14a-14n are unpacked and bytes written to de-interleaver array 70 as described above. In addition, all R rows corresponding to a file fragment that is not received from a physical file system (PFS) 14a-14n are marked for erasure. Once all R rows are filled, erasure decoding is done column-wise using the converse of the erasure encoding process described above. A row marked for erasure results in a single erasure byte in every column. Use of sufficient parity bytes (R-K) per column ensures missing bytes, such as from a fragment not received from a physical file system, are recreated. Only the first K rows of de-interleaver array 70 are used for recreating the original file, i.e., the parity bytes are discarded once the erasure decoding process is run on each column.
In another embodiment shown in
In accordance with an embodiment of the present invention, a computer-readable medium (e.g., CD-ROM, DVD, computer disk or any other suitable memory device) can be encoded with computer-executable instructions for performing the functions of blocks of 50-57 of FIGS. 2 and 60-67 of
It is to be understood that the above-described embodiments are illustrative of only a few of the many possible specific embodiments, which can represent applications of the principles of the invention. Numerous and varied other arrangements can be readily devised in accordance with these principles by those skilled in the art without departing from the spirit and scope of the invention.
Number | Date | Country | |
---|---|---|---|
61866746 | Aug 2013 | US |