The invention is related data encryption and cryptography. More specifically, the invention relates to encrypting of a file-based data volume, partitioning data into two sections of different sizes so the smaller section is required to be able to utilize the larger one, to confirm the data integrity, and to recognize whether the data is encrypted or unencrypted.
Firstly, Processing of Block Mode Data is Discussed Below.
One of the handbooks of the art is the Handbook of Applied Cryptography (Discrete Mathematics and Its Applications), Alfred Menezes, Paul van Oorschot, and Scott Vanstone (CRC-Press, 1996, ISBN 978-0849385230).
In WO 03/088052, Andrew Tune teaches a way to partition data, such as credit card data, into two sections kept separately, locally, and on a server. Tune adds an tagto a local section based on which a section on a server can be retrieved and the sections combined with each other. The method taught by Tune does not, however, check the integrity of the restored data; neither does Tune cater for the processing of unencrypted data amongst encrypted data. In addition, Tune does not cover situations where one of the sections is modified afterwards, for example, by truncating it. Tune does not either teach how to minimize the size of the other data section.
A block mode data volume consists of several blocks of the same size into which data is saved. Each block has its own tag, usually a sequence number. This tag is generally called a block number.
Typical examples of block mode data volumes include computer mass storages, for example hard drives (HDD=Hard Disk Drive), or semiconductor-based non-volatile memories (SSD=Solid State Disk). Often, a file system using which data can be saved as files is created on a block mode data volume. When data is written in a mass storage or read from it, the writing or reading point is determined based on a logical block number (LBA). A file system attends, among others, to which position indicated by LBA data is written on each occasion and from where it is read. The writing itself is usually performed in full blocks, a typical block size being any power of two, most often at least 512 bytes.
Secondly, Encryption of Block Mode is Discussed Below.
For encrypting block mode data it is typically used a block cipher algorithm, such as AES-256 (FIPS 197, Advanced Encryption Standard (AES), 2001, National Institute of Standards and Technology, USA), using which a plaintext of certain length is modified into ciphertext using an encryption key. In many encryption algorithms, the size of a cipher block is, however, smaller than the block size of the data volume, for example, in said AES-256 it is 16 bytes. For this reason, to be able to encrypt a single data volume block, several cipher blocks have to be combined.
Several working modes have been described for combining cipher blocks, the most often used perhaps being CBC (Cipher Block Chaining). In the CBC mode, the ciphertext of the preceding block is combined to the plaintext of the following block using an exclusive OR (XOR) operation. If the size of the plaintext is not divisible by the size of the cipher block when using the CBC mode, the last block must be processed before encryption using, for example, the Ciphertext Stealing method. When changing file size afterwards, Ciphertext Stealing requires re-encryption using the original plaintext.
Thirdly, A Stream Cipher Technique is Discussed Below.
The encryption of plaintext block by block was described above. Another common way is the Stream Cipher method wherein plaintext is generally appended with a pseudorandom key stream using an XOR operation (the exact name of the method is “Additive Stream Cipher”). If the key stream is not identified, the restoring of plaintext cannot be done.
Fourthly, Message Authentication Codes (MAC) Are Discussed Below.
Let us start by specifying the term “hash”: A hash identifies data content with a data size that is smaller than the original data content. A characteristic of a good hash is that two data blocks of whatever similarity cannot produce the same hash. Another characteristic of a good hash is the distribution of control numbers over the whole number space in use.
Using non-linear transformations, such secure hashes can be produced in which the transformation only works in one direction. Additionally, it is difficult to specify a data content that produces the exact wanted secure hash. A hash can therefore be considered a control number that cannot be used to restore the actual data. Methods generally in use include, for example, SHA-256 and RIPEMD-160. These are generally considered good hashes.
Hashes can also be calculated in encrypted format, in which case they are typically message authentication codes (MAC=Message Authentication Code). Below follows discussion of
CMAC
K(M,i)=CMACK(M1∥ . . . ∥Mi) (i)
where the operator ∥ indicates the combination of two character strings.
Let it be noted that an authentication code can also be produced using a hash function, for example, using the HMAC method as follows:
HMAC(K,M)=H((K⊕opad)∥H((K⊕ipad)∥M)), (ii)
where opad and ipad are certain standard character strings, H is a hash function, K is a key and M is a message (for example, in plaintext format) HMAC is calculated from.
Fifthly, Errors in Ciphertext are Discussed Below.
Ciphertext may be missing data either on purpose or accidentally. In general, it is desirable to minimize the effect of missing data, for example, the characteristics of the aforementioned CMC mode include that when ciphertext is incorrect for a single cipher block, when restoring plaintext, the error is only reflected on the same and the next plaintext block.
When decrypting a stream ciphered ciphertext an error in the ciphertext produces an error in the corresponding position in the plaintext. If ciphertext is missing data or there is too much data, the mutual synchronization between the keystream and the ciphertext is lost, which results into all the restored plaintext after the error to be defective.
To avoid synchronization errors in decrypting the stream-ciphered ciphertext, a general procedure is used where created ciphertext is used to create a “self-synchronizing keystream”. Instead, plaintext is not as well suited for synchronization and it is not generally used.
In certain situations it is, however, desirable that an error produced in ciphertext on purpose is propagated to as large portion of the plaintext as possible.
Sixthly, CTR Encryption Mode is Discussed Below.
Below,
One of the significant benefits of the CTR encryption mode is that it can be used to encrypt such plaintexts the size of which is not divisible by the size of the cipher block. Truncating a file afterwards is possible, too.
Seventhly, Data Integrity is Discussed Below.
In practice, all block mode data volumes contain some extra information on the basis of which it is deductible whether the content read from the data volume has remained unchanged.
Traditionally, control numbers have been calculated for data blocks to ensure their data validity. For example, when saving each block on a hard drive, a control number is calculated on hardware level and saved with the block on the hard drive. When reading the block from the drive, the block control number is also read. If it does not match with the rest of the data in the block, either the data reading or writing can be found to have occurred incorrectly. Generally, for this purpose a CRC check sum has been used.
When the content of a block mode data volume is being encrypted, the content of the block exports the same mode when it's encrypted and unencrypted. Accordingly, there is no space in blocks for such extra information that could be used to confirm the success of encryption or decryption.
Eightly, Network Servers are Discussed.
An Internet connection is currently available almost everywhere, although it is not necessarily a broadband connection. For IP (Internet Protocol) data transfer between computers, secure protocols haven developed for which open source code libraries are available. For example, an OpenSSL library of open source code provides for SSL/TLS protocol support.
Ninthly, Here Follows Discussion of File Processing.
In the latest Windows operating systems, there are several interfaces for processing writable and readable data, the simplest of which is probably Minifilter. A person of the art may get a clear idea of Minifilter implementations through the model programs in the available Windows Development Kit, especially the Minispy application in which communication between usermode and kernelmode has been implemented.
Tenthly, Below Follows Discussion of Saving Data in a Data Volume.
In commonly used file systems, such as FAT, FAT32, exFAT, and NTFS, file size is determined by two alternative ways: If file writing is ends in a position which is greater than the preceding file size, file size is updated to reflect the end of the whole writing task. In the second place, file size can be determined explicitly to either a greater or a smaller size than the preceding file size.
There are two types of writing operations in Windows operating system driver stacks: cached and non-cached. A file system driver stack assigns the data to be saved to a data volume driver stack in a non-cached format and block by block as IRP (I/O Request Packet) messages. File size is typically determined either based on cached writing operations or explicit file size determinations.
Especially in Windows operating system driver stacks, there is a certain problem related to multilevel caching: If data to be written is modified in a driver stack, the modified data may, due to some anomalous situations, appear unmodified in the writing phase. This occurs, for example, in Windows XP/Vista operating system Minifilter implementations in NTFS file systems with small-sized files.
A fundamental problem occurs in situations where data to be written has been encrypted using block cipher and where file size is indivisible by the cipher block size. A special problem occurs in situations where file size is afterwards truncated as regards to a cipher block to an indivisible size, when writing operations have already been executed. In this case, data is lost in the last cipher block and the cipher block in question cannot be restored.
Finally, In the Following the Concept of Entropy is Reviewed.
Information entropy indicates the smallest possible bit number with which certain data can be represented. The entropy of a random number sequence is as large as the amount of numbers contained in it multiplied by the bit number of a single number.
The entropy of a completely pseudorandom number sequence corresponds to the entropy of a random number sequence, unless the production method of pseudorandom numbers is revealed. If it is revealed in its entirety, entropy is zero because in this case all values can be calculated unambiguously.
A primary object of the invention is to enable changing the size of an encrypted file afterwards.
Further, another primary object of the invention is to protect data, preferably in such a way that the entropy in it is reduced by insufficiently saving the data in a protectable data volume, a small section of it being saved in another data volume.
A secondary objective of the invention may be to improve data reliability using a procedure where the integrity of encrypted data can be reliably authenticated.
A data volume to be encrypted comprises of a group of equal-size blocks. Each block is divided into equal-length plaintext character strings and then each plaintext character string is encrypted with a proper state-of-art encryption block generating a key stream that is XORed with the plaintext character string to be encrypted, which results in a cipher text character string. The invention is based on that the current plaintext character string or later plaintext character strings has no influence on encryption of the current plaintext character string, more precisely on the above-mentioned key stream, but only the previous plaintext character string or earlier plaintext character strings affect. This is implemented so that to the input of the encryption block is fed a hash value formed from one or more of the earlier plaintext character strings. Thereby the encryption block generates, according to its encryption algorithm, the key stream based on a key and the hash value.
The hash value is a message authentication code MAC calculated from at least one of the plaintext character strings preceding the plaintext character string to be encrypted
Alternatively, the hash value is a cipher-based message authentication code CMAC calculated from at least one of the plaintext character strings preceding the plaintext character string to be encrypted
The algorithm for calculating the MAC or CMAC of a plaintext character string is using a key. According to the further aspect of the invention, the MAC or CMAC of the preceding plaintext character string is used as the key. Thus, because the MAC or CMAC of the plaintext character string prior to said preceding plaintext character string has been used as the key for the MAC or CMAC of said preceding plaintext, etc., it can be stated that on a key used for calculating the MAC or CMAC of any plaintext character string is influenced by the MACs or CMACs of all the preceding plaintext character strings.
Preferably, the block cipher operates in Counter mode (CTR mode). A Hash of at least one of the plaintext character strings preceding the plaintext character string to be encrypted is applied to the Counter input of the encryption block. Preferably the encryption algorithm is AES, AES256 for example, wherein the encryption block is the known AES Counter mode Block cipher.
An aspect of the method may be the partition of the said ciphertext block into at least two sections of different sizes.
An aspect of the method may further include writing the file derived from plaintext blocks onto at least two memory devices, the first of which may be, for example, SSD based and in which at least the largest of the ciphertext block sections is saved as a file. The first memory device may be connected to a first computer, for example, a Windows workstation.
The method may further include the steps of connecting a second computer to the first computer via, for example, an information network, such as an IP protocol using network, and authorizing this connection based on either the said first computer, its user, or the said first memory device.
The method may also include the steps of saving at least the smallest of the ciphertext block sections in the said second computer.
Another aspect of the invention is a system executing the method, characterized by that it contains at least two memory devices onto which the said ciphertext block sections are saved.
The third aspect of the invention is a computer program executing the method, characterized by that it can create a ciphertext block from a plaintext block consisting of more than one consecutive character strings in such a way that, when creating the ciphertext block, at least one of the character strings in question is modified based on a hash derived from more than one preceding character strings included in the plaintext block.
When encrypting plaint text character strings it is extremely important to ensure that the same value is never applied to the counter input twice. Probability of two same values is almost zero if the has is formed as a secure hash. Reference is made to
In this manner all plaintext character strings are encrypted. However, encryption of the first plaintext character string requires the use of an initialization vector as the secure Hash. In other words, when encrypting any plaintext character string the secure Hash is formed from the preceding plaintext character string using as the key the secure Hash of the previous plaintext character string. Therefore it can be stated that on the secure Hash used in encryption of a plaintext character string has been influenced by the Hash values of all preceding plaintext strings, but the plaintext character string to be encrypted has no influence on generating of the key stream used in encryption of said plaintext character string.
In a preferred embodiment of the invention, at least the first character string C1 or part of it can be saved from a ciphertext block in a second memory device. Further, in a preferred embodiment the encryption is performed on a file by file basis in such a way that each cipher text block is saved in the same place in a file as the corresponding plaintext block would otherwise have been saved in. A person of the art is, for example, able to implement the Minifilter driver executing the Windows operating system encryption; the driver encrypts the file contents as described in the invention and maintains the original file name.
In the invention, encryption keys are preferably file-specific, they are also preferably saved on a second memory device.
Let us first look at
C
i
=f
M(Mi, fG(M,i)) (iii)
The modification function (407) may preferably by an XOR operation; it is desirable that the plaintext character string (103) contains as many bytes as the value of the mask function (401). Other modification functions may also be used; it is essential that no such data following a ciphertext (202) or a plaintext character string (103), which might afterwards be truncated from the character string (202), may affect any values of the character string (202) within the modification function.
The next state of the mask function is provided by the function fNS (402) which is backfed via the delay buffer maintaining the inner state (403). Let us describe the value of the function fNs using the designation fNS(M,i) when processing the ith character string:
f
NS(M,i)=fNS(Mi, fNS(i−1)) (iv)
Therefore, it is essential for the invention that the value of the mask function fG (401), when processing the ith character string, is not dependent of the ith plaintext character string but of the initial value z−10 of the inner state (403) and of at least one possibly preceding character string, preferably for the invention, on all the preceding character strings of the same plaintext block. Let us designate:
f
G(M,i)=fG(z−10∥M1∥M2∥ . . . ∥Mi−1) (v)
A preferred embodiment of the invention described in
Further, let it be emphasized that although in
M′
i
=f
M
−1(Ci, fG(M,i)) (vi)
Because the value of the mask function fG (401), when processing the ith character string, is independent of the current plaintext character string and dependent only of z−10 and the preceding character strings, the ciphertext block may be truncated from the middle of the ith character string, the value of the mask function fG (401) still being calculatable (cf. formula v).
Further, in the following the inverted function fM−1 (501) of the modification function fM is discussed: Because in the modification function fM no such piece of information following a ciphertext (202) or a plaintext character string (502), which might afterwards be truncated from the character string (202), may affect any values of the character string (202), an inverted function may also be calculated for truncated character strings. A preferred embodiment of the invention uses an XOR operation as the modification function fM, the inverted function fM−1 of which is also XOR.
Below follows discussion of
In the preferred embodiment of the invention represented in
Especially noteworthy in a preferred embodiment of the invention shown in
A review of the CMAC-CTR combination follows: To make the algorithm identical with the original CMAC, plaintext delay (601) and cipher block delay (602) could simply be initialized in such a way that, when processing the first plaintext character string, an XOR operation between their modes results in a null processed with decryption of its encryption (105). In this case, when the plaintext delay (601) gives a first plaintext character string M1, the output of the cipher block delay (602) would be null and the block processing would be in accordance with CMAC.
However, this procedure would include a vulnerability: Even if the first character string C1 from the ciphertext block was only saved in a second memory device, the value of the output function (406) would be same for each first character string. Further, if the same keys are used to process several ciphertext blocks, it would be completely possible to have blocks where the first character string M1 of the plaintext block would be the same, which would result into an identical ciphertext block C1. Hence, known ciphertext blocks C1 could be adapted into the place of unknown ciphertext blocks C1′: Thus, with a good guess or abundant tests, at least the protection of the character string M2 could be weakened.
Using the teaching of the CTR mode and as a solution to this vulnerability, preferably for the invention, plaintext delay (601) and cipher block delay (602) can be initialized in such a way that an XOR operation between them produces a unique number. In practice, for example, plaintext delay (601) can be initialized as null and cipher block delay (602) initialized using such a counter that does not produce two same figures for plaintext blocks within a reasonable timeframe. A person of the art may, when implementing the counter, use a CTR mode counter as a basis.
In terms of the security of the invention, it is preferred that the same encryption key/counter value combination is practically never repeated. If the invention is implemented as a Minifilter implementation, each file may be given its own encryption keys and the counter may be derived from the location where the data block in question is written to.
Referring to the example of
z−10=Counter (vii)
In this case, CMAC in fact produces an authentication code from the character string that is logically the value of the counter when processed with a decryption function and appended with the preceding character strings of the same plaintext block.
CMAC
K(i−1)=CMACK(DK(Counter)∥P1∥P2∥ . . . ∥Pi-1) (viii)
In other words, it is still a CMAC method described in NIST Special Publication 800-38B; even if a counter was appended to it, only the value derived from the counter would be inserted in front of the data. Further, let it be noted that in
Thus, the character string C1 (704) of the first ciphertext is:
C
1
=P
1
⊕CMAC(DK(Counter)) (ix)
The discussion of the embodiment represented in
When processing the first character string M1, use of K2 in the output function (404) instead of K1 may be preferred for the invention because, as noted below, at least the first character string or a part of it can be saved only in another memory device. In this case, neither of the memory devices has to contain character strings processed both with K1 and K2. Since K2 is from internal state (403 in
In the embodiment shown in
For the safety of the CTR mode, it is essential that the same value of the counter is not repeated. According to a birth date paradox well know to a person of the art, when using a 16-byte cipher block, for example, the counter gets two exactly same values with a probability of 50% only when approximately 300 exabytes (300×1018 bytes) have been written. This is believed to be enough for any imaginable applications.
Let us discuss
F
G(i)=CMACK(i−1) (x)
In other words, the output of CMAC has been delayed with a single character string, as is also the case in the embodiment shown in
When striving for a simple implementation, the inner state initial value z−10 is preferably initialized with the value of the counter (701) described for
Z−10=Counter (xi)
whereby the review in
Below follows discussion of
In
It should probably be noted that a person of the art is easily able to implement a method where data partitioned into several sections is combined to form the original data, as long as the way partitioning was executed is well specified. Similarly, it should probably be noted that a person of the art is easily able to implement data partitioning into more than two sections if there is a need for partitioning data into several sections.
Below follows discussion of
For all the embodiments of the invention, it is preferred that restoring of each cipher text character string is affected by the data removed from the first memory device and saved in the second memory device.
At the same time, space is freed from the data (004) saved in the first memory device in those locations where data was removed and transferred onto the second memory device (902). In the invention, it is preferred to replace the ciphertext character string (1005) removed from the data saved in the first memory device with an authentication tag (1002) using which, in the reading phase, the encryption status of the block can at least be indicated. Proceeding in this manner, especially those situations occurring in Windows operation systems can be avoided where caching restores—yet in an unencrypted format—data presumed to be encrypted.
In a preferred embodiment of the invention, the authentication tag(1002) is appended with the data (1003) required for checking integrity. This integrity check data (1003) is preferably calculated using a secure hash describing the contents of a plaintext block (101), the hash using a key not used in block ciphering. The key of the said hash is preferably derived from the key used in encryption; additionally, it has to be remembered that the above mentioned key K2 is available for use. A person of the art is able to plan integrity check data (1003) in such a way that the data required for checking integrity does neither reveal the key nor the fact whether there are two blocks with same content on the memory device. A preferred way of confirming that no blocks with the same content are revealed is to append a character string, for example, in its beginning, before integrity calculation, with such data unique for each encryption key which is known in reading phase before decryption. This data can, for example, be derived from a plaintext block (101) sequence number within a file and possibly from a file-specific tag.
It has to be noted that the file may be truncated also on the encryption tag (1002) whereby it is preferred to make a conclusion in the writing phase based only on the first section. If the beginning of an encryption tag is broken in two matches and the preceding block had been encrypted, the beginning of a cipher Mode block is retrieved from another memory device (902). It is preferred that the encryption tag (1002) starts with a clearly identifiable character string: If the original file (1001), and thus also the encrypted file (1006), is smaller than the encryption tag, accordingly it a conclusion may be made based on, for example, whether or not the other files of the same type included in the same first memory device (901) are encrypted on default. In case the file is small, a person skilled in the art may have to make case-specific conclusions although it has to be remembered that in practical planning it is preferable to strictly define which files are to be protected by the invention and which not—an exception to this indicates as such an error condition.
Because in this preferred embodiment of the invention data is removed from the beginning of the cipher text blocks saved in a memory device, it is further preferred, in terms of the invention, that the removed data affects the restoring of all the other character strings from the same plaintext block.
Let us further look at the preferred embodiment presented in
Finally, below follows discussion of an embodiment of the invention presented in
In the description of the invention thus far, only file truncation has been mentioned. Let it be noted that extending a file is also possible. In general, data is written in the truncated part of the file afterwards. This, as well as other data volume writings, is performed on a block by block basis whereby the protection described as the invention functions normally. If the file is only extended and not written on, the file content is generally unspecified in terms of the extension and its contents is not to be trusted. Accordingly, the application of the invention does not essentially weaken the functioning of the memory device—not even in terms of file extension.
Modifications of the invention are easily made based on the description and guided by the represented representative embodiments. Data can, for example, partitioned into more than two sections, and it may be removed from a primary data volume in different quantities. Additionally, for example, only one Windows™ based Minifilter implementation was represented as an embodiment of a driver software, however, the invention may also be used in other architectures utilizing the inventional concept presented here.
Number | Date | Country | Kind |
---|---|---|---|
20090254 | Jun 2009 | FI | national |
PCT/FI2010/050560 | Jun 2010 | FI | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/FI10/50560 | 6/29/2010 | WO | 00 | 12/9/2011 |