HIDDEN DATA IDENTIFICATION IN SOLID STATE DRIVER FORENSICS

Information

  • Patent Application
  • 20150331743
  • Publication Number
    20150331743
  • Date Filed
    May 19, 2015
    9 years ago
  • Date Published
    November 19, 2015
    9 years ago
Abstract
A method of isolating hidden data in a solid state memory system is disclosed including obtaining a logical block address (LBA) image from the memory system, obtaining a physical block address (PBA) image, determining whether an error exists in the PBA image and correcting the error, calculating an ETCRC on each sector of the LBA image and building a search tree indexed on the ETCRC value. For each sector in the PBA image, the method also includes computing an error tolerant cyclic redundancy check (ETCRC) value and searching for the ETCRC value in the LBA search tree. If the ETCRC value is found, also included is comparing the cyclic redundancy check (CRC) of the LBA and PBA sectors, and outputting to an output file the PBA sector as hidden data if either the ETCRC value is not found in the LBA search tree or the CRC comparison fails.
Description
FIELD OF DISCLOSURE

The present disclosure generally relates to forensic data recovery and more particularly to accessing hidden data files.


BACKGROUND

The statements in this section merely provide background information related to the present disclosure and may not constitute prior art.


Existing methods to recover hidden data files from computational storage devices are tedious and time-consuming. Solid state drives (SSDs or SSD when discussing a singular solid state drive) are complex computational storage devices that use NAND flash memory chips. Such memory devices have a high data storage capacity; however, they are difficult to manage because existing data in a chip cannot simply be overwritten, but rather must first be erased, and then written on again. Furthermore, data must be erased in large blocks; specifically, on the order of a million bytes, but may also be written in smaller blocks, on an order of thousands of bytes. These hindrances pose a problem for forensic analysts and others seeking to recover hidden data files because they impose significant time constraints on the process and thus potentially prevent successful data recovery from ever being accomplished.


In an effort to make the above memory chips easier to use and the hidden data more accessible, a flash translation layer (FTL) of software is included in the SSD to handle the details of deleting old data and writing new data, thus taking the burden of this task away from the host operating system. A memory array of the SSD has two spaces: a logical block address (LBA) space and a physical block address (PBA) space. These spaces are overlaid spaces. The LBA space is the data structure that the host computer sees and comprises the sectors in which data is stored. The PBA space is the memory provided by flash chips, and is generally up to 20% larger than the LBA space, depending on the particular configuration. The LBA space is mapped into the PBA space by the FTL software.


A legacy hard disk drive (HDD), a similar device, has a simpler configuration in that its LBA and PBA spaces essentially have a corresponding size ratio of one-to-one, with the PBA being only a fraction of a percent larger than the LBA.


The extra PBA space in the SSD is referred to as over provisioning space and has several purposes. These purposes include storing SSD firmware (which is the firmware that runs the SSD's internal microcontroller and which is typically 100 K to 200 K bytes, though this range is not provided to be limiting), NAND flash wear leveling, had block management, housing FM management tables, and garbage collection.


Wear leveling is a type of software algorithm that distributes the reading and writing activities evenly among the flash chips on the SSD. This is needed because NAND flash exhibits rapid wear out mechanisms, resulting in extreme fragmentation of the data written to the SSD. FTL management tables comprise memory storage for the LBA/PBA mapping table, which can be gigabytes in size. They also include other general task, or housekeeping, information. Garbage collection is a software algorithm which collects and erases currently unused but previously written areas in flash memory in order to prepare clean sections for future writes and avoid delays in erasing. All of the above functions are well known.


A problem occurs as a side effect in the operation of the SSD, which is that forensically valuable data gets moved out of the LBA space to where it cannot be accessed via the host computer interface, The management complexity and non-one-to-one LBA and PBA memory spaces of the SSD (as contrasted from the legacy HDD) further impede successful data recovery and forces individuals who want to recover the data, such as forensic analysts, to attempt to reverse engineer the algorithms in the SSD to obtain the hidden data, which can be very time consuming.


Referring now to FIG. 1, a block diagram showing an exemplary relationship between the PBA space and the LBA space in the memory array of an SSD space is shown. Importantly, the PBA space 100 is usually larger than the LBA space 101 (typically by seven percent (7%), but in some instances up to twenty percent (20%)) within the memory array 10 of the SSD. For example, an SSD with a 128 GB storage capacity would have about 119 GB of LBA space available for file storage. The remaining 9 GB would be used as over provisioning space and may be a resource for wear leveling, garbage collection, and other SSD firmware functions. The LBA space 101 is a logical representation, and does not reveal where in the memory devices data is stored. The PBA space 100 consists of memory in the form of physical flash memory chips. As shown, the LBA space 101 is a subset of the PBA space 100.


The data that is stored on SSDs and HDDs is in aggregations known as sectors. A sector is typically 512 bytes in length, but may be larger. The NAND flash chips include this form of data storage, considered sectors, and so, in most cases, integral numbers of LBA sectors are stored in physical flash memory pages.


There are currently two main methods to read the over provisioning space as a first step to recover hidden data. The first method consists of using custom read commands over the host interface port of the SSD. However, these commands are not standardized and are proprietary, and do not even exist for most SSD models, or are password protected or encrypted. These characteristics make it hard for individuals to access the hidden data.


The second method of reading the over provisioning space consists of reading the flash chips directly. This can be done by removing the flash chips and inserting them into a reading device that reads and stores their contents. To remove the flash chips involves desoldering the flash chips form the memory array. This may also be accomplished via electronic means of reading the flash chips while they remain installed on the SSD circuit.


There are several remaining steps currently required to recover hidden data from an SSD. After the flash memory chips are read and the data is saved as a PBA image, the LBA space is read over the host computer interface and the data is saved as a LBA image. Next, the flash memory errors in the PBA image are corrected, if possible. The error correction information. is deleted from the image, leaving only data. The PBA and LBA images are then compared, noting which sectors match in each image. Finally, the unmatched PBA sectors are separated and stored as hidden data.


The described existing process contains several issues. First, the format of the data within a flash memory chip varies greatly depending on the make and model of the SSD, as well as the make and model of the memory chip. This format must be determined before any hidden data can be recovered, which takes time. Additionally, the error correction code (ECC) that is used to prevent flash memory bit errors is typically unknown and is not published by the SSD manufacturer. It therefore may not be possible to correct errors in the raw data that is read from the flash chips. Further complicating matters is that the amount of data may be huge, reaching as high as the terabyte range. This means that the algorithms that are used must be of low complexity and have a low run time for large data sets, which could take days, weeks, or longer to complete. The standard approach to this problem is to represent each sector with a short hash value. For example, the use of an eight-byte hash value for each sector would reduce the data storage requirements by 98.5% compared to handling the raw 512-byte sectors. Provisions would need to be made to handle hash collisions.


However, even with the above hash optimization, the LBA and PBA images still bear no relation to one another due to the wear leveling algorithm used in the SSD, which significantly fragments stored files. This means that an LBA image file that is stored in contiguous sectors will be distributed over a large area of the PBA image with no simple mapping relationship, that mapping relationship being different for every make and model of SSD, as well as changing as a given SSD is used. This means that the matching process could potentially be an order of n2 process, which would be quite slow.


To increase the rate of the process, the LBA and PBA hash tables must be sorted. The tables are huge, containing millions of entries; however, there are an order of n (noted as Q(n)) sorting algorithms which may be used to quickly sort them, such as radix sort.


After the LBA and PBA hash tables are sorted, a searching process must examine each LBA hash, find it in the PBA hash table, compare the LBA and PBA source data byte-for-byte (to avoid the effects of hash collisions), and mark the sectors as matched, if they so are. This searching process is difficult to perform because there are many duplicate hashes in the PBA hash table, which prevents the use of a fast binary search. Therefore, an intermediate PBA index table is created that contains each hash value only once, along with a count number for the duplicate hashes in the PBA hash table, as well as an index into said table.


The above process works well when the errors in the PBA image have been corrected. if they have not been corrected, then any bit errors in the PBA sector source data will skew hash values and prevent them from matching to corresponding LBA sectors.


Given the foregoing, what is needed are methods which facilitate identifying and recovering data that is normally hidden in NAND flash memory arrays in SSDs and is normally inaccessible using host computer interfaces, without having to reverse engineer the algorithms in the SSD, using a hash value that is tolerant of some small percentage of bit errors in the source data.


SUMMARY

This Summary is provided to introduce a selection of concepts. These concepts are further described below in the Detailed Description section. This Summary is not intended to identify key features or essential features of this disclosure's subject matter, nor is this Summary intended as an aid in determining the scope of the disclosed subject matter.


A method of isolating hidden data in a solid state memory system is disclosed. The method comprises obtaining a logical block address (LBA) image from the memory system, obtaining a physical block address (PBA) image, and determining whether an error exists in the PBA image and correcting the error. The method also comprises calculating an ETCRC on each sector of the LBA image and building a search tree indexed on the ETCRC value. For each sector in the PBA image, the method also comprises computing an error tolerant cyclic redundancy check (ETCRC) value and searching for the ETCRC value in the LBA search tree, if the ETCRC value found, the method compares the cyclic redundancy check (CRC) of the LBA and PBA sectors. The method also provides for outputting to an output file the PBA sector as hidden data if either the ETCRC value is not found in the LBA search tree or the CRC comparison fails.


Another method of computing a total hash function of an array of values which limits the impact of a change in the array of values to one subfield in the hash value is also disclosed. The method comprises dividing the array of values into a number of sections, computing a hash function over each section, and concatenating the computed hash function values to create a total hash function of the array of values, wherein a change in one of the array of values is reflected as a change in the total hash function in only one subfield.


A hidden data determination system for locating hidden date in a memory array of a solid state device is also disclosed. The system comprises an interface to access memory space on a memory device and a graphics processing unit to create a plurality of hash values for a logical block address (LBA) memory space of the memory device and to create a plurality of hash values for a physical block address (PBA) memory space of the memory to identify data hidden within the PBA memory space from view of the LBA memory space. The graphics processing unit creates a PBA index table associated with the plurality of PBA hash values, compares the plurality of LBA hash values with the PBA index table, identifies matches of any of the plurality of LBA hash values and any of the plurality of PBA hash values resulting from the step of comparing, and identifies data hidden within the PBA memory space when data identified in the PBA index table has no identified with any of the plurality of LBA hash values. The system also includes display device to show the data hidden.





BRIEF DESCRIPTION OF THE DRAWINGS

A more particular description briefly stated above will be rendered by reference to specific embodiments thereof that are illustrated in the appended drawings. Understanding that these drawings depict only typical embodiments and are not therefore to be considered to be limiting of its scope, the embodiments will be described and explained with additional specificity and detail through the use of the accompanying drawings in which:



FIG. 1 is an image illustrating an exemplary relationship between the PBA space and the LBA space in the memory array of an SSD space;



FIG. 2 shows a block diagram of LBA data used to construct an Adelson-Velsky and Landis' (AVL) tree;



FIG. 3 shows an embodiment of a completed AVL tree, and each of several unique nodes is shown with a unique linked list;



FIG. 4 shows a block diagram of PBA data used in conjunction with the LBA AVL tree;



FIG. 5 shows a block diagram of ETCRC bytes relationship to byte sectors;



FIG. 6 shows a representation of a PBA and an LBA ETCRC;



FIG. 7 shows a flowchart of an embodiment of a method;



FIG. 8 shows a continuation of the flowchart of the embodiment of the method started in FIG. 7;



FIG. 9 shows an embodiment of the subroutine of BUILD LBA AVL TREE shown in FIG. 7;



FIG. 10 shows an embodiment of the subroutine COPY LBA AVL TREE AND PUNCTURE shown in FIG. 7;



FIG. 11 shows an embodiment of the subroutine PROCESS INPUTFILE SECTORS shown in FIG. 8;



FIG. 12 shows an embodiment of a table of puncturing patterns;



FIG. 13 shows an embodiment of a block diagram of an AVL tree labeled with sample ETCRC values;



FIG. 14 shows an embodiment of a block diagram of information in the AVL tree after puncturing and rebalancing of the AVL tree; and



FIG. 15 shows a block diagram of an embodiment of a device.





DETAILED DESCRIPTION

Embodiments are described herein with reference to the attached figures wherein like reference numerals are used throughout the figures to designate similar or equivalent elements. The figures are not drawn to scale and they are provided merely to illustrate aspects disclosed herein. Several disclosed aspects are described below with reference to non-limiting example applications for illustration. It should be understood that numerous specific details, relationships, and methods are set forth to provide a full understanding of the embodiments disclosed herein. One having ordinary skill in the relevant art, however, will readily recognize that the disclosed embodiments can be practiced without one or more of the specific details or with other methods. In other instances, well-known structures or operations are not shown in detail to avoid obscuring aspects disclosed herein. The embodiments are not limited by the illustrated ordering of acts or events, as some acts may occur in different orders and/or concurrently with other acts or events. Furthermore, not all illustrated acts or events are required to implement a methodology in accordance with the embodiments.


Notwithstanding that the numerical ranges and parameters setting forth the broad scope are approximations, the numerical values set forth in specific non-limiting examples are reported as precisely as possible. Any numerical value, however, inherently contains certain errors necessarily resulting from the standard deviation found in their respective testing measurements, Moreover, all ranges disclosed herein are to be understood to encompass any and all sub-ranges subsumed therein. For example, a range of “less than 10” can include any and all sub-ranges between (and including) the minimum value of zero and the maximum value of 10, that is, any and all sub-ranges having a minimum value of equal to or greater than zero and a maximum value of equal to or less than 10, e.g., 1 to 4.


Embodiments disclosed herein are directed to a method that facilitates forensic data recovery. The embodiments include a method which facilitates a process wherein the physical block address (PBA) and logical block address (LBA) memory spaces are examined to identify and collect hidden data without having to reverse engineer the algorithms in the Solid state drives (SSDs)(SSD used herein as an abbreviation for a single Solid state drive), by providing a hash value that is tolerant of a small percentage of bit errors in the source data.


The term “hidden data” and/or the plural form of this term are used throughout herein to refer to data stored on a SSD that is not accessible through the SSD's host computer interface, and the like.



FIG. 2 shows a block diagram of LBA data used to construct an Adelson-Velsky and Landis' (AVL) tree. The AVL tree 113 is a self-balancing binary search tree. In the AVL tree 113, the heights of the two child subtrees of any node 114 differ by at most one. If at anytime they differ by more than one node, rebalancing is done to restore this property. The LBA image 110 is composed of 512-byte sectors 111. The LBA image 110 and a PBA image 120, discussed further below, consist of a linear array of sectors. Each sector 111 is examined and a special hash value 112 is computed, called an Error Tolerant Cyclic Redundancy Check (ETCRC), described in detail further below. Based on the value of the ETCRC, a node 114 is either inserted (if a node of that value is not present) or updated in the AVL tree 113. This process creates a tree 113 which is quickly searchable by ETCRC value, and is much smaller than the LBA image 110.


It is possible that two sectors will have the same data, and therefore the same ETCRC values, and will indicate the same node of the tree. To track all the sectors with the same ETCRC, each tree node contains a linked list of associated LBA sector information, comprised of the byte offset of each sector within the LBA image 110, and a Cyclic Redundancy Check (CRC) of each sector, typically a standard 32-bit CRC such as the CRC32 (commonly used in Ethernet).



FIG. 3 shows a completed AVL tree 113, and each of several unique nodes 114 is shown with a unique linked list 152. The completed AVL tree 113 is searchable by ETCRC value. Once the corresponding node is found, the associated linked list discloses the entire set of LBA sectors having that ETCRC value, and the 32-bit CRC of each.



FIG. 4 shows a block diagram of PBA data used in conjunction with the LBA AVL tree. To identify hidden data in the PBA image 120, the ETCRC 122 of each PBA sector 121 is searched for in the tree 113, and sectors not found are considered hidden. As illustrated in this embodiment, the PBA image 120 is composed of 512-byte sectors 121. The ETCRC values 122 are computed for each sector in the image 120. A search of the LBA AVL tree 113 is performed using the ETCRC value 122. If the search fails, the PBA sector does not appear in the LBA image 110, and the PBA sector is output to a file as hidden data. If the search succeeds, the linked list 152, as shown in FIG. 3, associated with the tree node is searched and the CRCs of the PBA sector and LBA sector are compared. If a match is not found, the PBA sector is output to a file as hidden data.


The inventors have found that this process works well, with the stated assumption that the errors present in the PBA image 120 have been corrected. If they have not been corrected, then any hit errors in the PBA sector source data will foul the hash and CRC values and prevent matches to corresponding LBA sectors.


More specifically, if bit errors in the PBA image 120 are not corrected, hash values computed on corresponding LBA sectors will not match. What is needed is a hash value that is tolerant of some small percentage of bit errors in the source data. This operation is antithetical to the standard definition of a hash function, where the hash value should change greatly with only one bit change in the source data.



FIG. 5 shows a block diagram of ETCRC bytes relationship to byte sectors. The ETCRC substantially serves the function of a hash, but is tolerant of some bit errors in the source data in that one bit change in the source data changes at most eight bits of the ETCRC. The ETCRC 161 is an 8-byte quantity that covers a 512-byte sector of data 160. Each byte 162 in the ETCRC is an 8-bit CRC (generator polynomial x8+x7+x4+x2+x+1) covering a 64-byte subset 163 of the 512-byte sector. Thus, if there are bit errors in only two 64-byte subsets, the entire ETCRC will not change, but only two component bytes. Two ETCRCs which match partially imply that the source data matches except for some bit differences in a certain area or areas. The final quality of the match can be determined by comparing the two corresponding source data sets byte for byte.


The ETCRC 161 can have a format and size suited to the task, not only eight bytes, but more or fewer, to accommodate various sizes of data to be hashed. Furthermore, the components of the ETCRC 161 can be other than 8-bits in length, as demanded by project requirements. Though the generator polynomial, x8+x7+x4+x2+x+1, was chosen, other generator polynomials may be used. The inventors found that the generator polynomial produces an acceptable collision rate of ˜1E-3.


By using the ETCRC 161 as explained above, if there are uncorrected bit errors in a PBA sector, then the LBA and PBA ETCRC values 112, 122 will not match, but usually only in one or two bytes of the ETCRC value. This is handled by a process called puncturing, which involves selectively ignoring combinations of bytes within the ETCRC during the hidden data discovery process.



FIG. 6 shows a representation of a PBA and an LBA ETCRC. As shown however, the PBA and LBA ETCRC match except for one byte, the fourth from the left, The puncturing process sequentially sets corresponding bytes in the LBA and PBA ETCRC values to zero. When a byte pair 180 is set to zero, the ETCRCs still do not match. However, in the sequential process, when a byte pair 181 is set to zero, the ETCRC values match. This event indicates that the LBA and PBA sector data is mostly matching, and may only mismatch by one or a few bits, prompting a deeper look at these sectors to determine the magnitude of the mismatch.


The discovery process is run the first time with the full ETCRCs as computed. Once the first pass is complete, the hidden data output file contains all the PBA sectors that do not appear in the LBA, but also many falsely mismatched sectors that are different simply because of a few bit errors. The hidden data set is then run through the process again, but with each hidden sector ETCRC value punctured, as shown in FIG. 6, and with a modification of the LBA AVL tree 113 using punctured ETCRC values.


Puncturing may be performed using combinations of one, two, or more bytes of the ETCRC. To accommodate sectors with just a few bit errors in one 64-byte subset of the 512-byte sector, a single byte of the eight in an ETCRC is set to zero. There are eight such ETCRC puncturing patterns. To accommodate sectors with hit errors in two 64-byte subsets, two bytes of the eight in an ETCRC are set to zero. There are C(8, 2)=28 such ETCRC puncturing patterns. This level of puncturing is typically good enough to recover the vast majority of hidden data in the presence of PBA errors, though higher levels could certainly be used.


For convenience, the patterns used to puncture ETCRCs may be tabulated for use in the hidden data discovery process. As a non-limiting example, FIG. 12, shows such a table of puncturing patterns. There are 37 puncturing patterns (400 and 401) shown. In each pattern, there are eight digits. A zero in the pattern indicates a punctured byte in an eight-byte ETCRC value. A sample pattern 402 (01010000) indicates puncturing of the left most and third from left bytes in an ETCRC value.


The entire hidden data discovery process is shown in flowchart form in FIGS. 7-11. More specifically, FIGS. 7 through 11 collectively depict a flowchart of an embodiment of a method for identifying and recovering data that is normally hidden in NAND flash memory arrays in SSDs and is normally inaccessible using host computer interfaces, without having to reverse engineer the algorithms in the SSD, using a hash value that is tolerant of some small percentage of bit errors in the source data. In general, the inputs to this process are the LBA and PBA images, a DATA CORRECTED flag indicating whether the PBA image is error corrected, a tolerable error rate LIMIT if uncorrected, and a table containing ETCRC puncturing patterns. Regarding the table, a non-limiting example is shown and was discussed above with respect to FIG. 12 where the first entry indicates no puncturing, which is called an “empty pattern” in the flowchart 700.



FIG. 7 shows a flowchart of a method. FIG. 7 starts the method 700 with a subroutine call, at 300, to build the AVL tree. This subroutine is shown in more detail in FIG. 9 and is described further below. Generally, the LBA image 110 is scanned and an AVL tree 113 is built, suitable for searching using the ETCRC values computed from the data in the PBA image 120. Next, the PBA image is opened, at 301, as the file variable INPUTFILE. An operation, at 302, sets PUNCTURE INDEX to zero, and this value indexes the table of puncture patterns in the process. The top of the main loop is marked with connector “B.”


Within the main loop, the first main task 303-305 is to put the AVL tree 113 into a compatible format, depending on the puncturing pattern selected. If the pattern is “empty”, meaning no puncturing is occurring (line 1 in the table of FIG. 12), then the LBA AVL tree 113 is copied, at 305, to a working tree and used as-is. If puncturing is occurring, then a subroutine “COPY LBA AVL TREE AND PUNCTURE”, at 304, is called, which copies the LBA AVL tree 113 to the working tree while puncturing the ETCRC values and ensuring the tree meets the well-known AVL tree criteria. This step, at 304, is disclosed in further detail below with respect to FIG. 10. The hidden data output file is then opened, at 306, as the last operation in FIG. 7.


The flowchart 700 continues in FIG. 8 with a call to a subroutine, at 307, “PROCESS INPUTFILE SECTORS.” This subroutine reads sectors from INPUTFILE and searches for them in the working AVL tree 113, outputting hidden sectors to OUTPUTFILE. This subroutine is described in detail further below with respect to FIG. 11.


An operation, at 308, closes the INPUTFILE and OUTPUTFILE files after all sectors are processed. If the PBA data is correct as supplied, a decision, at 309, terminates the process as no puncturing is required to complete hidden data discovery. If the PBA data is not correct as supplied, the PUNCTURE INDEX is incremented, at 310, and tested, at 311, for maximum value, terminating the process if so. Else, the INPUTFILE is closed, and the OUTPUTFILE is reopened as the new INPUTFILE, at 312, feeding the latest hidden data back into the process for further examination with a different puncturing pattern. This completes the description of the main loop in FIGS. 7 and 8.



FIG. 9 shows the subroutine that builds the LEA AVL tree, at represented in general at step 300. Generally, the LBA image 110 is scanned and the AVL tree 113 is built, suitable for searching using the ETCRC values computed from the data in the PBA image 120.


An empty working tree is created, at 330, first, and the LBA image 110 is opened, at 331, as INPUTFILE. The top of the loop in this subroutine reads, at 332, a sector of data from INPUTFILE. The ETCRC and CRC values are computed, at 333. The ETCRC is searched for, at 334 in the tree. If not found, a node corresponding to the ETCRC is inserted, at 326 into the tree 113.


The node in the tree corresponding to the ETCRC then has the INPUTFILE byte offset and CRC of the sector stored, at 337, into the linked list. A decision block, at 338, at the end of the loop terminates the loop after the last LBA sector has been examined. The INPUTFILE is then closed, at 339.


The LBA AVL tree is now ready for use. FIG. 10 Shows a subroutine “COPY LBA AVL TREE AND PUNCTURE”, as originally identified at step 304. This subroutine copies the LBA AVL tree 113 to the working tree while puncturing the ETCRC values and ensuring the tree meets the well-known. AVL tree criteria. An empty working tree, at 320, is created first. The top of the loop in this subroutine reads an LBA tree node ETCRC value, at 321, then punctures, at 322, that value according to the current PUNCTURE INDEX table indexed puncture pattern. That punctured ETCRC is searched for, at 323, in the working tree. If not found, the ETCRC is inserted, at 325, into the tree.


The node in the working tree corresponding to the punctured ETCRC then is updated, at 326, with a reference to the linked list from the original LBA AVL tree 113. The puncturing process can have the effect of combining two or more LBA AVL tree nodes 114. Rather than copying all the linked lists associated with those nodes into the working tree, references to the original LBA tree tables are stored, saving memory. A decision block, at 327, at the end of the loop terminates the loop after the last LBA node has been loaded into the working tree.



FIG. 13 shows a block diagram of an AVL tree labeled with sample ETCRC values. The embodiment in FIG. 13 is a non-limiting example. As shown, four digits are provided in each node for the sake of discussion and brevity. A node 130 is shown containing the decimal value 4713.



FIG. 14 shows the information after puncturing and rebalancing of the AVL tree. The third digit from the right in each number has been punctured by replacing it with a zero. For the FIG. 14 values 4713, 4513, and 4313, the puncturing process makes them all identical, at a value of 4013. The node 140 takes on this value. This node also carries with it references to the linked lists from the three original nodes in FIG. 13 (reference not shown). The net effect is that the punctured and rebalanced tree in FIG. 14 produces more search hits on similarly punctured ETCRC values, increasing the likelihood of finding a match among LBA and PBA sectors that differ in only a few bits.



FIG. 11 shows a subroutine “PROCESS INPUTFILE SECTORS”. This subroutine was provided for at 307 in FIG. 8. This subroutine reads sectors from INPUTFILE and searches for them in the working AVL tree, outputting hidden sectors to OUTPUTFILE. At the top of the loop in this subroutine, a sector S is read, at 350, from the INPUTFILE, and INPUTFILE is either the PBA image 120 or a subsequent hidden data file, The ETCRC and CRC are computed, at 351, on the sector S. The ETCRC value is punctured, at 352, according to the current tabulated puncture pattern and specified by PUNCTURE——INDEX. That punctured ETCRC is searched for, at 353, in the working tree. If not found, the sector S is written, at 360, to the hidden data output file. If found, the node's linked list (or multiple lists in the case of a punctured-ETCRC based working tree) is searched, at 355, for the CRC of sector S. If found, at 356, the sector is not a hidden data sector and execution falls to the bottom of the loop, at 361. If the CRC indicates a mismatch, the disposition of sector S depends on the corrected status, at 357, of the PBA image 120. If the PBA is corrected, then a CRC mismatch indicates a data mismatch and sector S is written, at 360, to the hidden data output file. If the PBA is uncorrected, then the LBA sector data is compared, at 358, with sector S byte for byte. The fraction of mismatch (bits not matching divided by total bits in the sector) is compared, at 359, against the specified mismatch LIMIT. A mismatch causes sector S to be written, at 360, to the hidden data output file. The bottom of this subroutine's loop checks, at 361, to see if all sectors in INPUTFILE have been processed, looping if not, returning to the caller if so.


After isolation of the hidden data, commercial tools can be applied to identify interesting information, such as word processing documents, spreadsheets, videos, and images.



FIG. 15 shows a block diagram of an embodiment of a device. The device 1500 comprises an interface 1510 to access memory space on a memory device, such as, but not limited to, a memory array on the SSD. A graphics processing unit 1520 is also provided to create a plurality of hash values for a logical block address (LBA) memory space of the memory device and to create a plurality of hash values for a physical block address (PBA) memory space of the memory to identify data hidden within the PBA memory space from view of the LBA memory space. A display 1530 is also provided to show the located hidden data. The display may be a visual display or may produce a printout with information pertaining to the hidden data. Thus, the term display is not used herein to be considered limiting.


As disclosed above, the graphics processing unit 1520 may create a PBA index table associated with the plurality of PBA hash values. It 1520 may also compare the plurality of LBA hash values with the PBA index table with a fast binary search, and identifies matches of any of the plurality of LBA hash values and any of the plurality of PBA hash values resulting from the step of comparing. The graphics processing unit 1520 may also identify data hidden within the PBA memory space when data identified in the PBA index table has no identified with any of the plurality of LBA hash values. Thus, as also disclosed above creating the hash value for both the LBA and PBA memory spaces comprises creating an error tolerant cyclic redundancy check (ETCRC) table for both the LBA memory space and PBA memory space.


The disclosed embodiments are conformable to parallel processing on a graphics processing unit (GPU). Radix and merge sort algorithms exist that are readily available for GPU application. The ETCRC process is performed on each IBA and PBA sector independently and therefore may be paralleled. Furthermore, the matching process for each LBA sector may be paralleled with appropriate record locking mechanisms for access to individual records in the PBA index table. The embodiments may be designed to allow for extensive parallelism and commensurate acceleration.


As another non-limiting example, the embodiments disclosed herein may be used with the on-board chip reader adapter disclosed in U.S. Patent Application No. ______, which claims priority to U.S. Provisional Application 62/000,475 filed May 19, 2014, both which are incorporated herein by reference in its entirety


Even though the disclosed embodiments do not match the PBA and LBA sectors the same way that the SSD FTL software would, it does not matter because the output that is valuable is the unique, hidden PBA data, without regard for the PBA/LBA mapping relationship maintained by the FTL. Specifically, as a non-limiting example, for four PBA sectors containing data values A, B, B, and C, with the LBA showing sectors with values A, B, and C, an embodiment disclosed herein outputs as hidden data one sector with data value B. It does not matter what the PBA/LBA mapping was for that sector. It only matters that a sector with that value was recovered from the hidden PBA space, This is advantageous in that a determination of the PBA/LBA mapping relationship is not required, which is different for most types of SSD and FTL algorithms.


Several general advantages of this invention include, but are not limited to the following: hidden data is discovered, data not accessible over the usual computer interface for the storage device; only the most basic knowledge of the storage format is required, and no information about the FTL mapping between LBA to PBA; knowledge of the error correction methods is optional, and the error tolerant cyclic redundancy check makes hidden data discovery possible in a reasonable time frame; identification of the hidden data is accomplished in a reasonable amount of time, using a reasonable amount of storage that is approximately about 10% of the size of the LBA image.


While various aspects of the present disclosure have been described above, it should be understood that they have been presented by way of example and not limitation. It will be apparent to persons skilled in the relevant art(s) that various changes in form and detail can be made therein without departing from the spirit and scope of the present disclosure. Thus, the present disclosure should not be limited by any of the above described exemplary aspects.


In addition, it should be understood that the figures in the attachments, which highlight the structure, methodology, functionality and advantages of the present disclosure, are presented for example purposes only. The present disclosure is sufficiently flexible and configurable, such that it may be implemented in ways other than that shown in the accompanying figures (e.g., implementation within computing devices and environments other than those mentioned herein). As will be appreciated by those skilled in the relevant art(s) after reading the description herein, certain features from different aspects of the method of the present disclosure may be combined to form yet new aspects of the present disclosure.


Further, the purpose of the foregoing Abstract is to enable the U.S. Patent and Trademark Office and the public generally and especially the scientists, engineers and practitioners in the relevant art(s) who are not familiar with patent or legal terms or phraseology, to determine quickly from a cursory inspection the nature and essence of this technical disclosure. The Abstract is not intended to be limiting as to the scope of the present disclosure in any way.

Claims
  • 1. A method of isolating hidden data in a solid state memory system, the method comprising: obtaining a logical block address (LBA) image from the memory system;obtaining a physical block address (PBA) image;determining whether an error exists in the PBA image and correcting the error;calculating an error tolerant cyclic redundancy check (ETCRC) on each sector of the LBA image and building a search tree indexed on the ETCRC value;for each sector in the PBA image, computing the ETCRC value and searching for the ETCRC value in the LBA search tree;if the ETCRC value found, comparing the cyclic redundancy check (CRC) of the LBA and PBA sectors; andoutputting to an output file the PBA sector as hidden data if either the ETCRC value is not found in the LBA search tree or the CRC comparison fails.
  • 2. The method according to claim 1, further comprising: selecting one of several puncturing patterns for ETCRC values;building a punctured ETCRC working search tree from the LBA search tree;for each sector in the hidden data file, computing the punctured ETCRC value and searching for the punctured ETCRC value in the working search tree;if the ETCRC value is found, comparing the data of the LBA and hidden data sectors; andoutputting to a new hidden data output file the previous hidden data file sector if either the ETCRC value is not found in the LBA search tree or the data comparison fails as specified by a predefined limit.
  • 3. The method according to claim 1, wherein the search tree indexed on the ETCRC value is an Adelson-Velsky and Landis' (AVL) tree.
  • 4. A method of computing a total hash function of an array of values which limits the impact of a change in the array of values to one subfield in the hash value, the method comprising: dividing the array of values into a number of sections;computing abash function over each section; andconcatenating the computed hash function values to create a total hash function of the array of values, wherein a change in one of the array of values is reflected as a change in the total hash function in only one subfield.
  • 5. A hidden data determination system for locating hidden date in a memory array of a solid state device, the system comprising: an interface to access memory space on a memory device;a graphics processing unit to create a plurality of hash values for a logical block address (LBA) memory space of the memory device and to create a plurality of hash values for a physical block address (PBA) memory space of the memory to identify data hidden within the PBA memory space from view of the LBA memory space; wherein the graphics processing unit:creates a PBA index table associated with the plurality of PBA hash values;compares the plurality of LBA hash values with the PBA index table;identifies matches of any of the plurality of LBA hash values and any of the plurality of PBA hash values resulting from the step of comparing; andidentifies data hidden within the PBA memory space when data identified in the PBA index table has no identified with any of the plurality of LBA hash values; anda display device to show the data hidden.
  • 6. The system according to claim 5, wherein the graphical processing unit creates the hash value for both the LBA and PBA memory spaces as an error tolerant cyclic redundancy check (ETCRC) table for both the LBA memory space and PBA memory space.
  • 7. The system according to claim 6, further comprises creating at least one of the LBA ETCRC table and the PBA ETCRC table with a search tree.
  • 8. The system according to claim 5, wherein the graphical processing unit creating the PBA index comprises: scanning the PBA ETCRC table;counting successive duplicate ETCRC values; andwriting the ETCRC value, count of duplicates, and index of a first ETCRC of that value in the PBA ETCRC table.
  • 9. The system according to claim 5, wherein the graphical processing unit selectively ignores combinations of fields with the PBA ETCRC table and LBA ETCRC table to resolve any uncorrected bit errors in the PBA memory space.
  • 10. The system according to claim 5, wherein the graphical processing unit identifies matches by removing the matches after initially identifying matches occurs and then creates a second LBA ETCRC table and then repeats to identify matches to increase a rate of execution.
  • 11. The system according to claim 5, wherein the graphical processing unit selectively ignores combinations of fields with the PBA ETCRC table and the second LBA ETCRC table to resolve any uncorrected bit errors in the PBA memory space.
  • 12. The system according to claim 5, wherein the graphical processing unit creates the LBA ETCRC table and the PBA ETCRC table on each memory sector of the LBA memory space and PBA memory space independently.
  • 13. The system according to claim 5, wherein hidden data located is located without regard to a mapping relationship between the PBA memory space and LBA memory space and a flash translation layer (FTL) of software is included in the SSD.
  • 14. The system according to claim 5, wherein the search tree comprises an Adelson-Velsky and Landis' (AVL) tree.
CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional Application No. 62/000,478 filed May 19, 2014, and incorporated herein by reference in its entirety.

Provisional Applications (1)
Number Date Country
62000478 May 2014 US