The present application is related to the following commonly owned and assigned applications: U.S. application Ser. No. (unassigned), Attorney Docket No. WEBR-040/00US, “Method and System for Rapid Data-Fragmentation Analysis of a New Technology File System (NTFS),” filed herewith; and U.S. application Ser. No. 11/145,593, Attorney Docket No. WEBR-009/00US, “System and Method for Neutralizing Locked Pestware Files,” which is incorporated herein by reference in its entirety.
The present invention relates to computer storage technology. In particular, but without limitation, the present invention relates to methods and systems for evaluating the extent of data fragmentation on a computer storage medium.
A number of variables influence computer performance. Factors such as processor speed, the size and speed of random-access memory (RAM), the speed of the system's internal bus, and the speed of disk access all play a role. The speed of disk access is particularly important since disk drives are slower than RAM, and many computer applications involve extensive disk access.
A formatted computer storage medium (e.g., a hard disk) typically contains data storage units called “clusters,” each of which is usually a power-of-two multiple of a smaller 512-byte-long unit called a “sector”; directory or index information about the files and folders stored on the storage medium; and a system for keeping track of which clusters are in use and to which file or folder each cluster belongs. Two well-known file-system architectures are the file-allocation-table (FAT) file system and the New Technology File System (NTFS). These two architectures take very different approaches to organizing and keeping track of data on a storage medium.
The longer a storage medium is used, the more fragmented the data on the storage medium become. That is, the clusters associated with an increasing number of files on the storage medium are scattered rather than contiguous. On a disk drive, reading a fragmented file requires more time than reading a non-fragmented file because the drive head has to jump around on the storage medium to access the scattered clusters making up the file. This extra “seek time” degrades system performance. Since flash-memory-based storage media such as secure digital (SD) cards and multi-media cards (MMCs) are typically formatted like disk volumes, fragmentation can also slow down the reading of data from those storage media, although the problem of seek time that occurs with disk drives is absent.
Utilities for defragmenting a storage medium have become commonplace. Such utilities rewrite the data on the storage medium, rendering contiguous the clusters making up each file. Before a computer user incurs the time and possible risk to data involved in using a defragmentation utility, however, the user may wish to test the storage medium first to measure the extent of data fragmentation. Conventional methods for evaluating the extent of fragmentation on a storage medium employing a FAT file system involve accessing the storage medium's directory tables repeatedly to identify one file at a time, locating the FAT entry for the first cluster of each file, and tracing the subsequent FAT entries associated with that file to determine whether the file contains any fragmentations. These conventional methods can require a significant amount of time to execute (e.g., several minutes), especially for large storage media.
It is thus apparent that there is a need in the art for an improved method and system for rapid data-fragmentation analysis of a FAT file system.
Illustrative embodiments of the present invention that are shown in the drawings are summarized below. These and other embodiments are more fully described in the Detailed Description section. It is to be understood, however, that there is no intention to limit the invention to the forms described in this Summary of the Invention or in the Detailed Description. One skilled in the art can recognize that there are numerous modifications, equivalents and alternative constructions that fall within the spirit and scope of the invention as expressed in the claims.
The present invention can provide a method and system for rapid data-fragmentation analysis of a file-allocation-table (FAT) file system. One illustrative embodiment of the invention is a method comprising reading into a memory of a computer a FAT associated with a storage medium of the computer, the storage medium having a plurality of clusters, each cluster having an associated entry in the FAT; and analyzing, substantially without accessing a directory table associated with the storage medium, the FAT to estimate the extent of data fragmentation on the storage medium.
Another illustrative embodiment is a system comprising a data acquisition module configured to read into a memory of a computer a FAT associated with a storage medium of the computer, the storage medium having a plurality of clusters, each cluster having an associated entry in the FAT; and an analysis module configured to analyze, substantially without accessing a directory table associated with the storage medium, the FAT to estimate the extent of data fragmentation on the storage medium.
Another illustrative embodiment is a computer-readable storage medium comprising a first instruction segment configured to read into a memory of a computer a file allocation table (FAT) associated with a storage medium of the computer, the storage medium having a plurality of clusters, each cluster having an associated entry in the FAT; and a second instruction segment configured to analyze, substantially without accessing a directory table associated with the storage medium, the FAT to estimate the extent of data fragmentation on the storage medium. These and other embodiments are described in more detail herein.
Various objects and advantages and a more complete understanding of the present invention are apparent and more readily appreciated by reference to the following Detailed Description and to the appended claims when taken in conjunction with the accompanying Drawings wherein:
Determining the extent of data fragmentation (“fragmentation analysis”) on a storage medium formatted with a file-allocation-table (FAT) file system can be sped up significantly by reading the FAT into the computer's random-access memory (RAM) and, substantially without accessing any directory tables of the storage medium, analyzing the FAT. Where all that is needed is a reasonably accurate estimate of the extent of data fragmentation to aid a user in deciding whether to defragment the storage medium, it is unnecessary to identify files and folders as such using the directory tables and to trace their associated FAT entries. Instead, FAT entries can be examined quickly and efficiently in sequential order, and a variety of statistics can be compiled to evaluate the extent of data fragmentation on the storage medium. The results may be reported to a user, and a recommendation may be made that the storage medium be defragmented, if the extent of data fragmentation exceeds a predetermined threshold.
A count of clusters associated with fragmented files (“fragmented clusters”), clusters associated with non-fragmented files (“non-fragmented clusters”), used clusters, unused clusters, and files (including folders or directories) can be gathered by examining FAT entries in sequential order in RAM without consulting directory tables of the storage medium. To improve the accuracy of the fragmentation estimate, the structure of individual files on the storage medium can also be inferred from the FAT entries. For example, where it is inferred that a particular group of clusters constitute a file having at least one “fragmentation,” all clusters in that particular group of clusters can be treated as “fragmented” and can be included in a count of “fragmented clusters.” Herein, a “fragmentation” is an interruption in the contiguity on a storage medium of the clusters making up a file or folder. In one illustrative embodiment, the ratio of fragmented clusters to total used clusters on the storage medium is computed and used to determine whether defragmentation of the storage medium is advisable.
In some embodiments, the FAT is read into RAM in its entirety before the fragmentation analysis begins. In other embodiments, only a portion of the FAT is read into RAM initially, and other portions are fetched from the storage medium as needed. In one illustrative embodiment, directory tables associated with the storage medium are not accessed at all in connection with the fragmentation analysis. In other embodiments, a limited amount directory-table access may be part of the fragmentation analysis.
Referring now to the drawings, where like or similar elements are designated with identical reference numerals throughout the several views,
Input devices 115 may be, for example, a keyboard and a mouse or other pointing device. In an illustrative embodiment, storage medium 135 is a disk volume such as a hard disk drive (HDD). In other embodiments, however, storage medium 135 may be any rewritable storage medium having a FAT file system, including, without limitation, magnetic disks, rewritable optical discs, and flash-memory-based storage media such as secure digital (SD) cards and multi-media cards (MMCs). Memory 130 may include random-access memory (RAM), read-only memory (ROM), or a combination thereof.
Fragmentation analysis system 138 estimates the extent of data fragmentation on storage medium 135. In some embodiments, the estimate is highly accurate. In other embodiments, the estimate may be designed to be less accurate, depending on the application. In the illustrative embodiment of
For convenience in this Detailed Description, the functionality of fragmentation analysis system 138 has been subdivided into three modules: data acquisition module 140, analysis module 145, and report generation module 150. In various embodiments of the invention, the functionality of these three modules may be combined or subdivided in a variety of ways different from that shown in
As mentioned above, fragmentation analysis system 138 performs its functions substantially without accessing any directory tables associated with storage medium 135. In FAT file systems, a directory table is a special type of file that contains metadata about the files and other directories (or “folders”) stored within a particular directory/folder. Each entry in the directory table contains the name, extension, and other information associated with the corresponding file or folder, including the address of the first cluster of the file or folder on storage medium 135. In one illustrative embodiment, fragmentation analysis system 138 does not access any directory tables of storage medium 135 at any time prior to or during the fragmentation analysis.
Before embodiments of the invention are described in further detail, some introductory context and terminology are first provided.
In
An entry 205 for a particular Cluster K may satisfy FAT(K)=K+1, yet Cluster K may be part of a fragmented file 230 (i.e., one or more fragmentations may occur elsewhere in the file). In such cases, analysis module 145 may infer, as it examines FAT entries 205, that Cluster K and a set of other clusters 210 constitute a file 230, and analysis module 145 may treat all clusters 210 that make up that file 230 as “fragmented clusters” in compiling its statistics.
Illustrative embodiments of the invention will now be described in more detail. Before fragmentation analysis system 138 analyzes storage medium 135, fragmentation analysis system 138 first determines that storage medium 135 has a FAT file system (as opposed to, e.g., a New Technology File System (NTFS)). To do so, fragmentation analysis system 138 may examine the first sector of storage medium 135 (e.g., the boot sector). That first sector contains the number of logical partitions on the drive (each logical partition on a physical drive has its own FAT), the number of sectors each FAT occupies, and the type of each logical partition (e.g., FAT16 or FAT32). The number (“16” or “32”) appended to “FAT” indicates the width, in bits, of each FAT entry 205 in a particular file system. The first sector of the volume also contains the number of hidden sectors on the volume (the number of sectors from the beginning of the volume to the first data allocation). Fragmentation analysis system 138 can find any needed information to read FAT 200 into memory 130 by examining the first sector of the volume as just described, obviating the need to consult directory tables associated with storage medium 135. In an illustrative embodiment, data acquisition module 140 performs this preliminary information gathering.
Once a FAT on storage medium 135 has been identified and located, data acquisition module 140 reads FAT 200 into memory 130. In some embodiments, data acquisition module 140 reads FAT 200 into memory 130 in its entirety before analysis module 145 analyzes FAT 200. In other embodiments, data acquisition module 140 reads only a portion of FAT 200 into memory 130 initially, reading subsequent portions of FAT 200 into memory 130 as needed. One advantage of the invention over conventional fragmentation analysis techniques is that, once FAT 200 has been loaded into memory 130, the fragmentation analysis may be performed on FAT 200 in RAM without any further need to access storage medium 135 (e.g., to consult directory tables of storage medium 135).
Analysis module 145 may, initially at least, count Cluster 5 as non-fragmented. When analysis module 145 encounters the index 220 in the entry 205 for Cluster 6, it notes a fragmentation and may increment a count of fragmentations and/or fragmented clusters. By keeping track of how many contiguous used clusters have been encountered in FAT 200 since the last EOF marker 225 was read, analysis module 145 may also infer that Cluster 5 is part of the same fragmented file 230 as Cluster 6 and, therefore, that Cluster 5 should be included, along with Cluster 6, in a count of fragmented clusters 210. Of course, analysis module 145 may also adjust the tentative count of non-fragmented clusters 210 accordingly.
If only fragmentations are counted (see the discussion of
Analysis module 145 may, initially at least, count Cluster 7 as non-fragmented. In encountering the entry 205 for Cluster 8, analysis module 145 may again increment the tally of files and folders. At the entry 205 for Cluster 9, analysis module 145 may determine that Cluster 9 is the cluster 210 to which the index 220 in the entry 205 for Cluster 6 pointed. For example, analysis module 145 may keep a running list of the “jumped-to” clusters 210 (such as Cluster 9 in this example) associated with fragmentations (such as the one occurring at Cluster 6 in this example) as it encounters the fragmentations. When the entry 205 of the jumped-to cluster 210 is later reached, analysis module 145 can take appropriate action.
When analysis module 145 reaches entry 205 for jumped-to Cluster 9, its inference of the structure of the file 230 made up of Clusters 5-6 and 9-11 may resume. Analysis module 145 can infer that a series of contiguous clusters 210 commencing with Cluster 9 and ending with Cluster 11, whose entry 205 contains an EOF marker 225, are part of the same file 230 as Clusters 5 and 6. Therefore, analysis module 145 can include Clusters 9-11 in a count of fragmented clusters 210 and adjust its tentative count of non-fragmented clusters 210 accordingly.
Analysis module 145 identifies another fragmentation at the entry 205 for Cluster 12, whose index 220 points backward to Cluster 7. Optionally, analysis module 145 may trace this index 220 back to Cluster 7 to infer that Clusters 7 and 8 belong to the same file 230 as Cluster 12 and update its statistics accordingly. Doing so improves the accuracy of the fragmentation estimate but is not essential in every embodiment of the invention.
In examining the entries 205 for Clusters 13-15, analysis module 145 encounters no further fragmentations and updates its statistics (e.g., a count of non-fragmented clusters 210 and a count of files and directories) accordingly.
As analysis module 145 examines each FAT entry, it may also update a count of “used” clusters 210 on storage medium 135.
At 530 in
If more entries 205 remain to be read at 550, the process returns to step 505 in
If, at 525, analysis module 145 determines that the value contained in the entry 205 read at 505 is not an index 220, analysis module 145 determines, at 555, whether the value is an EOF marker 225. If not, the process proceeds to 550. Otherwise, analysis module 145 updates a count of files and folders on storage medium 135, and the process then proceeds to 550.
In conclusion, the present invention provides, among other things, a method and system for rapid data-fragmentation analysis of a FAT file system. Those skilled in the art can readily recognize that numerous variations and substitutions may be made in the invention, its use and its configuration to achieve substantially the same results as achieved by the embodiments described herein. Accordingly, there is no intention to limit the invention to the disclosed exemplary forms. Many variations, modifications and alternative constructions fall within the scope and spirit of the disclosed invention as expressed in the claims. For example, it is not required that all of the various statistics mentioned above be collected in every embodiment of the invention. In some embodiments, some subset of those statistics or even other statistics not mentioned herein that, nevertheless, can be derived from examining a FAT 200 may be collected in analyzing a storage medium 135 for fragmentation.