User demand for mass storage capacity continues to grow, especially for storing large audio, video, image, and multimedia files. This capacity demand has affected the design and development of hard disks and removable media such as CDs (compact discs) and DVDs (digital versatile discs). Storage technologies are further evolving to meet user demands for increasingly greater capacity and more flexible capabilities. Examples of such technologies include compact and portable mass storage devices. Mass storage devices are a class of devices used for storing data in a volume which can be shared with other devices and resources using a data transfer protocol running, for example, on a high speed external bus such as Universal Serial Bus (“USB”) or IEEE-1394 (Institute of Electrical and Electronics Engineers).
While some mass storage devices use solid state memory as a storage medium, larger capacity portable mass storage devices typically use a small-sized hard disk drive that may often be powered through the USB or IEEE-1394 data cable itself rather than use a separate power cord. These disk-based mass storage devices can thus enable plug-and-play convenience for users with a compact form factor while providing very large amounts of storage for multimedia including, for example, pictures and music libraries.
Mass storage devices typically store data in the form of files which are organized using a file system. The FAT (file allocation table) file system is one commonly used file system for disk-based mass storage devices. The FAT file system has its origins in the late 1970s and early 1980s and was the file system supported by the Microsoft MS-DOS operating system. It was originally developed as a simple file system suitable for floppy disk drives less than 500K (kilobytes) in size. Over time it has been enhanced to support larger and larger media. Currently, there are three FAT file system types: FAT12, FAT16, and FAT32. The basic difference in these FAT sub types, and the reason for the names, is the size, in bits, of the entries in the actual FAT structure on the disk. There are 12 bits in a FAT12 FAT entry, 16 bits in a FAT16 FAT entry, and 32 bits in a FAT32 FAT entry.
The FAT file system is characterized by the file allocation table (the “FAT”), which is really a table that resides in a reserved portion of volume. To protect the volume, two copies of the FAT are kept in case one becomes damaged. The FAT tables and the root directory are also stored in a fixed location so that the system's boot files can be correctly located.
While the FAT file system performs well in many applications, it has some inherent limitations. In particular, there is no organization to the FAT directory structure, and files and directories are written to the first open location on a disk. As a result, the clusters used for the files and directories can be randomly distributed on the disk in locations that are not logically close to one another. Accessing the data to enumerate a file index for the volume's contents can be undesirably time consuming because the hard disk drive read/write head must constantly move back and forth, to and from the different tracks on the disk, as it reads the relevant clusters.
This Background is provided to introduce a brief context for the Summary and Detailed Description that follow. This Background is not intended to be an aid in determining the scope of the claimed subject matter nor be viewed as limiting the claimed subject matter to implementations that solve any or all of the disadvantages or problems presented above.
An arrangement for enumerating data, such as media content including music, that is stored on external hard disk drive-based mass storage devices is provided by a media content processing system that implements a direct mass storage device file indexing process. This file indexing process is configured for finding all files and directories on the mass storage device, and reading through those parts of the files which contain metadata (such as album name, artist name, genre, track title, track number, etc.) about the file.
Use of the media content processing system reduces file enumeration time by minimizing the amount of physical movement of the read/write head in the mass storage device's hard disk drive as it reads data from the disk. This motion minimization is accomplished by reading the clusters of directory and file data in a sequential manner from the hard disk, rather than by randomly performing such read operations. The media content processing system keeps track of the location of clusters it must process in a work list (i.e., a request queue). Items in the request queue are processed by selecting the next closest cluster to the current physical location of the hard drive read/write head. If additional clusters are required to process an item, those clusters are added to the request queue and processed later, for example, in a subsequent iteration of the direct mass storage indexing process.
This Summary is provided to introduce a selection of concepts in a simplified form that are further described below in the Detailed Description. This Summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used as an aid in determining the scope of the claimed subject matter.
Like reference numerals indicate like elements in the drawings.
Clusters 115 comprise a set of sectors ranging in number from 2 to 128. The cluster size increases with the size of the hard disk 100 because FAT is limited in the number of clusters that it can track. Thus, larger volumes are supported in FAT by increasing the number of sectors per cluster. A cluster is the minimum space used by any read or write operation to the hard disk 100. Although clusters 115 are shown as being contiguous in
Various portions of the hard disk 100 are allocated for the FAT file system boot sector, one or more FAT tables, the root directory for volume, and a data region for files and directories. When a file is created, an entry is created in the FAT table and the first cluster number containing data is established. This entry in the FAT table either indicates that this is the last cluster of the file, or points to the next cluster. If the size of a file or directory is larger than the cluster size, then multiple clusters are allocated.
Because files and directories are written on the hard disk 100 to the first available clusters, the clusters storing such files and directories are accessed in a random manner as shown in
To locate the next piece of the File1.mp3, the read/write head moves to consult the FAT table on the hard disk 100, and then moves to the identified cluster to access 210-4 as shown. The process of consulting the directory entries and/or the FAT table and then moving to the identified cluster repeats in order to access the remaining directories, subdirectories, and files continues until all the contents on the hard drive are enumerated. Because the read/write head of the hard disk drive must continually move across the platters of the drive to get to the location of the FAT table, and to the clusters which store the files and directories, considerable latency may occur during enumeration of the volume's contents when using current FAT file system methodologies.
MSD 310, in this example, is a conventional hard disk-based device that is configured to be compact and portable and is further arranged as a volume under the FAT32 file system. MSD 310 is coupled to the sound/entertainment system 316 in the vehicle 321 using a USB cable 325 that carries signals in compliance with USB 2.0, although in alternative implementations other data transfer busses and protocols may also be utilized, including those, for example which use wireless or optical infrastructure.
A media content processing system 332 is also operative in the environment 300. In this example, media content processing system 332 is a discrete system in the vehicle 321 and is typically located behind the dashboard or console area, although other locations may also be utilized as dictated by the circumstances of a particular implementation. The media content processing system 332 is configured to be operatively connectable to the sound/entertainment system 316 over an interface (not shown), or it may be optionally integrated with the functionality provided by the sound/entertainment system 316 in common package or form factor in some applications. Media content processing system 332 is shown in detail in
As shown in
The media core 411 is arranged to parse file and/or directory data received from a process operating in the file index processing layer 415 to thereby perform the file enumeration through call back and return messages, as respectively indicated by reference numerals 418 and 422. Media core 411 may be optionally arranged to provide additional features and functionalities including, for example, media content decoding, rendering, and playback control in some implementations.
The file index processing layer 415 includes a direct MSD file indexing process 430 which interacts with the media core 411, as shown, and which also interacts with a FAT table cache 432 and a request queue 435. The direct MSD file indexing process 430 is further configured to read data from the MSD 310 that is sent using the USB protocol, in this illustrative example, as indicated by reference numeral 437.
The FAT table cache 432 is used to cache FAT table data whenever it is read from the hard disk 100 (
The FAT table cache 432 and request queue 435 are implemented in system memory 439 (e.g., volatile random access memory or “RAM”). The interaction between the FAT table cache 432 and direct MSD file indexing process 430 includes caching FAT table data, as indicated by reference numeral 440, and reading FAT table data from the cache, as indicated by reference numeral 442. The interaction between the request queue 435 and direct MSD file indexing process 430 includes saving request items in the queue, as indicated by reference numeral 445, and reading request items from the queue, as indicated by reference numeral 448. The operation of the direct MSD file indexing process 430 is shown in the flowchart in
At block 516, the direct MSD file indexing process 430 notifies the caller (i.e., the media core 411) of the new data ascertained from the method step at block 512. Control passes to decision block 520 where the caller decides whether it is interested in the new data. For example, the file extension may be of a particular type that is utilized in the illustrative environment 300 such as an MP3, WMA (Windows® Media Audio), or WAV (WAVeform audio format) file. In this case then, data associated with non-audio formats or file extensions would not be of interest.
Another example for which the caller may not be interested in the data is where enough parts of file have already been located so as to identify particular metadata of interest that will be used to enumerate the stored content and create a file index. Typically, and in this illustrative example, the metadata of interest relates to music and includes album name, artist name, genre, track (e.g., song) title, track number, etc. Thus, if all the metadata is already located, then the caller will not need to continue with an item even when it is a logical part of a file that was previously identified as being of interest. While such logical parts of the file would be needed to play back the content, they are not needed for enumeration purposes and could thus be skipped.
If the data is of interest to the caller, then control passes to decision block 523 where the direct MSD file indexing process 430 determines if the entire directory or file has been read. If it has not, then an item is either saved or updated in the request queue 435, as indicated at block 526. If the data is not of interest to the caller, then control passes to block 530, and an item is either not added, or removed, from the request queue.
Control passes from either block 526 or block 530 to decision block 534 where the direct MSD file index process 430 determines if there are any items in the request queue 435. If so, then control passes to decision block 538 where the direct MSD file indexing process 430 determines if the number of items in the request queue 435 is less than a low water mark (i.e., a lower limit). If so, then at decision block 542, if there are any directory items in the request queue 435, control returns to block 512 where the next sub-directory or file associated with that directory item in the request queue 435 is read. The low water mark is used to designate a set minimum number of items in the request queue 435 above which it is efficient to process the queued items.
If there are no directory items in the request queue 435, then control passes to block 545 where the next data cluster that is associated with that file item in the request queue 435 is read.
If the number of items is not below the low water mark, then control passes to block 547. If the number of items in the request queue 435 is greater than a high water mark (i.e., an upper limit), then control passes to block 550. If there are no file items in the request queue 435, then control returns to block 512 where the next sub-directory or file associated with that directory item in the request queue 435 is read.
If there are file items in the request queue 435, then control passes to block 545 where the next data cluster that is associated with that file item in the request queue 435 is read. If the number of items in the request queue 435 is less than the high water mark, then control passes to block 552 where the file item in the request queue 435 that owns the next closest cluster is found. At decision block 554, if the item in the request queue 435 is a file, then control passes to block 545 where the next data cluster that is associated with that file item in the request queue 435 is read. If the next item is not a file (i.e., it is a directory), then control returns to block 512 where the next sub-directory or file associated with that directory item in the request queue 435 is read. The high water mark may be configured to different values depending on the requirements of a particular implementation and will typically be sized in light of available resources such as system memory.
The above described method is successively iterated until, at block 534, when there are no more items remaining in request queue 435, the method ends at block 560.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.
This application claims the benefit of U.S. Provisional Patent Application Ser. No. 60/966,032 filed Aug. 24, 2007, entitled “Direct Mass Storage Device File Indexing” which is incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
60966032 | Aug 2007 | US |