Read caches load blocks of data from slower storage devices like disk drives into higher speed storage devices like random access memory (RAM). Subsequent reads go directly to the higher speed devices thereby avoiding the latency due to the slower memory devices. Read caches algorithms can opportunistically load blocks of data that may be accessed in the future. By loading blocks of data from the slower storage device before the data is needed, the initial latency due to the slower storage device may be avoided. There may be multiple read caches in a system. One typical location for a read cache is on the storage controller for the slower memory devices.
One method read caches use to store blocks of data that may be used in the future is by selecting the next logical blocks of data stored after the block of data that was just requested. This technique is typically called a read-ahead algorithm. Unfortunately, the data used and stored by the most computer programs is organized by files and the data from a file may not be stored in contiguous memory blocks on the slower memory devices.
When a computer program needs the data stored in file A, the program “opens” file A by sending a request to the operating system. The operating system typically has a logical map between the name of the file (file A) and a local or address in memory space where the data is stored, as a well as the amount of data or length of the file. The operating system typically does the mapping between the file name and the locations in memory space where the data is stored. The operating system will send a request for data (a data read) to the storage controller using the mapped data location. The read request will typically have a starting location or address in memory space and a length. The operating system may break the request for data from the file into a number of different read request. In this example, the operating system may break the request into three different read requests. The first read request may correspond to the data stored in the first part of file A.
When the operating system sends the first read request, corresponding to the data stored in the first part of file A, to the storage controller, the storage controller may check its cache to determine if the data has been loaded into the cache. When the data has not been loaded into the cache, the storage controller will access the slower memory device and read block 1. The storage controller will return the data from block 1 to the operating system. The storage controller may also load the data from block 1 into its cache. If the cache in the storage controller is using a read ahead algorithm, the storage controller may also read the data from block 2 and load it into cache. In this example, reading the data from block 2 will not help reduce the latency due to the slower storage device. When the operating system requests the data from the next part of file A (the second part), the operating system will request the data from block 4. The data from block 2 that was loaded into the cache will not be used. Because the data from file A is not stored in contiguous logical storage blocks, the read ahead cache algorithm may no be effective in reducing the latency of the slower storage devices. The storage controller typically does not have any file level information that maps which blocks are assigned to which files.
In operation, the processors 206 on the processor blades 204 may be executing code. The code may be a computer program. A computer program may be one or more operating systems, application programs, firmware routines, basic input output routines (BIOS), or the like. Controller 210 may be running code that monitors the operation of computer system 200 including the operations of the one or more operating systems and/or the operation of one or more of the auxiliary blades 218. In one example embodiment of the invention, one or more of the auxiliary blades 218 may be a storage blade.
Storage blade 218 comprises storage controller 320 and a plurality of storage devices 326. Storage devices may be hard disk drives, solid state drives, or the like. Storage controller 320 is coupled to, and controls, the plurality of storage devices 326. Storage controller 320 stores data onto storage devices 326 in logical storage blocks. Storage blade 218 is couples to bus 216 in computer system 200. Storage controller 320 comprises one or more processors 322 and cache 324. Storage controller 320 uses cache 324 to reduce the latency due to storage devices 326. In other example embodiments of the invention, the storage system may be external to computer system 200, for example a network attached storage system.
In one example embodiment of the invention, storage controller 320 loads the data from a plurality of logical memory blocks on the storage devices into cache 324. The logical storage blocks loaded into cache 324 correspond to one or more files stored onto the storage devices 326. The logical storage blocks loaded into the caches may not be from contiguous storage blocks because the data in the files may not be stored in contiguous memory blocks. Once the data is loaded into the cache, the data is marked as persistent. Data marked as persist remains “pinned” in the case until the persistent nature of the data is cleared. When a computer program, for example an application program or an operating system, requests the data saved in the file, the storage controller 320 can read the data directly from the read cache 324 versus the storage devices and immediately return the data to the requesting program.
The file(s) may be associated with a particular application program or an instantiation of an operating system. In one example embodiment of the invention, the files used or associated with a computer program are identified. Once the files are identified, the logical storage blocks corresponding to the files are identified. When the program is launched or instantiated, the data contained in the logical storage blocks are loaded into a cache. When the program opens its associated files, the data from the files will already be loaded into the cache, thereby reducing the access time for the data.
In some example embodiments of the invention, the files used or associated with a program may depend on who launched the program. For example, when the program is a web based program used by a plurality of users, each instantiation of the program may be associated with a different set of files. For example, a first user may have a first set of images, songs, or data, stored in files accessed by the program, and a second user may have a second set of images, songs or data accessed by the same program when launched by the second user. In one example embodiment of the invention, the storage blocks loaded into the cache may correspond to the files associated with an instantiation of the program that corresponds to a specific user ID. In other example embodiments of the invention, the files associated with the program may not be dependent on the user of the program.
The files associated with a program or an operating systems can be determined in a number of different ways. In one example embodiment of the invention, an administrator may make a list of files used by a program. In another example embodiment of the invention, the operating system may keep track of the files used by different programs. The operating system may keep track of the files used by a program for each different user that launches the program. In yet another example embodiment of the invention, a tracking program, running on controller 210, may track which files are associated with each program or operating system running on one or more of the processor blades 204. The tracking program may be started before the program to be tracked is launched, and stopped after the program to be tracked is ended. Once the list of files associated with a program have been determined, the logical storage blocks associated with the files is determined.
Determining the logical storage blocks associated with a file can be done in a number of ways. In one example embodiment of the invention, a file walking program can be used. The file walking program takes the list of files and “touches” or accesses each file. The file walking program tracks which logical blocks in the storage devices are accessed as it reads or touches each of the files. In this way the file walking program generates a list of logical storage blocks that corresponds to each of the files associated with a program. In another example embodiment of the invention, the tracking program, running on controller 210, may directly track which logical storage blocks are associated with each program, instead of tracking which files are associated with each program. In another example embodiment of the invention, the storage controller may track the logical memory blocks used after receiving a “start tracking” command. The storage controller would stop tracking which logical blocks were loaded after receiving a “stop tracking” command. The list of logical memory block associated with a file or a program can be stored for later use.
Once the list of logical blocks associated with a file or a program have been determined, the data stored in the list of logical blocks can be read from the logical storage blocks and moved into cache. The data can be moved into cache using a number of different triggers. In one example embodiment of the invention, the trigger to move the data into cache can be a user or administrator command. Once the data has been loaded into cache the data may be marked as persistent. A second user or administrator command may be used to clear the persistent property of the data so that the data may be flushed from cache using the normal cache algorithm. There may be a third command that immediately flushes the data from cache. In some embodiments the second and third commands may be combined.
In another example embodiment of the invention, the trigger to load the data into cache may be when the program is launched or instantiated. When the logical storage blocks are associated with a program launched by a specific user, only the data from the logical storage blocks that are associated with that user are loaded into cache when the user launches the program. The trigger to flush the cache may be when the program is closed or ends.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/US10/43472 | 7/28/2010 | WO | 00 | 12/7/2012 |