The present invention relates, in general, to managing an information object stored in a storage device or medium.
In an environment comprising a plurality of storage devices connected through a network, information objects may be migrated among storage devices to improve information access efficiency. The information objects, such as files, are migrated to a storage device with a lower operating rate or to a storage device with a higher performance rating. Current algorithms involve the use of least recently used storage hierarchy mechanisms based upon frequency of access to the information object stored in the storage device, the performance of each storage device, the cost, etc.
According to one embodiment of the present invention, a computer implemented method, system, and program product is provided for storing and operating an information object. An indicator associated with the information object is read. The indicator indicates that a historical information is stored for the information object. Responsive to determining from the historical information that the information object has been historically accessed, (a) future access time based on the historical information is determined; (b) a trigger for placing the information object in a storage device at a predetermined time before the future access of the information object is scheduled, the trigger being associated with a scheduled time; and (c) responsive to the scheduled time elapsing, the trigger is executed. When the trigger is executed, the information object is placed in said storage device.
According to another embodiment of the present invention, a database comprising the historical information is updated with an updated historical information regarding the information object responsive to the information object being accessed.
According to another embodiment of the present invention, an optimal placement of the information object into the storage device is determined based on a predetermined policy.
According to another embodiment of the present invention, the predetermined policy comprises a policy selected from the group consisting of a placement policy, a storage management policy, and a storage hierarchy.
According to another embodiment of the present invention, the historical information comprises a historical access time.
According to another embodiment of the present invention, the indicator is a bit.
According to another embodiment of the present invention, the predetermined time is determined based upon the historical information.
According to another embodiment of the present invention, the historical information is updated when the trigger is executed.
Still referring to
In one embodiment of the present invention, information object or file management is extended to include historical access statistics to enable optimal placement of files. A file is a named piece of data that is referenced by workloads and can be relocated in the data management system. For convenience purposes, the term “file” and “information object” are used interchangeably. Historical access statistics include a sizable class of access data that may be predictable in terms of time period (i.e., days, weeks, months and years) access patterns. An indicator in file metadata (i.e. an inode) along with historical access statistics may be used to analyze, on a periodic or event basis, patterns and cycles of particular file data in a file data repository. A historical access log may be created containing historical access statistics to enable optimal file data placement over time. Upon completion of the analysis, the analysis output may drive scheduled tasks into the data management facility. These scheduled tasks may allow for the optimal placement of file data, so that the file data will be placed prior to a predicated file access event in the future. These scheduled tasks may also allow for the movement of file data from an optimal placement for near-in-time access to an optimal placement for infrequent access.
In one embodiment of the invention, the indicator in file metadata (i.e. an inode) may be a flag, for example a single bit. This flag may indicate that historical point-in-time information is kept for a particular file. This bit may be set at any time during the life cycle of the file. In another embodiment of the invention, a bit may be stored at the dnode level (a data structure, for example, in a Unix® or Linux® system) indicating all elements within a directory is using a metadata system with the indicator. In another embodiment, a bit may be stored in an external database system which would indicate that the same historical access recording mechanisms, as described previously, should be enabled. In another embodiment of the invention, a Linux file system may be used. The Linux file system, as discussed in “Anatomy of the Linux file system” by M. Tim Jones, Oct. 30 2007, is incorporated herein by reference.
One embodiment of the present invention may optionally include the ability through management services to set policies to configure the size, location, and field for each record in the historical access log. The present invention may also include the ability to configure the pattern matching ability to enable easier configuration of files that have historical access statistics.
A Placement and Migration Engine 208 receives multiple inputs for its analysis of where to optimally place a file. These inputs may include the historical access entries 205, placement queries 209, storage management policies 210, and storage hierarchy with performance attributes 211. The historical access log entries 205 are made up of historical access information. The historical access log entries 205 may include individual log entries 206 that may further include multiple log entry fields 207. These log entry fields 207 may include access event time, duration of access, access type, security credential used, was item previously placed, previous location, size delta from operation, and size before operation. Other examples of entry fields, which represent other metadata that may be updated, include read time, write time, average delay in read time from last read, average read time from last write, average write time from last write, average write time from read. Entry fields may further include other relevant fields not listed above.
The placement queries 209 are predefined policies which determine where files are generally placed. For example, if there is a lull in the entire system, then the placement queries may define whether the system should start file migration during that lull. Also for example, files may be placed or migrated regardless of how busy the system is if there is a policy stating that the need for those files outweighs the performance impact in placing those files. The storage management policies 210 are policies based on predefined business rules. For example, certain business rules may dictate that certain files have to be stored in certain types of secured servers or storage devices. The storage hierarchy with performance attributes 211 is a database of characteristics for the various devices connected to the network. These characteristics may include the speed, size, workload, and other characteristics of the various storage devices.
The placement and migration engine takes the four inputs, 206, 209, 210, 211, and produces a work order of placement tasks 212 which determines which file should be placed in which storage device. The method for this determination is well known in the art, for example as would be normally done in a hierarchical storage manager such as IBM Tivoli Storage Manager®. The placement tasks 212 are ordered in a queue 215. The queue has a placement queue head 214 and a placement queue tail 213. The queue 215 may be reordered by the placement and migration engine based on file access events or policy changes. The reorder may be done manually by a user, for example through an administrative interface.
The placement and relocation service 216 then optimizes the queue through an analysis of its historical access log entry 207. The placement and relocation service determines when the files should be migrated to the optimal server in order to achieve optimal results. This analysis operates at some configurable or static interval (time or event based) and operates on historical file access information for analysis. During analysis, it is determined when the file has historically been accessed (particular days of month, quarter, year, interval between accesses and length of access). For example, the analysis may determine when to migrate the information to another storage device by taking into account the transfer rate (TR) from the various storage devices and the file sizes (S) of the data to be moved. For example, (1/TR)*S may be used to determine how long the file would take to transfer. This calculation, along with the day, month, quarter, year, etc based on historical access, would determine the time and date the file was to be speculatively moved.
Based upon that analysis, database triggers, not shown in the
When the file migration is triggered, for example because the schedule date has been triggered, the placement tasks are acted upon, 217. Data blocks referenced in the inode are moved to the new storage location. Inode entries and historical access log entries are updated to take into the account the new storage locations.
In one embodiment of the present invention, when I/O operations (i.e. read and write) are done on a file, a thread or some other form of asynchronous worker process is started or spawned to update a data management system. The thread may update historical access entries with access information, where the thread may either time out waiting for additional accesses or exit to be started again in a subsequent access. The use of threads allows the data management system to run potentially in parallel. This data management system may be a component external to the normal file system, or part of the file system on which this file resides. The data management system, at configurable intervals or continuously, analyzes the historical I/O operation metadata for files and instructs the data management system to speculatively (e.g. a close point in time before actual file access) migrate or copy the file data in an optimal fashion for future access. The metadata management system may also chose to migrate associated data as configured by policy, or it may signal some external component such as a subscribed access control component (for example, a security manager for credentials or certificates), such that the subscribed access control component may use the metadata analysis results to optimally move or copy other data that is associated with this speculative access. For example a security component may wish to cache credentials or data used to validate credentials on a particular server. It should be noted that if data or associated data is copied rather than moved, copy on write semantics should be used to preserve data integrity.
In one embodiment, the present invention includes a database in the data management system, for example DB2 with Hierarchical Storage Manager (HSM) or Tivoli Storage Manager (TSM) by IBM. This database may contain fields and records associated with the file for historical access items such as time of historical accesses, type of access, length of access, security access control information, etc. The HSM or TSM may retrieve the files needed, for example at month's end, and place it from one storage device to another before the projected date of the need. This movement of data may be spread out over time during the scheduled day, filling in idle time in the HSM/TSM and storage device workload.
As will be appreciated by one skilled in the art, the present invention may be embodied as a system, method or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer usable program code embodied in the medium.
One example of a computer program product incorporating one or more aspects of an embodiment of the present invention is described with reference to
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.