1. Field of the Invention
This invention relates in general to file systems for computer systems, and more particularly to a method, apparatus and program storage device for providing a centralized policy based preallocation in a distributed file system.
2. Description of Related Art
A computer operating system may represent a collection of computer programs or routines which control the execution of application programs and that may provide services such as resource allocation, scheduling, input/output control, and data management. Most operating systems store logical units of data in files, and files are typically grouped in logical units of folders. Computer systems often process large quantities of information, including application data and executable code configured to process such data. In numerous embodiments, computer systems provide various types of mass storage devices configured to store data, such as magnetic and optical disk drives, tape drives, etc.
To provide a regular and systematic interface through which to access their stored data, such storage devices are frequently organized into hierarchies of files by software such as an operating system. Often a file defines a minimum level of data granularity that a user can manipulate within a storage device, although various applications and operating system processes may operate on data within a file at a lower level of granularity than the entire file.
Most file systems have not only files, but also data about the files in the file system. This data typically includes time of creation, time of last access, time of last write, time of last change, file characteristics (e.g., read-only, system file, hidden file, archive file, control file), and allocation size.
Storage area networks (SANs) enable the sharing of storage resources across one or more enterprises. But for many companies, information resources are spread over a variety of storage and server environments, often using products from different vendors. The result can be a multitude of file systems that need to be managed individually, which can increase complexity and costs, limit growth and increase operational risk. Many companies require a variety of skilled resources and find it difficult to implement consistent policies for file and database management. File and data administration tasks often impact application availability, leading to poor utilization of storage resources, high costs and reduced business efficiency.
Many applications have a preferred file size or allocation pattern for specific classes of files. In the relational database world, data files often have a uniform file size, for example the initial size may be 2 GB, and then the file size grows in some well-defined increment beyond that. During the file's initialization, the application will reserve the required amount of file system disk space by extending the file at some constant size. Many digital media applications write files of many megabytes in size from beginning to the end in one continuous sequence of writes. These examples require many calls into the file system.
Several file systems provide the capability for an application to call a special API to indicate a recently created file should be reserved a specific amount of disk space. Generally a second API is available that allows the application to extend a file by a specified number of blocks. However, this requires changes to an application that would like to take advantage of such a feature. For example, support to make API calls and/or additional commands for extending a file size must be integrated into an application to be able to utilize such features. Furthermore, once the allocation size is compiled into an application it is either static and can't be changed or requires tuning for each instance of that application.
Most file systems allow configuration of allocation behavior at the level of whole file systems. In other words, each file written within a single traditional file system instance will have the same allocation behavior. This limitation can force users to put files into different file systems as dictated by the desired allocation behavior, which greatly complicates administration.
Some file systems allow configuration of allocation behavior according to the physical or virtual storage device used to store the file. In other words, each file written to a given storage device will have the same allocation behavior. This limitation can force users to put files onto different storage devices as dictated by the desired allocation behavior, which greatly complicates administration.
It can be seen that there is a need for a method, apparatus and program storage device for providing a centralized policy based preallocation in a distributed file system.
To overcome the limitations in the prior art described above, and to overcome other limitations that will become apparent upon reading and understanding the present specification, the present invention discloses a method, apparatus and program storage device for providing a centralized policy based preallocation in a distributed file system.
The present invention solves the above-described problems by providing policy rules that provide for the specification of a preallocation and an extend size for files in a computer system. An administrator specifies various pre-allocation and extends sizes for sets of files as defined in the set of policy rules. Policy rules used in this manner can take into account the unique situation in which a file is being created or extended.
A policy database for providing policy-based preallocation in a file system in accordance with the principles of the present invention includes at least one rule set comprising at least one rule for specifying a preallocation size for a file being created.
In another embodiment of the present invention, a method for controlling files in a file system is provided. The method includes detecting a file event, determining whether file meets a predetermined criterion and setting a file parameter according to a policy rule when the file meets a predetermined criterion.
In another embodiment of the present invention, a program storage device that includes program instructions executable by a processing device to perform operations for controlling files in a file system is provided. The operations includes detecting a file event, determining whether file meets a predetermined criterion and setting a file parameter according to a policy rule when the file meets a predetermined criterion.
In another embodiment of the present invention, a computer is provided. The computer includes memory for storing data and program instructions and a processor, coupled to the memory, the processor being configured to detect a file event, determine whether file meets a predetermined criterion and set a file parameter according to a policy rule when the file meets a predetermined criterion.
These and various other advantages and features of novelty which characterize the invention are pointed out with particularity in the claims annexed hereto and form a part hereof. However, for a better understanding of the invention, its advantages, and the objects obtained by its use, reference should be made to the drawings which form a further part hereof, and to accompanying descriptive matter, in which there are illustrated and described specific examples of an apparatus in accordance with the invention.
Referring now to the drawings in which like reference numbers represent corresponding parts throughout:
In the following description of the embodiments, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration the specific embodiments in which the invention may be practiced. It is to be understood that other embodiments may be utilized because structural changes may be made without departing from the scope of the present invention.
The present invention provides a method, apparatus and program storage device for providing a centralized policy based preallocation in a distributed file system. Policy rules are used to specify a pre-allocation and an extend size for files in a computer system. An administrator specifies various pre-allocation and extend sizes for sets of files as defined in the set of policy rules. Policy rules used in this manner can take into account the unique situation in which a file is being created or extended.
The client computers 120, 122 are also connected to a third server computer 150. The third server computer 150 may use a third operating system having an associated file system 156 for use with third storage array 155. Each of the first, second, and third storage arrays 135, 145, 155 may include a plurality of storage devices or may be a single storage device. Storage device as used herein may encompass hard disk drives, tape drives, solid-state memory devices, or other types of storage devices. The third operating system may be different from the first and second operating systems or may be the same operating system.
The client computer 120 may be a personal computer, a workstation, a handheld computer, etc. The server computers 130, 140, 150 may be personal computers, workstations, minicomputers, or mainframes. The client computer 120 and the server computers 130, 140, 150 may be bi-directionally coupled to the communications networks 110 over communications lines, via wireless systems, or any combination thereof. For example, client computers 120, 122 and the server computers 130, 140, 150 may be coupled to one another by various private networks, public networks or any combination thereof, including local-area networks (LANs), wide-area networks (WANs), or the Internet. Those skilled in the art will recognize many modifications may be made to this configuration without departing from the scope of the present invention.
A virtual storage volume can be created in accordance with the present invention, for example, by a user at client computers 120, 122 creating a file in the user's home directory, or loading data from a tape or other storage device to a file in the user's home directory. Conventionally, files located in the user's home directory are conventionally subject to a number of constraints, including, but not limited to, the maximum file size, the maximum file-system size, and the size of the user's local hard disk drive.
By managing file details (via the meta-data controller) on the storage network instead of in individual servers, the file system for the storage area network 220 of the present invention moves the file system intelligence into the storage network where it can be available to all application servers. Doing so provides a single namespace and a single point of management. This eliminates the need to manage files on a server-by-server basis. The file system for the storage area network 220 automates routine and error-prone tasks, such as file placement, and handles out of space conditions. The file system for the storage area network 220 also allows true heterogeneous file sharing, wherein the reader and writer of the exact same data can run different operating systems.
The SAN file system 302 consists of a small module of enablement code that runs on application servers 310-316 and meta-data controller 320. The features of the SAN file system 302 work together to provide a variety of benefits to customers. One of the major benefits is a single image or global namespace. This function shields the end user from storage network complexity, and dramatically reduces administrative workload via an administration client 350. Since the SAN file system 302 is designed to be implemented on a variety of operating systems, e.g., Windows to various flavors of Linux and UNIX, the SAN file system 302 will allow all of these operating systems to share files. For example, a file created in Windows will be as accessible from a Windows client 316 as it is from Solaris 312, AIX 310, or any other supported platform, and vice versa. However, an application is still required to be able to read that file, no matter how accessible it is.
Since the SAN file system 302 will have a complete understanding of all files on the SAN 300, including the essential meta-data related to each file to make important decisions, the SAN file system 302 is a logical point to manage the storage on the network through policy-based controls. For example, the SAN file system 302 can decide where to place each file based on user-defined criteria, such as file type, using policy-based automation. Setting these policies relieves the storage administrator of the burden of repetitive tasks, and forms the basis of automation.
The SAN file system 302 provides the ability to group storage devices according to their characteristics, such as latency and throughput. These groupings, called storage pools 330, allow administrators to manage data according to predetermined characteristics. Because the meta-data for the SAN file system 302 is separate from the application data, files can be manipulated while remaining active. When files are removed from service, the SAN file system 302 will automatically reallocate the space without disruption. If a LUN is removed from the control of the SAN file system 302, the data on that LUN is automatically moved. Accordingly, the SAN file system 302 is designed to provide policy-based storage automation capabilities for provisioning and data placement, non-disruptive data migration, and a single point of management for files on a storage network.
While the meta-data controller 320 is shown in
An administrator may direct which files got stored on which class of storage based on a file's file name, file set, owner, or timestamp when the file was created. The policy rules 400 are specified as a set of statements. An applicability cache may be maintained to reduce the performance impact on file creation and extension in file sets that have no policy rules. A query evaluator may be utilized to evaluate matches locally on each SAN file system subordinate node.
Accordingly, the example of a set of policy rules 410, 412, 414 are expanded to include the ability to specify a pre-allocation 440 and extend size 442 for specific files. If a file matches a rule with an EXTEND qualifier, the value is stored with the object's metadata to reduce the performance impact at block allocation time. An example of a rule for a “datafile” and a rule for a “logfile” is given by:
Constants for the pre-allocation and extend can be dynamically updated on the fly by editing the policy set and re-activating the policy. The set of terms that can be considered in a rule's precondition is very broad. However, those skilled in the art will recognize that embodiments of the present invention could use any information available to the file system within the context when a file is created or allocation is extended.
For the purposes of this description, a computer-usable or computer readable medium 768 can be any apparatus that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. The medium 768 may be an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. Examples of a computer-readable medium include a semiconductor or solid-state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk. Current examples of optical disks include compact disk—read only memory (CD-ROM), compact disk—read/write (CD-R/W) and DVD.
A system suitable for storing and/or executing program code will include at least one processor 796 coupled directly or indirectly to memory elements 792 through a system bus 720. The memory elements 792 can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code in order to reduce the number of times code must be retrieved from bulk storage during execution.
Input/output or I/O devices 740 (including but not limited to keyboards, displays, pointing devices, etc.) can be coupled to the system either directly to the system or through intervening I/O controllers.
Network adapters 750 may also be coupled to the system to enable the system to become coupled to other data processing systems 752, remote printers 754 or storage devices 756 through intervening private or public networks 760. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
Accordingly, the computer program 790 comprise instructions which, when read and executed by the system 700 of
The foregoing description of the embodiment of the invention has been presented for the purposes of illustration and description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. It is intended that the scope of the invention be limited not with this detailed description, but rather by the claims appended hereto.