Method and Data Processing System For Managing A Mass Storage System

Information

  • Patent Application
  • 20070180001
  • Publication Number
    20070180001
  • Date Filed
    November 22, 2006
    17 years ago
  • Date Published
    August 02, 2007
    16 years ago
Abstract
A method for managing a mass storage system wherein the mass storage system comprises a first storage space and a second storage space. A file index is generated which lists in a uniformly distributed way each file along with a first characteristic quantity, a second characteristic quantity, and a status information, wherein the status information specifies if the file is held on the first storage space or on the second storage space. A sample of files is selected from the file index. The sample of files contains a given number of files, wherein the status information of each file of the given number of files specifies the file to be held on the first storage space. The first critical value is determined by use of the first characteristic quantity of each file comprised in the sample of files and a second critical value is determined by use of the second characteristic quantity of each file of the sample of files. Then, a first subset of files comprising each file for which the first characteristic quantity is larger than the first critical value and for which the second characteristic quantity is larger than the second critical value and which is specified by the status information to be held on the first storage device is determined.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

In the following, preferred embodiments of the invention will be described in greater detail by way of example only making reference to the drawings in which:



FIG. 1 shows a block diagram of a computer system comprising a mass storage system,



FIG. 2 depicts a flow diagram illustrating the basic steps performed by the method in accordance with an embodiment of the invention,



FIG. 3 depicts a typical distribution of files with the same age within a large file system,



FIG. 4 shows a typical distribution of the number of files over file sizes,



FIG. 5 illustrates graphically the criteria for determining the first, second, third, and fourth subset of files.


Claims
  • 1. A method for managing a mass storage system, said mass storage system comprising a first storage space and a second storage space, said method comprising: generating a file index, said file index listing in a uniformly distributed way each file along with a first characteristic quantity, a second characteristic quantity, and a status information, said status information specifying if said file is held on said first storage space or on said second storage space;selecting a sample of files from said file index, said sample of files containing a given number of files, wherein said status information of each file of said given number of files specifying said file to be held on said first storage space;determining a first critical value by use of said first characteristic quantity of each file of said sample of files;determining a second critical value by use of the second characteristic quantity of each file of said sample of files; anddetermining a first subset of files comprising each file for which said first characteristic quantity is larger than said first critical value and for which said second characteristic quantity is larger than said second critical value and which is specific by said status information to be held by said first storage space.
  • 2. The method according to claim 1, wherein said method further comprises: determining a second subset of files comprising each file for which said first characteristic quantity is smaller than said first critical value but larger than a first threshold value, and for which said second characteristic quantity is larger than said second critical value and which is specific by said status information to be held by said first storage space;determining a third subset of files comprising each file for which said first characteristic quantity is larger than said first critical value and for which said second characteristic quantity is smaller than said second critical value but larger than a second threshold value and which is specific by said status information to be held by said first storage space; anddetermining a fourth subset of files comprising each file for which said first characteristic quantity is smaller than said first critical value but larger than said first threshold value and for which said second characteristic quantity is smaller than said second threshold value but larger than said second threshold value.
  • 3. The method according to claim 1, said method further comprising: moving a given number of files of said first subset of files from said first storage space to said second storage space if more than said given number of files are contained in said first subset of files;moving all files of said first subset of files from said first storage space to said second storage space if less than said given number of files are contained in said first subset of space and moving the remaining number of files of said second subset of files or of said third subset or of said fourth subset of files so that in total said given number of files is moved from said first storage space to said second storage space; andupdating said first subset of files, said second subset of files, said third subset of files, and said fourth subset of files.
  • 4. The method according to claim 3, wherein said first subset of files, said second subset of files, said third subset of files, and said fourth subset of files are determined dynamically before said given number of files is moved from said first storage space to said second storage space, and wherein files from said fourth subset of files are only moved if not sufficient files are contained in the first subset of state, in the second subset of states, and in the third subset of states.
  • 5. The method according to claim 1, wherein said file index is regenerated depending on the number of new files added to the mass storage system or depending on the number of files contained in the first, second, third or fourth subset of files.
  • 6. The method according to claim 1, wherein said first characteristic quantity specifies the age of a file and wherein said second characteristic quantity specifies the size of a file.
  • 7. The method according to claim 1, wherein said first critical value is determined by calculating the average age of the files contained in said sample of files, and wherein said second critical value is determined by calculating the average space occupied by the files contained in said sample of files.
  • 8. The method according to claim 1, wherein said mass storage system is a hierarchical storage management system, wherein said first storage space is provided by a tier one storage device, and wherein said second storage space is provided by a tier two storage device.
  • 9. The method according to claim 1, wherein said first storage space and said second store space is provided by one storage device or wherein said first storage space and said second storage space is provided by two separate storage devices.
  • 10. The method according to claim 1, wherein said file index is generated by use of a hash algorithm, wherein said hash algorithm is used for storing and retrieving the attributes and the information status of each file held by said first or said second storage space in said file index, and wherein said first and said second characteristic quantities are comprised in the attributes of each file.
  • 11. A computer program product comprising computer executable instructions for causing a computer to perform a method for managing a mass storage system, wherein said mass storage system comprises a first storage space and a second storage space, the method comprising the steps of: generating a file index, said file index listing in a uniformly distributed way each file along with a first characteristic quantity, a second characteristic quantity, and a status information, said status information specifying if said file is held on said first storage space or on said second storage space;selecting a sample of files from said file index, said sample of files containing a given number of files, wherein said status information of each file of said given number of files specifying said file to be held on said first storage space;determining a first critical value by use of said first characteristic quantity of each file of said sample of files;determining a second critical value by use of the second characteristic quantity of each file of said sample of files; anddetermining a first subset of files comprising each file for which said first characteristic quantity is larger than said first critical value and for which said second characteristic quantity is larger than said second critical value and which is specific by said status information to be held by said first storage space.
  • 12. A data processing system for managing a mass storage system, said mass storage system comprising a first storage space and a second storage space, said data processing system comprising: means for generating a file index, said file index listing in a uniformly distributed way each file along with a first characteristic quantity, a second characteristic quantity, and a status information, said status information specifying if said file is held on said first storage space or on said second storage space;means for selecting a sample of files from said file index, said sample of files containing a given number of files, wherein said status information of each file of said given number of files specifying said file to be held on said first storage space;means for determining a first critical value by use of said first characteristic quantity of each file of said sample of files;means for determining a second critical value by use of the second characteristic quantity of each file of said sample of files; andmeans for determining a first subset of files comprising each file for which said first characteristic quantity is larger than said first critical value and for which said second characteristic quantity is larger than said second critical value and which is specific by said status information to be held by said first storage space.
  • 13. The data processing system according to claim 12, wherein said data processing system further comprises: means for determining a second subset of files comprising each file for which said first characteristic quantity is smaller than said first critical value but larger than a first threshold value, and for which said second characteristic quantity is larger than said second critical value and which is specific by said status information to be held by said first storage space;means for determining a third subset of files comprising each file for which said first characteristic quantity is larger than said first critical value and for which said second characteristic quantity is smaller than said second critical value but larger than a second threshold value and which is specific by said status information to be held by said first storage space; andmeans for determining a fourth subset of files comprising each file for which said first characteristic quantity is smaller than said first critical value but larger than said first threshold value and for which said second characteristic quantity is smaller than said second threshold value but larger than said second threshold value.
  • 14. The data processing system according to claim 12, said data processing system further comprising: means for moving a given number of files of said first subset of files from said first storage space to said second storage space if more than said given number of files are contained in said first subset of files;means for moving all files of said first subset of files from said first storage space to said second storage space if less than said given number of files are contained in said first subset of space and moving the remaining number of files of said second subset of files or of said third subset or of said fourth subset of files so that in total said given number of files is moved from said first storage space to said second storage space; andmeans for updating said first subset of files, said second subset of files, said third subset of files, and said fourth subset of files.
  • 15. The data processing system according to claim 13, wherein said first subset of files, said second subset of files, and said third subset of files are determined dynamically before said given number of files is moved from said first storage space to said second storage space.
  • 16. The data processing system according to claim 12, wherein said first characteristic quantity specifies the age of a file and wherein said second characteristic quantity specifies the size of a file.
  • 17. The data processing system according to claim 12, wherein said first critical value is determined by calculating the average age of the files contained in said sample of files, and wherein said second critical value is determined by calculating the average space occupied by the files contained in said sample of files.
  • 18. The data processing system according to claim 12, wherein said mass storage system is a hierarchical storage management system, wherein said first storage space is provided by a tier one storage device, and wherein said second storage space is provided by a tier two storage device.
  • 19. The data processing system according to claim 12, wherein said first storage space and said second store space is provided by one storage device or wherein said first storage space and said second storage space is provided by two separate storage devices.
  • 20. The data processing system according to claim 12, wherein said file index is generated by use of a hash algorithm, wherein said hash algorithm is used for storing and retrieving the attributes and the information status of each file held by said first or said second storage space in said file index, and wherein said first and said second characteristic quantities are comprised in the attributes of each file.
Priority Claims (1)
Number Date Country Kind
06100012.1 Feb 2006 EP regional