This application claims priority from Japanese Patent Application No. 2022-027166, filed on Feb. 24, 2022, the entire disclosure of which is incorporated herein by reference.
The present disclosure relates to an information processing apparatus, an information processing method, and an information processing program.
JP2021-117772A discloses a technique of generating parity data from a plurality of pieces of data and recording the plurality of pieces of data and the parity data on magnetic tapes different from each other.
JP2015-143952A discloses a technique of making a plurality of pieces of data the same size in a case of generating parity data from the plurality of pieces of data by adding dummy data to data other than data having a maximum size among the plurality of pieces of data having sizes different from each other.
It is preferable to reduce a difference in a time required for recording data on each of a plurality of magnetic tapes in a case where a plurality of pieces of data and parity data generated from the plurality of pieces of data are recorded on the plurality of magnetic tapes. The techniques described in JP2021-117772A and JP2015-143952A do not consider a time required for recording data on each of a plurality of magnetic tapes in a case where a plurality of pieces of data and parity data generated from the plurality of pieces of data are recorded on the plurality of magnetic tapes.
The present disclosure has been made in view of the above circumstances, and an object of the present disclosure is to provide an information processing apparatus, an information processing method, and an information processing program capable of reducing a difference in a time required for recording data on each of a plurality of magnetic tapes in a case where a plurality of pieces of data and parity data generated from the plurality of pieces of data are recorded on the plurality of magnetic tapes.
According to an aspect of the present disclosure, there is provided an information processing apparatus that performs a control of recording a plurality of pieces of data on a plurality of magnetic tapes, the apparatus including: at least one processor, in which the processor is configured to execute processing of making sizes of the plurality of pieces of data the same by adding dummy data to data other than data having a maximum size among the plurality of pieces of data, generate parity data based on the plurality of pieces of data to which the dummy data is added, and select the magnetic tapes as recording destinations of the plurality of pieces of data and the parity data from the plurality of magnetic tapes such that a difference in a size of at least one of the dummy data or the parity data to be recorded on each of the plurality of magnetic tapes is minimized.
In the information processing apparatus according to the aspect of the present disclosure, the processor may be configured to select the magnetic tapes as the recording destinations of the plurality of pieces of data and the parity data from the plurality of magnetic tapes such that a difference in a size of the parity data to be recorded on each of the plurality of magnetic tapes is minimized.
Further, in the information processing apparatus according to the aspect of the present disclosure, a plurality of combinations of the plurality of pieces of data and the parity data may exist, and the processor may be configured to select the magnetic tapes different from each other as recording destinations of each of the plurality of pieces of data and the parity data in the same set, and select the magnetic tapes as recording destinations of each of pieces of the parity data such that a difference in a total size of the pieces of the parity data to be recorded on each of the plurality of magnetic tapes is minimized.
Further, in the information processing apparatus according to the aspect of the present disclosure, the processor may be configured to perform a control of recording only the data among the data and the dummy data added to the data in a case of performing the control of recording the data on the selected magnetic tapes.
Further, in the information processing apparatus according to the aspect of the present disclosure, the processor may be configured to select the magnetic tapes as the recording destinations of the plurality of pieces of data and the parity data from the plurality of magnetic tapes such that a difference in a size of the dummy data to be recorded on each of the plurality of magnetic tapes is minimized.
Further, in the information processing apparatus according to the aspect of the present disclosure, a plurality of combinations of the plurality of pieces of data and the parity data may exist, and the processor may be configured to select the magnetic tapes different from each other as recording destinations of each of the plurality of pieces of data and the parity data in the same set, and select the magnetic tapes as recording destinations of each of the plurality of pieces of data and the parity data from the plurality of magnetic tapes such that a difference in a total size of pieces of the dummy data to be recorded on each of the plurality of magnetic tapes is minimized.
Further, in the information processing apparatus according to the aspect of the present disclosure, the data to be recorded on the magnetic tapes may be a packed object obtained by grouping a plurality of objects including data and metadata related to the data.
Further, according to another aspect of the present disclosure, there is provided an information processing method executed by a processor of an information processing apparatus that includes at least one processor and performs a control of recording a plurality of pieces of data on a plurality of magnetic tapes, the method including: executing processing of making sizes of the plurality of pieces of data the same by adding dummy data to data other than data having a maximum size among the plurality of pieces of data; generating parity data based on the plurality of pieces of data to which the dummy data is added; and selecting the magnetic tapes as recording destinations of the plurality of pieces of data and the parity data from the plurality of magnetic tapes such that a difference in a size of at least one of the dummy data or the parity data to be recorded on each of the plurality of magnetic tapes is minimized.
Further, according to still another aspect of the present disclosure, there is provided an information processing program for an information processing apparatus including at least one processor and performing a control of recording a plurality of pieces of data on a plurality of magnetic tapes, the information processing program for causing the processor to execute a process including: executing processing of making sizes of the plurality of pieces of data the same by adding dummy data to data other than data having a maximum size among the plurality of pieces of data; generating parity data based on the plurality of pieces of data to which the dummy data is added; and selecting the magnetic tapes as recording destinations of the plurality of pieces of data and the parity data from the plurality of magnetic tapes such that a difference in a size of at least one of the dummy data or the parity data to be recorded on each of the plurality of magnetic tapes is minimized.
According to the present disclosure, it is possible to reduce a difference in a time required for recording data on each of a plurality of magnetic tapes in a case where a plurality of pieces of data and parity data generated from the plurality of pieces of data are recorded on the plurality of magnetic tapes.
Hereinafter, an example of an embodiment for performing a technique according to the present disclosure will be described in detail with reference to the drawings.
First, a configuration of an information processing system 10 according to the present embodiment will be described with reference to
The tape library 14 includes a plurality of slots (not illustrated) and a plurality of tape drives 18, and each slot includes a magnetic tape T as an example of a recording medium. Each tape drive 18 is connected to the information processing apparatus 12. The tape drive 18 writes or reads data to or from the magnetic tape T under a control of the information processing apparatus 12. Examples of the magnetic tape T include a linear tape-open (LTO) tape.
In a case where the information processing apparatus 12 writes or reads data to or from the magnetic tape T, the magnetic tape T as a write target or a read target is loaded from the slot into a predetermined tape drive 18. In a case where data is written or read to and from the magnetic tape T loaded into the tape drive 18, the magnetic tape T is unloaded from the tape drive 18 into the slot in which the magnetic tape T is originally included.
In the present embodiment, as illustrated in
In addition, in the present embodiment, as illustrated in
Examples of the packing rule include a rule for grouping a plurality of objects including pieces of data having the same extension into the same packed object and a rule for grouping a plurality of objects that are likely to be read at the same time into the same packed object. In addition, examples of the packing rule include a rule for grouping a plurality of objects into one packed object such that a size of one packed object is equal to or larger than a predetermined lower limit value and is smaller than a predetermined upper limit value. In addition, examples of the packing rule include a rule for grouping a plurality of objects into one packed object such that the number of objects included in one packed object is equal to or larger than a predetermined lower limit value and is smaller than a predetermined upper limit value. In addition, a plurality of packing rules may be combined.
Next, a hardware configuration of the information processing apparatus 12 according to the present embodiment will be described with reference to
The storage unit 22 is realized by a hard disk drive (HDD), a solid state drive (SSD), a flash memory, or the like. An information processing program 30 is stored in the storage unit 22 as a storage medium. The CPU 20 reads the information processing program 30 from the storage unit 22, develops the read information processing program 30 in the memory 21, and executes the developed information processing program 30.
On the other hand, the information processing apparatus 12 according to the present embodiment receives the data and the metadata transmitted from an external apparatus such as a user terminal. For transmission of the data from the external apparatus, for example, a hypertext transfer protocol (HTTP) application programming interface (API) provided from the information processing system 10 is used.
In this case, for example, the data to be transmitted by a user is included in a body portion of HTTP, and the metadata related to the data is included in a header portion of HTTP. The information processing apparatus 12 stores the object associated with the received data and the received metadata in the storage unit 22. The information processing apparatus 12 has a function of performing a control of generating a packed object based on the plurality of objects stored in the storage unit 22 and recording a plurality of generated packed objects on the plurality of magnetic tapes T.
Next, a functional configuration of the information processing apparatus 12 according to the present embodiment will be described with reference to
The first generation unit 40 generates a packed object obtained by grouping the plurality of objects stored in the storage unit 22 according to a packing rule, and stores the generated packed object in the storage unit 22. The first generation unit 40 sequentially generates the packed objects, and the plurality of packed objects are stored in the storage unit 22.
The execution unit 42 performs processing of making the sizes of the plurality of packed objects the same size by adding dummy data to the packed objects other than the packed object having a maximum size among the plurality of packed objects stored in the storage unit 22 (hereinafter, referred to as “dummy data addition processing”). Examples of the dummy data include data padded with 0, data padded with 1, and the like. In addition, the dummy data may be data represented by a bit string in which 0s and 1s are arranged according to a specific rule, such as data in which 0s and is are alternately repeated.
In the present embodiment, as illustrated in
The second generation unit 44 generates parity data based on the plurality of packed objects to which the dummy data is added by the execution unit 42. In the present embodiment, as illustrated in
As illustrated in
As illustrated in
For example, the selection unit 46 can select the magnetic tapes T as recording destinations of each of the packed objects and each of the pieces of the parity data by solving a combinatorial optimization problem based on constraints such as (1) and (2) and an objective function such as (3).
In this way, the magnetic tapes T different from each other are set as the recording destinations of each of the three packed objects and the parity data which are included in the same set. Thereby, even in a case where one magnetic tape T among four magnetic tapes T is lost, it is possible to recover the packed object.
The controller 48 performs a control of recording each of the three packed objects and the parity data, which are included in each of the plurality of sets, on the magnetic tapes T selected by the selection unit 46. In a case where a total value of the number of the packed objects and the number of the parity data in one set is set to be equal to or smaller than the number of the tape drives 18 that can be used at one time, the control can be performed in parallel.
In the present embodiment, in a case of performing the control, the controller 48 performs a control of recording only the packed objects among the packed objects and pieces of the dummy data added to the packed objects. Thereby, it is possible to prevent a decrease in available capacity of the magnetic tape T due to recording of the dummy data on the magnetic tape T.
In this case, in a case where one object of the three packed objects is lost, dummy data is added to the remaining two packed objects such that the sizes of the remaining two packed objects match the size of the parity data. Thereby, it is possible to recover the lost one packed object from the two packed objects to which the dummy data is added and the parity data.
In the present embodiment, in the case of performing the control, the controller 48 may also perform a control of recording the dummy data on the magnetic tape T. In this case, the dummy data addition processing in a case of recovering the packed object is not required.
Next, an operation of the information processing apparatus 12 according to the present embodiment will be described with reference to
In step S10 of
In step S14, as described above, the second generation unit 44 generates the parity data based on the plurality of packed objects to which the dummy data is added in step S12. In step S16, as described above, the selection unit 46 selects magnetic tapes T different from each other as recording destinations of each of the three packed objects and the parity data which are included in the same set. Further, as described above, the selection unit 46 selects magnetic tapes T as recording destinations of each of the three packed objects and the parity data from the plurality of magnetic tapes T such that a difference in sizes of the pieces of the parity data to be recorded on each of the plurality of magnetic tapes T is minimized.
In step S18, as described above, the controller 48 performs a control of recording each of the three packed objects and the parity data, which are included in each of the plurality of sets, on the magnetic tapes T selected in step S16. In a case where the processing of step S18 is completed, the data recording processing is completed.
The data included in the packed object may include data having a relatively high compression ratio, such as text data. Here, the compression ratio is represented by the following Equation (1). As the compression ratio has a larger numerical value, the data is further compressed.
Compression Ratio=((Data Size before Compression−Data Size after Compression)÷Data Size before Compression)×100 (1)
On the other hand, the compression ratio of the parity data is relatively low. That is, a ratio of the size of the parity data to the size of all the pieces of the data on the magnetic tape T is relatively high. In the present embodiment, a difference in sizes of the pieces of the parity data to be recorded on each of the plurality of magnetic tapes T is set to be minimized. Therefore, it is possible to reduce a difference in a time required for recording data on each of the plurality of magnetic tapes T in a case where a plurality of pieces of data and parity data generated from the plurality of pieces of data are recorded on the plurality of magnetic tapes T.
In the present embodiment, a case where the selection unit 46 selects magnetic tapes T as recording destinations of each of the plurality of packed objects and the parity data from the plurality of magnetic tapes T such that a difference in sizes of the pieces of the parity data to be recorded on each of the plurality of magnetic tapes T is minimized has been described. On the other hand, the present invention is not limited thereto. The selection unit 46 may select magnetic tapes T as recording destinations of each of the plurality of packed objects and the parity data from the plurality of magnetic tapes T such that a difference in sizes of the pieces of the dummy data to be recorded on each of the plurality of magnetic tapes T is minimized.
Specifically, the selection unit 46 selects magnetic tapes T different from each other as recording destinations of each of the three packed objects and the parity data which are included in the same set. Further, as illustrated in
In the example, in a case of performing a control of recording each of the three packed objects and the parity data, which are included in each of the plurality of sets, on the magnetic tapes T selected by the selection unit 46, the controller 48 performs a control of recording the dummy data added to the packed object on the magnetic tape T. In the example, the dummy data is also recorded on the magnetic tape T, and thus the sizes of the parity data and each packed object including the dummy data in each set are the same size. Therefore, in a case where each packed object and the parity data in each set are recorded on the magnetic tapes T different from each other, a total size of all the data to be recorded on each magnetic tape T is the same. Here, the dummy data is data having a relatively high compression ratio, such as data padded with 0. That is, even in a case where the total size of all the data to be recorded on each magnetic tape T is the same, as a ratio of the dummy data to the total size of all the data is higher, a size of the data on the magnetic tape T (that is, a length of a portion in which the data is recorded on the magnetic tape T) is decreased. Thus, a time required for recording the data on the magnetic tape T is shortened. In the example, a difference in sizes of the pieces of the dummy data to be recorded on each of the plurality of magnetic tapes T is set to be minimized. Therefore, it is possible to reduce a difference in a time required for recording data on each of the plurality of magnetic tapes T in a case where a plurality of pieces of data and parity data generated from the plurality of pieces of data are recorded on the plurality of magnetic tapes T.
In addition, the example may be combined with the embodiment. In this case, the selection unit 46 selects magnetic tapes T as recording destinations of each of the three packed objects and the parity data from the plurality of magnetic tapes T such that a difference in sizes of the pieces of the parity data to be recorded on each of the plurality of magnetic tapes T is minimized and a difference in the size of the dummy data is minimized.
Further, in the embodiment, a case where the technique according to the present disclosure is applied to an object storage system has been described. On the other hand, the present disclosure is not limited thereto. The technique according to the present disclosure may be applied to a file storage system that handles data in file units.
Further, in the embodiment, for example, as a hardware structure of a processing unit that executes various processing, such as the first generation unit 40, the execution unit 42, the second generation unit 44, the selection unit 46, and the controller 48, the following various processors may be used. The various processors include, as described above, a CPU, which is a general-purpose processor that functions as various processing units by executing software (program), and a dedicated electric circuit, which is a processor having a circuit configuration specifically designed to execute a specific processing, such as a programmable logic device (PLD) or an application specific integrated circuit (ASIC) that is a processor of which the circuit configuration may be changed after manufacturing such as a field programmable gate array (FPGA).
One processing unit may be configured by one of these various processors, or may be configured by a combination of two or more processors of the same type or different types (for example, a combination of a plurality of FPGAs or a combination of a CPU and an FPGA). Further, the plurality of processing units may be configured by one processor.
As an example in which the plurality of processing units are configured by one processor, firstly, as represented by a computer such as a client and a server, a form in which one processor is configured by a combination of one or more CPUs and software and the processor functions as the plurality of processing units may be adopted. Secondly, as represented by a system on chip (SoC) or the like, a form in which a processor that realizes the function of the entire system including the plurality of processing units by one integrated circuit (IC) chip is used may be adopted. As described above, the various processing units are configured by using one or more various processors as a hardware structure.
Further, as the hardware structure of the various processors, more specifically, an electric circuit (circuitry) in which circuit elements such as semiconductor elements are combined may be used.
Further, in the embodiment, an example in which the information processing program 30 is stored (installed) in the storage unit 22 in advance has been described. On the other hand, the present disclosure is not limited thereto. The information processing program 30 may be provided by being recorded in a recording medium such as a compact disc read only memory (CD-ROM), a digital versatile disc read only memory (DVD-ROM), or a Universal Serial Bus (USB) memory. Further, the information processing program 30 may be downloaded from an external apparatus via a network.
Number | Date | Country | Kind |
---|---|---|---|
2022-027166 | Feb 2022 | JP | national |