The present invention is related to a data management device, a data management method, a data management program, and an information processing device.
Conventionally, there is known a document list display method performed by a document processing system including a low-speed storage device and a high-speed storage device. This method is for providing a management table for managing which storage device each document data item is stored in, and referring to the management table to check whether the document data, which is specified in a document list information request from the user, is stored in either the low-speed storage device or the high-speed storage device. Furthermore, this method is for displaying a list of requested document data in which the data is distinguished by different storage devices, and then copying the details of the document data stored in the low-speed storage device in the list of document data, into a high-speed storage device, in parallel with the process of displaying details of particular document data.
Patent Document 1: Japanese Laid-Open Patent Publication No. H7-319902
However, in the above conventional method, the document data that is specified in document list information is merely copied into the high-speed storage device each time, and therefore there are cases where the speed of outputting data is not sufficiently increased.
According to an aspect of the embodiments, a data management device includes a first storage unit configured to store data; a second storage unit configured to store data, to which access is possible at a high speed compared to the first storage unit; and a processor configured to execute a process including reading, from the first storage unit or the second storage unit, data according to an input data request, and outputting the read data, analyzing relevance between data items stored in the first storage unit or the second storage unit, based on history of data requests that have been input, and dividing, into groups, the data items stored in the first storage unit or the second storage unit based on a result of the analysis, and storing, in the second storage unit, the data items in units of the groups into which the data items have been divided.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention as claimed.
In the following, with respect to a mode for carrying out the present invention, a description is given by citing embodiments with reference to accompanying drawings.
In the following, with reference to drawings, a description is given of a data management device, a data management method, a data management program, and an information processing device according to a first embodiment of the present invention.
(Hardware Configuration)
The CPU 20 is a processor acting as an arithmetic processing unit including, for example, a program counter, an instruction decoder, various computing units, an LSU (Load Store Unit), and a general-purpose register, etc.
The memory device 30 is a storage device that may be accessed at high speed compared to the storage device 40. As a combination of the memory device 30 and the storage device 40, the following combinations of a RAM (Random Access Memory), a flash memory, a HDD (Hard Disk Drive), etc., may be used.
memory device 30: RAM—storage device 40: HDD
memory device 30: RAM—storage device 40: flash memory
memory device 30: RAM—storage device 40: tape
memory device 30: RAM—storage device 40: DVD
memory device 30: RAM—storage device 40: CD
memory device 30: RAM—storage device 40: Blu-ray (registered trademark) Disc
memory device 30: flash memory—storage device 40: HDD
memory device 30: flash memory—storage device 40: tape
memory device 30: flash memory—storage device 40: DVD
memory device 30: flash memory—storage device 40: CD
memory device 30: flash memory—storage device 40: Blu-ray Disc
The storage device 40 stores data provided by the data management device 1 in units of segments. A segment is a group of data acknowledged as having relevance based on the history of data requests, and as described below, the contents of the segment are updated by processes by the CPU 20.
In the memory device 30, for example, a segment that is highly frequently accessed, among the segments stored in the storage device 40, is copied from the storage device 40. Accordingly, the data management device 1 is able to output data at high speed in response to an input data request.
The data management device 1 includes a ROM storing BIOS, a program memory, etc., in addition to the above configuration. The program executed by the CPU 20 may be acquired via the network 60 or may be acquired by inserting a portable memory in the data management device 1.
(Functional Configuration)
The input output management unit 21 searches the memory device 30 and the storage device 40 in response to a data request input from a request source such as the client computer 70, and sends the requested data to the request source. Note that the data request is not only sent by the client computer 70; there may be cases where the entity of the a process, etc., being executed in the data management device 1, is the issue source of the data request. Furthermore, if an input output device is connected to the data management device 1, there may be cases where the user inputs a data request in the input output device.
When the data request is input, the input output management unit 21 first searches the memory device 30, and when there is data specified by the data request in the memory device 30, the input output management unit 21 reads the data from the memory device 30 and returns the data to the request source. Furthermore, when there is no data specified by the data request in the memory device 30, the input output management unit 21 searches the storage device 40, and when there is data specified by the data request in the memory device 40, the input output management unit 21 reads the data from the memory device 40 and returns the data to the request source. At this time, the input output management unit 21 copies the segment to which the read data belongs from the storage device 40 to a segment management cabinet 30A of the memory device 30.
Note that the input output management unit 21 may unconditionally copy the segment from the storage device 40 to the segment management cabinet 30A with respect to the data for which a data request has been given, or may acquire access frequencies within a certain time period and prioritize the copying of a segment having a high access frequency.
The analysis necessity determination unit 22 determines whether to cause the relevance analysis unit 23 to analyze the relevance based on a relationship between the data request and the segment. The relevance analysis unit 23 analyzes the relevance with respect to data, for which a data request input to the input output management unit 21 is added to a relevance saving cabinet 30B stored in the memory device 30 of a self-node and another node, and determines the segment based on the analysis result. The segment arrangement unit 24 updates the allocation of the segment according to the determination of the relevance analysis unit 23. Details of process contents of the analysis necessity determination unit 22, the relevance analysis unit 23, and the segment arrangement unit 24 are described below.
(Specific Example of Data Changes)
In the following, a description is given of how the data stored in the segment management cabinet 30A and the relevance saving cabinet 30B changes according to a data request input to the input output management unit 21.
Here, it is assumed that in the data request Rq input to the data management device 1, there is described information identifying the previous data requested by a previous data request by the same request source. For example, the information of the previous data may be recognized by the client computer 70 itself, or may be recognized by the data management device 1 side for each request source. In this case, the data management device 1 saves the history of data requests for each request source in one of the memory device 30, the register, etc.
The input output management unit 21 reads the data A from the segment management cabinet 30A and outputs the data A to the request source.
Furthermore, the input output management unit 21 updates the relevance saving cabinet 30B by referring to the information of the previous data. In the state of
The input output management unit 21 reads the data B from the relevance saving cabinet 30B, and sends the data B to the request source.
Furthermore, the input output management unit 21 refers to the previous data and updates the relevance saving cabinet 30B. In the state of
When the relevance saving cabinet 30B is updated, the analysis necessity determination unit 22 determines whether relevance analysis by the relevance analysis unit 23 is needed. In the state of
The input output management unit 21 reads the data C from the relevance saving cabinet 30B, and sends the data C to the request source.
Furthermore, the input output management unit 21 refers to the previous data and updates the relevance saving cabinet 30B. In the state of
When the relevance saving cabinet 30B is updated, the analysis necessity determination unit 22 determines whether relevance analysis by the relevance analysis unit 23 is needed. In the state of
For example, the relevance analysis unit 23 analyzes the relevance between data by using a graph division method.
First, the relevance analysis unit 23 reads all of the data included in both the segment to which the current data belongs and the segment to which the previous data belongs (step S100).
Next, the relevance analysis unit 23 extracts the two data items i, j from the read data, and performs the processes of steps S102 through S106 for all of the combinations of i, j (i≠j). In
The relevance analysis unit 23 counts the number Cij* of data items j in the history of the data i field stored in the relevance saving cabinet 30B of the node (10#1, 10#2) that is the current target (step S102).
Next, the relevance analysis unit 23 counts the number Cji* of data items i in the history of the data j field stored in the relevance saving cabinet 30B of the node that is the current target (step 104).
Next, the relevance analysis unit 23 adds Cij* with Cji*, and calculates an index value Cij indicating the relevance of data i and j (step S106).
When the relevance analysis unit 23 has performed the processes of steps S102 through S106 for all of i, j, the relevance analysis unit 23 sets all of the segment patterns for dividing the data number m, within a range satisfying the maximum data number (for example, three) within a segment (step S108). In the example of
Next, in a case where all segment patterns are applied, the relevance analysis unit 23 extracts all of the index values Cij of data items that have been determined to belong to different segments, and obtains the total of the extracted values (step S110).
Then, the relevance analysis unit 23 selects a segment pattern having the lowest total number of index values Cij with respect to the pairs of data items that have been determined to belong to different segments, and determines a new segment (step S112).
When the relevance analysis unit 23 determines a new segment, the segment arrangement unit 24 changes the association of the data and the segments.
In the state of
(Overall Process Flow)
First, the input output management unit 21 reads the data specified by the data request from the memory device 30 or the storage device 40, and sends the data to the request source (S200).
Next, the input output management unit 21 refers to the previous data included in the data request, and update the relevance saving cabinet 30B (S202).
When the relevance saving cabinet 30B is updated, the analysis necessity determination unit 22 determines whether relevance analysis by the relevance analysis unit 23 is needed (step S204). When it is determined that relevance analysis is not needed, the data management device 1 ends the process of this flowchart.
When it is determined that relevance analysis is needed, the relevance analysis unit 23 analyzes the relevance of the data (S206).
Next, the segment arrangement unit 24 determines whether the association of the data and the segments needs to be changed, based on the analysis result of the relevance analysis unit 23 (step S208). When it is determined that the association of the data and the segments does not need to be changed, the data management device 1 ends the process of the this flowchart.
When it is determined that the association of the data and the segments needs to be changed, the segment arrangement unit 24 changes the association of the data and the segments (step S210).
(Comparison with Reference Method)
In the following, a description is given of a comparison between the process performed by the data management device 1 according to the present embodiment and a process performed by a reference method in which the data is not managed in units of segments. Generally, the memory device 30 such as a RAM and a flash memory may be accessed at high speed compared to the storage device 40 such as a HDD, but then again the memory device 30 often has a small storage capacity. Therefore, it may be possible to perform a process of increasing the response speed, by storing the original data in the storage device 40, and storing data having a high access frequency in the memory device 30 according to need.
Conversely, in the data management device 1 according to the present embodiment, data A, B having relevance are managed as the same segment, and therefore when reading data A from the storage device 40, data A, B are stored in the memory device 30 by the whole segment.
In the present embodiment, a seek time Ts and a read time Tr are needed to output data A; however, as the segment including data A, B is copied to the memory device 30 when outputting the data A, there is no need for a time corresponding to the read time Tr when outputting data B. Therefore, the time needed for outputting both data A and data B is approximately 2Ts+Tr, and it is possible to output the requested data at a high speed compared to the reference method.
(Overview)
According to the data management device, the data management method, and the data management program of the present embodiment described above, it is possible to output the requested data at high speed.
Furthermore, according to the information processing device according to the present embodiment described above, information identifying the previous data is attached to the currently-requested data and sent to the data management device, and therefore it is possible to output the requested data to the data management device at high speed.
In the following, with reference to drawings, a description is given of a data management device, a data management method, a data management program, and an information processing device according to a second embodiment of the present invention.
(Hardware Configuration)
The node 10#1 includes, for example, a CPU 20#1, a memory device 30#1, and a storage device 40#1. Similarly, the node 10#2 includes, for example, a CPU 20#2, a memory device 30#2, and a storage device 40#2, and the node 10#n includes, for example, a CPU 20#n, a memory device 30#n, and a storage device 40#n.
In the following, when a description is given without distinguishing the nodes, the reference numeral beyond # is omitted, which is the identifier of the node.
The CPU 20 is a processor acting as an arithmetic processing unit including, for example, a program counter, an instruction decoder, various computing units, an LSU, a general-purpose register, etc.
The memory device 30 is a storage device that may be accessed at high speed compared to the storage device 40. As a combination of the memory device 30 and the storage device 40, similar to the first embodiment, combinations of a RAM, a flash memory, a HDD, etc., may be used.
The storage device 40 stores data provided by the data management device 2 in units of segments. A segment is a group of data acknowledged as having relevance based on the history of data requests, and as described below, the contents of the segment are updated by processes by the CPU 20.
In the memory device 30, for example, a segment that is highly frequently accessed, among the segments stored in the storage device 40, is copied from the storage device 40. Accordingly, the data management device 2 is able to output data at high speed in response to an input data request.
The node 10 includes a NIC (Network Interface Card) for communicating with the physical switch 50, a ROM storing BIOS (Basic Input/Output System), a program memory, etc., in addition to the above configuration. The program executed by the CPU 20 may be acquired via the network 60 or may be acquired by inserting a portable memory in the data management device 2.
(Functional Configuration)
The input output management unit 21 searches the memory device 30 and the storage device 40 in response to a data request input from the physical switch 50, and outputs the requested data to the physical switch 50. The data request is, for example sent from the client computer 70 to the data management device 2. Note that the data request is not only sent by the client computer 70; there may be cases where the entity of the a process, etc., being executed in the node 10, is the issue source of the data request. Furthermore, if an input output device is connected to the data management device 2, there may be cases where the user inputs a data request in the input output device.
When the client computer 70 sends a data request to the data management device 2, for example, the physical switch 50 transfers the data request to each of the nodes 10 by broadcast transmission. The input output management unit 21 of each node 10 first searches the memory device 30, and when there is data corresponding to the data request in the memory device 30, the input output management unit 21 reads the data from the memory device 30 and returns the data to the physical switch 50. Furthermore, when there is no data corresponding to the data request in the memory device 30, the input output management unit 21 searches the storage device 40, and when there is data corresponding to the data request in the storage device 40, the input output management unit 21 reads the data from the storage device 40 and returns the data to the physical switch 50. The physical switch 50 transfers the data, which is received from each of the nodes, to the client computer 70. At this time, the input output management unit 21 copies the segment to which the read data belongs from the storage device 40 to the segment management cabinet 30A of the memory device 30.
Note that the input output management unit 21 may unconditionally copy the segment from the storage device 40 to the segment management cabinet 30A with respect to the data for which a data request has been given, or may acquire access frequencies within a certain time period and prioritize the copying of a segment having a high access frequency.
The analysis necessity determination unit 22 determines whether to cause the relevance analysis unit 23 to analyze the relevance based on a relationship between the data request and the segment. The relevance analysis unit 23 analyzes the relevance with respect to data, for which a data request input to the input output management unit 21 is added to a relevance saving cabinet 30B stored in the memory device 30 of a self-node and another node, and determines the segment based on the analysis result. The segment arrangement unit 24 updates the allocation of the segment according to the determination of the relevance analysis unit 23. Details of process contents of the analysis necessity determination unit 22, the relevance analysis unit 23, and the segment arrangement unit are described below.
(Specific Example of Data Changes)
In the following, a description is given of how the data stored in the segment management cabinet 30A and the relevance saving cabinet 30B changes according to a data request input to the input output management unit 21.
In the state of
Here, it is assumed that in the data request Rq input to the data management device 2, there is described information identifying the previous data (in the above case, “None”, i.e., does not exist) requested by a previous data request by the same request source. For example, the information of the previous data may be recognized by the client computer 70 itself, or the physical switch 50 may identify the request source from a port number, an IP address, etc., and the information stored in an internal storage device may be given.
Furthermore, the information of the previous data may be recognized by the data management device 2 side for each request source. In this case, the data management device 2 saves the history of data requests for each request source in one of the memory device 30, the register, etc.
The input output management unit 21#1 reads the data A from the segment management cabinet 30A (#1) and outputs the data A to the physical switch 50.
Furthermore, the input output management unit 21#1 updates the relevance saving cabinet 30B (#1) by referring to the previous data. In the state of
The input output management unit 21#1 reads the data B from the segment management cabinet 30A (#1), and outputs the data B to the physical switch 50.
Furthermore, the input output management unit 21#1 refers to the previous data and updates the memory device 30 (#1). In the state of
When the relevance saving cabinet 30B (#1) is updated, the analysis necessity determination unit 22#1 determines whether relevance analysis by the relevance analysis unit 23#1 is needed. In the state of
The input output management unit 21#2 reads the data C from the relevance saving cabinet 30B (#2), and outputs the data C to the physical switch 50.
Furthermore, the input output management unit 21#2 refers to the previous data and updates the relevance saving cabinet 30B (#2). In the state of
When the relevance saving cabinet 30B (#2) is updated, the analysis necessity determination unit 22#2 determines whether relevance analysis by the relevance analysis unit 23#2 is needed. In the state of
For example, the relevance analysis unit 23#2 analyzes the relevance between data by using a graph division method. Because the relevance analysis by the relevance analysis unit 23#2 is the same as that of the first embodiment, reference is made to
When the relevance analysis unit 23#2 determines a new segment, the segment arrangement unit 24#2 changes the arrangement of the segments, such that the data belonging to the same segment is stored in the same memory device 30. From the state illustrated in
The segment arrangement unit 24 may arbitrarily determine whether to store a segment in the memory device 30 of the self-node or to store a segment in the memory device 30 of another node, while following the above rule “data items belonging to the same segment are stored in the same memory device 30”. Therefore, the segment arrangement unit 24 appropriately determines the arrangement of segments, for example, in consideration of the free space, etc., in the memory device 30 of each node 10.
Furthermore, when the segment arrangement unit 24 changes the arrangement of segments in the memory device 30, the segment arrangement unit 24 also changes the arrangement of segments in the storage device 40, accordingly. In the state of
In the state of
(Overall Process Flow)
First, the input output management unit 21 determines whether the data (in the above description, the current data) specified by the data request, is stored in the memory device 30 or the storage device 40 of the self-node (S300). When the data specified by the data request is not stored in the memory device 30 or the storage device 40 of the self-node, the node 10 ends the process of the this flowchart.
When the data specified by the data request is stored in the memory device 30 or the storage device 40 of the self-node, the input output management unit 21 reads the data specified by the data request from the memory device 30 or the storage device 40, and outputs the data to the physical switch 50 (S302).
Next, the input output management unit 21 refers to the previous data included in the data request, and updates the relevance saving cabinet 30B (S304).
When the relevance saving cabinet 30B is updated, the analysis necessity determination unit 22 determines whether relevance analysis by the relevance analysis unit 23 is needed (S306). When it is determined that relevance analysis is not needed, the node 10 ends the process of the this flowchart.
When it is determined that relevance analysis is needed, the relevance analysis unit 23 analyzes the relevance of the data (S308).
Next, the segment arrangement unit 24 determines whether the segment arrangement needs to be changed, based on the analysis results by the relevance analysis unit 23 (S310). When it is determined that the segment allocation does not need to be changed, the node 10 ends the process of the this flowchart.
When it is determined that the segment arrangement needs to be changed, the segment arrangement unit 24 changes the arrangement of the segments, such that the data belonging to the same segment is stored in the same memory device 30 (S312). Furthermore, the segment arrangement unit 24 moves the data stored in the relevance saving cabinet 30B according to the change of the segment arrangement (S314).
(Comparison with Reference Method)
In the following, a description is given of a comparison between the process performed by the data management device 2 according to the present embodiment and a process performed by a reference method in which the data is not managed in units of segments. Generally, the memory device 30 such as a RAM and a flash memory may be accessed at high speed compared to the storage device 40 such as a HDD, but then again the memory device 30 often has a small storage capacity. Therefore, it may be possible to perform a process of increasing the response speed, by storing the original data in the storage device 40, and storing data having a high access frequency in the memory device 30 according to need.
Conversely, in the data management device 2 according to the present embodiment, data A, B having relevance are managed as the same segment, and therefore when reading data A from the storage device 40, data A, B are stored in the memory device 30 by the whole segment.
In the present embodiment, a seek time Ts and a read time Tr are needed to output data A; however, as the segment including data A, B is copied to the memory device 30 when outputting the data A, there is no need for a time corresponding to the read time Tr when outputting data B. Therefore, the time needed for outputting both data A, B is approximately 2Ts+Tr, and it is possible to output the requested data at a high speed compared to the reference method.
(Overview)
According to the data management device, the data management method, and the data management program of the present embodiment described above, it is possible to output the requested data at high speed.
Furthermore, according to the information processing device according to the present embodiment described above, information by which the previous data may be identified is attached to the currently-requested data and sent to the data management device, and therefore it is possible to output the requested data to the data management device at high speed.
A description is given above of the best mode for carrying out the present invention by using embodiments; however, the present invention is not limited to the specific embodiments described herein, and various modifications and substitutions may be made without departing from the scope of the substance of the present invention.
For example, in the above embodiments, an example is given of the relevance analysis between data by the relevance analysis unit 23 in which focus is given on only the continuity of data requests; however, the relevance analysis is not so limited. For example, the relevance analysis unit 23 may recognize the time intervals between the data requests, and for example, the relevance analysis unit 23 may count the number of history items obtained by multiplying the inverse number of the time intervals, etc., and use a logic of calculating a high index value Cij between data items that are continuously requested within a short period of time. By this method, the relevance analysis unit 23 is capable of analyzing the relevance between data items more accurately. Furthermore, according to the same reason, exceptional processes may be performed such as not adding a data request to the relevance saving cabinet 30B when a certain amount of time has passed since the previous data request, and not adding a data request to the relevance saving cabinet 30B with respect to data requests that have been given around the time when the client computer 70 is shut down.
Furthermore, when handling data of a graph structure, the relevance analysis unit 23 may handle data items, which correspond to the link source and the link destination, as having relevance.
Furthermore, the data stored in the storage device 40 is managed in units of segments; there may be data that is stored as a single piece of data.
According to an embodiment, a data management device is provided, which is capable of outputting the requested data at high speed.
The present invention may be used in the data providing service industry, the computer manufacturing industry, the computer software industry, etc.
All examples and conditional language recited herein are intended for pedagogical purposes to aid the reader in understanding the invention and the concepts contributed by the inventors to furthering the art, and are to be construed as being without limitation to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
This application is a U.S. continuation application filed under 35 USC 111(a) claiming benefit under 35 USC 120 and 365(c) of PCT Application PCT/JP2012/052011 filed on Jan. 30, 2012, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2012/052011 | Jan 2012 | US |
Child | 14337282 | US |