This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2015-054207, filed on Mar. 18, 2015, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a data classifying program and the like.
Collation processing and calculation of the similarity by using non-structural data such as an image, voice, and sensor data generally take long time. Therefore, there has been a conventional technology for efficiently performing the collation processing by allocating record data to a plurality of calculation resources and distributing the processing.
In the example illustrated in
However, in the above-mentioned related art, there has been a problem in that the record data with a long processing time are unable to be distributed and a database which can reduce time to perform the query data are unable to be constructed.
There is a case where the processing time does not depend on the record data and the processing time is fluctuated by a data pair of the query data and the record data. For example, when the query data is similar to the record data, the processing time to process the record data gets longer. Therefore, when a plurality of pieces of record data similar to the query data is collectedly arranged in a certain calculation resource, the processing time of the calculation resource gets longer.
Therefore, it is difficult to reduce the processing time of the calculation resource only by controlling not to arrange the record data which are similar to each other to the same calculation resource. Also, it can be considered that the processing time is observed by actually using the query data and the record data is sorted based on the observation result. However, it is difficult to determine the number of pieces of the query data of which the processing time is measured.
According to an aspect of an embodiment, a non-transitory computer-readable recording medium stores therein a data classifying program that causes a computer to execute a process including: performing processing request to first data groups stored in a database; obtaining a parameter obtained from results of the performed processing for each data included in the first data group; extracting a second data group from at least the plurality of first data groups based on a first similarity between the parameters; generating a third data group by classifying data included in the second data group so that second similarities between the parameters of the data included in the second data group are low; and classifying the third data group to the first data group so that a third similarity between the parameters of the data included in a pair of the third data group and the first data group is low.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
Preferred embodiments of the present invention will be explained with reference to accompanying drawings. The embodiments do not limit the invention.
Exemplary processing of a data classifying device according to the present embodiment will be described.
By performing the processing illustrated in
Here, the meaning of rearranging the pair of recorded data to a new pair will be described.
It is difficult to manipulate the distribution of the query data. Therefore, when the volume of the common part 1XZ is reduced, the probability that the two pieces of record data rx and rz have long processing time at the same time can be reduced. For example, in
Even when the record data is originally apart from another record data, similarly to the other record data group, a better pair can be formed by rearranging the pair to the pair of record data which have longer distance from each other. When forming a pair, the data classifying device may fix the calculation resource to which one of the recorded data belongs and return the pair to the fixed calculation resource at the time of selecting each record data to be a candidate. However, the data classifying device according to the present embodiment does not fix the calculation resource.
Subsequently, exemplary processing in a case where the data classifying device returns the pair of record data to the calculation resource will be described.
For example, the data classifying device returns the pair to the calculation resource based on the “stable matching problem”. As illustrated in
Whereas, as illustrated in
Next, an exemplary stable matching problem (stable marriage problem and stable matching problem) used by the data classifying device according to the present embodiment will be described. The stable matching problem is a problem to form stable pairs of men and women when there is N men and N women and each man has preference lists of the women and each woman has preference lists of the men. Here, when a matching of man and woman is given and both of them have a more preferred partner rather than the current partner who makes the pair together, they run off together. Such a pair is referred to as a blocking pair. A matching having the blocking pair is referred to as an unstable matching, and a matching having no blocking pair is referred to as a stable matching.
In a group 20a, pairs are formed as (1, a), (2, c), (3, b), and (4, d). Since no blocking pair exists in the group 20a, it can be said that each pair in the group 20a is the stable matching.
On the other hand, in a group 20b, pairs are formed as (1, a), (2, c), (3, d), and (4, b). A blocking pair (4, d) exists in the group 20b. This is because the man 4 prefers the woman d rather than the woman b and the woman d prefers the man 4 rather than the man 3. Therefore, it can be said that each pair in the group 20b is the unstable matching.
Next, the Gale-Shapley algorithm to obtain the stable matching indicated in the group 20a in
As illustrated in
On the other hand, when the unmarried man h exists (step S11, Yes), the GS makes the man h propose to the woman d in the highest rank in the preference list from among the women who have not received the proposal of the man h yet (step S13). The GS determines whether the woman d who has been proposed is unmarried (step S14).
When the woman d is unmarried (step S14, Yes), the GS makes the woman d engage to the man h (step S15), and the procedure proceeds to step S11. On the other hand, when the woman d is not unmarried (step S14, No), the GS allows the procedure to proceed to step S16.
In step S16, when a rank of a man h′ is higher than a rank of the man h in the preference list of the woman d, the woman d refuses the proposal from the man h. When the rank of the man h is higher than the rank of the man h′, the woman breaks off the engagement to the man h′ and engages to the man h. After the processing in step S16 has been terminated, the GS shifts the processing to step S11.
Next, the extended Gale-Shapley in which the Gale-Shapley algorithm is extended will be described. In the following description, the extended Gale-Shapley is written as “extended GS”. The extended GS deletes a pair candidate which does not become the stable matching from the preference list in the middle of the algorithm. Specifically, when the man h engages to the woman d, the extended GS is different from the GS in a point that a man with a lower priority than the man h is deleted from the preference list of the woman d. By adding this processing, the extended GS can more efficiently perform the stable matching than the GS.
Next, the hospitals/residents problem will be described. The hospitals/residents problem is a problem to determine an arrangement destination hospital of a doctor-in-training. A point different from the stable matching problem is that a hospital has the maximum number of people who can be accepted and does not accept the doctor-in-trainings more than that. The number of people who can be accepted in the hospital is written as a “quota”. When the quotas of all the hospitals are one, the hospitals/residents problem is the same as the stable matching problem.
To solve the hospitals/residents problem, the hospital doctor-in-training problem is changed into a stable matching problem of an incomplete list as follows. When it is assumed that a quota of a hospital A be “qA”, A is divided into qA and divided into A1, A2, A3, . . . , and AqA of which the quota is one. Also, the hospital A included in the preference list of the doctor-in-training is changed from A1s of which the number is qA into AqA, and the ranking is performed in a high-handed manner.
For example, it is assumed that the hospitals A and B exist, the quota of the hospital A is two, and the quota of the hospital B is one. It is assumed that the first preference is the hospital B and the second preference is the hospital A in the preference list of one doctor-in-training. In this case, first, the hospital A is divided into hospitals A1 and A2, and ranking regarding the hospitals A1 and A2 is performed in a high-handed manner. For example, the second preference or the third preference is randomly allocated to the hospitals A1 and A2. In this way, for example, regarding the preference list of one doctor-in-training, it is assumed that the first preference be the hospital B, the second preference be the hospital A1, and the third preference be the hospital A2. As a result, since the problem is the stable matching problem of the incomplete list, the problem is solved by using the extended GS.
Next, the stable roommate problem will be described. The stable roommate problem is to divide 2n people into pairs of two. At that time, each person has a preference order to be a roommate relative to 2n−1 people. When the present embodiment is applied, each record data has the preference list. In the preference list, the other record data with smaller similarity has a higher rank in the preference order to be a pair. The output is the stable pairing.
The data classifying device performs first phase processing (step S21). The data classifying device determines whether a stable roommate solution exists (step S22). When the stable roommate solution does not exist (step S22, No), the data classifying device terminates the processing.
On the other hand, when the stable roommate solution exists (step S22, Yes), the data classifying device performs second phase processing (step S23). The data classifying device outputs n stable pairs (step S24).
Next, an exemplary processing procedure of the first phase processing indicated in step S21 of
When the condition in step S30 is satisfied (step S30, Yes), the data classifying device selects a person “X” whose proposal is not held (step S31). The data classifying device makes the person “X” propose to a person “Y” in the highest rank who has not received the proposal of “X” yet in the preference list of the “X” (step S32).
The data classifying device determines whether the “Y” has already held the proposal and the partner is in the higher rank in the preference list of the “Y” than that of the “X” (step S33). When the “Y” has already held the proposal and the partner is at the higher rank in the preference list of the “Y” than that of the “X” (step S33, Yes), the data classifying device makes the “Y” refuse the proposal from the “X” (step S34) and shifts the procedure to step S30.
On the other hand, when the “Y” does not hold the proposal and when the partner of the proposal is in the lower rank in the preference list of the “Y” than that of the “X” (step S33, No), the data classifying device shifts the procedure to step S35. The data classifying device makes the “Y” refuse the currently holding proposal of the partner and hold the proposal from the “X” (step S35), and the procedure proceeds to step S30. In step S35, when the “Y” does not hold the proposal partner, the “Y” is made to hold the proposal from the “X”.
The description returns to step S30. When the condition in step S30 is not satisfied (step S30, No), the data classifying device determines whether the person whose proposal is refused by everyone exists (step S36). When the person whose proposal is refused by everyone exists (step S36, Yes), the data classifying device determines that there is no stable matching solution (step S37), and the first phase processing is terminated.
On the other hand, when the person whose proposal is refused by everyone does not exist (step S36, No), the data classifying device determines that there is a stable matching solution (step S38). The data classifying device deletes a proposal candidate under a predetermined condition in the preference list of the “Y” (step S39), and the first phase processing is terminated.
Step S39 will be specifically described. As a precondition, it is assumed that the “Y” hold the proposal from the “X”. The data classifying device deletes a proposal candidate having a lower rank than the “X” in the preference list of the “Y” from the preference list of the “Y”. Also, the data classifying device deletes the partner, who has refused one's proposal, from the one's preference list. Also, the data classifying device deletes the proposal candidate from the preference list of the proposer corresponding to the deleted proposal candidate.
Next, an exemplary processing procedure of the second phase processing indicated in step S23 of
When the condition in step S40 is not satisfied (step S40, No), the data classifying device terminates the second phase processing. On the other hand, when the condition in step S40 is satisfied (step S40, Yes), the data classifying device shifts the procedure to step S41. The data classifying device searches for an all-or-nothing cycle a (1), . . . , a (r), b (1), . . . , b (r) in the preference list (step S41). Here, since b (i) has the highest rank of a (i), b (i) holds a proposal from a (i). Also, to simplify the expression, this is expressed as a (r+1)=a (1), b (r+1)=b (1).
The data classifying device controls all the “i” s so that b (i) refuses the proposal from a (i) (step S42). The data classifying device controls all the “i” s so that a (i) proposes to b (i+1) and b (i+1) holds the proposal from a (i) (step S43). The data classifying device deletes the highest rank in the preference list of a (i) and the lowest rank in the preference list of b (i) relative to all the “i” s (step S44).
The data classifying device deletes b (i+1) from the preference list of the “i” relative to all the “i” which is equal to or lower than a (i) in the preference list of b (i+1). Also, the data classifying device deletes all the “X” which is equal to or lower than a (i) from the preference list of b (i+1) (step S45), and the procedure proceeds to step S40.
Here, an exemplary processing procedure for searching for the all-or-nothing cycle indicated in step S41 will be described.
On the other hand, when s which satisfies p (s+r)=p (s) and the positive integer r1 do not exist (step S52, No), the data classifying device shifts the procedure to step S56. The data classifying device assumes that q (i) is the second person in the preference list of p (i) (step S56).
The data classifying device assumes p (i+1) as the person who has the lowest rank in the preference list of q (i) (step S57), the procedure proceeds to step S52.
Next, exemplary processing for obtaining the stable room solution from each preference list by the data classifying device will be described.
Whereas, a relation between the proposer to be denied and the proposal destination is as follows. Person 3→person 4, person 1→person 4, person 2→person 6, and person 6→person 5. For example, the person 3 proposes to the person 4. However, since the person 4 receives the proposal from the person 2 having the higher priority than the person 3, the person 4 denies the proposal from the person 3.
In
The second row corresponding to the preference list of the person 2 will be described. The person 3 holds the proposal from the person 2, and also, the person 2 holds the proposal from the person 4. Also, as indicated in the first row, the person 2 is a hopeless proposer for the person 1. Therefore, the person 1 in the second row is a hopeless proposer. Also, the proposal of the person 2 to the person 6 is denied. Therefore, the data classifying device deletes the people 6 and 1 from the preference list of the person 2.
The third row corresponding to the preference list of the person 3 will be described. The proposal of the person 3 is held by the person 5, and also, the person 3 holds the proposal from the person 2. Also, as indicated in the first row, the person 3 is a hopeless proposer for the person 1. Therefore, the person 1 in the third row is the hopeless proposer. Also, as indicated in the sixth row, the person 3 is a hopeless proposer for the person 6. The person 6 in the third row is a hopeless proposer. Therefore, the data classifying device deletes the people 4, 1, and 6 from the preference list of the person 3.
The fourth row corresponding to the preference list of the person 4 will be described. The proposal of the person 4 is held by the person 2, and also, the person 4 holds the proposal from the person 5. In the preference list of the person 4, the people 1 and 3 are hopeless proposers for the person 4. Also, as illustrated in the sixth row, the person 4 is the hopeless proposer for the person 6. Therefore, the person 6 in the fourth row is a hopeless proposer. Therefore, the data classifying device deletes the people 6, 1, and 3 from the preference list of the person 4.
The fifth row corresponding to the preference list of the person 5 will be described. The proposal of the person 5 is held by the person 4, and also, the person 5 holds the proposal from the person 3. In the preference list of the person 5, the people 6 and 1 are hopeless proposers for the person 5. Therefore, the data classifying device deletes the people 6 and 1 from the preference list of the person 5.
The sixth row corresponding to the preference list of the person 6 will be described. The proposal of the person 6 is held by the person 1, and also, the person 6 holds the proposal from the person 1. In the preference list of the person 6, the people 4, 2, and 3 are hopeless proposers for the person 6. Also, the proposal of the person 6 to the person 5 is denied. Therefore, the data classifying device deletes the people 4, 2, and 3 from the preference list of the person 6.
By performing the processing by the data classifying device, the preference list table 110c in
In the preference list table 110c in
The data classifying device searches for the all-or-nothing cycle in the preference list table 110c in
For example, q (1)=5 and p (2)=3 are satisfied based on the processing procedure in
The data classifying device controls all the “i” s so that b (i) refuses the proposal from a (i). Also, the data classifying device controls all the “i” s so that a (i) proposes to b (i+1) and controls so that b (i+1) holds the proposal from a (i). Also, the data classifying device deletes the highest rank in the list of a (i) and the lowest rank in the list of b (i) relative to all the “i” s. Then, the relation between the proposer and the proposal destination is person 1→person 6, person 2→person 3, “person 3→person 2”, “person 4→person 5”, person 5→person 4, and person 6→person 1.
The data classifying device deletes b (i+1) from the preference list of the “X” relative to all the “X” which is equal to or lower than a (i) in the preference list of b (i+1). Also, the data classifying device deletes all the X which is equal to or lower than a (i) from the preference list of b (i+1). By performing the processing by the data classifying device, the preference list table 110c illustrated in
Next, an exemplary structure of the data classifying device according to the present embodiment will be described.
The calculation resource S1 is a device which collates a plurality of pieces of record data arranged in the calculation resource S1 with query data obtained from the collation processing requesting unit 130 and performs processing for collating and searching for the record data corresponding to the query data. The calculation resource S1 outputs search results to an external device. Also, the calculation resource S1 measures a processing time needed for the collation and the search by the query data for each record data and registers the measured result to the intermediate table 110b. The calculation resource S1 has a generating unit which is not illustrated, and the generating unit may generate the intermediate table 110b. The description on the processing regarding the calculation resources S2 to SN is similar to that of the calculation resource S1. Generally, when the query data is similar to the record data, the processing time gets longer. Therefore, the processing time indicated in the intermediate table 110b is an index indicating the similarity of the query data to the record data.
The storage unit 110 includes a record data table 110a, an intermediate table 110b, a preference list table 110c, and arrangement destination information 110d. The storage unit 110 corresponds to a storage device, for example, a semiconductor memory element such as a random access memory (RAM), a read only memory (ROM), and a flash memory.
The record data table 110a has record data arranged to each of the calculation resources S1 to SN.
The intermediate table 110b corresponds the query data to the processing time of the record data processed by the query data. For example, a data structure of the intermediate table 110b corresponds to that of the intermediate table 110b illustrated in
The preference list table 110c holds information on the preference list of each record data. The preference list of each record data includes a plurality of pieces of preferred record data as an object to be a pair. In the preference list, the record data with smaller similarity gets a higher rank based on the similarity between the record data and the other record data.
For example, regarding the record data 001 to 004, the similarities between the record data 001 and the other record data 002 to 004 are respectively 10, 20, 30, and 40. In this case, the preference list of the record data 001 includes the record data 004, 003, and 002. The similarity corresponds to a distance between the record data, and the closer the distance between the pair of record data is, the higher the similarity is.
A data structure of the preference list table 110c corresponds to the preference list table illustrated in
The arrangement destination information 110d is information indicating an arrangement destination of the data.
The description returns to
When obtaining the query data from the input unit 120, the collation processing requesting unit 130 outputs the query data relative to each of the calculation resources S1 to SN and performs the collation processing request.
The data pair generating unit 140 obtains the stable roommate solution from the record data of which the processing time is equal to or more than the threshold TO and generates a stable pair of the record data. For example, the processing of the data pair generating unit 140 corresponds to the processing illustrated in
When obtaining the threshold TO and the data arrangement request from the input unit 120, the data pair generating unit 140 refers to the intermediate table 110b and obtains the record data of which the processing time is equal to or more than the threshold TO from the record data table 110a. The data pair generating unit 140 may obtain the record data of which the processing time is equal to or more than the threshold TO from the calculation resources S1 to SN. In the following description on the data pair generating unit 140, the number of the calculation resources S1 to SN is assumed to be N. It is assumed that a data set obtained from the calculation resource i be X_i. In each calculation resource i, the number of X_i is assumed to be 2*R_i. It is assumed that K=R_1+R_2+ . . . +R_N is satisfied. All the data sets obtained by the data pair generating unit 140 are assumed to be data sets X.
The data pair generating unit 140 performs the following processing relative to each element a of the data set X. The data pair generating unit 140 calculates the similarity relative to elements other than the element a of the data set X. The data pair generating unit 140 generates a preference list of the element a by arranging the elements other than the element a of the data set X in an order of the similarity from the smallest. By repeatedly performing the above processing for each element, the data pair generating unit 140 generates the preference list of each element and registers them to the preference list table 110c.
The data pair generating unit 140 obtains the stable roommate solution by using the Irving algorithm described in
When the stable roommate solution does not exist, the data pair generating unit 140 generates the pair of record data based on the greedy algorithm. Here, the greedy algorithm will be described. The data pair generating unit 140 selects the i-th record data. When the selected record data does not form a pair, the data pair generating unit 140 performs the following processing. The data pair generating unit 140 forms a pair of the highest record data and the i-th record data from among the record data which do not form a pair in the preference list of the i-th record data. The data pair generating unit 140 performs the above processing relative to the i-th to K-th record data. It is assumed that the quota qi be R_i.
The matching processing unit 150 performs the matching between the pair of record data and the calculation resources S1 to SN and returns the pair of record data to the calculation resources S1 to SN based on the matching result. For example, the processing of the matching processing unit 150 corresponds to the above-mentioned processing of
The matching processing unit 150 calculates the similarity of each record data arranged in the calculation resource Sj relative to (pi1, pi2) of each pair pi of the record data. The maximum value of the similarity of pi1 to each record data is m (1, i, j). The maximum value of the similarity of pi2 to each record data is m (2, i, j). In this case, the matching processing unit 150 assumes the similarity m (i, j) of the pair pi to the calculation resource Sj as the larger one of m (1, i, j) and m (2, i, j). A matrix D having K rows and N columns is defined, and each (i, j) component of the matrix D is assumed to be m (i, j).
The matching processing unit 150 sorts the i-th row of the matrix D in ascending order and determines an order of the calculation resource Sj relative to the pair pi. Then, the matching processing unit 150 assumes the determined order as a preference list Lpi of the pair pi. At this time, both j!=j′ and dij=dij′ could be satisfied, and either one may be first to come at the time of sorting. The matching processing unit 150 sorts the j-th row of the matrix D in ascending order and determines an order of the pair pi relative to a calculation resource Sj. Then, the matching processing unit 150 assumes the determined order as a preference list LSj of the calculation resource Sj.
The matching processing unit 150 solves the hospitals/residents problem with the extended GS algorithm by using Lp1, . . . , and LpK and LS1, . . . , and LSN. At this time, either one of the pair of record data and the calculation resource may propose. Also, it is assumed that the quota of the calculation resource be q_1, q_2, . . . , and q_N. The matching processing unit 150 specifies the arrangement destination of the record data based on the matching result and generates the arrangement destination information 110d. For example, when the calculation resource matched with one pair is the calculation resource Sj, the arrangement destination of the record data of the pair is the calculation resource Sj.
Next, a processing procedure of the data classifying device 100 according to the present embodiment will be described.
The input unit 120 of the data classifying device 100 receives an arrangement processing requesting and a threshold TO from the user (step S102). The calculation resources S1 to SN measure the processing times of all the record data relative to a single piece of the query data and generate the intermediate table 110b (step S103).
The data pair generating unit 140 of the data classifying device 100 selects an even number of pieces of the record data, of which the processing time exceeds the threshold TO, based on the intermediate table 110b (step S104). The data pair generating unit 140 generates a pair of the record data which are not similar to each other (step S105).
The matching processing unit 150 of the data classifying device 100 obtains the stable matching solution based on the similarity of the record data arranged to the calculation resources S1 to SN to the data pair so that the data pair is arranged to the calculation resource which is not similar to the same (step S106). The data classifying device 100 generates the arrangement destination information 110d based on the stable matching solution (step S107). The data arrangement processing unit 160 arranges the pair of record data to the calculation resources S1 to SN based on the arrangement destination information 110d (step S108).
Next, an effect of the data classifying device 100 according to the present embodiment will be described. The data classifying device 100 extracts the even number of the record data having a long processing time relative to the query data from the calculation resources S1 to SN and forms a pair of the extracted record data which are not similar to each other. The data classifying device 100 arranges the pair of record data to the calculation resource where the record data which is not similar to the pair is stored. By performing this processing, a database that can reduce time to perform the query data can be constructed.
The data classifying device 100 generates a pair of the record data based on the intermediate table 110b indicating the processing time of the record data relative to the query data generated by the calculation resources S1 to SN. Therefore, the pair of record data which have a long processing time and are not similar to each other can be efficiently generated.
Next, other processing (1) of the data pair generating unit 140 illustrated in
When obtaining the threshold TO and the data arrangement request from the input unit 120, the data pair generating unit 140 refers to the intermediate table 110b and obtains the record data of which the processing time is equal to or more than the threshold TO from the record data table 110a. The data pair generating unit 140 may obtain the record data of which the processing time is equal to or more than the threshold TO from the calculation resources S1 to SN.
It is assumed that a data set obtained from each calculation resource i be X_i. In each calculation resource i, the number of X_i is assumed to be 2*R_i. It is assumed that K=R_1+R_2+ . . . +R_N is satisfied. All the data sets obtained by the data pair generating unit 140 are assumed to be data sets X.
The data pair generating unit 140 randomly selects K record data from the data set X. The set of the selected record data is assumed to be a data set Y, and other set is assumed to be a data set Z. The data pair generating unit 140 calculates the similarity of each element of the data set Z relative to each element a of the data set Y, and a list in which the elements of the data set Z are arranged in ascending order of the similarity is assumed to be a preference list of the element a. The data pair generating unit 140 calculates the similarity of each element of the data set Y relative to each element b of the data set Z, and a list in which the elements of the data set Y are arranged in ascending order of the similarity is assumed to be a preference list of the element b. For example, the data pair generating unit 140 calculates a distance between the elements of the data set Y and the elements of the data set Z as the similarity. The shorter the distance is, the larger the similarity is.
The data pair generating unit 140 obtains the stable matching solution with the Gale-Shapley algorithm based on the preference list of each element of the data set Y and the preference list of each element of the data set Z and forms the pair of record data according to the obtained stable matching solution. It is assumed that the quota qi be R_i. The Gale-Shapley algorithm corresponds to the processing procedure in
When the data pair generating unit 140 obtains the stable roommate solution or the stable matching solution, the quota may be set as follows.
It is assumed that a data set obtained from a certain calculation resource i be X_i. In each calculation resource i, the number of X_i is assumed to be 2*R_i. It is assumed that K=R_1+R_2+ . . . +R_N is satisfied. All the data sets obtained by the data pair generating unit 140 are assumed to be data sets X. Also, it is assumed that the number of the record data allocated to each of the calculation resources S1 to SN be n1, n2, . . . , and nN.
The data pair generating unit 140 obtains a quota q_i based on the formula (1). M included in the formula (1) is defined by the formula (2). Also, floor (x) included in the formula (1) indicates the largest integer that does not exceed x.
When a condition indicated in the formula (3) is satisfied, the data pair generating unit 140 randomly selects i* from among one to N and defines q_i* as indicated in the formula (4) again.
A purpose of setting the quota q_i is that the numbers of record data arranged to the calculation resources S1 to SN become almost equal to each other when the data pairs are allocated to the calculation resources S1 to SN again. Since M is the number of all the record data, it is preferable to set q_i so that n_i−R_i+q_i is equal to M/N in order to equally allocate all the record data. However, it is preferable that the quota be not a negative integer. In addition, the selected record data is finally arranged to the calculation resources S1 to SN. Therefore, when the sum of the quotas does not satisfy a condition of the formula (3), the data pair generating unit 140 adjusts the quota according to the formula (4).
Other processing (2) of the data pair generating unit 140 illustrated in
The data pair generating unit 140 randomly selects the elements of which the number is multiples of U from among the data set X_i obtained from each calculation resource i. The number of the selected elements is U*R_i. It is assumed that K=R_1+R_2+ . . . +R_N, and all the selected data sets are assumed to be X. The data pair generating unit 140 returns the data which has not selected to the calculation resource.
The data pair generating unit 140 performs the following processing relative to each element a of the data set X. The data pair generating unit 140 calculates the similarity relative to the elements other than the element a of the data set X. The data pair generating unit 140 assumes a list in which the similarities of the elements other than the element a of the data set X are arranged in ascending order as a preference list of the element a. The data pair generating unit 140 considers all the combinations relative to the preference lists of all the elements of the data set X and obtains the stable roommate solution. When the stable roommate solution is obtained, the data pair generating unit 140 generates K pairs of record data according to the solution and outputs them to the matching processing unit 150.
When the stable roommate solution does not exist, the data pair generating unit 140 generates the pair of record data based on the greedy algorithm. Here, the greedy algorithm will be described. The data pair generating unit 140 selects the i-th record data. When the selected record data does not form a pair, the data pair generating unit 140 performs the following processing. The data pair generating unit 140 forms a pair of the record data in the upper (U−1) pieces of record data and the i-th record data from among the record data which do not form a pair yet in the preference list of the i-th record data. The data pair generating unit 140 performs the above processing relative to the first to the U*K-th record data. It is assumed that the quota qi be R_i.
Next, other processing of the matching processing unit 150 illustrated in
The matching processing unit 150 sorts the i-th row of the matrix D in ascending order and determines an order of the calculation resource Sj relative to the pair pi. Then, the matching processing unit 150 assumes the determined order as a preference list Lpi of the pair pi. At this time, both j!=j′ and dij=dij′ could be satisfied, and either one may be first to come at the time of sorting. The matching processing unit 150 sorts the j-th row of the matrix D in ascending order and determines an order of the pair pi relative to a calculation resource Sj. Then, the matching processing unit 150 assumes the determined order as a preference list LSj of the calculation resource Sj.
The matching processing unit 150 solves the hospitals/residents problem with the extended GS algorithm by using Lp1, . . . , and LpK and LS1, . . . , and LSN. At this time, either one of the pair of record data and the calculation resource may propose. Also, it is assumed that the quota of the calculation resource be q_1, q_2, . . . , and q_N. The matching processing unit 150 specifies the arrangement destination of the record data based on the matching result and generates the arrangement destination information 110d. For example, when the calculation resource matched with one pair is the calculation resource Sj, the arrangement destination of the record data of the pair is the calculation resource Sj.
Next, an exemplary computer which executes a data classifying program for realizing a function similar to that of the data classifying device 100 indicated in the above embodiment will be described.
As illustrated in
The hard disk drive 207 includes a data pair generating program 207a, a matching processing program 207b, and a data arranging processing program 207c. The CPU 201 reads the data pair generating program 207a, the matching processing program 207b, and the data arranging processing program 207c and develops the programs to the RAM 206.
The data pair generating program 207a functions as a data pair generating process 206a. The matching processing program 207b functions as a matching processing process 206b. The data arranging processing program 207c functions as a data arranging processing process 206c. The processing of the data pair generating process 206a corresponds to that of the data pair generating unit 140. The processing of the matching processing process 206b corresponds to that of the matching processing unit 150. The processing of the data arranging processing process 206c corresponds to the processing of the data arrangement processing unit 160.
For example, the data pair generating program 207a, the matching processing program 207b, and the data arranging processing program 207c are stored in “portable physical media” such as a flexible disk (FD), a CD-ROM, a DVD disk, a magnetooptical disk, and an IC card that are inserted into the computer 200. The computer 200 may read and perform each of the programs 207a to 207c.
A database that can reduce time to perform query data can be constructed.
All examples and conditional language recited herein are intended for pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although the embodiment of the present invention has been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2015-054207 | Mar 2015 | JP | national |