This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2015-115286, filed on Jun. 5, 2015, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein relate to a computer readable medium, a method, and an information processing apparatus.
Software techniques for a computer include, for example, a technique related to a sort process for rearranging a plurality of pieces of data in order according to a certain sequence. The sort process may include rearranging numerical values, for example, in order from the largest or in order from the smallest, rearranging character strings in alphabetic order or in Japanese alphabetic order, or rearranging dates in order from the oldest or from the newest.
A technique for increasing the speed of the sort process involves dividing one sort process and carrying out the processing in a plurality of processors or processor cores concurrently. For example, when one sort process is carried out concurrently in a plurality of processors, the plurality of pieces of data to be rearranged is divided into a plurality of data groups. The speed of the sorting can be increased by allocating the data groups to each of the plurality of processors and carrying out the rearranging, and then combining the obtained plurality of rearranging results.
With respect to the above matter, a technique is known in which a sorting method is automatically selected according to the properties and amounts of the data to be sorted to increase the speed of the sorting. Another technique is known that provides a sorting method that reduces the sort processing time due to the sort process in which data to be sorted is divided into N number of groups, and the amount of the data is reduced by the dividing. A technique is known that reduces unnecessary exchange processing as much as possible and that classifies a plurality of given record values as equally as possible when classifying the given record values into two groups. A technique is known that improves performance by greatly reducing the number of branch prediction errors (see Japanese Laid-open Patent Publication No. 2014-102613 for example). A technique is known in which a display or a printed matter can be easily understood visually while enabling the recognition of sequences or groupings of data spread between groupings.
Related techniques are discussed in, for example, Japanese Laid-open Patent Publication No. 2002-116907, Japanese Laid-open Patent Publication No. 2012-185791, Japanese Laid-open Patent Publication No. 5-143286, Japanese Laid-open Patent Publication No. 2014-102613, and Japanese Laid-open Patent Publication No. 2006-268688.
According to an aspect of the invention, a method includes: reading, when each of a plurality of pieces of data is set as target data to be grouped and the target data is grouped based on a boundary value of a root node in a binary tree created from a plurality of boundary values, the target data from the plurality of pieces of data, specifying, by a processor, a temporary maximum value that indicates a maximum value among the target data and data already grouped, and a temporary minimum value that indicates a minimum value among the target data to be grouped and the data already grouped; specifying, by the processor, a maximum value and a minimum value of the plurality of pieces of data by updating the temporary maximum value and the temporary minimum value; dividing, by the processor, the plurality of pieces of data based on a boundary value between the maximum value and the minimum value of the plurality of pieces of data among the plurality of boundary values; and allocating, by the processor, the divided plurality of pieces of data to a plurality of processing devices that carry out processing on each of the divided plurality of data.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention, as claimed.
In order to carry out concurrent processing, there is a method for using a binary tree to group a plurality of pieces of data into a plurality of division segments as one example of dividing a plurality of pieces of data to be sorted. A plurality of boundary values (pivots) which act as boundaries for dividing the data are determined in this method. By using the data to search a binary tree created from the plurality of boundary values, the plurality of pieces of data is grouped into a plurality of division segments demarcated (or partitioned) by the boundary values. For example, when the boundary values are −5, 0, 5, the plurality of pieces of data may be divided into four division segments of less than −5, −5 and greater to less than 0, 0 and greater to less than 5, and 5 and greater. In this case, magnitude relationships are determined among the plurality of division segments demarcated by the boundary values. As a result, the sorting can be completed at a higher speed by allocating the data included in the division segments to a plurality of processors to carry out rearranging and then combining the obtained plurality of rearranging results according to the magnitude relationships of the division segments. However, the amount of data to be handled has been increasing in recent years and there is a desire to further reduce the computing load in the sort process. An object according to one aspect of the embodiment is to provide a technique that is capable of reducing the number of comparisons when using a binary tree to divide a plurality of data.
According to the one aspect of the disclosed embodiment, a technique is provided that reduces the number of comparisons when using a binary tree to divide a plurality of data.
Hereinafter, embodiments will be described with reference to the drawings. The same reference numerals are attached to the corresponding elements in the drawings.
Dividing a plurality of pieces of data into a plurality of division segments using the binary tree will be discussed next. As mentioned above, the plurality of pieces of data is grouped into any of the plurality of division segments (for example, division segment 1 to division segment 8 in
By searching the binary tree as described above, for example, when the data divided into the data group 1 to the data group X in
In order to reduce the amount of computing for dividing using the binary tree, it has been considered to use a bias of the data inside the data group to be divided, for example. A bias of the data may be a bias toward a range having a larger amount of data in segments in a certain row when the data in a certain data group, for example, is lined up.
For example, it is assumed that only the product numbers of the summer period products are included in a certain data group, and moreover, numbers from 251 to 600 are allocated to the product numbers of the summer period products. In this case, the product numbers included in the data group are biased toward values between 251 to 600. If the data included in the data group is grouped by using the boundary values as boundaries depicted in
The range of the bias of the data included in the data group can be specified, for example, from the maximum value and the minimum value of the data included in the data group. Accordingly, it may be considered that the maximum and minimum values of the data included in the data group can be specified before executing the dividing using a binary tree on the data included in the data group.
In step S501 (hereinbelow, “step” is written as “S” and expressed as “S501” for example), the information processing apparatus specifies the maximum value of the data included in the data group to be divided. The specification of the maximum value may be executed as described below for example. First, a predetermined initial value is previously set as the maximum value. The information processing apparatus then compares the maximum value with data read from the data group, and if the read data is greater than the set maximum value, the maximum value is updated to the read data. This processing is executed on all of the data included in the data group whereby the information processing apparatus is able to specify the maximum value of the data in the data group. In this case, the number of comparisons used to specify the maximum value may be estimated as “m” number of times which is the number of the data included in the data group, and one comparison is carried out with regard to each data to specify the maximum value. In S502, the information processing apparatus specifies the minimum value of the data included in the data group to be divided. The number of comparisons used to specify the minimum value may be estimated, for example, as “m” number of times and one comparison is carried out with regard to each data.
In S503, the information processing apparatus specifies the division segment to which the maximum value belongs from among the plurality of division segments demarcated by the predetermined plurality of boundary values. The information processing apparatus may use, for example, the value specified as the maximum value to search the binary tree created from the set plurality of boundary values to specify the division segment to which the maximum value belongs. In this case, the number of comparisons used to specify the division segment to which the maximum value belongs may be estimated as “log2 n”, where n is a value equal to or greater than the number of the plurality of division segments demarcated by the plurality of preset boundary values and may be a value to the power of 2. In S504, the information processing apparatus specifies the division segment to which the minimum value belongs from among the plurality of division segments demarcated by the plurality of preset boundary values. The number of comparisons used to specify the division segment to which the minimum value belongs may be estimated as “log2 n”. In S505, the information processing apparatus uses the boundary value between the maximum value and the minimum value among the set plurality of boundary values to create a binary tree. The information processing apparatus then uses the created binary tree to divide the data included in the data group into division segments, and then the action flow is finished. The number of comparisons for the division processing in S505 may be estimated to be “m×log2 n′” where n is a value equal to or greater than the number of the plurality of division segments from the division segment to which the maximum value belongs to the division segment to which the minimum value belongs and may be a value to the power of 2.
For example, by carrying out the action flow depicted above in
In the embodiment discussed below, processing for specifying the maximum value and the minimum value of data included in a data group is combined with processing to group the data included in the data group based on the boundary value of the root node 21 in the binary tree. As a result, the number of comparisons used to specify the maximum value and the minimum value in the data included in the data group can be reduced when grouping the data included in the data group using the boundary value of the root node 21 as the boundary. The binary tree may be created as a tree structure so that, for example, the magnitude relationship of the left side child node<parent node≦right side child node is satisfied in the embodiment discussed below. However, the structure of the binary tree used in the embodiment is not limited to this structure and another binary tree may be used. For example, in another embodiment, a binary tree may be created so that the magnitude relationships of left side child node≦parent node<right side child node, left side child node>parent node≧right side child node, or left side child node≧parent node>right side child node, and the like are satisfied. Furthermore, a binary tree for satisfying these types of magnitude relationships may be referred to as a binary search tree hereinbelow.
As discussed above, two comparisons with regard to each data included in the data group are carried out when specifying the maximum value and the minimum value of the data included in a certain data group in the example in
For example, it is assumed that a temporary maximum value and a temporary minimum value are specified when grouping the data included in a certain data group as data to be grouped using the boundary value of the root node 21 as the boundary. The temporary maximum value in this case represents, for example, a maximum value among the data to be grouped and the data that has previously been grouped. The temporary minimum value in this case represents, for example, a minimum value among the data to be grouped and the data that has previously been grouped. If the magnitude relationships among the three values of the temporary maximum value, the temporary minimum value, and the boundary value of the root node 21 are known in this case, a binary tree (for example, a binary search tree) created according to the magnitude relationships among the three values can be used and the magnitude relationships between the data to be grouped and each of the three values can be specified in two comparisons. An example is discussed below.
First, the magnitude relationships among the temporary maximum value, the temporary minimum value, and the boundary value of the root node 21 can be depicted, for example, according to the following four ways.
(State 1) Initial state in which no data is processed. The temporary maximum value and the temporary minimum value are undefined. The boundary value of the root node 21 is defined.
(State 2) Temporary minimum value≦temporary maximum value<boundary value of the root node 21
(State 3) Boundary value of the root node 21≦temporary minimum value≦temporary maximum value
(State 4) Temporary minimum value<boundary value of the root node 21≦temporary maximum value
Moreover, there is a possibility that the above four states may transition when processing the data to be grouped which is read from the data group, and such transitioning occurs according to the methods depicted in
First, the state of the initial state in which none of the data included in the data group to be divided is processed, is the state 1 (
If the state is the state 2, the binary tree created according to the magnitude relationships between the temporary maximum value, the temporary minimum value, and the boundary value of the root node 21 is the binary tree depicted in
Moreover, in the state 2, if the data is equal to or greater than the temporary maximum value for example, the control unit 600 updates the temporary maximum value to the value of the data. Moreover, if the data is greater than the temporary maximum value, the data does not become the temporary minimum value and the control unit 600 can omit comparing the data with the temporary minimum value. Next, the control unit 600 compares the data with the boundary value of the root node 21 that is the node of the right side branch in
If the state is the state 3, the binary tree created according to the magnitude relationships between the temporary maximum value, the temporary minimum value, and the boundary value of the root node 21 is the binary tree depicted in
If the data is less than the temporary minimum value for example in the state 3, the control unit 600 updates the temporary minimum value to the value of the data. Moreover, if the data is less than the temporary minimum value, the data does not become the temporary maximum value and the control unit 600 can omit comparing the data with the temporary maximum value. Next, the control unit 600 compares the data with the boundary value of the root node 21 that is the node of the right side branch in
If the state is the state 4, the binary tree created according to the magnitude relationships between the temporary maximum value, the temporary minimum value, and the boundary value of the root node 21 is the binary tree depicted in
Moreover, if the data is less than the boundary value of the root node 21 for example, the control unit 600 groups the data into the group less than the boundary value of the root node 21. Moreover, if the data is less than the boundary value of the root node 21 in the state 4, the data does not become the temporary maximum value and the control unit 600 can omit comparing the data with the temporary maximum value. Next, the control unit 600 compares the data with the temporary minimum value that is the node that branches to the left side in
As discussed above, the control unit 600 compares two values among the three values of the temporary maximum value, the temporary minimum value, and the boundary value of the root node 21 with the data to be divided based on the binary tree determined according to the magnitude relationships among the temporary maximum value, the temporary minimum value, and the boundary value of the root node 21. Consequently, the control unit 600 is able to specify the magnitude relationships between the data and each of the temporary maximum value, the temporary minimum value, and the boundary value of the root node 21 with two comparisons. When all of the data in the data group is grouped with the processing according to the above states and the temporary maximum value and the temporary minimum value have been updated, the temporary maximum value is the value that represents the maximum value of the data in the data group and the temporary minimum value is the value that represents the minimum value of the data in the data group. Therefore, the control unit 600 is able to group the data inside the data group with the boundary value of the root node 21 as the boundary, and specify the maximum value and the minimum value of the data inside the data group. Thus, the number of comparisons used to group the data included in the data group using the boundary value of the root node 21 as the boundary and to specify the maximum value and the minimum value in the data included in the data group can be reduced.
Moreover, the transitioning of the states can be specified from the magnitude relationship specified between the boundary value of the root node 21 and the data in each of the states. As a result, the control unit 600 is able to specify whether a state transitioning occurs or not without specifying, for example, whether to carry out another comparison of the magnitude relationships among the temporary maximum value, the temporary minimum value, and the boundary value of the root node 21. The processing to be executed along with the state can be changed without holding information for indicating the states in the storage unit 610 or the like by changing, for example, the action flow to be executed with a jump command and the like when a state transition occurs.
An explanation of an action flow for grouping processing based on the boundary value of the root node 21 as in the embodiment will be provided for each state from the abovementioned state 1 to the state 4 with reference to
In S901, the control unit 600 initializes the values of a variable n_S and a variable n_L to “0”. The variable n_S and the variable n_L are discussed in detail below. In S902, the control unit 600 reads one data from a data group to be divided and sets the data to a temporary maximum value, a temporary minimum value, and a variable val. The variable val is used as a variable for storing the data to be grouped. In S903, the control unit 600 compares the variable val with the boundary value of the root node 21 in the binary tree created using the plurality of present boundary values according to the division processing. In order to reduce the number of comparisons in correspondence to the lengths of the branches from the root node 21 to the leaf node 22 in the searching of the binary tree, the boundary value of the root node 21, for example, is preferably a boundary value close to the center of a row created when the boundary values used for the dividing are aligned in order of size.
In S903, if the variable val is less than the boundary value of the root node 21 (S903: Yes), the flow advances to S904. In S904, the control unit 600 adds 1 to the value of the variable n_S and sets the value of the variable val to an array variable S[n_S] (for example, the array variable S[1] in this case because the variable n_S=1), and the flow advances to the action flow of the state 2. However if, in S903, the variable val is equal to or greater than the boundary value of the root node 21 (S903: No), the flow advances to S905. In S905, the control unit 600 adds 1 to the value of the variable n_L and sets the value of the variable val to an array variable L[n_L] (for example, the array variable L[1] in this case because the variable n_L=1), and the flow advances to the action flow of the state 3.
In the action flows in
In S1001, the control unit 600 executes finish confirmation processing. Details of the finish confirmation processing are discussed below with reference to
If, in S1003, the variable val is a value greater than the value set to the temporary maximum value (S1003: No), the flow advances to S1007. In S1007, the control unit 600 updates the temporary maximum value to the value set to the variable val and the flow advances to S1008. In S1008, the control unit 600 determines whether the variable val is less than the boundary value of the root node 21. If the variable val is less than the boundary value of the root node 21 (S1008: Yes), the flow advances to S1009. In S1009, the control unit 600 adds 1 to the variable n_S and sets the value of the variable val to the array variable S[n_S], and the flow returns to S1001. However if, in S1008, the variable val is equal to or greater than the boundary value of the root node 21 (S1008: No), the flow advances to S1010. In S1010, the control unit 600 adds 1 to the variable n_L and sets the value of the variable val to the array variable L[n_L], and the flow advances to the action flow of the state 4.
In S1101, the control unit 600 executes finish confirmation processing. Details of the finish confirmation processing are discussed below with reference to
If, in S1103, the variable val is a value less than the value set to the temporary minimum value (S1103: No), the flow advances to S1107. In S1107, the control unit 600 updates the temporary minimum value to the value set to the variable val and the flow advances to S1108. In S1108, the control unit 600 determines whether the variable val is equal to or greater than the boundary value of the root node 21. If the variable val is equal to or greater than the boundary value of the root node 21 (S1108: Yes), the flow advances to S1109. In S1109, the control unit 600 adds 1 to the variable n_L and sets the value of the variable to the array variable L[n_L], and the flow returns to S1101. However if, in S1108, the variable val is less than the boundary value of the root node 21 (S1108: No), the flow advances to S1110. In S1110, the control unit 600 adds 1 to the variable n_S and sets the value of the variable val to the array variable S[n_S], and the flow advances to the action flow of the state 4.
In S1201, the control unit 600 executes the finish confirmation processing. Details of the finish confirmation processing are discussed below with reference to
However if, in S1203, the variable val is equal to or greater than the boundary value of the root node 21 (S1203: No), the flow advances to S1207. In S1207, the control unit 600 adds 1 to the variable n_L and sets the value of the variable to the array variable L[n_L]. In S1208, the control unit 600 determines whether the variable val is greater than the value set to the temporary maximum value. If the variable val is a value equal to or less than the value set to the temporary maximum value (S1208: No), the flow returns to S1201. However, if the variable val is a value greater than the value set to the temporary maximum value (S1208: Yes), the flow advances to S1209. In S1209, the control unit 600 updates the temporary maximum value to the value set to the variable val and the flow returns to S1201.
As discussed above, when, for example, the data read from the data group to be divided is equal to or greater than the boundary value of the root node 21, the control unit 600 stores the data in the array variable L by executing any of the action flows from
In S1301, the control unit 600 determines whether the reading of all the data from the data group to be divided is finished. If, in S1301, the reading of all the data from the data group to be divided is finished (S1301: Yes), this action flow is finished (finish 1). When the reading of all the data in the data group is finished, the temporary maximum value is the maximum value of the data in the data group because the temporary maximum value has been specified after comparing the temporary maximum value with all of the data included in the data group. Similarly, the temporary minimum value is also the minimum value of the data in the data group because the temporary minimum value has been specified after comparing the temporary minimum value with all of the data included in the data group. That is, the control unit 600 is able to specify the maximum value and the minimum value of the data included in the data group by executing any of the action flows from
However, if in S1301, the reading of all the data from the data group to be divided is not finished (S1301: No), the flow advances to S1302. In S1302, the control unit 600 determines whether the number of the data read from the data group is divisible by a predetermined number. If the number of read data is not divisible by the predetermined number (S1302: No), the flow returns to the action flow that is the invoking source. As described above, the finish confirmation processing is invoked in S1001, S1101, or S1201. As a result, if the action flow that is the invoking source is S1001 for example, the flow may advance to S1002. Similarly, if the action flow of the invoking source is S1101, the flow may advance to S1102, or if the action flow of the invoking source is S1201, the flow may advance to S1202. Conversely, if in S1302, the number of the read data is divisible by the predetermined number (S1302: Yes), the flow advances to S1303. S1303 and S1304 are executed only when the number of the read data is divisible by the predetermined number according to the processing in S1302, that is, only when processing a portion of the data among all of the data. The determination may be carried out with a random number instead of the processing in S1302.
In S1303, the control unit 600 specifies the division segment to which the maximum value belongs from among the plurality of division segments demarcated by the plurality of preset boundary values. For example, the control unit 600 may specify to which division segment the temporary maximum value belongs by using the value set to the temporary maximum value and searching the binary tree created using the plurality of preset boundary values. In S1304, the control unit 600 specifies the division segment to which the minimum value belongs from among the plurality of division segments demarcated by the plurality of preset boundary values. For example, the control unit 600 may specify to which division segment the temporary minimum value belongs by using the value set to the temporary minimum value and searching the binary tree created using the plurality of preset boundary values.
In S1305, the control unit 600 specifies the number of division segments included from the division segment to which the temporary maximum value belongs to the division segment to which the temporary minimum value belongs. The control unit 600 then determines whether the derived number of division segments is equal to or greater than a predetermined ratio with regard to the number of all the division segments demarcated by the plurality of preset boundary values from the division processing. The predetermined ratio may be 1/4 for example. In another embodiment, the determination in S1305 may be executed, for example, by determining whether the number of boundary values from the temporary maximum value to the temporary minimum value is equal to or greater than a predetermined ratio with regard to the total number of boundary values. In S1305, if the number of division segments included from the division segment to which the temporary maximum value belongs to the division segment to which the temporary minimum value belongs is less than the predetermined ratio with regard to the total number of division segments (S1305: No), the flow returns to the invoking source. However, if in S1305 the number of division segments included from the division segment to which the temporary maximum value belongs to the division segment to which the temporary minimum value belongs is equal to or greater than the predetermined ratio with regard to the total number of division segments (S1305: Yes), this action flow is finished (finish 2). In this case, the control unit 600 may execute the action flow depicted in
In S1401, the control unit 600 specifies the division segment to which the maximum value of the data included in the data group to be divided belongs from among the plurality of division segments demarcated by the plurality of preset boundary values with regard to the division processing. As described above, if Yes is determined in S1301, the temporary maximum value is specified by comparing the temporary maximum value with all of the data included in the data group and thus the value of the temporary maximum value is the maximum value of the data included in the data group. As a result, the control unit 600 in S1401 may use the value set to the temporary maximum value as the maximum value of the data included in the data group. The control unit 600 then may specify to which division segment the maximum value belongs by using the maximum value of the data included in the data group, for example, and searching the binary tree created using the plurality of preset boundary values. In S1402, the control unit 600 specifies the division segment to which the minimum value of the data included in the data group to be divided belongs from among the plurality of division segments demarcated by the plurality of preset boundary values with regard to the division processing. If Yes is determined in S1301, the temporary minimum value is specified by comparing the temporary minimum value with all of the data included in the data group and thus the value of the temporary minimum value is the minimum value of the data included in the data group. As a result, the control unit 600 in S1402 may use the value set to the temporary minimum value as the minimum value of the data included in the data group. The control unit 600 then may specify to which division segment the minimum value belongs by using the minimum value of the data included in the data group, for example, and searching the binary tree created using the plurality of preset boundary values.
Next in S1403, the control unit 600 creates a binary tree based on a boundary value between the division segment to which the maximum value of the data included in the data group belongs and the division segment to which the minimum value of the data included in the data group belongs. For example, the control unit 600 may create a binary tree based on a boundary value between the division segment to which the maximum value of the data included in the data group belongs and the division segment to which the minimum value of the data included in the data group belongs. In S1404, the control unit 600 completes the dividing by using the created binary tree and grouping the data included in the data group to be divided into the division segments, and then this action flow is finished.
In the processing in S1403 and 1404, the boundary value of the root node 21 of the binary tree created from the plurality of present boundary values may be included, for example, between the minimum value and the maximum value of the data included in the data group. In this case, the results of the groupings using the boundary value of the root node 21 as a boundary are stored in the array variable S and the array variable L, and thus the number of comparisons can be reduced by using these results.
For example, the control unit 600 uses a boundary value equal to or greater than the minimum value of the data included in the data group and smaller than the boundary value of the root node 21 to create a binary tree. The control unit 600 then may specify a division segment that is a grouping destination of the data included in the array variable S by using the data included in the array variable S to search the obtained binary tree. Moreover, the control unit 600 uses, for example, a boundary value equal to or less than the maximum value of the data included in the data group and greater than the boundary value of the root node 21 to create a binary tree. The control unit 600 then may specify a division segment that is a grouping destination of the data included in the array variable L by using the data included in the array variable L to search the obtained binary tree.
In S1501, the control unit 600 groups the remaining data that has not yet been grouped in the array variable S or the array variable L among the data included in the data group to be divided, by the boundary of the boundary value of the root node 21. For example, if the data to be grouped is less than the boundary value of the root node 21, the control unit 600 adds 1 to the value of the variable n_S and sets the data to be grouped to the array variable S[n_S]. Conversely, if the data to be grouped is equal to or greater than the boundary value of the root node 21, the control unit 600, for example, adds 1 to the value of the variable n_L and sets the data to be grouped to the array variable L[n_L]. In S1502, the control unit 600 may group the data stored in the array variable S into a division segment by searching from the child node on the left side of the root node 21 in the binary tree created by using the plurality of preset boundary values with regard to the division processing. Moreover in S1503, the control unit 600 may group the data stored in the array variable L into a division segment by searching from the child node on the right side of the root node 21 in the binary tree created by using the plurality of preset boundary values with regard to the division processing. This action flow is finished when the processing in S1503 is completed.
As described above, the control unit 600 in the above embodiment combines the processing for specifying the temporary maximum value and the temporary minimum value with the grouping of the data to be grouped based on the boundary value of the root node 21 of the binary tree. Consequently, the control unit 600, for example, is able to reduce the number of comparisons used for grouping the data included in the data group based on the boundary value of the root node 21 and for specifying the maximum value and the minimum value of the data included in the data group.
For example, when a certain data is grouped using the boundary value of the root node 21 as a boundary individually according to whether the data is the maximum value or the minimum value in the data group, it is estimated that three comparisons are carried out for each data. However, in the above embodiment, the control unit 600 compares the data to be grouped with two values among the three values determined by the binary tree created according to the magnitude relationship of the three values of the boundary value of the root node 21, the temporary maximum value, and the temporary minimum value. As a result, the control unit 600 specifies the group using the boundary value of the root node 21 as the boundary, the temporary maximum value, and the temporary minimum value. Therefore, the number of comparisons can be reduced to two according to the embodiment.
Furthermore, for example, the magnitude relationships among the temporary maximum value, the temporary minimum value, and the boundary value of the root node 21 are aligned in the order from the smallest to the largest as the temporary minimum value, the temporary maximum value, and the boundary value of the root node 21. In this case, the control unit 600 compares the data to be grouped with the temporary maximum value, and if the data to be grouped is less than the temporary maximum value, the data can be grouped in the group less than the boundary value of the root node 21 without comparing the data to be grouped with the boundary value of the root node 21. Conversely, if the data to be grouped is larger than the temporary maximum value, the control unit 600 is able to omit comparing the data to be grouped with the temporary minimum value because the data to be grouped is not the temporary minimum value.
Furthermore, for example, the magnitude relationships among the temporary maximum value, the temporary minimum value, and the boundary value of the root node 21 are aligned in the order from the smallest to the largest as the boundary value of the root node 21, the temporary minimum value, and the temporary maximum value. In this case, the control unit 600 compares the data to be grouped with the temporary minimum value, and if the data to be grouped is greater than the temporary minimum value, the data can be grouped in the group greater than the boundary value of the root node 21 without comparing the data to be grouped with the boundary value of the root node 21. Conversely, if the data to be grouped is less than the temporary minimum value, the control unit 600 is able to omit comparing the data to be grouped with the temporary maximum value because the data to be grouped is not the temporary maximum value.
Furthermore, for example, the magnitude relationships among the temporary maximum value, the temporary minimum value, and the boundary value of the root node 21 are aligned in the order from the smallest to the largest as the temporary minimum value, the boundary value of the root node 21, and the temporary maximum value. In this case, if the data to be grouped is compared with the boundary value of the root node 21 and the data to be grouped is less than the boundary value of the root node 21, the control unit 600 is able to omit comparing the data to be grouped with the temporary maximum value because the data to be grouped is not the temporary maximum value. Conversely, if the data to be grouped is larger than the boundary value of the root node 21, the control unit 600 is able to omit comparing the data to be grouped with the temporary minimum value because the data to be grouped is not the temporary minimum value.
Moreover, in the above embodiment, if the number of division segments included from the division segment to which the temporary maximum value belongs to the division segment to which the temporary minimum value belongs in S1305 is equal to or greater than the predetermined ratio with regard to the total number of division segments, the flow advances to the action flow in
Therefore according to the embodiment, the control unit 600 is able to, for example, detect a bias of the data included in the data group and reduce the number of comparisons when dividing the data in the data group using the binary tree.
The explanations of the above examples use numerical values as the examples of the data. However, the data that can be used with the embodiment is not limited as such and the embodiment may be applied to data other than numerical values. For example, the data may be character string or dates and data may be used in which the magnitude relationships are defined according to alphabetic order, the order of the Japanese alphabet, or by order of the oldest or newest dates. For example, data that satisfies the following conditions may be used as the data to be divided.
(a) A magnitude relationship is defined between an element A and an element B when the element A and the element B which are any two elements in a collection are extracted. The magnitude relationship indicates whether the element A is smaller than the element B (which is the same as the element B being greater than the element A), whether the element A is the same as the element B, or whether the element A is greater than the element B (which is the same as the element B being less than the element A).
(b) When arbitrary elements A, B, and C are extracted from a collection, if the element A is smaller than the element B and the element B is smaller than the element C, the element A is defined as being smaller than the element C.
Moreover, when the element A in the collection satisfies the following condition, the element A is understood to be the maximum value of the collection.
(c) The element A is greater than the element B or the element A is the same as the element B with regard to an arbitrary element B in the collection.
Similarly, when the element A in the collection satisfies the following condition, the element A is understood to be the minimum value of the collection.
(d) The element A is less than the element B or the element A is the same as the element B with regard to an arbitrary element B in the collection.
Further, data that can be used in the embodiment may be, for example, a numeral or a character string, or may be a value of a portion of an element in a table or a database and the like registered and associated with a plurality of elements such as the product numbers in the sales database 300 in
The data of a data group to be divided is compared with two values determined by a binary tree created according to the magnitude relationships among a temporary maximum value, a temporary minimum value, and a boundary value of the root node 21, whereby the number of comparisons is reduced in the above embodiment. However, the embodiment is not limited as such. For example, it can be assumed that the order of comparing the data with the boundary value of the root node 21, the temporary maximum value, and the temporary minimum value is determined and the comparison is carried out in the sequence of the boundary value of the root node 21, the temporary maximum value, and the temporary minimum value. In this case for example, if there are two states, it can be known when comparing the data with the boundary value of the root node 21 that the data is greater than the temporary maximum value and the temporary minimum value even without comparing the data with the temporary maximum value and the temporary minimum value if the data is greater than the boundary value of the root node 21. In this way, the processing can be executed while reducing the number of comparisons by using, for example, the information of the magnitude relationships among the temporary maximum value, the temporary minimum value, and the boundary value of the root node 21 in each state.
While an embodiment has been exemplified above, the embodiment is not limited as such. For example, the above action flows are examples and the embodiment is not limited to the above action flows. For example, when possible, the action flows may be executed while changing the order of the processing, other processing may be included, or a portion of the processing may be omitted. For example, in another embodiment, the sequence of the processing in S901 and the processing in S902 may be replaced and executed. Similarly, the respective sequences of the processing in S1303 and the processing in S1304, the processing in S1401 and the processing in S1402, or the processing in S1502 and the processing in S1503 may be replaced and executed.
Moreover, while grouping the data to the greater side of the boundary value when the data to be grouped is equal to the boundary value is presented in the above examples, the embodiment is not limited in this way. For example, when the data to be grouped is equal to the boundary value, the control unit 600 may sort the data to the side smaller than the boundary value in another embodiment.
The control unit 600 in the above action flows in
The processor 1601 may execute the above action flow processing, for example, by using the RAM 1602 and executing a program in which is written the procedures of the above action flows. In the first embodiment, the control unit 600 may be the processor 1601 for example. The storage unit 610 may be the RAM 1602 for example.
The peripheral device controller 1606 may be connected to the ROM 1603 for example. Moreover, the peripheral device controller 1606 may be connected to a storage device controller 1611 for example. The storage device controller 1611 may be connected to an external storage device such as a hard disk for example, and may read and write data to the external storage device according to an instruction from the processor 1601. Moreover, the peripheral device controller 1606 may be connected to a reading device 1612. The reading device 1612 accesses a portable recording medium 1613 according to an instruction from the processor 1601 for example. The portable storage medium 1613 may be realized for example by a semiconductor device (for example, a USB memory), a medium in which information is input and output through magnetic actions (for example, a magnetic disc), or medium in which information is input and output through optical actions (for example, a CD-ROM or DVD). USB is an abbreviation for a universal serial bus. CD is an abbreviation for a compact disk. DVD is an abbreviation for a digital versatile disk.
The peripheral device controller 1606 may be connected to a communication interface 1614 for example. The communication interface 1614 may send and receive data over a network according to an instruction from the processor 1601 for example. The peripheral device controller 1606 may be connected to an input/output interface 1615 for example, and the input/output interface 1615 may be an interface with an input device and an output device for example. The input device may be a device such as an input key or a touch panel and the like that receives inputs from a user for example. The output device may be a display device such as a display or a touch panel for example, or may be a printing device such as a printer and the like.
The programs according to the embodiment for causing the processor 1601 to execute the above mentioned action flows and the plurality of pieces of data to be divided for example, may be supplied to the information processing apparatus 60 in the following forms.
(1) Stored in an external storage device connected to the storage device controller 1611.
(2) Supplied by a server over a network.
(3) Supplied by the portable recording medium 1613.
The system 1600 in
Moreover, several embodiments including the above embodiment are to be understood by a person skilled in the art as containing various modifications and substitutions of the above embodiment. For example, the embodiments may be embodied by changes in the constituent elements. Moreover, various embodiments may be carried out by combining as appropriate the plurality of constituent elements disclosed in the abovementioned embodiments. Furthermore, various embodiments may be carried out by removing or substituting various constituent elements from among all of the constituent elements presented in the embodiment, or by adding several constituent elements to the constituent elements presented in the embodiment.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2015-115286 | Jun 2015 | JP | national |