This application is based upon and claims the benefit of priority of the prior Japanese Patent Application No. 2022-66461, filed on Apr. 13, 2022, the entire contents of which are incorporated herein by reference.
The embodiments discussed herein are related to a correlation coefficient computation program and the like.
In recent years, research has been conducted to efficiently narrow down the number of conditions to be causally searched, by extracting correlated conditions.
Thus, a technique that efficiently narrows down the number of conditions to be causally searched, by relaxing the condition search target to the correlation from the causal relationship has been disclosed.
Thereafter, for each of the found conditions, a causal search technique is used to determine whether the important factor candidate under that condition is accurately an important factor. For example, a case where there is “x1∧x3∧x4→y” (y=1 when x1=x3=x4=1 is true) is assumed. In such a case, one variable chosen from the left side is assigned as an “important factor candidate” and the rest is assigned as a “condition”. Here, it is assumed that x4 indicates the “important factor candidate” and the remaining “x1∧x3” indicates the “condition”. In this technique, if there is a high correlation between the “important factor candidate” and y on the right side in the past sample set that satisfies the “condition”, that “condition” is adopted. The conditions and important factors found in this manner are held in a database (DB). Then, when applied, for samples whose causal relationships are desired, the conditions that these samples satisfy are selected from the DB, and the corresponding important factors are presented.
Here, a technique that converts two types of signals to be correlated into a 1-bit signal according to whether or not the signals are less than an intermediate value of the dynamic range to reduce the arithmetic amount relating to correlation arithmetic operations has been disclosed.
Japanese Laid-open Patent Publication No. 2008-158855 and Yusuke Koyanagi, four others, “Developing a Framework for Individual Causal Discovery and its Application to Real Marketing Data”, The Japanese Society for Artificial Intelligence 18th Special Interest Group on Business Informatics, March 2021, <URL:http://sig-bi.jp/doc/18thSIG-BI2021/18thSIG-BI2021 paper13.pdf> are disclosed as related art.
According to an aspect of the embodiments, a non-transitory computer-readable recording medium storing a correlation coefficient computation program for causing a computer to execute a process, the process includes obtaining a first average value of remaining elements after masking, with a first condition, first column data obtained by taking out values of a first attribute from tabular data in which values of a plurality of attributes that each sample includes are accumulated for each sample, obtaining a second average value of the remaining elements after masking the first column data with a second condition that negates the first condition, loading the first column data into a first register, loading values obtained by masking the first average value with the first condition and values obtained by masking the second average value with the second condition, into a second register, obtaining a first value sequence by performing first subtraction between value sequences loaded into the first register and value sequences loaded into the second register on the first column data, obtaining a third average value of remaining elements after masking, with a first condition, second column data obtained by taking out values of a second attribute from tabular data in which values of a plurality of attributes that each sample includes are accumulated for each sample, obtaining a fourth average value of the remaining elements after masking the second column data with the second condition that negates the first condition, loading the second column data into a first register, loading values obtained by masking the third average value with the first condition and values obtained by masking the forth average value with the second condition, into a second register, obtaining a second value sequence by performing second subtraction between value sequences loaded into the first register and value sequences loaded into the second register, and obtaining correlation coefficients between the first column data and the second column data for the first condition and the second condition, based on the first value sequence and the second value sequence, by using arithmetic logical units with the first register and the second register as inputs.
The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.
While a correlation coefficient is calculated between the “important factor candidate” and y on the right side when the condition for extracting the past sample set is adopted, it is desirable to calculate the correlation coefficient efficiently.
Hereinafter, embodiments of techniques capable to efficiently calculate a correlation coefficient to be used when adopting a condition under which correlation appears will be described in detail with reference to the drawings. Note that the present embodiments are not limited to the embodiments.
First, enumerating conditions for extracting a sample set “having correlated observed value pairs”, from a numerical data group for a plurality of events observed with respect to individual instances will be considered.
The left diagram of
As illustrated in
When conditions are exhaustively searched, dxCk patterns of conditional expressions will be determined for the total number of conditions dx and the number of conditions k to be taken out. For example, it is determined individually whether or not each conditional expression is a condition for extracting the sample set “having correlated observed value pairs”.
For example, the correlation coefficient operation process masks the matrices Y1 and Y2 with the condition X and computes the correlation coefficients only with the remaining elements (the elements in the rows having one in the X matrix).
Incidentally, the scalable vector extension (SVE) of ARM Ltd. is capable of controlling whether or not the operation is made effective, according to the bit string input to the predicate register.
The correlation coefficient operation process loads the matrix Y1 into a SIMD register a, loads the average value Y1ave into a SIMD register b, and gives the X matrix to the predicate register. Then, since arithmetic logical units (ALUs) are masked by the X matrix given to the predicate register, the correlation coefficient operation process is allowed to conduct SIMD computation without generating filtered Y illustrated in the lower right of
However, since the correlation coefficient operation process is not allowed to effectively utilize the ALUs in the portion masked by the predicate register, there is a problem that the central processing unit (CPU) utilization rate decreases. Thus, in the embodiment, a correlation coefficient operation process capable of effectively utilizing the ALUs will be described.
[Configuration of Information Processing Device]
The information processing device 1 includes a control unit 10 and a storage unit 20. The control unit 10 includes an average value computation processing unit 11, a deviation computation processing unit 12, a correlation coefficient computation unit 13, and a determination unit 14. The storage unit 20 includes an observed value list 21 and a condition list 22.
The observed value list 21 is a list that stores a numerical data group of a plurality of observed values observed for an instance id. For example, the observed value list 21 is tabular data in which the values of a plurality of observed values (attributes) that each instance id has are accumulated. The instance id mentioned here refers to an identifier that uniquely identifies an individual person or the like. Each column of the observed value list 21 corresponds to one of the observed values (attributes). Here, an example of the observed value list 21 will be described with reference to
Returning to
Then, separately for various conditions, “1” is set when the conditions are satisfied, and “0” is set when the conditions are not satisfied. As an example of the values, when the instance id is “1”, the observed value a is “1.3” for the condition “a<5”, and “1” is set because the observed value a is less than “5”. As for the condition “not(a<5)”, the observed value a is “1.3”, and “0” is set because the observed value a is less than “5”. Then, the column-wise array for the condition “a<5” forms a bit string of “111 . . . ”. The column-wise array for the condition “not(a<5)” forms a bit string of “000 . . . ”.
Returning to
Here, the principle of the arithmetic method for the correlation coefficient will be described with reference to
For example, the bit string corresponding to the conditional expression X is computed by bitand (logical product) between the bit string corresponding to the condition x0 and the bit string corresponding to the condition x1. Meanwhile, the bit string corresponding to the conditional expression X′ is computed by bitand (logical product) between the bit string corresponding to the condition x0 and the bit string corresponding to the condition not(x1). The logical sum of the bit string corresponding to the conditional expression X and the bit string corresponding to the conditional expression X′ forms the bit string corresponding to the condition x0. Therefore, the conditional expressions X and X′ can be collectively determined because the conditional expressions X and X′ are conditional expressions that share the condition x0 but do not overlap.
Returning to
Here, an average value computation method performed by the average value computation processing unit 11 will be described with reference to
As illustrated in
In addition, the average value computation processing unit 11 loads the elements of the matrix Y1 obtained by taking out the values of the first observed values from the observed value list 21, into the SIMD register A. The average value computation processing unit 11 loads a second bit string obtained by binarizing the elements of the matrix Y1 with the conditional expression X′, into the predicate register. Then, the average value computation processing unit 11 masks the arithmetic logical units (ALUs) with the predicate register to calculate a second average value Y1X′ave of the elements of the SIMD register A (reference sign a2). For example, the average value computation processing unit 11 calculates the second average value Y1X′ave of the remaining elements after masking the elements of the SIMD register A with the predicate register.
In addition, the average value computation processing unit 11 loads the first bit string binarized by the conditional expression X into the predicate register. The average value computation processing unit 11 loads the first average value Y1Xave into the SIMD register A. Then, the average value computation processing unit 11 copies the first average value Y1Xave to a SIMD register B by masking with the predicate register (reference sign a3). In addition, the average value computation processing unit 11 loads the second bit string binarized by the conditional expression X′ into the predicate register. The average value computation processing unit 11 loads the second average value Y1X′ave into the SIMD register A. Then, the average value computation processing unit 11 copies the second average value Y1X′ave to the SIMD register B by masking with the predicate register (reference sign a4). Since the first bit string binarized by the conditional expression X and the second bit string binarized by the conditional expression X′ form exclusive bit strings with the common condition x0 as the axis, the first average value Y1Xave and the second average value Y1X′ave copied to the SIMD register B are not copied to the same bit. Therefore, the average value computation processing unit 11 is allowed to load the average values Y1Xave and Y1X′ave of the first observed values masked by the conditional expressions X and X′, respectively, into the SIMD register B, using the predicate register.
In addition, with respect to the matrix Y2 obtained by taking out second observed values from the observed value list 21, as in the case of the matrix Y1, the average value computation processing unit 11 only has to calculate an average value Y2Xave of the remaining elements after masking with the conditional expression X (=x0∧x1). Furthermore, with respect to the same matrix Y2, the average value computation processing unit 11 only has to calculate an average value Y2X′ave of the remaining elements after masking with the conditional expression X′ (=x0∧not(x1)), as in the case of the matrix Y1. Then, the average value computation processing unit 11 only has to load the average values Y2Xave and Y2X′ave of the second observed values masked by the conditional expressions X and X′, respectively, into the SIMD register B, using the predicate register.
Returning to
Here, a deviation computation method performed by the deviation computation processing unit 12 will be described with reference to
As illustrated in
In addition, also for the matrix Y2 obtained by taking out the second observed values from the observed value list 21, the deviation computation processing unit 12 only has to compute “Y2−Y2ave” for the conditional expressions X and X′, as in the case of the matrix Y1. In this manner, the deviation computation processing unit 12 is allowed to perform the deviation computation collectively for the conditional expressions X and X′, by loading the condition x0 common to the conditional expressions X and X′ into the predicate register and performing the computation.
The correlation coefficient computation unit 13 uses the predicate register and the SIMD registers to collectively compute the correlation coefficient for the conditional expression X between the matrix Y1 of the first observed values and the matrix Y2 of the second observed values, and the correlation coefficient for the conditional expression X′ between the matrices Y1 and Y2.
Here, a correlation coefficient computation method performed by the correlation coefficient computation unit 13 will be described with reference to
As illustrated in
In addition, for the matrix Y2 of the second observed values, the correlation coefficient computation unit 13 loads the column values of “Y2−Y2ave” for the conditional expressions X and X′ processed by the deviation computation processing unit 12, into the SIMD registers A and B (reference sign c4). Then, the correlation coefficient computation unit 13 masks the arithmetic logical units (ALUs) with the predicate register to execute an operation of multiplying the SIMD registers A and B. Then, the correlation coefficient computation unit 13 loads the multiplication result into the SIMD register C (reference sign c6). As a result, for the matrix Y2 of the second observed values, the correlation coefficient computation unit 13 computes expression (3) collectively for the conditional expressions X and X′ to find Sy2.
In addition, for the matrix Y1 of the first observed values, the correlation coefficient computation unit 13 loads the column values of “Y1−Y1ave” for the conditional expressions X and X′ into the SIMD register A (reference sign c7). For the matrix Y2 of the second observed values, the correlation coefficient computation unit 13 loads the column values of “Y2−Y2ave” for the conditional expressions X and X′ into the SIMD register B (reference sign c8). Then, the correlation coefficient computation unit 13 masks the arithmetic logical units (ALUs) with the predicate register to execute an operation of multiplying the SIMD registers A and B. Then, the correlation coefficient computation unit 13 loads the multiplication result into the SIMD register C (reference sign c10). As a result, for the matrix Y1 of the first observed values and the matrix Y2 of the second observed values, the correlation coefficient computation unit 13 computes expression (1) collectively for the conditional expressions X and X′ to find Sy12.
As illustrated in
For example, the correlation coefficient computation unit 13 loads the first bit string obtained by binarizing the elements of the matrix Y1 with the conditional expression X, into the predicate register (reference sign d1). The correlation coefficient computation unit 13 loads Sy1 for the conditional expressions X and X′ computed by expression (2), into the SIMD register A (reference sign d2). Then, the correlation coefficient computation unit 13 masks the SIMD register A with the predicate register and computes the sum (sum) of the unmasked elements to find “sumSy1X” (reference sign d3).
Then, the correlation coefficient computation unit 13 loads the second bit string obtained by binarizing the elements of the matrix Y1 with the conditional expression X′, into the predicate register (reference sign d4). Then, the correlation coefficient computation unit 13 masks the SIMD register A with the predicate register and computes the sum (sum) of the unmasked elements to find “sumSy1X′” (reference sign d5).
In addition, similarly, the correlation coefficient computation unit 13 finds “sumSy2X” and “sumSy2X′” from Sy2 for the conditional expressions X and X′ computed by expression (2). Similarly, the correlation coefficient computation unit 13 finds “sumSy12X” and “sumSy12X′” from Sy12 for the conditional expressions X and X′ computed by expression (1).
Then, the correlation coefficient computation unit 13 computes a correlation coefficient Ry12X for the conditional expression X between the matrix Y1 of the first observed values and the matrix Y2 of the second observed values, based on expression (4) in the upper part of the right diagram of
In a similar manner, the correlation coefficient computation unit 13 computes the correlation coefficient for the conditional expressions X and X′ between another observed value pair. This allows the correlation coefficient computation unit 13 to collectively compute the negative condition X′ for the conditional expression X, which in turn enables to halve the number of cycles involved in computing Sy1 indicated by expression (2), Sy2 indicated by expression (3), and Sy12 indicated by expression (1) among the correlation coefficient operations.
Returning to
[Flow of Correlation Coefficient Computation Process]
The conditional expression is then generated from the condition list 22. In
As illustrated in
Similarly, the average value computation processing unit 11 loads the bit string binarized by the conditional expression X′ into the predicate register. Then, the average value computation processing unit 11 masks the arithmetic logical units (ALUs) with the predicate register to compute an average value YaaveX′ of the observed values a. The average value computation processing unit 11 masks the arithmetic logical units (ALUs) with the predicate register to compute an average value YbaveX′ of the observed values b. Here, (YaaveX′, YbaveX′) is computed as (1.85 (=(1.3+2.4)×0.5), 4.0 (=(4.8+3.2)×0.5)).
As illustrated in
In addition, the average value computation processing unit 11 loads the bit string “1010” binarized by the conditional expression X′ into the predicate register. Then, the average value computation processing unit 11 copies the average value YaaveX′ to the SIMD register z0 by masking with the predicate register. Here, the average value YaaveX′ “1.85” is copied to the first and third bits of the SIMD register z0.
As illustrated in
Also for the observed value b, as in the case of the observed value a, the deviation computation processing unit 12 masks the arithmetic logical units (ALUs) with the predicate register to execute an operation of subtracting the SIMD register z0 from the matrix Yb obtained by taking out the values of the observed values b. Then, the deviation computation processing unit 12 loads the subtraction result into a SIMD register Sbtmp. For example, the deviation computation processing unit 12 is allowed to compute “Yb−Ybave” collectively for the conditional expressions X and X′.
As illustrated in
Also for the observed value b, as in the case of the observed value a, the correlation coefficient computation unit 13 uses the predicate register to compute the square of the value sequence of the SIMD register Sbtmp and loads the computed square into the SIMD register Sb. In this case, the bit string “1111” binarized by the condition x0 common to the conditional expressions X and X′ is loaded into the predicate register. As a result, for the matrix Yb of the observed values b, the correlation coefficient computation unit 13 computes expression (2) collectively for the conditional expressions X and X′ to find Sb.
In addition, the correlation coefficient computation unit 13 uses the predicate register to multiply the value sequence of the SIMD register Satmp and the value sequence of the SIMD register Sbtmp and loads the multiplied value sequence into a SIMD register Sab. In this case, the bit string “1111” binarized by the condition x0 common to the conditional expressions X and X′ is loaded into the predicate register. As a result, for the matrix Ya of the observed values a and the matrix Yb of the observed values b, the correlation coefficient computation unit 13 computes expression (1) collectively for the conditional expressions X and X′ to find Sab.
Then, the correlation coefficient computation unit 13 loads the bit string “0101” binarized by the conditional expression X into the predicate register. The correlation coefficient computation unit 13 then masks Sa for the conditional expressions X and X′ with the predicate register and computes the sum (sum) of the unmasked elements to find “sumSaX”. The correlation coefficient computation unit 13 masks Sb for the conditional expressions X and X′ with the predicate register and computes the sum (sum) of the unmasked elements to find “sumSbX”. The correlation coefficient computation unit 13 masks Sab for the conditional expressions X and X′ with the predicate register and computes the sum (sum) of the unmasked elements to find “sumSabX”.
Then, the correlation coefficient computation unit 13 loads the bit string “1010” binarized by the conditional expression X′ into the predicate register. The correlation coefficient computation unit 13 then masks Sa for the conditional expressions X and X′ with the predicate register and computes the sum (sum) of the unmasked elements to find “sumSaX”. The correlation coefficient computation unit 13 masks Sb for the conditional expressions X and X′ with the predicate register and computes the sum (sum) of the unmasked elements to find “sumSbX”. The correlation coefficient computation unit 13 masks Sab for the conditional expressions X and X′ with the predicate register and computes the sum (sum) of the unmasked elements to find “sumSabX′”.
Then, the correlation coefficient computation unit 13 computes a correlation coefficient RyabX between the observed values a and b for the conditional expression X, using (SaX, SbX, SabX). The correlation coefficient computation unit 13 computes a correlation coefficient RyabX′ between the observed values a and b for the conditional expression X′, using (SaX′, SbX′, SabX′).
[Flowchart of Correlation Coefficient Computation Process]
The information processing device 1 uses the conditions x0 and x1 to execute the average value computation process for the observed value matrices Y1 and Y2 (operation S11). Note that the flowchart of the average value computation process will be described later.
The information processing device 1 uses the conditions x0 and x1 to execute the deviation computation process for the observed value matrices Y1 and Y2 (operation S12). Note that the flowchart of the deviation computation process will be described later.
The information processing device 1 uses the conditions x0 and x1 to execute a correlation coefficient computation and determination process for the observed value matrices Y1 and Y2 (operation S13). Note that the flowchart of the correlation coefficient computation and determination process will be described later. Then, the information processing device 1 ends a process for the correlation coefficient computation process.
[Flowchart of Average Value Computation Process]
First, the average value computation processing unit 11 loops through operations S21 to S28 until all elements of the observed value matrices Y1 and Y2 are processed in order. The average value computation processing unit 11 loads the observed value matrix Y1 into the SIMD register z0 at the maximum in number (operation S21). The average value computation processing unit 11 loads the observed value matrix Y2 into a SIMD register z1 at the maximum in number (operation S22).
Then, the average value computation processing unit 11 loads the bit string corresponding to the conditional expression X “x0∧x1” into the p register at the maximum in number (operation S23). The average value computation processing unit 11 then adds the elements at the processing target positions of the SIMD register z0 to Y1Xave after masking with the p register (operation S24). The average value computation processing unit 11 adds the elements at the same processing target positions of the SIMD register z1 to Y2Xave after masking with the p register (operation S25).
Then, the average value computation processing unit 11 loads the bit string corresponding to the conditional expression X′ “x0∧not(x1)” into the p register at the maximum in number (operation S26). The average value computation processing unit 11 then adds the elements at the processing target positions of the SIMD register z0 to Y1X′ave after masking with the p register (operation S27). The average value computation processing unit 11 adds the elements at the same processing target positions of the SIMD register z1 to Y2X′ave after masking with the p register (operation S28).
Subsequently, after processing all the elements of the observed value matrices Y1 and Y2, the average value computation processing unit 11 divides Y1Xave and Y1X′ave by the number of elements of the observed value matrix Y1 (operation S29). For example, the average value computation processing unit 11 calculates the average value Y1Xave of the remaining elements after masking the elements of the observed value matrix Y1 with the bit string corresponding to the conditional expression X. The average value computation processing unit 11 calculates the average value Y1X′ave of the remaining elements after masking the elements of the observed value matrix Y1 with the bit string corresponding to the conditional expression X′.
Additionally, the average value computation processing unit 11 divides Y2Xave and Y2X′ave by the number of elements of the observed value matrix Y2 (operation S30). The average value computation processing unit 11 calculates the average value Y2Xave of the remaining elements after masking the elements of the observed value matrix Y2 with the bit string corresponding to the conditional expression X. The average value computation processing unit 11 calculates the average value Y2X′ave of the remaining elements after masking the elements of the observed value matrix Y2 with the bit string corresponding to the conditional expression X′.
Then, the average value computation processing unit 11 ends the average value computation process.
[Flowchart of Deviation Computation Process]
First, the deviation computation processing unit 12 loops through operations S41 to S47 until all elements of the observed value matrix Y1 are processed in order. The deviation computation processing unit 12 loads the bit string corresponding to the conditional expression X “x0∧x1” into the p register at the maximum in number (operation S41). The deviation computation processing unit 12 copies the average value Y1Xave to the SIMD register z0 at the maximum in number after masking with the p register (operation S42).
Then, the deviation computation processing unit 12 loads the bit string corresponding to the conditional expression X′ “x0∧not(x1)” into the p register at the maximum in number (operation S43). The deviation computation processing unit 12 copies the average value Y1X′ave to the SIMD register z0 at the maximum in number after masking with the p register (operation S44).
Then, the deviation computation processing unit 12 loads the bit string corresponding to the condition x0 common to the conditional expressions X and X′ into the p register at the maximum in number (operation S45). The deviation computation processing unit 12 loads the observed value matrix Y1 into the SIMD register z1 at the maximum in number (operation S46). Then, after masking with the p register, the deviation computation processing unit 12 subtracts the SIMD register z0 from the SIMD register z1 and stores the subtracted SIMD register z1 in a primary array S1tmp (operation S47). For example, the deviation computation processing unit 12 computes the “observed value matrix Y1−Y1ave” for the conditional expressions X and X′ and stores the computed “observed value matrix Y1−Y1ave” in the primary array S1tmp.
Subsequently, after processing all the elements of the observed value matrix Y1, the deviation computation processing unit 12 loops through operations S48 to S54 until all elements of the observed value matrix Y2 are processed in order. The deviation computation processing unit 12 loads the bit string corresponding to the conditional expression X “x0∧x1” into the p register at the maximum in number (operation S48). The deviation computation processing unit 12 copies the average value Y2Xave to the SIMD register z0 at the maximum in number after masking with the p register (operation S49).
Then, the deviation computation processing unit 12 loads the bit string corresponding to the conditional expression X′ “x0∧not(x1)” into the p register at the maximum in number (operation S50). The deviation computation processing unit 12 copies the average value Y2X′ave to the SIMD register z0 at the maximum in number after masking with the p register (operation S51).
Then, the deviation computation processing unit 12 loads the bit string corresponding to the condition x0 common to the conditional expressions X and X′ into the p register at the maximum in number (operation S52). The deviation computation processing unit 12 loads the observed value matrix Y2 into the SIMD register z1 at the maximum in number (operation S53). Then, after masking with the p register, the deviation computation processing unit 12 subtracts the SIMD register z0 from the SIMD register z1 and stores the subtracted SIMD register z1 in a primary array S2tmp (operation S54). For example, the deviation computation processing unit 12 computes the “observed value matrix Y2−Y2ave” for the conditional expressions X and X′ and stores the computed “observed value matrix Y2−Y2ave” in the primary array S2tmp.
[Flowchart of Correlation Coefficient Computation and Determination Process]
As illustrated in
The correlation coefficient computation unit 13 loads the bit string corresponding to the conditional expression X “x0∧x1” into the p register at the maximum in number (operation S63). After masking with the p register, the correlation coefficient computation unit 13 squares the SIMD register z0 and adds the squared SIMD register z0 to Six (operation S64). For example, the correlation coefficient computation unit 13 computes “sum(observed value matrix Y1−Y1ave)2” corresponding to the conditional expression X.
After masking with the p register, the correlation coefficient computation unit 13 squares the SIMD register z1 and adds the squared SIMD register z1 to S2X (operation S65). For example, the correlation coefficient computation unit 13 computes “sum(observed value matrix Y2−Y2ave)2” corresponding to the conditional expression X. After masking with the p register, the correlation coefficient computation unit 13 multiplies the SIMD registers z0 and z1 and adds the multiplied SIMD registers z0 and z1 to S12X (operation S66). For example, the correlation coefficient computation unit 13 computes “sum{(observed value matrix Y1−Y1ave)×(observed value matrix Y2−Y2ave)}” corresponding to the conditional expression X.
The correlation coefficient computation unit 13 loads the bit string corresponding to the conditional expression X′ “x0∧not(x1)” into the p register at the maximum in number (operation S67). After masking with the p register, the correlation coefficient computation unit 13 squares the SIMD register z0 and adds the squared SIMD register z0 to S1X′ (operation S68). For example, the correlation coefficient computation unit 13 computes “sum(observed value matrix Y1−Y1ave)2” corresponding to the conditional expression X′.
After masking with the p register, the correlation coefficient computation unit 13 squares the SIMD register z1 and adds the squared SIMD register z1 to S2X′ (operation S69). For example, the correlation coefficient computation unit 13 computes “sum(observed value matrix Y2−Y2ave)2” corresponding to the conditional expression X′. After masking with the p register, the correlation coefficient computation unit 13 multiplies the SIMD registers z0 and z1 and adds the multiplied SIMD registers z0 and z1 to S12X′ (operation S70). For example, the correlation coefficient computation unit 13 computes “sum{(observed value matrix Y1−Y1ave)×(observed value matrix Y2−Y2ave)}” corresponding to the conditional expression X′.
Subsequently, after processing all the elements of the primary arrays S1tmp and S2tmp, the correlation coefficient computation unit 13 computes a correlation coefficient R12X from S1X, S2X, and S12X corresponding to the conditional expression X, and the determination unit 14 performs the determination process as to the threshold value (operation S71). For example, the determination unit 14 computes the correlation coefficient R12X based on expression (1). Then, the determination unit 14 determines whether or not the correlation coefficient R12X exceeds the threshold value and holds the determination result. In addition, the correlation coefficient computation unit 13 computes a correlation coefficient R12X′ from S1X′, S2X′, and S12X′ corresponding to the conditional expression X′, and the determination unit 14 performs the determination process as to the threshold value (operation S72). For example, the determination unit 14 computes the correlation coefficient R12X′ based on expression (1). Then, the determination unit 14 determines whether or not the correlation coefficient R12X′ exceeds the threshold value and holds the determination result.
Thereafter, the correlation coefficient computation unit 13 changes the pair of the observed value matrices Y1 and Y2 and computes the correlation coefficients corresponding to the conditional expressions X and X′. Then, the determination unit 14 determines whether or not there are n, which is designated in advance, or more observed value pairs whose correlation coefficients exceed the threshold value and, when the determination condition is met, the conditional expression used for extraction is saved.
In this manner, the correlation coefficient computation unit 13 is allowed to perform computation of S12X, S1X, and S2X corresponding to the condition X collectively with computation of S12X′, S1X′, and S2X′ corresponding to the condition X′ and accordingly, may compute the correlation coefficients for the conditions X and X′ efficiently. As a result, the correlation coefficient computation unit 13 may perform processes up to the determination of the conditional expression more quickly than in the prior art.
According to the above embodiment, the information processing device 1 calculates a first average value of remaining elements after masking, with a first condition, first column data obtained by taking out values of a first attribute from tabular data in which the values of a plurality of attributes that each sample has are accumulated for each sample. The information processing device 1 calculates a second average value of the remaining elements after masking the first column data with a second condition that negates the first condition. The information processing device 1 loads the first column data into a first register. The information processing device 1 loads the values obtained by masking the first average value with the first condition and the values obtained by masking the second average value with the second condition, into a second register. The information processing device 1 uses arithmetic logical units with the first register and the second register as inputs, to perform first subtraction between a value sequence loaded into the first register and the value sequence loaded into the second register. The information processing device 1 performs second subtraction on second column data obtained by taking out the values of a second attribute different from the first attribute, by a method same as the method in the process of performing the first subtraction. Then, the information processing device 1 calculates correlation coefficients between the first column data and the second column data for the first condition and the second condition, using a first value sequence obtained by the first subtraction and a second value sequence obtained by the second subtraction. According to such a configuration, when locating a condition for extracting a sample group having a pair of correlated attributes, the information processing device 1 may efficiently compute the correlation coefficients between the pair of attributes for the first condition and the second condition that negates the first condition, by utilizing the arithmetic logical units. For example, the information processing device 1 is allowed to utilize the arithmetic logical units that have been inactive in individual condition operations, for the computation of the subtraction between the correlation coefficients of the pair of attributes for the first condition and the second condition, and may enable efficient computation by utilizing the arithmetic logical units. As a result, the information processing device 1 may improve the CPU utilization rate.
In addition, according to the above embodiment, the information processing device 1 loads the first value sequence into the first register and the second register, and uses the arithmetic logical units with the first register and the second register as inputs to calculate first computation of squaring the first value sequence. The information processing device 1 loads the second value sequence into the first register and the second register and uses the arithmetic logical units with the first register and the second register as inputs to calculate second computation of squaring the second value sequence. The information processing device 1 loads the first value sequence into the first register, loads the second value sequence into the second register, and uses the arithmetic logical units with the first register and the second register as inputs to calculate third computation of multiplying the first value sequence and the second value sequence. Then, the information processing device 1 calculates the correlation coefficients between the first column data and the second column data for the first condition and the second condition, using the first computation result, the second computation result, and the third computation result. According to such a configuration, the information processing device 1 is allowed to utilize the arithmetic logical units that have been inactive in individual condition operations, for the computation of multiplying the subtraction results for the correlation coefficients of the pair of attributes for the first condition and the second condition, and may enable efficient computation by utilizing the arithmetic logical units.
In addition, according to the above embodiment, the information processing device 1 uses the predicate register to load the values obtained by copying the first average value while masking with the first condition, and the values obtained by copying the second average value while masking with the second condition, into the second register. According to such a configuration, the information processing device 1 may load the respective average values into the same register, by separately masking with the first condition and the second condition, by using the predicate register.
In addition, according to the above embodiment, the information processing device 1 further uses the predicate register to calculate value sequences by masking each of the first computation result, the second computation result, and the third computation result with the first condition, and uses each of the calculated value sequences to calculate the correlation coefficients between the first column data and the second column data for the first condition. Then, the information processing device 1 uses the predicate register to calculate value sequences by masking each of the first computation result, the second computation result, and the third computation result with the second condition, and uses each of the calculated value sequences to calculate the correlation coefficients between the first column data and the second column data for the second condition. According to such a configuration, the information processing device 1 may break up the computation results by separately masking with the first condition and the second condition, by using the predicate register. As a result, the information processing device 1 may efficiently compute the correlation coefficients of the pair of attributes for the first condition and the second condition.
Note that each illustrated component of the information processing device 1 does not necessarily have to be physically configured as illustrated in the drawings. For example, specific forms of distribution and integration of the information processing device 1 are not limited to the illustrated ones, and the whole or a part of the information processing device 1 may be configured by being functionally or physically distributed and integrated in any units according to various loads, use states, or the like. In addition, the storage unit 20 may be connected through a network as an external device of the information processing device 1.
Furthermore, various types of processing described in the above embodiment may be implemented by a computer such as a personal computer or a workstation executing programs prepared in advance. Thus, in the following, an example of the computer that executes an information processing program that implements functions similar to the functions of the information processing device 1 illustrated in
As illustrated in
The drive device 213 is a device for a removable disk 210, for example. The HDD 205 stores an information processing program 205a and information processing-related information 205b.
The CPU 203 reads the information processing program 205a to load the read information processing program 205a into the memory 201 and executes the loaded information processing program 205a as a process. Such a process corresponds to the respective functional units of the information processing device 1. The information processing-related information 205b corresponds to the observed value list 21 and the condition list 22. Then, for example, the removable disk 210 stores each piece of information such as the information processing program 205a.
Note that the information processing program 205a may not necessarily be stored in the HDD 205 from the beginning. For example, the program is stored in a “portable physical medium” to be inserted into the computer 200, such as a flexible disk (FD), a compact disk read only memory (CD-ROM), a digital versatile disk (DVD), a magneto-optical disk, or an integrated circuit (IC) card. Then, the computer 200 may read the information processing program 205a from these media to execute the read information processing program 205a.
All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2022-066461 | Apr 2022 | JP | national |