ASSOCIATION RULE GENERATION PROGRAM, DEVICE, AND METHOD

Information

  • Patent Application
  • 20220398464
  • Publication Number
    20220398464
  • Date Filed
    May 03, 2022
    2 years ago
  • Date Published
    December 15, 2022
    2 years ago
Abstract
An association rule generation device includes a processor that executes a procedure. The procedure includes acquiring plural combinatorial data including one or more data value, for each of the combinatorial data augmenting the combinatorial data with a high level concept data value for each of the one or more data values contained in the combinatorial data, and generating an association rule indicating an association between data values by employing the plural combinatorial data augmented with the high level concept data values.
Description
CROSS-REFERENCE TO RELATED APPLICATION

This application is based on and claims the benefit of priority of the prior Japanese Patent Application No. 2021-097159, filed on Jun. 10, 2021, the entire contents of which are incorporated herein by reference.


FIELD

The embodiments discussed herein are related to an association rule generation program, an association rule generation device, and an association rule generation method.


BACKGROUND

The generation of association rules has hitherto been performed from acquired data. An association rule expresses a relationship of a specific event Y occurring given a specific event X, and is often denoted by use of an arrow as [X→Y]. The X part on the left of the arrow is called an antecedent (also called a precondition), and the Y part is called a consequent (also called a conclusion). For example, consider a convenience store in which person A has bought (bread, milk, jam) and person B has bought (rice balls, green tea, pickles). In cases in which there are many occasions of people purchasing products in the same combinations as these two people, then this enables the association rules of “someone who buys bread also buys milk and jam”, “someone who buys bread and milk also buys jam”, and “someone who buys rice balls and green tea also buys pickles” to be extracted therefrom.


There is technology that, with the objective of raising the interpretability of generated association rules and the like, generates association rules using items that are high level conceptualizations of instances contained in acquired data. The instances here are data values originally contained in the acquired data, such as the products in the above example of a convenience store. There is, for example, a proposal for a system that references an ontology indicating relationships between instances and high level concept items of these instances in order to apply filtering to the instances employed in association rules. In such a system, the instances contained in the acquired data are transformed into high level concept items, and then all the items belonging to a specified hierarchical layer in the ontology are employed to extract an itemset for use in generating an association rule using an a priori algorithm.


RELATED NON-PATENT DOCUMENTS



  • Andrea Bellandi, Barbara Furletti, Valerio Grossi, and Andrea Romei, “Ontology-Driven Association Rule Extraction: A Case Study”, Conference: Proceedings of the International Workshop on Contexts and Ontologies: Representation and Reasoning (C&O:RR) Collocated with the 6th International and Interdisciplinary Conference on Modelling and Using Context (CONTEXT-2007), Roskilde, Denmark, Aug. 21, 2007.



SUMMARY

According to an aspect of the embodiments, an association rule generation program causes a computer to execute processing including: acquiring plural combinatorial data including one or more data value; for each of the combinatorial data, augmenting the combinatorial data with a high level concept data value for each of the one or more data values contained in the combinatorial data; and generating an association rule indicating an association between data values by employing the plural combinatorial data augmented with the high level concept data values.


The object and advantages of the invention will be realized and attained by means of the elements and combinations particularly pointed out in the claims.


It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory and are not restrictive of the invention.





BRIEF DESCRIPTION OF DRAWINGS


FIG. 1 is a functional block diagram of an association rule generation device.



FIG. 2 is a diagram illustrating an example of a transaction data set.



FIG. 3 is a diagram illustrating an example of an ontology.



FIG. 4 is a diagram illustrating an example of an augmented transaction data set.



FIG. 5 is a diagram illustrating an example of an exclusion list.



FIG. 6 is a block diagram illustrating a schematic configuration of a computer functioning as an association rule generation device.



FIG. 7 is a flowchart illustrating an example of association rule generation processing.



FIG. 8 is flowchart illustrating an example of k-item extraction processing.



FIG. 9 is a diagram to explain an advantageous effect of an association rule generation device according to the present exemplary embodiment.



FIG. 10 is a diagram illustrating an example of ontology relating to medical data.





DESCRIPTION OF EMBODIMENTS

Explanation follows regarding an example of an exemplary embodiment according to technology disclosed herein, with reference to the drawings.


As illustrated in FIG. 1, an association rule generation device 10 is input with a transaction data set and an ontology. The association rule generation device 10 uses the input transaction data set and ontology to generate and output an association rule.


The transaction data set is a set of transaction data including one or more item. FIG. 2 illustrates an example of a transaction data set. The example of FIG. 2 is a representation of plural transaction data contained in the transaction data set, expressed in a table format in which each row (each record) corresponds to a single transaction data. Each of the transaction data is data in which an ID number of transaction data “TID” is associated with an “item” contained in the transaction data.


An item is an entity contained in the transaction data. For example, in cases in which the transaction data is a receipt relating to a single transaction bill, such as in a convenience store or the like, then each of the purchased products listed on the receipt is an item. Moreover, for example, in cases in which the transaction data is a result of a genetic test on a single patient, then each of the types of mutation contained in the test result is an item. Note that transaction data is an example of combinatorial data of technology disclosed herein, and an item is an example of a data value of technology disclosed herein.


An ontology is data expressing high level concept-low level concept relationships in relation to items. For example, the ontology may be a knowledge graph in which items are represented by nodes, and edges connecting between the nodes correspond to inter-item hierarchical relationships such as an IS-A relationship, a part-of relationship, etc. FIG. 3 illustrates an example of an ontology. In the example of FIG. 3, each of the nodes is represented by a circle, and the symbol inside the circle represents an item corresponding to that node. In the following a node corresponding to item “X” will be denoted by “node X”. Moreover, each of the lowest layer nodes, namely leaf nodes, corresponds to each of the items contained in the transaction data obtained. Each node in a high level hierarchical layer corresponds to an item that is a high level concept of an item corresponding to a low level node to which it is connected.


For example, in a case directed toward transaction data for purchased products as described above, for example, a node a is an item “apple”, a node b is an item “banana”, a node G is an item “fruit”, and a node J is an item “food”, etc. Moreover, for example, in a case directed toward genome data (medical data), nodes a to f are genetic mutations, nodes G, H, and I are each a gene name, and nodes J and K are each a gene family, etc. Note that the example in FIG. 3 illustrates an example of a three hierarchical layer ontology, however, the number of hierarchical layers in the hierarchical ontology may be two hierarchical layers or four or more hierarchical layers.


The association rule generation device 10 functionally includes an acquisition section 12, an augmentation section 14, and a generation section 16, as illustrated in FIG. 1.


The acquisition section 12 acquires the transaction data set and the ontology that were input to the association rule generation device 10, and passes these across to the augmentation section 14.


For each of the transaction data contained in the transaction data set passed from the acquisition section 12, the augmentation section 14 augments the transaction data with items that are high level concepts for each of the items contained in the transaction data. Specifically, the augmentation section 14 references the ontology passed from the acquisition section 12, identifies items that are high level concepts for each of the items, and adds the identified items to the corresponding transaction data. In cases in which the ontology is configured with three hierarchical layers or more, the augmentation section 14 sequentially tracks from the lowest layer nodes, which correspond to the items contained in the original transaction data, to the high level layer nodes connected by edges thereto, and identifies high level concept items from the nodes in each of the layers up to the highest layer.


For example, in cases in which TID=2 as illustrated in FIG. 2, b and c are contained as items in the transaction data. With reference to the ontology illustrated in FIG. 3, because node G and node J are nodes in high level layers connected to node b, G and J are high level concept items of item b. Similarly, H and J are high level concept items of item c. The augmentation section 14 accordingly adds items G, H, and J to the transaction data of TID=2. The top of FIG. 4 illustrates an example in which high level concept items have been added to the transaction data illustrated in FIG. 2 with reference to the ontology illustrated in FIG. 3. In FIG. 4 the portion illustrated with a bold frame illustrates the augmented high level concept items.


In order to facilitated subsequent processing, the augmentation section 14 may convert the transaction data set augmented with the high level concept items into a table expressed by one-hot for each of the items, as illustrated at the bottom of FIG. 4. The augmentation section 14 passes the transaction data set augmented with the high level concept items (hereafter referred to as the “augmented transaction data set”) and the ontology to the generation section 16.


The generation section 16 uses the augmented transaction data set passed from the augmentation section 14 to generate association rules indicating associations between items.


Specifically, the generation section 16 extracts an itemset satisfying a prescribed condition from out of itemsets combining two or more items from the items contained in the augmented transaction data set. The generation section 16 may extract the itemset satisfying the prescribed condition using an apriori algorithm. A detailed explanation is given later regarding the extraction of an itemset using an apriori algorithm. The prescribed condition may be that an index related to appearance frequency of an itemset in the augmented transaction data set is a threshold value or greater. For example, the generation section 16 may apply a support number indicating the number of transaction data in which a given itemset appears in the augmented transaction data set as the index, and extract itemsets for which the support number is a prescribed threshold value or greater.


The generation section 16 generates an association rule to express an antecedent represented by a combination of some items contained in the extracted itemset, and a consequent represented by a combination of remaining items. For example, in cases in which {b, H} is extracted as an itemset, the generation section 16 generates from this itemset the association rules of [b→H] and [H→b]. Moreover, for example, in cases in which {b, G, H} is extracted as an itemset, the generation section 16 generates from this itemset the association rules of [b→(G, H)], [G→(b, H)], [H→(b, G)], [(G, H)→b], [(b, H)→G], and [(b, G)→H].


High level concept items of the original item are added to the augmented transaction data set. This enables association rules that employ items expressed as high level concepts to be generated by generating association rules from itemsets combining items contained in the augmented transaction data set. This raises the interpretability and generality of the association rules generated thereby. Namely, association rules are generated having good predictability and a higher general validity.


However, in cases in which association rules are simply generated from itemsets combining items contained in the augmented transaction data set, association rules expressing known hierarchical relationships between items are also generated, such as [b→G] and [b→J]. Moreover, association rules for inclusion relationships such as [(J, H)→f] [H→f] are both generated, although generating only one thereof would be sufficient. Preferably relationships between items that up to now would not have been noticed are discovered based on the association rules, and, in cases in which they are to be used in subsequent investigations, preferably redundant association rules such as the association rules described above are not generated as association rules.


In order to address this the generation section 16 adds the above prescribed condition, i.e. that the extracted itemset does not include combinations of items having a high level concept-low level concept relationship. Specifically, based on the ontology, the generation section 16 produces an exclusion list of combinations of items that have a high level concept-low level concept relationship. The generation section 16 then excludes from the extracted itemsets any itemsets for a combination present in the exclusion list from out of the itemsets combining items included in the augmented transaction data set.


For example, for each item the generation section 16 sequentially tracks from nodes corresponding to each item in the ontology through high level layer nodes connected by edges thereto, and produces an exclusion list listing respective pairs of high level concept items identified from the nodes in each of the layers up to the highest layer. FIG. 5 illustrates an example of an exclusion list. In the example illustrated in FIG. 5, for example “list(a)=(G,J)” means that an itemset containing a combination of items a and G is excluded, and an itemset containing a combination of items a and J is excluded.


Thus for the itemsets {b, G} and {b, J}, association rules expressing known hierarchical relationships between items, such as [b→G] and [b→J] mentioned above, are not generated due to being excluded based on “list(b)=(G,J)” in the exclusion list. Moreover, for the itemset {f, J, H}, an association rule such as [(J, H)→f] is not generated due to being excluded based on “list(H)=(J)” in the exclusion list. However, for the itemset {f, H}, an association rule such as [H→f] is generated due to not performing exclusion based on the exclusion list. Thus the association rule [H→f] is generated alone from out of the inclusion relationships [(J, H)→f] and [H→f].


Moreover, the generation section 16 may output, as the final generated association rules, any association rule, from among the generated association rules, for which the prescribed index is a threshold value or greater. The index may, for example, be a support level of supp (X→Y), or a confidence level of conf (X→Y), or a lift of lift (X→Y), or a combination thereof. Note that X is an antecedent of the association rule in the itemset, and Y is a consequent of the association rule in the itemset.








supp



(

X

Y

)


=


σ

(

X

Y

)

/
M









conf

(

x

Y

)

=



σ

(

X

Y

)

/

σ

(
X
)








=



supp

(

X

Y

)

/

supp

(
X
)










lift
(

X

Y

)

=


conf

(

X

Y

)

/

supp

(
Y
)







In the above equations, M is the number of transaction data contained in the augmented transaction data set, σ(X∪Y) is the number of transaction data included in itemsets X and Y, and σ(X) is the number of transaction data included in itemset X.


For example, for the prescribed condition employed when extracting itemsets as described above, itemsets having a support number of a threshold value or greater are extracted, and from among association rules generated from the extracted itemsets, association rules having a confidence level of a threshold value or greater are output. This results in an association rule satisfying a minimum support level (a threshold value support level) and a minimum confidence level (a threshold value confidence level) being output in this case.


The association rule generation device 10 may, for example, be implemented by a computer 40 as illustrated in FIG. 6. The computer 40 includes a central processing unit (CPU) 41, memory 42 serving as a temporary storage area, and a non-volatile storage section 43. The computer 40 also includes an input/output device 44 such as an input section, display section, or the like, and a read/write (R/W) section 45 for controlling reading and writing of data from/to a storage medium 49. The computer 40 also includes a communication interface (I/F) 46 connected to a network such as the internet. The CPU 41, the memory 42, the storage section 43, the input/output device 44, the R/W section 45, and the communication I/F 46 are connected to each other through a bus 47.


The storage section 43 may be implemented by a hard disk drive (HDD), a solid state drive (SSD), flash memory, or the like. An association rule generation program 50 to cause the computer 40 to function as the association rule generation device 10 is stored in the storage section 43 serving as a storage medium. The association rule generation program 50 includes an acquisition process 52, an augmentation process 54, and a generation process 56.


The CPU 41 reads the association rule generation program 50 from the storage section 43, expands the association rule generation program 50 into the memory 42, and sequentially executes the processes of the association rule generation program 50. The CPU 41 operates as the acquisition section 12 illustrated in FIG. 1 by executing the acquisition process 52. The CPU 41 operates as the augmentation section 14 illustrated in FIG. 1 by executing the augmentation process 54. The CPU 41 operates as the generation section 16 illustrated in FIG. 1 by executing the generation process 56. The computer 40 executing the association rule generation program 50 accordingly functions as the association rule generation device 10. Note that the CPU 41 executing the program is hardware.


Note that the functions implemented by the association rule generation program 50 may also be implemented, for example, by a semiconductor integrated circuit, or more specifically by an application specific integrated circuit (ASIC).


Next, description follows regarding operation of the association rule generation device 10 according to the present exemplary embodiment. A transaction data set and an ontology are input to the association rule generation device 10, and then, when instructed to generate an association rule, the association rule generation device 10 executes the association rule generation processing illustrated in FIG. 7. Note that association rule generation processing is an example of an association rule generation method of technology disclosed herein.


At step S10, the acquisition section 12 acquires the transaction data set and the ontology input to the association rule generation device 10, and passes them to the augmentation section 14. Consider here a case in which the transaction data set illustrated in FIG. 2 and the ontology illustrated in FIG. 3 have been acquired.


Next, at step S20, the augmentation section 14 references the ontology passed from the acquisition section 12, identifies high level concept items for each of the items contained in the transaction data, and adds these identified items to the corresponding transaction data. An augmented transaction data set such as illustrated in FIG. 4 is accordingly obtained. The augmentation section 14 passes the augmented transaction data set and the ontology to the generation section 16.


Next, at step S30, k-item extraction processing is executed. A k-item is an itemset combining k individual items from out of the items contained in the augmented transaction data set. The k-item extraction processing will be described in detail later, with reference to FIG. 8. Note that the k-item extraction processing is processing in which an apriori algorithm as mentioned above is applied.


At step S32, the generation section 16 extracts items for which the support number is the threshold value or greater from the items contained in the augmented transaction data set passed from the augmentation section 14, and adds these to set Li. If the support number threshold value is 3, then L1={b, G, H, J}.


Next, at step S34, the generation section 16 produces an exclusion list such as illustrated in FIG. 5 based on the ontology passed from the augmentation section 14. Next, at step S36, the generation section 16 sets k to 2.


Next, at step S38, the generation section 16 extracts candidates for a k-item from the (k−1)-items contained in the set Lk-1. Note that any (k−1)-item not contained in Lk-1 has a support number less than the threshold value, and so the support number of a k-item including such a (k−1)-item would also necessarily be less than the threshold value. This means that the processing of the present step may simply be performed on the (k−1)-items contained in the set Lk-1. (b, G), (b, H), (b, J), (G, H), (G, J), and (H, J) are accordingly extracted here as 2-item candidates.


Next, at step S40, the generation section 16 applies the exclusion list to the k-item candidates extracted at step S38, and the remaining non-excluded k-item candidates are added to the set Ck. This results in C2={(b, H), (G, H)}, due to (b, G), (b, J), (G, J), and (H, J) being excluded based on the exclusion list.


Next, at step S42, the generation section 16 extracts as k-items candidates for which the support number is the threshold value or greater from out of the k-item candidates contained in the set Ck, and adds these to the set Lk. In this case the 2-items (b, H) and (G, H) both have a support number of 3, and so L2=(b, H), (G, H)}.


Next, at step S44, the generation section 16 determines whether or not any k-item was extracted at step S42. In cases in which there was a k-item extracted processing transitions to step S46 where the generation section 16 increments k by 1, and then processing returns to step S38. In cases in which there was no k-item extracted, the k-item extraction processing is ended, and processing returns to the association rule generation processing (FIG. 7). Processing returns to step S38 since 2-item extraction had been performed.


At step S38 for k=3, the generation section 16 extracts the 3-item candidate (b, G, H) from a combination of items contained in L2=(b, H), (G, H). Next, at step S40, the generation section 16 applies the exclusion list to (b, G, H), and (b, G, H) is excluded due to the pair b and G being included as expressed by list (b)=(G, J) of the exclusion list. Thus there was no 3-item extraction performed, negative determination is made at step S44, the k-item extraction processing is ended, and processing returns to the association rule generation processing (FIG. 7).


Note that at step S42, the generation section 16 determines whether or not to include a (k−1)-item having a support number less than the threshold value obtained at step S42 the previous time, from out of the (k−1)-items of combinations of items included in the k-item candidates. In a case not including (k−1)-items having a support number less than the threshold value, the generation section 16 extracts the candidates for the k-item as the k-item and adds these to Lk. However, in a case including a (k−1)-item having a support number less than the threshold value, the generation section 16 does not extract this candidate for k-item as the k-item. This accordingly enables simplification of the determination as to whether or not the support number of the k-item is the threshold value or greater.


Next, at step S50 of the association rule generation processing (FIG. 7), the generation section 16 generates an association rule for each of the k-items extracted by the k-item extraction processing, namely for each of the k-items included in the Lk (k≥2). In the case described above, the k-items of L2={(b, H), (G, H)} are extracted, and so the association rules [b→H], [H→b], [G→H], and [H→G] are generated. The generation section 16 computes the confidence level for each of the generated association rules, and outputs any association rules having a confidence level of the threshold value or greater. In a case in which the confidence level threshold value is 80%, then the finally output association rules are [b→H], and [H→G]. The association rule generation processing is ended when these association rules have been output.


As described above, the association rule generation device according to the present exemplary embodiment acquires plural transaction data containing one or more item. Moreover, for each of the transaction data, the association rule generation device augments the transaction data with high level concept items for each of the one or more items contained in the transaction data. The association rule generation device then employs combinatorial itemsets of two or more items contained in the plural transaction data augmented by the high level concept items to generate association rules indicating associations between items. This thereby enables association rules to be generated that have appropriately incorporated high level concepts.


For example, consider a case in which itemsets having a minimum support number of three, namely having a support number of three or more, are extracted from the transaction data set prior to augmenting with high level concept items, as illustrated on the left in FIG. 9, and association rules generated therefor. In this case the association rules [b→c] and [b→d] are not generated. However, as illustrated on the right in FIG. 9, in a case in which the transaction data has been augmented by the high level concept items, association rules of [b→H] and [G→H] are generated as high level concepts of [b→c] and [b→d]. The present exemplary embodiment is accordingly able to generate association rules for high level concepts even in cases in which association rules are not generated for low level concepts.


Moreover, even in cases such as the technology of Non-Patent Document 2 in which association rules are generated from high level concept items, association rules such as [b→H], which is able to be generated in the present exemplary embodiment, are not able to be generated for cases in which association rules are generated only from items in a specified same hierarchical layer. The present exemplary embodiment enables association rules to be generated from itemsets belonging to different hierarchical layers.


Moreover, when extracting itemsets, the association rule generation device according to the present exemplary embodiment applies an exclusion list produced based on the hierarchical relationship of items, and exclude unwanted itemsets. This thereby suppresses the generation of association rules such as those expressing a known hierarchical relationship between items, and the generation of redundant association rules such as association rules for inclusion relationships and the like. Moreover, processing can be speeded up by application of the exclusion list when extracting itemsets, compared to cases in which filtering is performed after association rules have been generated. For example, when s is an average of the number of items of pairs with each item in the exclusion list, then when extracting 2-items processing can be performed at s2 times the speed compared to not employing the exclusion list, and when extracting 3-items processing can be performed at s3 times the speed compared thereto. Note that s tends to increase by more the deeper the hierarchical layers of the data (ontology) indicating the hierarchical relationships between items.


Explanation follows regarding advantageous effects of present exemplary embodiment using specific examples. For example, in transaction data in which expressed mutations have been associated with the presence or absence of medical efficacy of a drug, an ontology is acquired such as illustrated in FIG. 10. The example of FIG. 10 illustrates respective hierarchical relationships for mutation of DNA position a and mutation of DNA position b that are types of mutation of a gene G, for mutation of DNA position c and mutation of DNA position d that are types of mutation of a gene H, and for mutation of DNA position e and mutation of DNA position f that are types of mutation of a gene I. Association rules are generated for antecedents of combinatorial mutations and for consequents of presence or absence of medical efficacy. In such a case combinatorial mutations, i.e. antecedents, may be extracted using the method of item extraction of the exemplary embodiment described above.


Consider the following two phenomenon.


(1) mutation of DNA position a does not affect gene function, and the high level gene ceases to function in cases in which other mutations are expressed.


(2) there is a medical efficacy of a given drug in cases in which the gene G and the gene I cease functioning at the same time.


In such cases, association rules of [(mutation of DNA position b, mutation of DNA position e)→medical efficacy present], and [(mutation of DNA position b, mutation of DNA position f)→medical efficacy present] are respectively generated by a method in which the transaction data is not augmented with high level concept items. However, the association rule of [(mutation of DNA position b, mutation of DNA position of gene I)→medical efficacy present] is able to express the above phenomenon using fewer rules and is more easily understood. Namely, an association rule conceptualized by including a high level concept is a better logical reflection of the actual mechanism than an association rule conceptualized by low level concepts alone, and is an association rule with good predictability. Moreover, for association rules using high level concepts, there are also a greater number of applicable transaction data, resulting in association rules that more accurately represent a phenomenon. The method of the present exemplary embodiment is considered to be particularly useful in medical fields due to the type of mechanism described above occurring not infrequently in real life.


Moreover, the association rules generated in the present exemplary embodiment are particularly useful in cases in which the validity of data is being verified, and in cases in which an unknown relationship is discovered. For example, in relation to medical fields, an association rule generated by the method of the present exemplary embodiment is useful in cases in which the validity of collected data is being checked by comparison against the knowledge of a doctor or the like. Moreover, the method of the present exemplary embodiment is also useful in cases in which a surprising association rule is discovered, and the content indicating this association rule is confirmed by tests and the like, leading to pure medical discoveries. Moreover, the method of the present exemplary embodiment is also useful in cases developing an understanding of the mechanism of medical efficacy of a drug based on the association rules generated.


Note that although in the present exemplary embodiment explanation has been given of a case in which the transaction data in table format and the ontology are respectively input and acquired, there is no limitation thereto. For example, items contained in the transaction data may be acquired as data expressed by a knowledge graph combining these items and items having a hierarchical relationship thereto. In such cases the acquired knowledge graph may be transformed into an augmented transaction data set in table format, as illustrated in FIG. 4.


Moreover, although in the present exemplary embodiment the association rule generation program is described as being in a format pre-stored (installed) in a storage section, there is no limitation thereto. The program according to the technology disclosed herein may be provided in a format stored on a storage medium such as a CD-ROM, DVD-ROM, USB memory, or the like.


There is an issue with related technology in that association rules are not able to be generated using itemsets spanning different hierarchical layers of an ontology. An association rule might be generated using an item contained in the acquired data, and a high level concept of the generated association rule might conceivably be obtained using ontology as in the related technology. However, such cases have an issue in that a high level concept association rule is not able to be generated for an itemset that is not extractable as an itemset for generating an association rule from a low level concept.


The technology disclosed herein enables association rules to be generated that appropriately incorporate high level concepts.


All publications, patent applications and technical standards mentioned in the present specification are incorporated by reference in the present specification to the same extent as if each individual publication, patent application, or technical standard was specifically and individually indicated to be incorporated by reference.


All examples and conditional language provided herein are intended for the pedagogical purposes of aiding the reader in understanding the invention and the concepts contributed by the inventor to further the art, and are not to be construed as limitations to such specifically recited examples and conditions, nor does the organization of such examples in the specification relate to a showing of the superiority and inferiority of the invention. Although one or more embodiments of the present invention have been described in detail, it should be understood that the various changes, substitutions, and alterations could be made hereto without departing from the spirit and scope of the invention.

Claims
  • 1. A non-transitory recording medium storing an association rule generation program executable by a computer to perform processing, the processing comprising: acquiring a plurality of items of combinatorial data including one or more data values;for each of the items of combinatorial data, augmenting the combinatorial data with a high level concept data value for each of the one or more data values contained in the combinatorial data; andgenerating an association rule indicating an association between data values by employing the plurality of items of combinatorial data augmented with the high level concept data values.
  • 2. The non-transitory recording medium of claim 1, wherein, in the processing, the association rule is generated from a data value set satisfying a prescribed condition among data value sets containing two or more of the data values contained in the plurality of items of combinatorial data augmented by the high level concept data values.
  • 3. The non-transitory recording medium of claim 2, wherein, in the processing, the prescribed condition is that data value combinations corresponding to a high level concept/low level concept relationship are not included in the data value set.
  • 4. The non-transitory recording medium of claim 2, wherein, in the processing, a data value set satisfying the prescribed condition is extracted using an a priori algorithm, and an association rule is generated that is expressed with an antecedent expressed by a combination of one or more data values contained in the extracted data value set and a consequent expressed by a combination of remaining data values.
  • 5. The non-transitory recording medium of claim 2, wherein, in the processing, the prescribed condition is that an index related to appearance frequency of the data value set in the plurality of items of combinatorial data exceeds a threshold value.
  • 6. The non-transitory recording medium of claim 1, wherein, in the processing: data expressed in a table format of data values contained in each of the items of combinatorial data is acquired as the plurality of items of combinatorial data; anddata values that are high level concepts of the data values are added to the items of combinatorial data with reference to pre-prepared high level concept/low level concept relationships related to the data values.
  • 7. The non-transitory recording medium of claim 1, wherein, in the processing: a knowledge graph acquired as the plurality of items of combinatorial data includes data values contained in each item of combinatorial data and respective data values indicating a high level concept/low level concept relationship to the data values expressed as nodes, and includes relationships between the data values expressed by edges connecting between the nodes; andthe plurality of items of combinatorial data augmented with high level concept data values for the data values is acquired by transforming the knowledge graph into data expressed in a table format of data values contained in each of the items of combinatorial data and the high level concept data values for the data values.
  • 8. An association rule generation device, comprising: a memory; anda processor coupled to the memory, the processor being configured to: acquire a plurality of items of combinatorial data including one or more data values;for each of the items of combinatorial data, augment the combinatorial data with a high level concept data value for each of the one or more data values contained in the combinatorial data; andgenerate an association rule indicating an association between data values by employing the plurality of items of combinatorial data augmented with the high level concept data values.
  • 9. The association rule generation device of claim 8, wherein the processor is further configured to generate the association rule from a data value set satisfying a prescribed condition among data value sets containing two or more of the data values contained in the plurality of items of combinatorial data augmented by the high level concept data values.
  • 10. The association rule generation device of claim 9, wherein the prescribed condition is that data value combinations corresponding to a high level concept/low level concept relationship are not included in the data value set.
  • 11. The association rule generation device of claim 9, wherein the processor is further configured to extract a data value set satisfying the prescribed condition using an a priori algorithm, and generate an association rule expressed with an antecedent expressed by a combination of one or more data values contained in the extracted data value set and a consequent expressed by a combination of remaining data values.
  • 12. The association rule generation device of claim 9, wherein the prescribed condition is that an index related to appearance frequency of the data value set in the plurality of items of combinatorial data exceeds a threshold value.
  • 13. The association rule generation device of claim 8, wherein the processor is further configured to: acquire data expressed in a table format of data values contained in each of the items of combinatorial data as the plurality of items of combinatorial data; andadd data values that are high level concepts of the data values to the items of combinatorial data with reference to pre-prepared high level concept/low level concept relationships related to the data values.
  • 14. The association rule generation device of claim 8, wherein the processor is further configured to: acquire, as the plurality of items of combinatorial data, a knowledge graph including data values contained in each item of combinatorial data and respective data values indicating a high level concept/low level concept relationship to the data values expressed as nodes, and including relationships between the data values expressed by edges connecting between the nodes; andacquire the plurality of items of combinatorial data augmented with high level concept data values for the data values by transforming the knowledge graph into data expressed in a table format of data values contained in each of the items of combinatorial data and the high level concept data values for the data values.
  • 15. An association rule generation method, comprising: acquiring a plurality of items of combinatorial data including one or more data values;by a processor, for each of the items of combinatorial data, augmenting the combinatorial data with a high level concept data value for each of the one or more data values contained in the combinatorial data; andgenerating an association rule indicating an association between data values by employing the plurality of items of combinatorial data augmented with the high level concept data values.
  • 16. The association rule generation method of claim 15, wherein the association rule is generated from a data value set satisfying a prescribed condition among data value sets containing two or more of the data values contained in the plurality of items of combinatorial data augmented by the high level concept data values.
  • 17. The association rule generation method of claim 16, wherein the prescribed condition is that data value combinations corresponding to a high level concept/low level concept relationship are not included in the data value set.
  • 18. The association rule generation method of claim 16, wherein a data value set satisfying the prescribed condition is extracted using an a priori algorithm, and an association rule is generated expressed with an antecedent expressed by a combination of one or more data values contained in the extracted data value set and a consequent expressed by a combination of remaining data values.
  • 19. The association rule generation method of claim 16, wherein the prescribed condition is that an index related to appearance frequency of the data value set in the plurality of items of combinatorial data exceeds a threshold value.
Priority Claims (1)
Number Date Country Kind
2021-097159 Jun 2021 JP national