An embodiment of the present invention will be described below with reference to the drawings.
As shown in
The present embodiment can accurately and quickly detect a sequential pattern following a variation in the event belonging to the same attribute, in sequential data in which elements composed of plural events are sequentially arranged.
Before description, several terms used in the specification are described below. The elements composed of plural events and sequentially arranged are assumed to be a sequential pattern. The number of elements contained in the sequential pattern is assumed to be a sequence size of the sequential pattern. The sequential pattern with a sequence size of “i” is called an ith-length sequential pattern. For example,
Description will be given of an example of process of a sequential pattern detecting apparatus in accordance with the present embodiment. The sequential data storage unit 1 stores sequential data for subjects P1 to P3 recorded in 2000 to 2002 as shown in
As shown in
The event detecting process in step Sa0 will be described below in detail with reference to
First, the event detecting unit 100 refers to the sequential data storage unit 1 to determine whether or not to be able to retrieve sequential data (step Sa1). If the sequential data storage unit 1 stores any unretrieved data (the result of step Sa1 is “YES”), the sequential data decomposing unit 2 retrieves one unretrieved data from the sequential data storage unit 1. The process then proceeds to step Sa2. If all sequential data have been retrieved, the process ends the event detecting process step Sa0 and proceeds to the event set detecting step Sb0. Specifically, to retrieve sequential data for the first time, the sequential data decomposing unit 2 retrieves sequential data for the subject P1 from the sequential data storage unit 1. The process then proceeds to step Sa2. If all the sequential data for the subjects P1 to P3 have already been retrieved, the event detecting process step Sa0 is ended. The process then proceeds to the event set detecting step Sb0.
In step Sa2, the event detecting unit 100 refers to the sequential data retrieved in step Sa1 to determine whether or not to be able to retrieve elements. If the sequential data contains any unretrieved element (the result of step Sa2 is “YES”), the sequential data decomposing unit 2 retrieves an unretrieved one of the elements forming the sequential data retrieved in step Sa1. The process proceeds to step Sa3. Otherwise (the result of step Sa2 is “NO”) the process returns to step Sa1. Specifically, if the elements are extracted, for the first time, from the sequential data for the subject P1 retrieved in step Sa1, the sequential data elements “blood pressure=G, exercise=G, sugar content=G” for the subject P1 recorded in 2000 are retrieved. The process then proceeds to step Sa3. If the sequential data elements for the subject P1 recorded in 2000 to 2002 have already been retrieved, the process then returns to step Sa1.
In step Sa3, the event detecting unit 100 refers to the element retrieved in step Sa2 to determine whether or not to be able to retrieve event. If the element include any unretrieved event (the result of step Sa3 is “YES”), the sequential data decomposing unit 2 retrieves one unretrieved event from the element. The process proceeds to step Sa4. Otherwise (the result of step Sa3 is “NO”) the process returns to step Sa2. Specifically, if an event is extracted, for the first time, from the sequential data elements retrieved in step Sa2, that is, the elements “blood pressure=G, exercise=G, sugar content=G” for the subject P1 recorded in 2000, the event “blood pressure=G” is retrieved. The process then proceeds to step Sa4. If all the events “blood pressure=G”, “exercise=G”, and “sugar content=G”, the sequential data elements for the subject P1 recorded in 2000, have already been retrieved, the process returns to step Sa2.
In step Sa4, the event detecting unit 100 refers to the event retrieved in step Sa3 to determine whether or not an event evaluation value calculation has already been performed. If the event evaluation value calculation, described later, has already performed on the event retrieved in step Sa3 (the result of step Sa4 is “YES”), the process returns to step Sa3. Otherwise (the result of step Sa4 is “NO”) the process proceeds to step Sa5. Specifically, it is assumed that in step Sa3, the event “sugar content=G” is retrieved from the sequential data elements for the subject P1 recorded in 2002. The event detecting unit 100 determines whether or not the event evaluation value calculation has been performed on the event “sugar content=G”. If the event evaluation value calculation has not been performed, the process proceeds to step Sa5. On the other hand, it is assumed that the sequential data elements for the subject P1 recorded in 2000 have already been processed and that the event “sugar content=G” has been retrieved from the sequential data elements for the subject P1 recorded in 2001, which was retrieved in step Sa3. In step Sa4, the event detecting unit 100 determines that the event evaluation value calculation has been performed on the event “sugar=G”. The process returns to step Sa3.
In step Sa5, the event detecting unit 100 calculates event evaluation values. That is, the candidate sequential pattern determining unit 3 calculates the support for each event, that is, an event evaluation value. First, the candidate sequential pattern determining unit 3 refers to sequential data stored in the sequential data storage unit 1 to calculate the number (frequency) of sequential data containing a particular event. Then, the candidate sequential pattern determining unit 3 applies the calculated frequency to Formula (1) to calculate the support for the event. Specifically, if the event detecting unit 100 determines that an event evaluation value has not been calculated for the event “blood pressure=G” in step Sa4, the candidate sequential pattern determining unit 3 calculates its support. As shown in
In step Sa7, the event detecting unit 100 stores the characteristic event. That is, the characteristic sequential pattern storage unit 4 stores the event determined to be characteristic in step Sa6 as a characteristic event set comprising one event. The process then returns to step Sa4. Specifically, for the event “blood pressure=G”, the characteristic sequential pattern storage unit 4 stores the event as a characteristic event set comprising one event. The process then returns to step Sa4.
Steps Sa1 to Sa7 allow the detection of all event sets each comprising one event. Specifically, for the sequential data shown in
Once the event detecting process in step Sa0, shown in
First, the event set detecting unit 200 determines whether or not to be able to retrieve an event set group (step Sb1). Specifically, if an event set group containing plural event sets corresponding to the current event count can be retrieved from the characteristic sequential pattern storage unit 4 (the result of step Sb1 is “YES”), the candidate sequential pattern generating unit 7 retrieves the event set group corresponding to the current event count from the characteristic sequential pattern storage unit 4. The process proceeds to step Sb2. Otherwise (the result of step Sb1 is “NO”) the process proceeds to step Sb8. If step Sb1 is performed for the first time on, for example, the sequential data shown in
In step Sb2, the event set detecting unit 200 determines whether or not to be able to retrieve an event set pair. Specifically, the candidate sequential pattern generating unit 7 refers to the event set group extracted in step Sb1. If there is any unextracted combination of event sets (the result of step Sb2 is “YES”), the candidate sequential pattern generating unit 7 retrieves one unextracted combination of event sets as one event set pair. The process then proceeds to step Sb3. Otherwise (the result of step Sb2 is “NO”), the candidate sequential pattern generating unit 7 increments the current event count by “1”. The process then returns to step Sb1. For example, it is assumed that step Sb2 is performed for the first time on the sequential data shown in
In step Sb3, the event set detecting unit 200 determines whether or not to be able to generate a candidate event set. That is, if the event subsets in each event set pair retrieved in step Sb2 match (the result of step Sb3 is “YES”), the event set detecting unit 200 combines the event set pair together and generates a candidate event set with an event count larger than the current one by “1”. The process then proceeds to step Sb4. Otherwise (the result of step Sb3 is “NO”) the process returns to step Sb2. Here, the event subset is the corresponding event set from which the last event is excluded. For example, the event subset of the “blood pressure=G, exercise=G, sugar content=G” is “blood pressure=G, exercise=G”. For example, it is assumed that in step Sb2, the two event sets “blood pressure=G” and “blood pressure=Y” are retrieved as an event set pair. In this case, the event subsets of the two event sets are both empty and are thus determined to match. The event set detecting unit 200 then generates a candidate event set such as “blood pressure=G, blood pressure=Y” which comprises two events. The process then proceeds to Sb4.
In step Sb4, the event set detecting unit 200 determines whether or not the candidate event set generated in step Sb3 is valid. That is, the attribute information determining unit 6 refers to the attribute information stored in the attribute information storage unit 5 to check the attribute duplication of each of the events constructing the candidate event set. If no duplication is found (the result of step Sb4 is “YES”), the process proceeds to step Sb5. Otherwise (the result of step Sb4 is “NO”), the process returns to step Sb2. Specifically, for a candidate event set such as “blood pressure=G, blood pressure=Y”, these two events belong to the same attribute “blood pressure”. Owing to the presence of the attribute duplication, the process returns to step Sb2. For a candidate event set such as “blood pressure=G, sugar content=G”, these events belong to different attribute. Owing to the lack of an attribute duplication, the process proceeds to step Sb5.
In step Sb5, the event set detecting unit 200 calculates evaluation value for each candidate event set. Specifically, the candidate sequential pattern determining unit 3 refers to the sequential data stored in the sequential data storage unit 1 to calculate the frequency of the sequential data containing the candidate event set. The candidate sequential pattern determining unit 3 further applies Formula (1), described above, to the calculated frequency to calculate a support for the candidate event set.
In step Sb7, the event set detecting unit 200 stores the characteristic event set. That is, the characteristic sequential pattern storage unit 4 stores the candidate event set determined to be characteristic in step Sb6. The process then returns to step Sb2. For example, the characteristic sequential pattern storage unit 4 stores the event “blood pressure=G, sugar content=G” as a characteristic event set with an event count of “2”.
The event set detecting process in step Sb0 is thus repeatedly performed on the characteristic event sets with an event count of “1” shown in
Further, as shown in
Further, it is assumed that a candidate event set “blood pressure=G, exercise=G, sugar content=G” is generated in step Sb3. Then, since these three events belong to the different attributes and have no attribute duplication, the process proceeds to step Sb5. On the other hand, it is assumed that a candidate event set such as “blood pressure=G, exercise=G, exercise=Y” is generated in step Sb3. Then, since the events “exercise=G” and “exercise=Y” belong to the same attribute “exercise” and have an attribute duplication, the process returns to step Sb2.
The event set detecting process in step Sb0 is thus repeatedly performed on the characteristic event sets with an event count of “2” shown in
In step Sb8, the event set detecting unit 200 generates primary sequential patterns. Specifically, the candidate sequential pattern generating unit 7 regards characteristic event sets with a sequence size of “1” stored in the characteristic sequential pattern storage unit 4 as the primary sequential patterns. The characteristic sequential pattern storage unit 4 then stores the primary sequential pattern to finish the event set detecting step Sb0. Specifically, for the sequential data in
Once the event set detecting process in step Sb0, shown in
In step Sc1, the sequential pattern detecting unit 300 determines whether or not to be able to retrieve sequential pattern sets. Specifically, if sequential pattern sets corresponding to the current sequence size can be retrieved from the characteristic sequential pattern storage unit 4 (the result of step Sc1 is “YES”), the candidate sequential pattern generating unit 7 retrieves sequential pattern sets corresponding to the current sequence size from the characteristic sequential pattern storage unit 4. The process then proceeds to step Sc2. Otherwise (the result of step Sc1 is “NO”) the sequential pattern detecting unit 300 ends the sequential pattern detecting process step Sc0. If step Sc1 is performed for the first time, the sequence size is “1”. Accordingly, to perform step Sc1 for the first time on the sequential data in
In step Sc2, the sequential pattern detecting unit 300 determines whether or not to be able to retrieve sequential pattern pair. Specifically, the candidate sequential pattern generating unit 7 refers to the sequential pattern sets extracted in step Sc1, and if any combination of two sequential patterns has not been extracted yet (the result of step Sc2 is “YES”), the candidate sequential pattern generating unit 7 retrieves one unextracted combination of two sequential patterns as a sequential pattern pair. The process then proceeds to step Sc3. Otherwise (the result of step Sc2 is “NO”) the candidate sequential pattern generating unit 7 increments the current sequence size by “1”. The process then returns to step Sc1. In step Sc2, a combination of two identical sequential patterns can also be retrieved. Further, a combination of two sequential patterns is considered to be different from another combination of the same two sequential patterns if the arrangement order of these sequential patterns is different between the two combinations. Specifically, to perform step Sc2 for the first time on the sequential data shown in
In step Sc3, the sequential pattern detecting unit 300 determines whether or not to be able to generate a candidate sequential pattern. Specifically, for the sequential pattern pair retrieved in step Sc2, when partial sequential patterns of the two sequential patterns match (the result of step Sc3 is “YES”), the candidate sequential pattern generating unit 7 combines the paired sequential patterns into a candidate sequential pattern with a sequence size larger than the current one by “1”. The process then proceeds to step Sc4. Otherwise (the result of step Sc3 is “NO”) the process returns to step Sc2. Here, the partial sequential pattern is the corresponding sequential pattern from which the last element is excluded. For example, the partial sequential pattern of “blood pressure=G→blood pressure=Y→blood pressure→R” is “blood pressure=G→blood pressure=Y”. For example, it is assumed that a sequential pattern of “blood pressure=G” and “blood pressure=Y” with a sequence size of “1” is retrieved in step Sc2 as a sequential pattern pair. In this example, the partial sequential patterns of these sequential patterns are both empty and thus match. The candidate sequential pattern generating unit 7 thus generates a candidate secondary sequential pattern “blood pressure=G→blood pressure=Y”. The process then proceeds to step Sc4.
In step Sc4, the sequential pattern detecting unit 300 determines whether or not the candidate sequential pattern generated in step Sc3 is valid. First, the attribute information determining unit 6 checks the candidate sequential pattern for its sequence size. If the sequence size is at least “3”, the process unconditionally proceeds to step Sc5. If the sequence size is “2”, the attribute information determining unit 6 refers to the attribute information stored in the attribute information storage unit 5 to compare the attributes of the events of the elements constructing the candidate secondary sequential pattern. If the attributes match (the result of step Sc4 is “YES”), the process proceeds to step Sc5. Otherwise (the result of step Sc4 is “NO”) the process returns to step Sc2. Specifically, if the candidate secondary sequential pattern is “blood pressure=G→blood pressure=Y”, the process proceeds to step Sc5 because the attributes of the events of the elements constructing the candidate secondary sequential pattern are both “blood pressure” and thus match. If the candidate secondary sequential pattern is “blood pressure=G→exercise=G”, the process returns to step Sc2 because the attributes of the events of the elements constructing the candidate secondary sequential pattern are “blood pressure” and “exercise” and do not match. If the candidate secondary sequential pattern is “blood pressure=G, exercise=G→blood pressure=Y, exercise=Y”, the process proceeds to step Sc5 because, for the elements “blood pressure=G, exercise=G” and “blood pressure=Y, exercise=Y”, the attributes of the events are both “blood pressure” and “exercise” and thus match. If the candidate secondary sequential pattern is “blood pressure=G, exercise=G→blood pressure=G, sugar content=G”, the process returns to step Sc2 because, in spite of the matching attribute “blood pressure”, the elements “blood pressure=G, exercise=G” and “blood pressure=G, sugar content=G” have different attributes, that is, “exercise” and “sugar content”.
In step Sc5, the sequential pattern detecting unit 300 calculates sequential pattern evaluation value. Specifically, the candidate sequential pattern determining unit 3 refers to the sequential data stored in the sequential data storage unit 1 to calculate the frequency of the candidate sequential pattern. The candidate sequential pattern determining unit 3 further applies Formula (1), described above, on the basis of the frequency to calculate the support for the candidate sequential pattern.
In step Sc7, the sequential pattern detecting unit 300 stores the characteristic sequential pattern. That is, the characteristic sequential pattern storage unit 4 stores the sequential pattern determined to be characteristic in step Sc6. The process then returns to step Sc2. For example, the secondary sequential pattern “blood pressure=G→blood pressure=Y” is stored in the characteristic sequential pattern storage unit 4 as a characteristic secondary sequential pattern.
The sequential pattern detecting process in step Sc0 is thus repeatedly performed on the primary sequential patterns shown in
Then, with the sequence size set to “2”, the sequential pattern detecting process in step Sc0 is thus repeatedly performed on characteristic secondary sequential patterns such as those shown in
In step Sc3, for example, the two sequential patterns “blood pressure=G→blood pressure=Y” and “blood pressure=G→blood pressure=R” have the same partial sequential pattern “blood pressure=G”. Accordingly, a candidate tertiary sequential pattern “blood pressure=G→blood pressure=Y→blood pressure=R” is generated, and the process proceeds to step Sc4. On the other hand, for example, the two sequential patterns “blood pressure=G→blood pressure=Y” and “exercise=G→exercise=Y” have the different sequential patterns “blood pressure=G” and “exercise=G”. The process thus returns to step Sc2.
In step Sc4, for example, for a candidate tertiary sequential pattern such as “blood pressure=G→blood pressure=Y→blood pressure=R”, the process immediately proceeds to step Sc5 because the sequential pattern has a sequence size of “3”.
A similar process is then performed to enable candidate tertiary sequential patterns shown in
Then, with the sequence size set to “3”, the sequential pattern detecting process in step Sc0 is thus repeatedly performed on the characteristic tertiary sequential patterns shown in
In step Sc3, for example, the two sequential patterns “blood pressure=G→blood pressure=Y→blood pressure=R” and “blood pressure=G→blood pressure=Y→blood pressure=R” have the same partial sequential pattern “blood pressure=G→blood pressure=Y”. Accordingly, a quartic sequential pattern “blood pressure=G→blood pressure=Y→blood pressure=R→blood pressure=R” is generated, and the process proceeds to step Sc4. On the other hand, for example, the two sequential patterns “blood pressure=G→blood pressure=Y→blood pressure=R” and “exercise=G→exercise=Y→exercise=R” have the different partial sequential patterns “blood pressure=G→blood pressure=Y” and “exercise=G→exercise=Y”. The process thus returns to step Sc2.
In step Sc4, for example, for a candidate quartic sequential pattern such as “blood pressure=G→blood pressure=Y→blood pressure=R→blood pressure=R”, the process immediately proceeds to step Sc5 because the sequential pattern has a sequence size of “4”.
A similar process is then performed to enable the acquisition of candidate quartic sequential patterns shown in
For the sequential data shown in
As described above, the present embodiment detects a characteristic sequential patterns with a sequence size “2” from combination of two characteristic sequential patterns with a sequence size of “1”, and sequentially increments the sequence size by “1”, while generating an (i+1)th-length characteristic sequential pattern with a sequence size of (i+1) from combination of two characteristic sequential patterns with a sequence size of “i”. Once all the characteristic sequential patterns are detected, the sequential pattern detecting process in step Sc0 is finished to complete all of the process performed by the sequential pattern detecting apparatus in accordance with the embodiment. That is, for the sequential data shown in
The present embodiment can also check the invalidity of a candidate event set containing a combination of events belonging to the same attribute and having no possibility of coincidental occurrence, to exclude the candidate event set from the determination as to whether or not the candidate event set is characteristic. This enables a sharp reduction in the number of candidate event sets for which it is necessary to determine whether or not they are characteristic. For example, for the sequential data in
The present embodiment can also determine that sequential patterns in which the events contained in the elements belong to different attributes are invalid, to exclude these sequential patterns from the determination as to whether or not the sequential patterns are characteristic. This enables a sharp reduction in the number of candidate sequential patterns for which it is necessary to determine whether or not they are characteristic. For example, for the sequential data in
The sequential patterns shown in
In the above embodiment, the attributes stored in the attribute information storage unit 5 are configured without specifying a hierarchical structure for events belonging to the same attribute column. However, the attributes may be configured with a hierarchical structure specified. For example, it is assumed that such events as those shown in
The attributes configured as shown in
Further, in step Sc4, regardless of the number of events contained in the attribute “alcohol consumption”, the attribute information determining unit 6 can determine whether or not to proceed to step Sc5 on the basis of the presence or absence of an event belonging to this attribute. This determination prevents a sequential pattern such as “alcohol consumption=doesn't drink→blood pressure=G” from proceeding to step Sc5, while allowing a sequential pattern such as “alcohol consumption=doesn't drink→alcohol consumption=drinks: wine→alcohol consumption=drinks: beer, alcohol consumption=drinks: wine” to proceed to step Sc5.
Further, for example, in step Sc4, the determination can be made with restrictions on a variation in event. Specifically, the process may proceed to step Sc5 if the event belonging to the attribute “blood pressure” changes like “blood pressure=G→blood pressure=Y” but not if the event belonging to the attribute “blood pressure” does not change like “blood pressure G→blood pressure=G”.
The above embodiment provides the event detecting unit 100, shown in
The above embodiment utilizes the support of each sequential pattern as a reference value for determining whether or not the sequential pattern is characteristic. However, a sequence interest level may be utilized in place of the support. The sequence interest level is described in Shigeaki Sakurai, Youichi Kitahara, and Ryohei Orihara: “Sequential Mining Method based on a New Criterion”, Proceedings the 10th IASTED International Conference on Artificial Intelligence and Soft Computing, 544-045(2006). For example, if a particular sequential pattern includes a partial sequential pattern with not a very high relative frequency, it can accurately predict the remaining events contained in itself when the partial sequential pattern with not a very high relative frequency is provided. Accordingly, this sequential pattern can be considered to be a kind of characteristic sequential pattern. Thus, not a very high relative frequency is evaluated using the minimum value of reciprocal of the frequency of the partial sequential pattern included in the sequential pattern. This is defined as an index for detection of such a sequential pattern.
Additional advantages and modifications will readily occur to those skilled in the art. Therefore, the invention in its broader aspects is not limited to the specific details and representative embodiments shown and described herein. Accordingly, various modifications may be made without departing from the spirit or scope of the general inventive concept as defined by the appended claims and their equivalents.
Number | Date | Country | Kind |
---|---|---|---|
2006-210202 | Aug 2006 | JP | national |