1. Field of the Invention
The present invention relates to a method, device and computer program for visualizing calculated risk assessment values in which risk assessment values for the occurrence of a predetermined event are calculated for each event sequence partially occurring in a time series.
2. Description of the Related Art
Often, before a critical event occurs, a number of events considered to be harbingers occur in a time series. Therefore, it is desirable to estimate the possibility of a critical event occurring from a group of events occurring in a time series (referred to below as an event sequence) in order to provide advance warning.
However, in many situations, it is often unclear from a given event sequence about which event is linked to a critical event. Also, it is difficult to assume the link among events beforehand in a given situation because the number of possible event sequences is often huge. Therefore, various systems have been developed to predict the occurrence of events by estimating risk assessment values modeled from, for example, neuron models and case-based inference engines.
For example, an information management device is described in Laid-open Patent Publication No. JP 2002-207755 that describes a case-based inference engine. In JP 2002-207755, in order to consider the time series in cases, time series data is inputted and stored. The importance of these cases is calculated, and cases with a high degree of importance are extracted as similar cases.
However, even when time series data is used as the input, the prior art: Laid-open Patent Publication No. JP 2002-20775, only calculates a degree of importance that takes into account the season, the time period, etc. For example, even when the same type of events has occurred in the same time period, the events that can occur are different if the time series are different. Thus, it is difficult to correctly extract similar events.
Also, it is impossible to realistically assume all possible cases in a medical event. Even if they can be assumed, very few cases are completely the same. Therefore, it is not realistic to store all cases beforehand as similar cases for extraction. In other words, a suitable means does not exist for comparing event sequences with different lengths and elements, and it is difficult to visually verify and to give feedback on risk assessment values based on event sequences.
In view of this situation, the purpose of the present invention is to provide a method, device and computer program for visualizing risk assessment values for event sequences in which totally ordered sets can be estimated on the basis of partially ordered sets indicating an event sequence, and the risk assessment values calculated for each event sequence can be visualized.
One aspect of the present invention provides a method for calculating and displaying a plurality of risk assessment values for an event sequence, wherein the event sequence comprises a plurality events for a finite number M of types (where M is a natural number) and a portion of the event group being a partially ordered set in a time series. The method includes: generating an M-dimensional sparsely ordered matrix based on the event sequence, and interpolating between a plurality of elements of the M-dimensional sparsely ordered matrix to calculate a densely ordered matrix; calculating a mapping matrix for mapping a plurality of similarity relations between a plurality of event sequences in two-dimensional space or three-dimensional space based on the densely ordered matrix; calculating the plurality of corresponding points of each event sequence in two-dimensional space or three-dimensional space using the mapping matrix; and outputting and displaying the plurality of corresponding points in two-dimensional or three-dimensional space
Another aspect of the present invention provides a device for calculating and displaying a plurality of risk assessment values for an event sequence, wherein the event sequence includes a plurality of events for a finite number M of types (where M is a natural number) and a portion of the event group being a partially ordered set in a time series, the device comprising: an order matrix calculating means for generating an M-dimensional sparsely ordered matrix on the basis of the event sequence, and interpolating between a plurality of elements of the M-dimensional sparsely ordered matrix to calculate a densely ordered matrix; a mapping matrix calculating means for calculating a mapping matrix for mapping a plurality of similarity relations between a plurality of event sequences in two-dimensional space or three-dimensional space based on the densely ordered matrix; and a display output means for calculating a plurality of corresponding points of each event sequence in two-dimensional space or three-dimensional space using the mapping matrix; and outputting and displaying the plurality of corresponding points in two-dimensional or three-dimensional space.
Another aspect of the present invention provides A computer readable non-transitory article of manufacture tangibly embodying computer readable instructions which, when executed, cause a computer to calculate and display a plurality of risk assessment values for an event sequence, wherein the event sequence includes a plurality of events for a finite number M of types (where M is a natural number) and a portion of the event group being a partially ordered set in a time series, the computer program which executes the method explained above.
The following is a detailed description with reference to the drawings of a risk assessment value display device in an embodiment of the present invention. This device calculates risk assessment values related to the occurrence of a predetermined event in each event sequence in which a portion of the event group indicates a time series, and then visualizes the calculated risk assessment values. Needless to say, this embodiment does not limit in any way the present invention as described in the scope of the claims, and all combinations of features explained in the embodiment are not necessarily essential to the technical solution of the present invention.
Also, the present invention can be embodied many different ways, and should not be interpreted as being limited to the description of the embodiment. Throughout the embodiment, the same elements are denoted by the same reference signs.
In the following embodiment, a device is explained in which a computer program has been introduced to a computer system. However, as should be clear to any person skilled in the art, the present invention can be embodied as a computer program that can execute a portion of this using a computer. Thus, the present invention can be embodied as hardware such as a risk assessment value display device which calculates risk assessment values for the occurrence of a predetermined event for each event sequence partially occurring in a time series and visualizes the calculated risk assessment values, as software, or as a combination of software and hardware. The computer program can be recorded on any computer-readable recording medium such as a hard disk, a DVD, a CD, an optical storage device, or a magnetic storage device.
In the embodiment of the present invention, risk assessment values can be calculated for each event sequence by converting partially ordered sets (matrices) indicating event sequences with different lengths and elements into totally ordered sets (matrices), and past cases can be easily compared by displaying and outputting the calculated risk assessment values in two-dimensional space or three-dimensional space. Also, the possibility (risk) of a critical event occurring can be visually evaluated in each event sequence by plotting and displaying or by performing a density conversion and then displaying the calculated risk assessment values in two-dimensional or three-dimensional space.
The CPU 11 is connected via the internal bus 18 to each unit of hardware in the risk assessment value display device 1 described above, controls the operations performed by each unit of hardware described above, and executes various software functions according to the computer program 100 stored in the storage device 13. The memory 12 is volatile memory such as SRAM or SDRAM, which expands load modules during execution of the computer program 100, and temporarily stores data generated during the execution of the computer program 100.
The storage device 13 can be a built-in fixed storage device (hard disk) and ROM. The computer program 100 stored in the storage device 13 is downloaded using a portable disk drive 16 from a portable recording medium 90 such as a DVD or CD-ROM on which the program and information such as data have been recorded. During execution, the program is expanded from the storage drive 13 to the memory 12 and executed. Of course, the computer program can also be downloaded from an outside computer connected via the communication interface 17.
The communication interface 17 is connected to the internal bus 18 and connected, in turn, to an outside network such as the Internet, a LAN or a WAN in order to be able to exchange data with an outside computer.
The I/O interface 14 is connected to input devices such as a keyboard 21 and a mouse 22 to receive data inputs. The video interface 15 is connected to a display device 23 such as a CRT display or a liquid crystal display to display on the display device 23 risk assessment values calculated for sampled event sequences and risk assessment values calculated for event sequences sampled in the past.
The event sequences can be acquired from an outside computer connected via the communication interface 17, or can be acquired from a portable recording medium 90 such as a DVD or CD-ROM using a portable disk drive 16. They can also be acquired by receiving direct input via input devices such as a keyboard 21 and mouse 22.
Returning to
As shown in
For example, since events occur as events A, B, C, E, F in event sequence 1 as shown in
In other words, element X(i) (e1, e2) in partially ordered matrix X(i) of event sequence i can be determined by (Equation 1). In (Equation 1), function I (e1, e2) returns “1” when event e1 is prior to event e2. Otherwise, it returns “0”. Also, s indicates the number of hops between event e1 and event e2 (a value proportional to the interval between the two). For example, the number of hops s from event A to event B is “1”, and the number of hops s from event A to event C is “2”. Therefore, a partially ordered matrix can be generated in which the elements have smaller values as the distance between events increases.
Equation 1
X
(i)
e1,e2
=I(e1,e2)βs (Equation 1)
A partially ordered matrix X is generated for each event sequence on the basis of (Equation 1), but the generated partially ordered matrices X are sparsely ordered matrices in which most of the elements are “0”. Therefore, the generated partially ordered matrices are interpolated using the so-called label propagation method. In other words, a densely ordered matrix U is calculated by properly interpolating areas of the partially ordered matrix X in which the elements are “0” in accordance with (Equation 2) so that the difference between elements is smaller than in the original partially ordered matrix X, and so that each element is weighted in accordance with the degree of similarity in the event sequence.
Returning to
In this embodiment, a calculated densely ordered matrix U(i) (i=1−N) is converted to N column vectors u as shown in (Equation 3). For example, function vec for converting a 3×3 matrix into column vectors is defined as shown in (Equation 3).
The mapping matrix A for mapping the space, for example, two-dimensional space or three-dimensional space, in which the N column vectors u are outputted and displayed is calculated on the basis of (Equation 4). In (Equation 4), z is, for example, a two-dimensional column vector consisting of (p, q) when two-dimensional space consisting of orthogonal axes p and q is mapped. Mapping matrix A is a (2×100) matrix when vector u is a column vector consisting of “100” elements.
Equation 4
z=Au (Equation 4)
Mapping vector A is calculated as a matrix in which the objective function shown in (Equation 5) is minimized.
In (Equation 5), Kn,n′ is a function indicating the degree of similarity between event sequences n and n′. This can be expressed using (Equation 6). Dn,n′ is shown in (Equation 8) and described below.
In (Equation 5), the first term is the term adjusted to keep the degree of similarity between event sequences equal after they are mapped in a predetermined space such as a two-dimensional space or three-dimensional space, and the second term is the term for keeping the mapping range converged in a predetermined range.
In other words, the objective function shown in (Equation 5) is essentially equal to an objective function used in the method called Locality Preserving Projections (LPP). However, a conventional LPP objective function is not used to convert an event sequence into a vector, and does not function as an LPP objective function with a sparse matrix in which most of the elements are 0 (zero).
Therefore, in this embodiment, the mapping matrix A is calculated using an objective matrix after a densely ordered matrix U has been calculated. In other words, the mapping matrix A can be calculated as a solution to the generalized eigenvalue problem shown in (Equation 7).
Equation 7
Φ(A)=Tr(AUGUTAT−μAUDUTAT) (Equation 7)
However, Gn,n′≡δn,n′Dn,n′−Kn,n′
In (Equation 7), Tr is a function for calculating diagonal elements in the matrix, and returns a scalar value that is the sum of the diagonal elements. Also, Dn,n′ can be expressed in (Equation 8) using Kronecker delta δn,n′.
(Equation 8) is differentiated using mapping matrix A to obtain (Equation 9). A matrix with a value of 0 on the right-hand side of (Equation 9) can be calculated as mapping matrix A.
Returning to
Equation 10
z=wA[w
n
I
M
+λL]
−1
x (Equation 10)
The coordinate point z0(p0, q0) outputted and displayed on plane pq using the mapping matrix A calculated from (Equation 9) is a risk assessment value. For example, in
It is often difficult to arrive at a decision from coarse-grained coordinate points and is difficult to determine anything visually simply by plotting risk assessment values in past event sequences. Therefore, the kernel density p(z) of coordinate value z is estimated on the basis of past event sequences.
Returning to
In (Equation 11), c is a constant meeting standardized conditions for kernel density p(z). For example, the value is set so that the integral value of kernel density p(z) is “1” in a predetermined domain of definition. Also, β represents the bandwidth, and is a constant calculated by running likelihood cross-validation.
When likelihood cross-validation is run, the event sequences acquired as sampling data are first split into several event sequences. For example, N event sequences are split into five, and a split event sequence group is set as D″(i) (i=a natural number from 1 to 5). The kernel density p(z) is calculated from (Equation 11) using the remaining four event sequence groups with respect to the bandwidth β of the one event sequence group D″(i), and the logarithmic likelihood Π(β) is calculated in accordance with (Equation 12).
From (Equation 12), the β with the largest logarithmic likelihood Π(β) is determined as the bandwidth β. In this embodiment, the event sequences were split into five. However, the present invention is not limited to this example. If there is a large enough amount of data, the event sequences can be split into a greater number than five.
The area output display unit 206 calculates the coordinate value z for two-dimensional space or three-dimensional space in all event sequences acquired as sampling data in which a critical event occurred, and determines whether or not risk has occurred on the basis of whether or not a label value indicating the occurrence of risk has been assigned to each calculated coordinate value z. Similarly, there is a high possibility of a critical event occurring in the vicinity of coordinate value z in a data set in which risk has occurred. Therefore, circumscribed areas for coordinate z are superimposed in two-dimensional space or three-dimensional space, outputted and displayed.
The coordinate points z1(p1,q1) and z2(p2, q2) outputted and displayed on plane pq using the mapping matrix A calculated from (Equation 9) are risk assessment values. For example, in
Therefore, coordinate value z1 calculated in a given vector sequence can be visually determined to have a high probability of a critical event occurring because it is in circumscribed area 71. Similarly, coordinate value z2 calculated in a given vector sequence can be visually determined to have a low probability of a critical event occurring because it is not included in circumscribed area 72.
The CPU 11 generates partially ordered matrices (partially ordered sets) representing the order of events based on the acquired event sequences (Step S802), and converts the generated partially ordered matrices into an approximation of totally ordered matrices (totally ordered sets) (Step S803). In other words, because the partially ordered matrices generated on the basis of acquired event sequences are sparsely ordered matrices (so-called sparse matrices) in which most of the elements are “0”, they are converted to totally ordered matrices by interpolating the elements of sparse matrices whose values are “0”.
The CPU 11 calculates a mapping matrix for mapping on the basis of the totally ordered matrices the similarity relations between event sequences in two-dimensional or three-dimensional space using an embedding method (Step S804). More specifically, a mapping matrix is calculated as a matrix which minimizes an objective function able to maintain a similarity relation equally between event sequences even when the similarity relation between event sequences has been mapped in two-dimensional or three-dimensional space.
The CPU 11 calculates the corresponding points of each event sequence in two-dimensional space or three-dimensional space using the calculated mapping matrix, and outputs and displays the calculated corresponding points in two-dimensional or three-dimensional space (Step S805). More specifically, coordinate points z(p, q) are determined in map space for given event sequence x using mapping matrix A calculated from (Equation 9), and the coordinate point is outputted and displayed.
In the embodiment described above, risk assessment values can be calculated for each event sequence by converting partially ordered sets (matrices) indicating event sequences with different lengths and elements into totally ordered sets (matrices), and past cases can be easily compared by displaying and outputting the calculated risk assessment values in two-dimensional space or three-dimensional space. Also, the possibility (risk) of a critical event occurring can be visually evaluated in each event sequence by plotting and displaying or by performing a density conversion and then displaying the calculated risk assessment values in two-dimensional or three-dimensional space.
Another embodiment of the present invention is the method in the first aspect of the invention, in which the mapping matrix is calculated as a matrix minimizing an objective function able to maintain a similarity relation equally between event sequences even when the similarity relation between event sequences has been mapped in two-dimensional or three-dimensional space.
Another embodiment of the present invention includes a step for running likelihood cross-validation on the event sequences and for estimating the kernel density of the event sequences on which likelihood cross-validation has been run.
embodiment of the present invention, in which the method also includes a step for calculating the corresponding points in two-dimensional space or three-dimensional space for all event sequences, for determining whether or not the kernel density is greater than a predetermined value at each calculated corresponding point, and for superimposing and outputting for display a circumscribed area of corresponding points exceeding the predetermined value.
Another embodiment of the present invention includes the mapping matrix calculating means calculates the mapping matrix as a matrix minimizing an objective function able to maintain a similarity relation equally between event sequences even when the similarity relation between event sequences has been mapped in two-dimensional or three-dimensional space.
Another embodiment of the present invention includes a kernel density estimating means for running likelihood cross-validation on the event sequences, and for estimating the kernel density of the event sequences on which likelihood cross-validation has been run.
Another embodiment of the present invention includes an area display output means for calculating the corresponding points in two-dimensional space or three-dimensional space for all event sequences, and for superimposing and outputting for display in two-dimensional space or three-dimensional space circumscribed areas of corresponding points labeled as to whether or not a risk has occurred at each calculated corresponding point.
The embodiment described above can be applied effectively to medical event sequences. For example, there is a wide range of symptoms such as having a headache, having a stomachache and feeling sick, and it is difficult to determine whether or not a series of symptoms is a sign of a serious illness. Therefore, it is conceivable that the risk of suffering from serious illnesses can be reduced by acquiring event sequences such as interview data with many patients and data on everyday life as sampling data, and applying the sampling data to a model to predict the risk of suffering from a serious illness such as diabetes or cancer.
In the present invention, risk assessment values can be calculated for each event sequence by converting partially ordered sets (matrices) indicating event sequences with different lengths and elements into totally ordered sets (matrices), and past cases can be easily compared by displaying and outputting the calculated risk assessment values in two-dimensional space or three-dimensional space. Also, the possibility (risk) of a critical event occurring can be visually evaluated in each event sequence by plotting and displaying or by performing a density conversion and then displaying the calculated risk assessment values in two-dimensional or three-dimensional space.
The present invention is not limited to the embodiment described above, and various modifications and improvements are possible within the scope of the present invention. In other words, the present invention is not limited to the medical event sequences described in the embodiment. Needless to say, it can be applied to any event in which there is a cause and effect.
Number | Date | Country | Kind |
---|---|---|---|
2011-266666 | Dec 2011 | JP | national |
This application claims priority under 35 U.S.C. 371 from PCT Application, PCT/JP2012/080880, filed on Nov. 29, 2012, which claims priority from the Japanese Patent Application No. 2011-266666, filed on Dec. 6, 2011. The entire contents of both applications are incorporated herein by reference.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2012/080880 | 11/29/2012 | WO | 00 | 6/4/2014 |