The present invention relates to a multidimensional data analysis method, a multidimensional data analysis apparatus, and a program for performing multidimensional analysis of time series data in which dimensions and events are in a many-to-many relationship.
In recent years, remarkable progress of computer environments and surrounding network technologies and development of basic technologies such as middleware typified by databases contribute to improvements in techniques of storing and managing enormous amounts of information. In addition, the Ministry of Health, Labor and Welfare has formulated a “Grand Design for Informatization of Medical, Healthcare, Nursing Care and Welfare Domains” (see Non-patent Reference 15), stimulating introduction of electronic medical record systems gradually. As a result, systems for storing medical and administrative data are becoming increasingly common to improve medical care service efficiency.
Meanwhile, there are growing expectations toward information management techniques that enhance intellectual productivity and analysis techniques that allow for new knowledge discovery by utilizing enormous amounts of information stored on a daily basis. As recent situations surrounding medical care, financial stringency in medical insurance system due to increasing national medical expenditure and an aging population with fewer children, combined with increasingly IT-oriented public services as represented by the e-Japan Strategy, raises a need for hospital management reforms using information systems (see Non-patent Reference 9).
Currently, medical information systems are introduced, though gradually, along the Grand Design for Informatization of Medical Domains, and there are some signs of improved efficiency in medical care and hospital management services. Enhancement of medical transparency has brought success in reassuring patients.
However, even when enormous amounts of medical information are stored, techniques of utilizing such medical information in order to increase management efficiency and establish evidence-based medicine (EBM) still have room for improvement.
In detail, medical information data includes time series data of medical care, testing, medication, surgery, and the like of patients, and each item has an extremely complex hierarchical structure and is managed as master data. Each patient receives different medical care, surgery, medication, and/or testing a plurality of times in different medical departments. Analyzing these data contributes to more detailed analysis of medical processes, evaluation of critical paths (clinical paths), and so on (see Non-patent Reference 9). However, it is not easy to perform analysis by a data mining technique of fully searching a possible hypothesis space in order to find a problem from a whole database which is large and complex. It is more realistic to perform such analysis that narrows down an item of the user's interest interactively or by trial and error, in terms of a computer processing capability too.
Interactive analysis is also effective as a process of finding a problem from data having a complex structure. In the field of databases, a multidimensional database is used as a technique of interactively analyzing time series data (see Non-patent References 1, 2, 4, 6, and 11).
The multidimensional database treats data as a set of events having measures and dimensions. For example, in retail sales data, each purchase history is a fact, an amount and a price are measures, and a product type, a purchase time, a purchase location, and the like are dimensions. A process of performing search, extraction, and processing on enormous amounts of original data, storing in a multidimensional database, and outputting a result is called Online Analytical Processing (OLAP). Each dimension of the multidimensional database has a hierarchical structure, so that data can be selected/aggregated at a data granularity corresponding to a processing request.
For instance, there is a purchase history example as a typical example of analysis in a conventional multidimensional database. In each store, information on which products are sold and when, where, and how much the products are sold are stored in a database, and a sales total and the like are aggregated in a three-dimensional database as shown in
Non-patent Reference 1: S. Agarwal, R. Agrawal, P. Deshpande, A. Gupta, J. F. Naughton, R. Ramakrishnan, and S. Sarawagi, On the Computation of Multidimensional Aggregates, Proc. of International Conference on Very Large Data Bases, pp. 506-521, 1996
Non-patent Reference 2: P. Baumann, A. Dehmel, P. Furtado, R. Ritsch, and N. Widmann, Spatia-Temporal Retrieval with RasDaMan, Proc. of International Conference on Very Large Data Bases, pp. 746-749, 1999
Non-patent Reference 3: P. F. Dietz, Maintaining order in a linked list, Proc. of Annual ACM Symposium on Theory of Computing, pp. 122-127, 1982
Non-patent Reference 4: S. Goil and A. N. Choudhary, High Performance Multi-dimensional Analysis of Large Datasets, Proc. of International Workshop on Data Warehousing and OLAP, pp. 34-39, 1998
Non-patent Reference 5: H. Gupta, V. Harinarayan, A. Rajaraman, and J. D. Ullman, Index Selection for OLAP, Proc. of International Conference on Data Engineering, pp. 208-219, 1997
Non-patent Reference 6: M. Gyssens and L. Lakshmanan, A Foundation for Multi-dimensional Databases, Proc. of International Conference on Very Large Data Bases, pp. 106-115, 1997
Non-patent Reference 7: A. Inokuchi, K. Takeda, N. Inaoka, and F. Wakao, MedTAKMI-CDI: Interactive knowledge discovery for clinical decision intelligence, IBM Systems Journal, Volume 46, Number 1, pp. 115-134, 2007
Non-patent Reference 8: A. Inokuchi and K. Takeda, A Method for Online Analytical Processing of Text Data, Proceedings of ACM Conference on Information and Knowledge Management (CIKM 2007), 2007 (to appear)
Non-patent Reference 9: Y. Kinosada, T. Umemoto, A. Inokuchi, K. Takeda, and N. Inaoka, Challenge to Analysis for Clinical Processes by Using Mining Technology, Japan Journal of Medical Informatics, Vol. 26, No. 3, pp. 191-199, 2006
Non-patent Reference 10: T. Pedersen and C. Jensen, Multidimensional Data Modeling for Complex Data, Proceedings of the 15th International Conference on Data Engineering, pp. 336-345, 1999
Non-patent Reference 11: T. B. Pedersen and C. S. Jensen, Multidimensional Database Technology, IEEE Computer, Vol. 34, No. 12, pp. 40-46, 2001
Non-patent Reference 12: F. Wakao, B. K. Ishikawa, N. Inaoka, A. Inokuchi, and S. Suzuki, A Study on Clinical Process Analysis System for Cancer, the 25th Joint Conference on Medical Informatics, 2-F-6-6, 2005
Non-patent Reference 13: L. Wang, A. Zhang, and M. Ramanathan, BioStar Models of Clinical and Genomic Data for Biomedical Data Warehouse Design, International Journal of Bioinformatics Research and Applications, Vol. 1, No. 1, pp. 63-80, 2005
Non-patent Reference 14: T. Igarashi, T. Ashihara, S. Nagata, M. Takada, and K. Nakazawa, A Pen-based Interface for Electronic Medical Recording Systems, Japan Journal of Medical Informatics, Vol. 20, No. 2, pp. 482-483, 2000
Non-patent Reference 15: the Ministry of Health, Labor and Welfare, a Grand Design for Informatization of Medical, Healthcare, Care and Welfare Domains, http://www.mhlw.go.jp/houdou/2007/03/h0327-3.html.
Non-patent Reference 16: M. Nishibori and S. Shiina, Developing the Ideal User Interface for the Medical Information System, Japan Journal of Medical Informatics, Vol. 10, No. 1, pp. 3-14, 1990
Non-patent Reference 17: Y. Yamanobe, S. Aizawa, and M. Honda, GUI Problems in Electronic Medical Record Systems, IT Health Care, Vol. 2, No. 1, pp. 28-31, 2007. 8
However, in the case of analyzing, for example, medical data in electronic medical records using the above-mentioned existing multidimensional database, due to characteristics of medical information data, it is difficult to store data by a schema used in the conventional multidimensional database, and also a temporal order of data needs to be taken into consideration at the time of analysis. Hence, a new method for modeling and analyzing more complex data than purchase history data and the like which have been much studied thus far is necessary.
That is, when analyzing medical information data using conventional OLAP, the conventional OLAP has the following four problems with regard to the medical information data.
Firstly, in a multidimensional database by a star schema, facts and dimensions are in a 1-to-n relationship. However, medical histories do not necessarily have a 1-to-n relationship but often have an n-to-m relationship. In detail, in retail sales data analysis, one purchase history which is a fact is associated with only one dimension value in each dimension such as a product type, a purchase time, and so a purchase location. On the other hand, in the case of medical histories where a history of one patient is set as a fact and medical care, surgery, medication, and test data are set as dimensions, a plurality of dimension values in each dimension exist for one fact, and a plurality of facts correspond to an item which can be a dimension. This cannot be supported by the conventional star schema. Although data can be stored in the star schema if one hospital stay is treated as a fact and a “main” disease name, a “main” surgical operation, and the like are treated as dimensions, this makes it difficult to perform analysis involving both outpatients and inpatients and analysis across a plurality of hospital stays.
Secondly, in medical information data, a temporal order of events has an important meaning, and an analytical query needs to be made in consideration of an order of events. In detail, for a patient with larynx cancer, the case of reducing tumor size by chemotherapy or radiation therapy before performing surgery and the case of applying chemotherapy or radiation therapy to prevent cancer recurrence after performing surgery need to be perceived as different medical processes.
Thirdly, since complex conditions are combined in a query in consideration of the problems mentioned above, efficient processing for interactive analysis is necessary. However, it is difficult to apply a form such as MOLAP that requires pre-aggregation, to medical data having many types of items which can be dimensions.
Fourthly, to execute such complex processing, a complex query needs to be provided using a query language such as SQL. Assuming that the user is a healthcare professional unfamiliar with SQL, an intuitively operable user interface is necessary in order to perform interactive analysis.
Thus, while individual purchases can be treated as separate records, each test history, surgery history, admission-discharge history, disease history, and the like of electronic medical records constitute a series of data for one patient, with there being a problem that sufficient analysis cannot be performed due to differences in data characteristics. In the case of purchase histories, one purchase record is associated with one purchase location, one purchase time, and one product type that belong to different dimensions. In the case of medical data, on the other hand, each item is associated with a plurality of test histories, surgery histories, admission-discharge histories, and disease histories, for a patient. Although there is an example of associating with one set of main data such as a main disease name, a main surgical operation, whether or not tested, and the like to perform analysis using a commercial system, sufficient analysis is impossible in this case.
A technique by Pedersen described later has a difficulty of performing analysis in consideration of an order of medical processes. Besides, a technique called Biostar (see Non-patent Reference 7) mainly proposes a data storage method, while leaving, to the user, a procedure (operation) for obtaining an analysis result desired by the user. Furthermore, a technique of MedTAKMI-CDI (see Non-patent Reference 13) holds data on the basis of events, but has poor efficiency. This technique also lacks extensibility and flexibility because individual features are implemented separately.
The present invention has been made in view of the problems described above, and has an object of providing a multidimensional data analysis method having a data model and a table schema that ease handling of a temporal order by treating data, such as medical information data which is difficult to be flexibly analyzed by the conventional OLAP, as interval data having information of start times and end times of events.
Moreover, the present invention has an object of providing a multidimensional data analysis method whereby various queries of the user can be handled uniformly.
Furthermore, the present invention has an object of providing a multidimensional data analysis method having a user interface that allows the user's purpose of analysis to be intuitively expressed to thereby execute the analysis easily.
To solve the problems described above, a multidimensional data analysis method according to the present invention is a multidimensional data analysis method for performing multidimensional analysis of time series data in which dimensions and events are in a many-to-many relationship, the multidimensional data analysis method including: holding an interval table I and a hierarchy table T separately in a database, the interval table I indicating intervals having information of start times and end times of the events, and the hierarchy table T indicating a hierarchical structure of each dimension of multidimensional data; selecting an interval having a property β requested by a user from the interval table I, by using an interval selection operation g which is an operation of returning a table indicating an interval; joining a set of intervals with a join operation β in the interval I′ selected in the selecting, by using the join operation β which is an operation of joining the interval I′ with a predetermined join condition; and generating a multidimensional cube from a result of the joining, by using an aggregation operation a which is an operation of generating a multidimensional cube of n dimensions from a data table.
According to this structure, data is treated as interval data having information of start times and end times of events, by using the interval table I. Thus, it is possible to provide a multidimensional data analysis method having a data model and a table schema that ease handling of a temporal order, whereby various queries of the user can be handled uniformly through the use of the interval selection operation g, the join operation β, and the aggregation operation α.
Moreover, the multidimensional data analysis method according to the present invention further includes: receiving an input command from the user; and displaying the multidimensional cube generated in the generating and a user interface used in a user operation in the receiving, on a screen, wherein, in the user interface displayed in the displaying, a left side and a right side of a rectangle object are set as a start time and an end time of an interval, connecting two intervals of different rectangle objects with a line designates a temporal order of the intervals, and connecting the rectangle objects to an aggregation operation rectangle object with a line inputs the aggregation operation.
According to this structure, the user performs interactive analysis using the user interface in the input step. Since the user interface can be operated even by the user such as a healthcare professional unfamiliar with operators and programming, it is possible to provide a multidimensional data analysis method that allows the user's purpose of analysis to be intuitively expressed to thereby execute the analysis easily.
Note that, to achieve the stated objects, the present invention may also be realized as a multidimensional data analysis apparatus including units corresponding to the characteristic steps of the multidimensional data analysis method, or as a program causing a computer to execute each of the steps. Such a program may be distributed via a recording medium such as a CD-ROM or a transmission medium such as the Internet.
In the multidimensional data analysis method according to the present invention, a data model and a table schema that ease handling of a temporal order can be realized by treating data as interval data having information of start times and end times of events. Moreover, data operations that enable various queries of the user to be handled uniformly can be provided. Furthermore, a user interface that allows the user's purpose of analysis to be intuitively expressed to thereby execute the analysis easily can be provided.
In
In
The following describes an embodiment of a multidimensional data analysis method according to the present invention, with reference to drawings.
In the multidimensional data analysis method according to the present invention, for example, medical data such as electronic medical records is stored in a database in a state of being separated between a table I indicating intervals having information of start times and end times of events and a hierarchy table T indicating a hierarchical structure of each dimension of multidimensional data. For example, the table I holds admission-discharge periods, disease periods, surgery periods, and the like of patients, and the table T holds a surgical procedure hierarchy, a disease hierarchy (ICD: International Classification of Diseases), and the like. Through the use of each of an interval selection operation g, a join operation β, and an aggregation operation a described later, a search result requested by the user can be displayed as a multidimensional cube.
Moreover, as shown in
A multidimensional data analysis apparatus 200 includes: a database 201 in which the interval table I and the hierarchy table T using electronic medical record information are held separately; an operation unit 202 including an aggregation operation unit 202a, a join operation unit 202b, and an interval selection operation unit 202c; a display unit 203 that displays a multidimensional cube as an operation result of the operation unit 202 and a user interface operated through an input unit 204; and the input unit 204 which is an operation input unit such as a keyboard.
First, the interval selection operation unit 202c selects an interval I′=g(I, T, c) having a property c requested by the user from I, by the interval selection operation g (Step S301). Following this, the join operation unit 202b joins a set of intervals with β({I′1, . . . , I′n}, O, W)=no(σp(I′1× . . . I′n)), by the join operation β (Step S302). Here, W and O are columns of selection conditions and outputs. Lastly, the aggregation operation unit 202a generates a multidimensional cube by the aggregation operation a (Step S303). The generated multidimensional cube is displayed by the display unit 203.
The following describes the multidimensional data analysis method according to the present invention in more detail.
First, when defining the technique proposed in the present invention in accordance with the references (see Non-patent References 8 and 10), analysis target data D is defined as D={(fi, {pi1; pi2, . . . pim})} (i=1; 2, . . . , n).
Here, {fi|i=1; 2, . . . n} is a set of patient IDs, and pij is interval information. Moreover, (fi; {pi1, pi2, . . . pim}) means each patient fi has a set of interval-related information pij.
An interval is defined as pij=(ts, te, {c: v}), where ts and te respectively denote a start time and an end time of the interval. In particular, when ts=te, the interval pij is called an event. v is a value describing the interval, and c is a category to which the value v belongs. c is also a node in data having a hierarchy.
In more detail, when {pij} is an interval (time period) relating to admission-discharge, c: v includes a disease name, an attending doctor, and the like during the hospital stay. An International Classification of Diseases (ICD) 400 having a hierarchical structure as shown in
Given a hierarchy set D={Tk}, a schema is defined as S=(F; D) where F is a fact type and Tk is a hierarchy type Tk=(Cl; <_Tk). A hierarchy instance Tk of the type Tk is Tk=(Ck; <_Tk). Here, Ck denotes a set of categories cj, and <_Tk denotes a partial order relation between Ck.
A hierarchy used in the present invention does not need to be a balanced tree adopted in many conventional OLAP systems, and a Directed Acyclic Graph (DAG) is assumed (see Non-patent Reference 8). Each category cεC has a domain dom(c), and each element of dom(c) is expressed as {c: v} as mentioned earlier.
To increase a computation speed of the aggregation operation, the hierarchy is indexed as follows. An artificial root node croat is given as a parent node of cj having no higher concept in C. Starting at croot, depth first search is performed while assigning a preorder, a postorder, and a depth to each node. Note that the search does not backtrack at internal nodes, and backtracks only at leaf nodes. Determination on whether or not an input category c and a category of data are in a descendant relationship can be easily made by the following condition. When a node A is an ancestor of a node B, the following expression (1) holds (see Non-patent Reference 3).
A's preorder1(=preorder of A)<preorder of B≦A's preorder2=postorder of A+depth of A [Expression 1]
To store hierarchical relationships and interval information, the tables CATEGORY T and INTERVAL I are defined as follows.
Each record of T corresponds to a different one of nodes in a hierarchy, and CATENAME, PATH, PREORDER1, PREORDER2, AND PARENT are respectively a category name of the node, a path from a root node to the node, a preorder of the node, a sum of a postorder and a depth of the node, and a preorder of a parent node.
Each record of I corresponds to information obtained by dividing (ts; te; {c: v}) by |{c: v}, and ID, START, END, PREORDER, VALUE, and INTERVALID are respectively a patient ID, an interval start time, an interval end time, a preorder of a category c, a value v in dom(c), and an interval identifier. The reason for using the interval identifier INTERVALID is that (ts; te; {c: v}) is divided by |{c: v}|.
The aggregation operation is defined as follows, using the two tables described above. In the following definition, Tc denotes “σp(T) FETCH FIRST 1 ROWS ONLY” which is an SQL statement of returning one tuple of the table T for an input category.
(1) Aggregation operation a: an aggregation operation of returning α(A)=v1; v2, . . . , vnXv1; v2; . . . ; vn; count(distince it) for a tble A (v1, v2, vn, id) is defined as σ(A). It can be understood that the operation σ is a function of generating a multidimensional cube of n dimensions from the table A.
(2) Join operation β: the join operation β is defined as β({I′1, I′2; . . . ; I′n}; O; W)=πo(I′1×I′2× . . . ×I′n). Here, each table I′i is an interval I′(id; start; end; value; interval_id). W is a set of join condition expressions, and I′i× . . . ×I′j are joined according to the condition expressions W and I′i.id=I′j.id. O is a set of columns outputted.
(3) Interval selection operation g: the interval selection operation g(T; I; c) is defined as an operation of returning a table I′(id; start; end; value; interval_id) indicating an interval. The function g is a user-defined function 500 (see Non-patent Reference 8) defined according to a purpose of analysis.
g(3) is an operation of selecting the same interval as g(1) where v is replaced with CATEGORYNAME in the table T. g(4) is an operation of selecting the same interval as g(1) where v is replaced with CATEGORYNAME of the child category of the designated category c. g(5) is an operation of selecting the same interval as g(1) where v is replaced with an interval start time.
Specific examples are given below to show what kind of aggregation can be performed by the operations described above.
(1) A query example 1 is expressed using an expression (2).
α(β({g(1)(T,I,c1),g(1)(T,I,c2)},O1,W1)) [Expression 2]
Let c1 and c2 be a surgery category and an admission-discharge category respectively, and an expression (3) is given.
O
1
={I′
1·value,id},
W
1
={I′
2·start≦I′1·start,I′1·end≦I′2·end} [Expression 3]
The above query returns a result of aggregating the number of patients undergoing surgery during a hospital stay, for each surgical procedure. An output image is shown in
(2) A query example 2 is expressed using an expression (4).
α(β({g(4)(T,I,c1),g(1)(T,I,c2),g(1)(T,I,c3)},O2,P2)) [Expression 4]
Let c1, c2, and c3 be a surgery category, an admission-discharge category, and a radiological examination (X-ray, CT, MRI) category respectively, and an expression (5) is given.
O
2={date(I′1·start),I′1value,id},
W
2
={I′2start≦I′3·start,I′3·end≦I′2·end} [Expression 5]
The above query returns a result of aggregating the number of patients undergoing a radiological examination and surgery “in this order” during a hospital stay, for each department of surgery and for each surgery date. An output image 700 is shown in
(3) A query example 3 is expressed using an expression (6)
α(β({g(4))(T,I,c1),g(1)(T,I,c2),g(1)(T,I,c4)},O3,W3)) [Expression 6]
Let c1, c2, and c4 be a surgery category, an admission-discharge category, and a gender category respectively, and an expression (7) is given.
O
3
={I′
1·value,date(I′1·start)−date(I′2·start),I′3·value,interval_id},
W
3={year(I′2·start)=2007} [Expression 7]
The above query returns a result of aggregating the number of surgical operations of patients hospitalized in 2007 for each gender and for each department, in relation to the number of days elapsed from an admission date to a surgery date. An output image 800 is shown in
In
(4) A query example 4 is expressed using an expression (8).
α(β({g(1)(T,I,c3),g(1)(T,I,c3),g(1)(T,I,c3),g(1)(T,I,c2)},O4,W4)) [Expression 8]
Let c3 be a radiological examination category, and an expression (9) is given.
O
3
={I′
1·value,I′2·value,I′3·value,id},
W
4
={I′
4·start≦I′1·start<I′2·start<I′3·start≦I4·end} [Expression 9]
The above query is a query of aggregating the number of instances of the order of each radiological examination type, for patients undergoing a radiological examination three or more times during a hospital stay. An output image 900 is shown in
As shown in
(5) A query example 5 is expressed using an expression (10).
α(β(β({g(7)(T,I,c5)},O5,φ))
Let c5 be a white blood cell count category, and O5={I1.value;id}, g(7) be a function of discretizing the white blood cell count. This being the case, the above query returns a result as shown in
The following describes the user interface used in the multidimensional data analysis apparatus according to the present invention.
In an environment where electronic medical record information is stored in a relational database, a person having experience of using SQL can obtain a desired analysis result by directly inquiring an operational system (or its replica), without using the tables described above.
However, the present invention is intended to be used by a healthcare professional having no experience of using SQL. As an example, an electronic medical record system introduced in a G university hospital contains master information over 100 and several tens of implementation tables, so that it is not easy for the user unfamiliar with SQL to express a query for obtaining a desired analysis result.
Besides, there is a difficulty in expressing the combination of the functions α, β, and g described above. In view of this, the present invention proposes a user interface that allows a query representing the user's purpose of analysis to be expressed easily.
(a) in
Through the use of such a user interface, the above-mentioned query examples 1 and 2 are expressed as (a) and (b) in
The present invention described above is implemented in Java (registered trademark), thereby realizing HealthCube which is a system of aggregating data in a relational database through Java Database Connectivity (JDBC).
Moreover, patient medical history information 1300 is pseudo-generated using the master information of the G university hospital.
In detail, each figure in the table indicates the number of patients who have undergone testing in the vertical axis and then undergone surgery in the horizontal axis. The number of patients in the pseudo data is 50,400, and the total number of intervals is 4,187,845. Most queries can be returned in several seconds, though the speed depends on the number of intervals and the number of dimensions of an aggregation result as conditions included in a query.
The following gives observations and describes related research.
Though medical information systems have been continuously discussed even before the Ministry of Health and Welfare launched the electronic medical chart development project in 1995, there is still ongoing debate about their operability and interfaces (see Non-patent References 14, 16, and 17).
Problems often cited include a lack of understanding of a use environment, a shortage of time the user can spare to use the system, a complex operational procedure, and an impossibility of reflecting flexible thinking. Similar problems are also raised with regard to medical information analysis tools. To enhance convenience and efficiency in an interactive analysis technique such as OLAP, it is important to not only improve tool operability but also enable the user to intuitively express what he/she wants to analyze so that the user's purpose is reflected on an output result. In consideration of these points, research relevant to the present invention is examined below.
As described above, according to the present invention, various types of query statements can be created in the same form by combining the operation functions α, β, and g and the tables T and I. Though the above-mentioned examples are relatively simple due to space limitations, it is possible to create a more complex query. An order relation between intervals or events created by a query does not need to be a total ordering, and may be a partial ordering. For example, even when intervals A and B are after C, it is possible to create a query that does not designate the order of the intervals A and B.
Research relevant to the present invention is described in Non-patent Reference 10. Non-patent Reference 10 presents nine requirements when analyzing medical data by OLAP, and proposes a data model addressing the nine requirements and operations associated with the data model. However, in the operations defined for generating a multidimensional cube, the same dimension cannot be selected, and so the result shown in
As mentioned above, an operation for obtaining surgical procedures performed during admission-discharge periods is expressed by an expression (11).
β({g(1)(T,I,c1),g(1)(T,I,c2)},O1W1)=πo(g(1)(c1)wg(1)(c2)) [Expression 11]
Here, c1 and c2 are respectively a surgery category and an admission-discharge category, and an expression (12) is given.
O
1
={I′
1·value,id},
W
1
={I′
2·start≦I′1·start,I′1·end≦I′2·end} [Expression 12]
Moreover, part of T and I is omitted for the sake of convenience. On the other hand, MedTAKMI-CDI (see Non-patent Reference 7) is a technique proposed to solve part of the problems listed above, too. In MedTAKMI-CDT, data is held not in units of intervals but in units of events. Therefore, in the case of an admission-discharge interval, data is held as an admission event and a discharge event having event times. According to MedTAKMI-CDI, an operation of obtaining surgical procedures performed during admission-discharge periods is expressed by an expression (13).
πo3(πo2(GXo1(g(1)(c2)P1g(1)(c3)))
P2g(1)(c1))
Here, c1, c2, and c3 are respectively surgery event, admission event, and discharge event categories, and an expression (14) is given.
P
1={2·id=3·id and 2·start<3·start},
P
2={2·id=1·id and 2·start≦1·start end},
O
1={2·id,2·start,min(3·start−2·start)as min},
O
2={2·id,2·start,2·start+min as end},
O
3={1·value, G=2·start} [Expression 14]
Here, i.start is a column name returned from g(1)(c1). When comparing the queries (2) and (3), the query (2) requires one join, whereas the query (3) requires two joins and one aggregation. Since g(ci)× . . . × p g(cj) joins tables having tuples as many as tuples held in a fact table of a star schema, it is clear that the latter requires a more computation time.
Furthermore, while the present invention enables various analysis requests to be generated by the operations α, β, and g and the aggregations such as
As described above, in the multidimensional data analysis method according to the present invention, a multidimensional cube can be generated in consideration of a temporal order of intervals or events that cannot be sufficiently analyzed in conventional techniques, and also various queries can be expressed by the combination of the operations α, β, and g. Moreover, an intuitive interface capable of generating queries supporting interactive analysis can be provided.
Accordingly, for example, for medical data and administrative data stored in a hospital information system, various types of query statements can be generated by combining tables and operation functions incorporating the concept of interval data, with it being possible to perform flexible analysis in an interactive manner.
In addition, by analyzing past medical history data using the multidimensional data analysis method according to the present invention, medical care quality can be improved and evaluated. Furthermore, in the case where hospital management needs to be reviewed due to modification of medical service fees and the like, the effect of improved management can be expected through comparison between departments and investigation into causes of prolonged hospitalizations.
Note that, though the present invention has been described on the basis of medical information data, the present invention is versatile and applicable to different types of data.
The multidimensional data analysis method according to the present invention can be used for medical process analysis and clinical path quantitative evaluation, when applied to medical data of electronic medical records. However, the multidimensional data analysis method according to the present invention is highly versatile and applicable not only to medical data but also to, for example, quality management and market analysis.
Number | Date | Country | Kind |
---|---|---|---|
2007-301025 | Nov 2007 | JP | national |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/JP2008/003366 | 11/18/2008 | WO | 00 | 5/19/2010 |