The present application claims priority from Japanese application JP2007-009371 filed on Jan. 18, 2007, the content of which is hereby incorporated by reference into this application.
The present invention relates to a technique for registering and retrieving structured data.
In recent years, needs for retrieving required information from electronized documents fast reliably have increased. There is a full text retrieval system as a system that meets such needs. In the full text retrieval system, a computer system can retrieve documents containing specified characters from a database of documents. Furthermore, the full text retrieval system is also sophisticated. Not only retrieval in conventional flat documents, but also retrieval with a structure specified in structured documents (structured data) such as XML (Extensible Markup Language) data is made possible (see JP-A-10-240752). For example, information containing an author name “A” is retrieved from information in the range of “<bibliography>” to “</bibliography>” in documents described with XML. In this way, retrieval with a document structure specified has become possible.
As a technique for raising the speed of the full text retrieval, there is a technique using an n-gram index. With respect to n connected characters (n-gram), the n-gram index indicates a position in a document in which the n characters appear, as an index. In structured documents such as XML data as well, it is possible to manage in which structure of the XML data the connected characters appear, by using the n-gram index.
The computer system can retrieve information at high speed by using the n-gram index. However, there is a problem that it takes time to conduct index (full text retrieval index) such as additional registration of indexes.
Therefore, the following technique is proposed in order to make it possible to retrieve documents without spending the update processing time of the full text retrieval index. In other words, when newly registering a document, the computer first stores the document at it is in an update text buffer. When the computer retrieves documents, the computer retrieves both documents stored in the update text buffer and indexes in the full text retrieval index. In other words, the computer conducts text scan on documents stored in the update text buffer and retrieves an index containing a specified character string on the full text retrieval index.
Separately from the retrieval processing (for example, while the computer is not conducting the retrieval processing), the computer updates the full text retrieval index on the basis of documents in the update text buffer. By the way, the update of the full text retrieval index is conducted in response to a command input from a system manager or storage of documents exceeding a predetermined number in the update text buffer (see JP-A-10-240754).
However, the technique described in JP-A-10-240754 has a problem that an increase of the number of documents registered in the update text buffer causes an increase of retrieval processing time for documents stored in the update text buffer. In other words, there is a problem that it takes a considerably long time if the computer executes retrieval processing in a state in which a large number of documents for each of which an index has not yet been generated are stored in the update text buffer. This problem is also posed in the same way when the technique for retrieving structured data described in JP-A-10-240752 is used in the technique described in JP-A-10-240754.
An object of the present invention is to solve the problem and raise the speed of data retrieval without increasing the structured data registration time, in a document retrieval system for structured data such as XML data.
In order to solve the problem, a computer for retrieving structured data by using an index according to the present invention accepts input of structured data and conducts structure analysis on the input structured data. In other words, the computer analyzes names of structure elements included in the structured data, relations among the structure elements, and appearance locations, in the structured data, of the structure elements. Subsequently, the computer calculates a processing cost for reflecting the structured data to the index on the basis of the generated structure analysis information. For example, the computer calculates a registration processing time required to reflect the structured data to the index. When the calculated processing cost exceeds the predetermined threshold, the computer stores structure analysis information concerning the structured data in a storage. In other words, the computer only stores the structure analysis information in the storage, and does not reflect the input structured data to the index. When the computer accepts an input of a retrieval request containing a structure condition and structured data that is an object of the retrieval request is structured data that is not reflected to the index, the computer conducts retrieval processing described hereafter. First, the computer reads out an appearance location, in the structured data, of a structure element satisfying the structure condition from the structure analysis information stored in the storage. And the computer retrieves data satisfying the retrieval request from data in the appearance location read out. For example, the computer conducts test scan.
In this way, the computer stores structured data that takes a long time to conduct index reflection (index update) in the storage at a stage in which structure analysis information is generated. In other words, index update based on the structure analysis information is not conducted. On the other hand, as for structured data that does not take a long time to update the index, the computer generates structure analysis information and then conducts index update on the basis of the structure analysis information.
When conducting retrieval in structured data that are not yet reflected to the index, the computer judges which range of structured data unreflected to the index should be a retrieval object on the basis of information indicated in the structure analysis information (information such as names of structure elements included in the structured data, relations among the structure elements, and appearance locations, in the structured data, of the structure elements), and narrows down the retrieval range. And the computer retrieves data satisfying a retrieval request over the range narrowed down. For example, the computer retrieves data containing a character string specified in the retrieval request over the predetermined range of structured data. Therefore, the computer can conduct retrieval faster as compared with the case where the computer conducts character string retrieval in all structured data unreflected to the index. Furthermore, the computer can conduct retrieval fast by using the index for structured data already reflected to the index as well. In other words, the speed of data retrieval can be raised without increasing the registration time of structured data.
According to the present invention, the speed of data retrieval can be raised without increasing the structured data registration time, in a document retrieval system for structured data such as XML data.
Other objects, features and advantages of the invention will become apparent from the following description of the embodiments of the invention taken in conjunction with the accompanying drawings.
Hereafter, embodiments of the present invention will be described with reference to the drawings. In the ensuing description, the object of retrieval and registration in the present system is supposed to be XML data. However, the object may be other data as long as the data is structured data.
The terminal devices 204 and 205 have application programs 221 and 222, respectively. The terminal devices 204 and 205 request the computer 201 to conduct various operation processing such as XML data registration or retrieval by using the application programs 221 and 222, respectively. The terminal devices 204 and 205 are connected to the computer 201 via the network 206 so as to be capable of conducting communication. Each of the terminal devices 204 and 205 is implemented by using, for example, a PC (personal computer). An input device (such as a keyboard and a mouse) and an output device (such as a liquid crystal display), which are not illustrated, are connected to each of the terminal devices 204 and 205. The network 206 is implemented by using, for example, the Internet or a LAN (local area network).
In the ensuing description, the terminal device 204 is supposed to be a terminal device that mainly registers XML data and the terminal device 205 is supposed to be a terminal device that mainly retrieves XML data. However, the terminal devices are not constrained to them. The number of terminal devices connected to the computer 201 is not restricted to the number exemplified in
The computer 201 conducts various kinds of operation processing such as XML data registration and retrieval. The computer 201 includes a network interface, an input interface and an output interface (which are not illustrated). The computer 201 conducts communication with the terminal devices 204 and 205 via the network 206 by using the network interface. Furthermore, the computer 201 reads data from the disk device 207 and writes data into the disk device 207 via the input interface and the output interface.
The disk device 207 is a storage connected to the computer 201. The disk device 207 includes a database 60 of XML data. The disk device 207 is implemented by using, for example, a HDD (hard disk drive) or a flash memory. In
The computer 201 includes a CPU (central processing unit) 202 and a main storage 203. Although not illustrated, the computer 201 includes a network interface, an input interface and an output interface.
The CPU 202 reads out a program (not illustrated) stored in the disk device 207 onto the main storage (main memory) 203 and executes the program. Thus the CPU 202 conducts various kinds of operation processing such as XML data registration and retrieval.
The main storage 203 is a storage used when the CPU 202 conducts various kinds of operation processing. The main storage 203 stores unreflected data management information 39, and secures a structure analysis information storage area 40 and an area for a database buffer 44 in a predetermined area. The main storage 203 and the disk device 207 are collectively referred to as storage.
The unreflected data management information 39 is information indicating identifiers of XML data that is included in XML data input to a database management system 10 and that is not yet reflected to the database 60. For example, as exemplified in
The database management system 10 can know a data identifier of XML data that is not reflected to any index, by referring to the unreflected data management information 39. Furthermore, the database management system 10 can know a storage area of structure analysis information of the XML data that is not reflected to any index. Furthermore, the database management system 10 can know access information 302 to structure analysis information 306 to 308 generated from these XML data.
The structure analysis information storage area 40 (see
The structure analysis information will now be described with reference to
For example, in the XML data exemplified in
For example, it is indicated in the structure analysis information shown in
Referring back to
The database buffer 44 is a storage area used when the database management system 10 reads out XML data from the database 60. In the present embodiment, mainly XML data that are not yet reflected to the index are read out onto the database buffer 44.
A configuration of the database management system 10 will now be described. The database management system 10 includes an input processing part 220, an output processing part 230, and a database access control part 210.
The input processing part 220 receives/delivers information input via the network interface, the input interface or the output interface from/to the database access control part 210. The output processing part 230 outputs a result of processing conducted in the database access control part 210 via the network interface, the input interface or the output interface.
The database access control part 210 includes a data management part 216, a structure analysis information management part 217, and an index management part 211.
The database access control part 210 calls the data management part 216, the structure analysis information management part 217, and the index management part 211 according to a kind or condition of an XML data registration request from the terminal device 204 or an XML data retrieval request from the terminal device 205. And the database access control part 210 transmits results of operation processing conducted by the data management part 216, the structure analysis information management part 217, and the index management part 211 to the terminal devices 204 and 205.
The data management part 216 conducts takeout, update and deletion of data in the database 60 stored in the disk device 207.
The structure analysis information management part 217 manages the unreflected data management information 39 and structure analysis information stored in the structure analysis information storage area 40. In other words, the structure analysis information management part 217 adds/deletes structure analysis information to/from the structure analysis information storage area 40. Furthermore, the structure analysis information management part 217 adds/deletes an entry of XML data that is not yet reflected to an index to/from the unreflected data management information 39.
The index management part 211 includes an index registration processing part 212 and the index retrieval processing part 214. The index management part 211 starts these processing parts according to contents of requests from the terminal devices 204 and 205. For example, upon accepting an XML data registration request from the terminal device 204, the index management part 211 starts the index registration processing part 212. Upon accepting an XML data retrieval request from the terminal device 205, the index management part 211 starts the index retrieval processing part 214.
The index registration processing part 212 updates the index 66 in the database 60 on the basis of structure analysis information of XML data.
The index retrieval processing part 214 retrieves the index 66, the structure analysis information and XML data on the database buffer 44 by using an input retrieval condition (a structure condition and a character string condition) as a key.
Details of the database access control part 210 will be described later.
The disk device 207 includes the database 60. The database 60 includes a table 62 for storing XML data, the index 66 of the XML data, and definition information 61.
The table 62 stores XML data. Every data identifier (data ID) of XML data, XML data associated with the identifier is stored in the table 62. TABLE 1 shows an example of the table 62. In TABLE “TI,” XML data associated with data identifiers “1” and “2” are stored.
By the way, XML data that are not yet reflected to the index are also stored in the table 62. The table 62 may contain meta data (for example, registration date of XML data) concerning XML data, besides the XML data.
The index 66 is an index of XML data stored in the table 62. The index 66 is generated every table 62. The index 66 is retrieved by the index retrieval processing part 214.
The index 66 includes a structured index for retrieving, for example, XML data by following structure elements included in the XML data, and a character string index for retrieving a character string of XML data. The structured index is an index which indicates XML data with a tree structure by using a tag of XML data as a node. The character string index is an index which indicates a document number of XML data containing a character string or which indicates a character location in the XML data every character string. The index retrieval processing part 214 can obtain XML data containing a character string indicated in a retrieval condition or a character location of the character string in the XML data, by retrieving the index 66.
The definition information 61 is information that indicates identification information of the index 66 of XML data stored in the table 62 every table 62 in the database 60. The definition information 61 exemplified in TABLE 2 indicates that an index of a table “T1” is “Idx1.” The database access control part 210 can know which index 66 is generated in each table 62 by referring to the definition information 61.
Outline of the system according to the present embodiment will now be described with reference to
First, the input processing part 220 included in the database management system 10 shown in
The data management part 216 decides to update the index 66 by referring to the definition information 61 in the database 60 (S11). For example, when the table 62 which is the registration destination of the XML data is “T1,” the data management part 216 decides to update the index 66 in the table 62 of “T1” by referring to the definition information 61.
Subsequently, the data management part 216 stores the XML data 52 into the database 60, and determines a data identifier 30 of the XML data 52 (S12). For example, the data management part 216 stores the XML data 52 into the table “T1” in the database 60, and determines a data identifier 30 of the XML data 52.
Subsequently, the index registration processing part 212 conducts structure analysis of the input XML data 52, and generates (creates) structure analysis information. And the index registration processing part 212 stores generated structure analysis information 31 in the structure analysis information storage area 40 (S13).
The index registration processing part 212 decides whether to update the index 66 on the basis of the number of structures in the structure analysis information 31 (S14).
For example, the index registration processing part 212 calculates the number of structures on the basis of the number of tags in the structure analysis information 31 and makes a decision whether the calculated number of structures exceeds a predetermined threshold. In other words, the index registration processing part 212 makes a decision whether the XML data is XML data in which it takes a comparatively long time to update the index.
If the number of structures in the structure analysis information 31 exceeds a predetermined threshold, the structure analysis information management part 217 registers an entry in the unreflected data management information 39. In other words, the structure analysis information management part 217 registers access information to the structure analysis information 31 generated at S13, and the data identifier of the XML data 52 on which the structure analysis information 31 is based, in the unreflected data management information 39. For example, the structure analysis information management part 217 registers the data identifier “2” of the XML data 52 and the access information to the structure analysis information 31. At this time, the index registration processing part 212 does not update the index 66.
On the other hand, if the calculated number of structures is equal to or less than the predetermined threshold, the index registration processing part 212 updates the index 66 by utilizing the structure analysis information. In other words, the index registration processing part 212 updates the index 66 of the table 62 which is the registration destination of the XML data 52 by utilizing the structure analysis information 31 generated at S13.
Thus, with respect to XML data for which the update time of the index 66 is comparatively short, the database management system 10 updates the index 66 on the basis of the structure analysis information of the XML data. On the other hand, with respect to XML data for which the update time of the index 66 is comparatively long, the database management system 10 only generates structure analysis information, but does not update the index 66. The generated structure analysis information is stored in the structure analysis information storage area 40 in the main storage 203 (see
Retrieval processing of XML data registered according to the above-described procedure will now be described. The case where the database management system 10 first retrieves the index 66 and then retrieves the unreflected data management information 39 will now be described as an example. However, this is not restrictive. In other words, the database management system 10 may first retrieve the unreflected data management information 39 and then conduct retrieves the index 66.
The input processing part 220 in the database management system 10 accepts input of a retrieval request 51 of XML data. The retrieval request 51 includes a structure condition, a character string condition (and a retrieval condition) of XML data which is the retrieval object.
For example, an input of the retrieval request 51 that specifies “bibliography/author” as the structure condition and “∘×” as the character string condition is accepted. In other words, an input of a retrieval request 51 that a case where a character string “∘×” appears in a structure of “author” located right under a structure “bibliography” in XML data should be retrieved is accepted.
Subsequently, the index retrieval processing part 214 in the index management part 211 refers to the definition information 61 in the database 60 and decides to utilize the index 66 (S16). In other words, the index retrieval processing part 214 refers to the definition information 61 and reads out the index 66 in the database 60.
And the index retrieval processing part 214 retrieves the index 66 (S17), and acquires a document number or a character location of XML data that meets the input retrieval request 51. And the output processing part 230 transmits a result of the retrieval to the application program 222 in the terminal device 205.
Subsequently, the data management part 216 reads out XML data that is not yet reflected to the index onto the database buffer 44 (S18). In other words, the data management part 216 reads out XML data associated with the data identifier that is registered on the unreflected data management information 39 from the table 62 onto the database buffer 44.
The index retrieval processing part 214 executes the following processing with respect to each of entries registered in the unreflected data management information 39 (S19).
XML data including a structure specified in the retrieval request 51 is acquired from the database buffer 44.
Data satisfying the character string condition specified in the retrieval request 51 is retrieved from the acquired XML data.
In other words, the index retrieval processing part 214 first acquires structure analysis information (see
For example, when “bibliography/author” is specified as the structure condition in the retrieval request, the index retrieval processing part 214 reads out a start location “14” and an end location “22” of “author” denoted by a numeral 432 located right under “bibliography” denoted by a numeral 431 in structure analysis information exemplified in
Subsequently, the index retrieval processing part 214 acquires XML data associated with the structure analysis information from the database buffer 44. And the index retrieval processing part 214 retrieves a character string specified in the retrieval request 51 from data ranging from the start location to the end location in the acquired XML data. And the output processing part 230 transmits a result of the retrieval to the application program 222 in the terminal device 205.
In this way, the index retrieval processing part 214 narrows down the range of the XML data that becomes an object of the retrieval on the basis of the structure analysis information, and then conducts test scan for the character string (character string retrieval). Therefore, the index retrieval processing part 214 can retrieve the XML data before index reflection fast.
Details of the XML data registration processing will now be described with reference to
First, the input processing part 220 in the database management system 10 shown in
Subsequently, the index management part 211 calls the index registration processing part 212. And the index registration processing part 212 stores the XML data in the table 62 in the database 60 specified at S501, and determines a data identifier of the XML data (S510).
Subsequently, the index registration processing part 212 analyzes a structure of XML data that is the object of the registration request, and generates structure analysis information (see
The index management part 211 calls the structure analysis information management part 217. The structure analysis information management part 217 stores the structure analysis information generated at S511 in the structure analysis information storage area 40 (S512).
Subsequently, the index registration processing part 212 calculates the number of structures contained in the structure analysis information generated at S511 (S513), and makes a decision whether the number of structures thus calculated is greater than a threshold (S514).
When the number of structures contained in the structure analysis information is greater than the threshold (yes at S514), the structure analysis information management part 217 registers the data identifier of the XML data on which the structure analysis information is based and access information to the structure analysis information in the unreflected data management information 39 (S515). Here, the index registration processing part 212 does not update the index 66.
On the other hand, when the number of structures contained in the structure analysis information is equal to or less than the threshold (no at S514), the index registration processing part 212 updates the index 66 by utilizing the structure analysis information (S516). In other words, the index registration processing part 212 reflects the structure analysis information to the index 66. Thereafter, the structure analysis information management part 217 deletes the entry of the structure analysis information that has already been reflected to the index, from the unreflected data management information 39. Furthermore, it is desirable that the structure analysis information management part 217 deletes the structure analysis information that has already been reflected to the index, from the structure analysis information storage area 40. By doing so, the storage area of the main storage 203 can be utilized effectively.
In this way, the index registration processing part 212 registers the XML data in the database 60. With respect to XML data for which the number of structures is small and it is presumed that a long time is not taken to update the index, the index registration processing part 212 conducts index update based upon XML data. On the other hand, with respect to XML data for which the number of structures is large and it is presumed that a long time is taken to update the index, the index registration processing part 212 retains the structure analysis information intact in the main storage 203 (processing heretofore described is referred to as fast registration processing).
Upon accepting an XML data retrieval request, the database management system 10 retrieves the index 66, with respect to XML data that is not yet reflected to the index. On the other hand, with respect to XML data that is not yet reflected to the index, retrieval is conducted by using structure analysis information in the structure analysis information storage area 40 and the XML data read out onto the database buffer 44. By doing so, the database management system 10 can retrieve the XML data fast without increasing the registration time of structured data. Details of the retrieval processing at this time will be described later with reference to
The index registration processing part 212 decides whether to conduct index update on the basis of the number of structures in the structure analysis information. However, this is not restrictive. For example, the index registration processing part 212 may decide whether to conduct index update on the basis of the number of structures and the data size of XML data on which the structure analysis information is based. The index registration processing part 212 may expect the time (registration processing time) taken to reflect the index of the XML data to the index 66 on the basis of the data size and the number of structures of the XML data and decide whether to conduct the index update on the basis of the registration processing time. In this case, the threshold used at S514 in
Retrieval processing of XML data will now be described with reference to
First, the database management system 10 shown in
First, processing (index retrieval processing) ranging from S600 to S602 will now be described.
The database access control part 210 calls the index management part 211, and the index management part 211 calls the index retrieval processing part 214. The index retrieval processing part 214 generates a list of results of XML data that meet the retrieval condition indicated in the retrieval request by utilizing the index 66 (S600). For example, the index retrieval processing part 214 retrieves the index 66 and generates a list of XML data satisfying the structure condition and character string condition indicated in the retrieval condition or information such as the document number and character location of the XML data.
Subsequently, the index retrieval processing part 214 transmits data of the result list of the XML data to the application program 222 in the terminal device 205 which is the transmission source of the retrieval request, via the output processing part 230 (S601).
Upon transmitting all data of the result list generated at S600 to the application program 222 in the terminal device 205 (yes at S602), the index retrieval processing part 214 terminates the processing. On the other hand, if transmission of all data of the result list to the application program 222 in the terminal device 205 has not been completed, then the index retrieval processing part 214 returns to S601.
The processing ranging from S610 to S616 (index-unreflected data retrieval processing) will now be described.
In the same way as the above-described index retrieval processing, the database access control part 210 calls the index management part 211, and the index management part 211 calls the index retrieval processing part 214. And the data management part 216 reads out XML data associated with the data identifier registered in the unreflected data management information 39 from the database 60 onto the database buffer 44 (S610).
Subsequently, the index retrieval processing part 214 acquires one entry of the unreflected data management information 39 (S611). And the index retrieval processing part 214 refers to access information to structure analysis information (see numeral 302 in
The index retrieval processing part 214 makes a decision whether there is a structure specified by an inquiry (a structure specified in the retrieval request) in structure analysis information associated with this entry (structure analysis information that is the processing object) (S612). For example, when “bibliography/author” is specified, as the structure condition in the retrieval request, the index retrieval processing part 214 makes a decision whether there is this structure in the structure analysis information.
If the structure specified in the retrieval request exists in structure analysis information to be processed (yes at S612), the index retrieval processing part 214 refers to this structure analysis information and acquires data of the structure specified in the retrieval request from the XML data stored in the database buffer 44 (S613). On the other hand, if the structure specified in the retrieval request does not exist in the structure analysis information (no at S612), the index retrieval processing part 214 proceeds to S616.
This will be described with reference to the example shown in
And the index retrieval processing part 214 makes a decision whether data acquired at S613 satisfies the character string condition specified in the retrieval request (S614). For example, the index retrieval processing part 214 retrieves a character string specified in the retrieval request from data acquired at S613 and makes a decision whether the character string exists in the data acquired at S613.
If the data acquired at S613 satisfies the character string condition specified in the retrieval request (yes at S614), then the index retrieval processing part 214 transmits a result of the retrieval to the application program 222 in the terminal device 205 via the output processing part 230 (S615). On the other hand, if the data acquired at S613 does not satisfy the character string condition specified in the retrieval request (no at S6149, then the index retrieval processing part 214 proceeds to S616.
The index retrieval processing part 214 makes a decision whether the processing ranging from S611 to S615 has been executed on all entries registered in the unreflected data management information 39 (S616). If there is an entry for which the processing ranging from S611 to S615 has not yet been executed (no at S616), then the index retrieval processing part 214 returns to S611. If the processing ranging from S611 to S615 has been executed on all entries registered in the unreflected data management information 39 (yes at S616), the index-unreflected data retrieval processing is terminated.
If both the processing ranging from S600 to S602 (the index retrieval processing) and the processing ranging from S610 to S616 (the index-unreflected data retrieval processing) have been terminated, then the index management part 211 terminates the processing conducted by the index retrieval processing part 214.
In this way, the database management system 10 retrieves data satisfying the structure condition and the character string condition indicated in the retrieval request from XML data stored in the database 60.
In the foregoing description, the database management system 10 conducts the index retrieval processing and the index-unreflected data retrieval processing in parallel. However, this is not restrictive. For example, the database management system 10 may first conduct the index-unreflected data retrieval processing and then conduct the index retrieval processing, or vice versa.
A second embodiment of the present invention will now be described.
A database management system 10A according to the second embodiment has a feature that it decides whether to conduct index update of the XML data on the basis of a registration upper limit value transmitted from the application program 221. The registration upper limit value is an upper limit value of time required to reflect the XML data to the index 66, i.e., an upper limit value of registration processing time.
As shown in
The registration upper limit time storage area 48 is an area for storing the registration upper limit time transmitted from the application program 221.
The registration upper limit time acceptance part 218 accepts input of the registration upper limit time transmitted from the application program 221. The registration upper limit time acceptance part 218 stores the registration upper limit time thus accepted in the registration upper limit time storage area 48.
The registration processing time prediction part 219 predicts time (registration processing time) required to reflect the XML data transmitted from the application program 221 to the index 66, on the basis of the XML data. By the way, the registration processing time in the present embodiment refers to time taken since the database management system 10 accepts input of the XML data until index update based on the XML data is terminated.
Furthermore, the index registration processing part 212A compares the predicted registration processing time with the registration upper limit time stored in the registration upper limit time storage area 48. If the predicted registration processing time does not exceed the registration upper limit time, the index registration processing part 212A reflects the XML data to the index 66. In other words, the index registration processing part 212A reflects XML data that can be reflected to the index 66 in a comparatively short time, to the index 66 immediately.
On the other hand, if the predicted registration processing time exceeds the registration upper limit time, the index registration processing part 212A does not reflect the index of the XML data to the index 66. And the structure analysis information management part 217 stores the structure analysis information of the XML data in the structure analysis information storage area 40, and registers information concerning the structure analysis information in the unreflected data management information 39.
XML data registration processing according to the second embodiment will now be described with reference to
First, the input processing part 220A in the database management system 10A shown in
Furthermore, the input processing part 220A accepts input of the registration upper limit time from the application program 221 by using the registration upper limit time acceptance part 218, and stores the registration upper limit time in the registration upper limit time storage area 48 (S801). By the way, the XML data registration request at S500 and the registration upper limit time at S801 may be input simultaneously, or it is also possible to conduct S801 in advance and then conduct S500.
In the same way as S501 in
Since S511 and S512 in
The registration processing time prediction part 219 predicts the registration processing time of the index of the XML data (S810). Prediction of the registration processing time at this time is conducted on the basis of the number of structures of XML data (for example, the number of tags) and the data size.
Thereafter, the index registration processing part 212A makes a decision whether the registration processing time predicted at S810 exceeds the registration upper limit time (S812). If the registration processing time predicted at S810 exceeds the registration upper limit time (yes at S812), the index registration processing part 212A proceeds to S515. On the other hand, if the predicted registration processing time is equal to or less than the registration upper limit time (no at S812), the index registration processing part 212A proceeds to S516. Since S515 and S516 in
According to the database management system 10A, the threshold used in the decision whether to update the index of the XML data can be set to an arbitrary value. Therefore, the database management system 10A can change the threshold according to various system requirements, resulting in great convenience.
The database management system 10A accepts input of the registration upper limit time from the application program 221. Alternatively, the database management system 10A may accept input of upper limit values of the number of structures and the data size of XML data. In other words, at S812 in
A third embodiment of the present invention will now be described with reference to
A database management system 10B according to the third embodiment has a feature that even data for which the registration processing time of XML data exceeds the registration upper limit time is reflected to the index 66 halfway. In other words, the database management system 10B has a feature that index update is conducted on XML data in which the data size or the number of structures is comparatively great and the registration processing time exceeds the registration upper limit time, as much as possible within the registration upper limit time.
Structure analysis information processed by the database management system 10B will now be described with reference to
As shown in
In other words, it is indicated in
In this way, the database management system 10B reflects structure analysis information to the index 66 even partially.
Referring back to
The registration processing time measurement part 223 measures time (registration processing time) elapsed since the database management system 10B accepts the input of the XML data to be registered. The index registration processing part 212B updates the index 66 on the basis of structure analysis information generated by using the XML, in a range in which the registration processing time measured by the registration processing time measurement part 223 is within the registration upper limit time. In other words, the index registration processing part 212B starts reflection of the structure analysis information to the index 66, and stops the reflection of the structure analysis information to the index 66 when the registration upper limit time has elapsed.
XML data registration processing in the third embodiment will now be described with reference to
Processing conducted since an input of an XML data registration request is accepted from the application program 221 in the terminal device 204 until the database access control part 210 calls the index management part 211 is the same as the processing procedure shown in
If the database access control part 210 is called, the index registration processing part 212B starts the registration processing time measurement part 223 and starts measurement of the registration processing time (S1010). Since subsequent S511 and S512 are the same as S511 and S512 in
After S512, the index registration processing part 212B reads out structure analysis information of the XML data to be registered, from a structure analysis information storage area 40B. If one unprocessed structure is taken out from structures (structure elements) of the structure analysis information (yes at S1011), the index registration processing part 212B updates the index 66 on the basis of a structure name and location information which are set in the structure thus taken out (S1012). In other words, the index registration processing part 212B reflects information which is set in this structure to the index 66.
And the structure analysis information management part 217B sets “1” in the index update completion flag of a structure included in structure analysis information and subjected to update of the index 66 at S1012 (S1013).
For example, the index registration processing part 212B reflects information of the structure name “book,” a start location “4” and an end location “1840” included in structure analysis information exemplified in
The index registration processing part 212B makes a decision whether registration processing time measured by the registration processing time measurement part 223 exceeds registration upper limit value (S1014). If the measured registration processing time does not yet exceed the registration upper limit time (no at S1014), the index registration processing part 212B returns to S1011. In other words, the index registration processing part 212B checks whether the registration upper limit time is exceeded each time one structure element in the structure analysis information is reflected to the index 66.
On the other hand, if the registration processing time exceeds the registration upper limit time (yes at S1014), the structure analysis information management part 217B registers the data identifier of the XML data on which the structure analysis information is based and access information to the structure analysis information in the unreflected data management information 39 in the same way as S515 in
If an unprocessed structure cannot be taken out from the structure analysis information (no at S1011), i.e., processing on all structures of the structure analysis information has been finished within the registration upper limit value, then the index registration processing part 212B terminates the processing as it is.
By doing so, the database management system 10B can conduct the index update processing within the registration upper limit time even if prediction of the registration processing time of the XML data is difficult. Furthermore, the database management system 10B conducts index update partially even with respect to XML data that is comparatively large in data size or the number of structures. In other words, it is prevented that the index of the XML data that is comparatively large in data size and the number of structures is not registered at all. Therefore, more information is registered in the index 66. As a result, the database management system 10B can conduct retrieval of XML data fast.
In the third embodiment, measurement of the registration processing time is started at the input timing of XML data. However, this is not restrictive. For example, the measurement may be started when the structure of structure analysis information is begun to be reflected after the structure analysis information of the XML data is generated.
In the systems according to the first to third embodiments, XML data that exceeds a predetermined threshold in the number of structures or registration processing time is not reflected to the index 66, but remains in the database 60. The database management system 10 may reflect such XML data to the index 66 at timing different from when accepting the registration request of the XML data (for example, when accepting an order input separately). A processing procedure of the database management system in this case will now be described as fourth to sixth embodiments.
A fourth embodiment of the present invention will now be described.
A database management system 10C according to the fourth embodiment has the following feature. Upon accepting a command input from a management program 270 in the terminal device 204 or a management program 271 in the terminal device 205, the database management system 10C reflects index-unreflected XML data stored in the database 60 to the index 66 by taking the command input acceptance as a trigger.
As shown in
An input processing part 220C in the database management system 10C includes a command acceptance part 240 which accepts the command input transmitted from the management program 270 or 271.
An index registration processing part 212C includes an index reflection processing part 250 which reflects index-unreflected structure analysis information to the index 66 on the basis of the order input output by the command acceptance part 240. A reflection document selection part 260 surrounded by a dotted line will be described later with reference to the fifth embodiment.
Details of the XML data registration processing in the fourth embodiment will now be described with reference to
The command acceptance part 240 in the database management system 10C shown in
The database access control part 210 reflects XML data registered in the unreflected data management information 39 (index-unreflected XML data) to the index 66 by using the index registration processing part 212C in the index management part 211 (S1202). In other words, the database access control part 210 reflects XML data associated with data identifiers that are registered in the unreflected data management information 39 to the index 66.
Processing of reflection to the index 66 conducted at this time will now be described in detail with reference to
First, the index reflection processing part 250 shown in
Subsequently, the index reflection processing part 250 takes out one entry of list information. And the index reflection processing part 250 requests the data management part 216 to read out XML data associated with a data identifier indicated in this information. The data management part 216 reads out the XML data from the table 62 (S1211).
The index registration processing part 212C reflects the XML data thus read out to the index 66 (S1212).
Thereafter, the structure analysis information management part 217 deletes the entry of structure analysis information concerning XML data already reflected to the index, from the unreflected data management information 39 (S1213). Furthermore, the structure analysis information management part 217 deletes structure analysis information concerning XML data already reflected to the index, from the structure analysis information storage area 40 as well.
The index reflection processing part 250 makes a decision whether unprocessed information still remains in the list (S1214). If unprocessed information still remains (yes at S1214), the index reflection processing part 250 returns to S1211. On the other hand, if unprocessed information does not remain (no at S1214), the processing is terminated.
By doing so, the database management system 10C can reflect index-unreflected XML data to the index 66.
In the above-described embodiments, the database management system 10C reflects all index-unreflected XML data to the index 66. However, this is not restrictive. For example, the database management system 10C may select predetermined XML data from among index-unreflected XML data and reflect the predetermined XML data to the index 66. The embodiment at this time will be described as a fifth embodiment.
In succession, a fifth embodiment of the present invention will be described with reference to
A database management system 10D according to the fifth embodiment has a feature that it accepts a selection input of XML data to be subject to index reflection from the management program 270 or 271.
As shown in
The reflection document selection part 260 accepts a selection input of XML data to be subject to index reflection from the management program 270 or 271. The index reflection processing part 250 recognizes XML data which is contained in a list of index-unreflected XML data and for which selection input is accepted by the reflection document selection part 260 as the object of index reflection. In other words, the index reflection processing part 250 lists all index-unreflected XML data. However, the index reflection processing part 250 deletes XML data that have not been selected by the management programs 270 and 271 respectively in the terminal devices 204 and 205 from the list as non-objects of the index reflection.
Registration processing of XML data in the fifth embodiment will now be described with reference to
The procedure followed since the command acceptance part 240 shown in
First, the reflection document selection part 260 transmits a list generated by the index reflection processing part 250 at S1210 to the management program 270 in the terminal device 204, and waits for a reply from the management program 270 (S1510).
Upon receiving the list transmitted by the reflection document selection part 260, the management program 270 causes an output device (not illustrated) in the terminal device 204 to display a selection input screen of XML data to be subject to index reflection. A screen example at this time will be described later with reference to
Upon receiving a reply from the management program 270 in the terminal device 204, the reflection document selection part 260 outputs the reply to the index reflection processing part 250. The index reflection processing part 250 updates the list generated at S1210 on the basis of the reply thus output (S1520). In other words, upon receiving selection information of XML data to be subject to index reflection from the reflection document selection part 260, the index reflection processing part 250 leaves XML data indicated by the selection information in the list, and deletes other XML data from the list.
Since subsequent processing ranging from S1211 to S1214 is the same as the processing ranging from S1211 to S1214 shown in
By doing so, the database management system 10D can designate XML data selected by the terminal device 204 as the object of index reflection. For example, in the case where there are a large number of index-unreflected XML data in the database 60, a system manager or the like can select XML data to be preferentially reflected to the index 66, resulting in great convenience.
A selection input screen of XML data that are objects of index reflection displayed by the management program 270 on the basis of the list transmitted by the reflection document selection part 260 will now be described with reference to
The selection input screen of XML data that are objects of index reflection has, for example, a configuration including a selection input column for specifying whether to set index reflection on XML data and a structure analysis information display column every data ID (data identifier) of XML data as shown in
The system manager performs selection input of XML data that should become objects of index reflection via an input device in the terminal device 204 while watching the screen, and performs selection input of an execution button. The management program 270 transmits information selected on the screen to the database management system 10D via the information network 206.
Data IDs and structure analysis information of XML data that are index reflection objects are displayed on the screen. However, this is not restrictive. For example, a part or the whole of the XML data or the data size of the XML data may be displayed. By conducting such display, it becomes easier for the system manager or the like to select XML data as the objects of index reflection.
A sixth embodiment of the present invention will now be described.
A database management system 10E according to the sixth embodiment records retrieval history of XML data that are not yet reflected to the index. When displaying the selection input screen of XML data which should become objects of index reflection, the management program 270 in the terminal device 204 displays a screen obtained by sorting the XML data on the basis of the retrieval history, or displays the retrieval history itself of the XML data on the screen. The database management system 10E according to the sixth embodiment has such a feature.
The database management system 10E includes a reflection document selection part 260E instead of the reflection document selection part 260 (see
An index retrieval processing part 214E includes a retrieval history recording part 215. The retrieval history recording part 215 records retrieval history of unreflected XML data in an unreflected data management information 39E.
The unreflected data management information 39E contains retrieval history of the structure analysis information, besides a data identifier of XML data that is not yet reflected to the index and access information to structure analysis information generated from the XML data.
Among them, the total number of times of retrieval indicates the number of times of retrieval of XML data that is a processing object. The value of the total number of times of retrieval is incremented regardless of whether the XML data satisfies a condition specified in the retrieval request. The number of times of structure meeting indicates the number of times a structure specified in the retrieval request exists in the XML data. The number of times of condition meeting indicates the number of times a structure specified in the retrieval request exists in the XML data and a condition specified in the retrieval request (for example, a character string condition) is met.
In the unreflected data management information 39E shown in
The retrieval history (the total number of times of retrieval, the number of times of structure meeting, and the number of times of condition meeting) in the unreflected data management information 39E is written by the retrieval history recording part 215 each time the index retrieval processing part 214E executes retrieval. By the way, the retrieval history is referred to when the reflection document selection part 260E displays a selection input screen of XML data that are index reflection objects.
A retrieval history recording procedure of XML data in the sixth embodiment will now be described with reference to
Processing conducted at S620, S600 to S602 and S610 to S612 in
If the index retrieval processing part 214E shown in
After S1801, the index retrieval processing part 214E acquires data having a structure specified in the retrieval request from XML data stored in the database buffer 44 in the same way as S613 in
After S1802, the index retrieval processing part 214 transmits a result of the retrieval to the application program 222 in the terminal device 205 in the same way as S615 in the same way as S615 in
Since processing conducted at subsequent S616 is the same as the processing conducted at S616 in
In this way, the retrieval history recording part 215 records the retrieval history of XML data in the unreflected data management information 39E.
Registration processing of XML data using such retrieval history will now be described.
In the same way as S1210 in
The reflection document selection part 260E transmits a list obtained by data sorting at S1910 to the management program 270 in the terminal device 204, and waits for a reply from the management program 270 (S1510). Since processing conducted at S1520 to S1214 after S1510 is the same as the processing conducted at S1520 to S1214 in
Upon receiving the list transmitted by the reflection document selection part 260E at S1510, the management program 270 causes an output device (not illustrated) in the terminal device 204 to display the selection input screen of XML data to be subject to index reflection. The screen at this time is exemplified in
As exemplified in
The database management system 10E causes the management program 270 to display a screen including the retrieval history of XML data or a screen obtained by sorting XML data on the basis of the retrieval history. As a result, it becomes easier for the system manager to find XML data desired to be an object of index reflection more preferentially.
When sorting the list data at S1910, the index reflection processing part 250 may conduct the sorting on the basis of data size, the number of structures and the registration date of the XML data. After the database management system 10E has conducted character string retrieval on XML data, the index reflection processing part 250 may conduct the sorting on the basis of whether there is data that needs postprocessing or the number of times of appearance of the character string in XML data.
By doing so, it becomes easy for the system manager or the like to select XML data that are objects of index reflection.
The reflection of XML data to the index is supposed to be conducted when there is order input from the terminal device 204 or the like. However, the reflection of XML data to the index may be conducted automatically. In other words, when predetermined time is reached or a predetermined number of XML data are stored, the management system 10 or 10A-10E may reflect the XML data to the index 66 automatically.
When predetermined setting input is conducted, the database management system 10 or 10A-10E may conduct index update for all XML data regardless of the processing cost or the like of the XML data. In other words, it is also possible to change over according to setting input whether the database management system 10 or 10A-10E should conduct fast registration processing as described above or should conduct index update on all input XML data.
As for such changeover setting input, a setting processing part (not illustrated) in the database management system 10 or 10A-10E accepts it and records it in the database 60 as setting information. And the database management system 10 or 10A-10E decides which method should be used to conduct index reflection, on the basis of the setting information.
By the way, the setting information may contain various kinds of information concerning the index update. For example, the setting information may contain information such as the size of the database buffer 44, the registration upper limit time in the fast registration processing, or a rule to be used when reflecting XML data to the index 66.
Information input from the setting screen is transmitted to the database management system 10 or 10A-10E by the management program 270 or the like. The setting processing part in the database management system 10 or 10A-10E reflects the transmitted information to the setting information.
In the setting screen, selection input of an algorithm (priority determination algorithm) to be used in each rule to be used may be accepted.
For example, in the setting screen example shown in
In the setting screen exemplified in
Index update that meets the system requirement of the present system can be conducted by setting whether to conduct fast registration and setting various conditions in conducting the fast registration on the setting screen.
The present invention is not restricted to the embodiments, but modification is possible.
For example, in the third embodiment, the database management system 10B makes a decision whether the registration processing time exceeds the registration upper limit time each time the database management system 10B reflects one structure contained in structure analysis information to the index 66. However, this is not restrictive.
For example, in the case where structures contained in structure analysis information are divided into some groups and index reflection is conducted for each of groups, the database management system may make a decision whether the registration processing time exceeds the registration upper limit time each time reflection of one group to the index 66 is completed.
In addition, in structure analysis information, structures (nodes) are connected to each other by a branch (link) which indicates that those nodes are in an adjacent relation as exemplified in
If the writing velocity in the disk device 207 is slow, the database management system 10B may update the index 66 as described hereafter. For example, when updating data in the index 66 stored in the disk device 207, the database management system 10B reads out data in the index 66 onto the main storage 203 and updates the index 66 on the main storage 203. And the database management system 10B shifts the updated index 66 to the disk device 207. Each time I/O (Input/Output) processing is conducted to shift the updated index 66 to the disk device 207, the database management system 10B may make a decision whether the registration upper limit time is exceeded. In other words, the database management system 10B updates the index 66 on the main storage 203, and then shifts the updated index 66 on the main storage 203 to the disk device 207 until the registration processing time is exceeded.
By the way, if all of the updated index 66 on the main storage 203 cannot be shifted to the disk device 207, updated index 66 remains on the main storage 203. If in this state it becomes necessary to update the index 66, the index 66 on the main storage 203 is updated. The index 66 can be updated by using such a method as well.
The embodiments have been described by taking the case where the retrieval request of XML data contains a character string condition of XML data that are the retrieval objects as an example. However, this is not restrictive. For example, a condition other than the character string condition such as registration date of XML data that are the retrieval objects may be contained.
In the embodiments, the registration processing and the retrieval processing of XML data are conducted by the same computer 201. However, this is not restrictive. For example, the registration processing of XML data and the update of the index 66, and the retrieval of XML data may be executed by different computers.
The database management system 10 or 10A-10E according to one of the embodiments can be implemented by using a program that causes the above-described processing to be executed. The program can be provided by storing it on a computer-readable storage medium (such as a CD-ROM). It is also possible to provide the program via a network such as the Internet.
It should be further understood by those skilled in the art that although the foregoing description has been made on embodiments of the invention, the invention is not limited thereto and various changes and modifications may be made without departing from the spirit of the invention and the scope of the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
2007-009371 | Jan 2007 | JP | national |