The present invention generally relates to a system and method for technical document analysis, and patent analysis system.
The contemporary technical documents are scattered among different databases around the world, with each database stores different technical documents, such as patents and research papers. The research and development team usually requires searching and reading a lot of technical documents in the initial phase of the technology development. How to effectively integrate the current resources in the databases and provide effective analysis to the contents and the trends in the technical documents remains an important issue.
The disclosed exemplary embodiments of the present invention may provide a system and method for technical document analysis, which utilize the technology of internal database (typically, relationship database) to integrate the technical documents from a plurality of original database to form a technical document network so as to achieve the indexing and analysis of technical documents.
In an exemplary embodiment, the disclosed is directed to a system for technical document analysis. The system comprises an internal database (preferably, a relationship database) and a technical document analysis module. According to at least an index condition from the user's input and through at least a original database link, the technical document analysis module may fetch the original data from the original database and analyzes the original data. The partial data of the original data with regularity and preliminary index act as primary identifiers. According to the relationship between the other part of the original data and the primary identifiers, the original data are converted into a plurality of sub-data. After being compared with the contents of the internal database, the plurality of sub-data may be stored in the internal database or only their renew portion may be stored in the internal database.
In another exemplary embodiment, the disclosed is directed to a method for technical document analysis. The method comprises: according to at least an input index condition, fetching a plurality of original data of technical documents from at least a original database; analyzing the plurality of original data to construct an internal database (preferably, a relationship database), including using the partial data of the original data with regularity and preliminary index as primary identifiers; according to the relationship between the other part of the original data and the primary identifiers, converting the original data into a plurality of sub-data; and after compared the plurality of sub-data with the contents of the internal database, storing the plurality of sub-data in the internal (relationship) database or only their renew portion in the internal database.
In the exemplary technical document analysis system of the present invention, other modules, such as diagram analysis modules, reading report generation modules, report commenting modules, and authorization management modules, may be added or integrated in addition to the internal database and technical analysis modules. The client side may further integrate or add other modules, such as search module, figure analysis input/output module, reading report input/output module and report commenting platform, and so on to collaborate, display or extend the capabilities generated by the modules within the technical document analysis module so as to enhance the convenience of use at the client end.
The technical document analysis system of the present invention may also provide a management analysis report in a table format. By using a simple manner to display various combinations of the X-dimension and Y-dimension of the statistic diagrams, the present invention allows the user to request the corresponding data analysis field to execute directly according to the requested combinations of X-dimension and Y-dimension of the statistic diagrams, so as to obtain a two-dimensional or three-dimensional analysis diagram. Therefore, when applied to analysis system of technical documents, such as patents or research papers, the present invention may solve the problem that the user may be unfamiliar with long and tedious terminology while operating the technical document analysis system so that patent or technical document analysis may be popularized.
When further combined with relationship database, the aforementioned exemplary embodiments may change the state of the original data and reorganize as data group with relationship correspondence to greatly reduce the system resources consumed by the system when generating analysis diagrams.
The foregoing and other features, aspects and advantages of the present invention will become better understood from a careful reading of a detailed description provided herein below with appropriate reference to the accompanying drawings.
The exemplary disclosed embodiments of the present invention may integrate a plurality of original databases, such as patent databases and thesis/dissertation databases, and so on. The original technical documents are systematically analyzed and re-arranged into a data group of corresponding relevance. A series of analyzable statistic string tables may be generated to construct an internal database for the user to search and index rapidly.
Refer to the exemplary embodiment of the system in
Original data 350a of original database 350 may include patents, research papers or their combinations thereof. Internal database 301 may be a module with logical computation capability. The original data of original database 350, after analyzed by technical document analysis module 302, is re-arranged so that the originally many-to-many, one-to-many and one-to-one mappings are rearranged to data groups with relationships, including one-to-one, one-to-many, many-to-many, or formed by text databases.
Clients 310, 320, 330 may be connected to technical document analysis system 300 in many manners, such as through Internet, LAN, direct link to the host, or internal connections of the host, such as IDE, SATA PATA, OLE, ODBC, and so on. Technical document analysis module 302 and original database 350 may also be connected in similar many manners. Technical document analysis system 300 may be a server. Original database 350 may be a physical database external to technical document analysis system 300, or a storage device, such as hard disk, tape or CD, storing data and is located on the same host or server.
The sub-data compared by technical document analysis module 302 are the relationship data constituted with one or more new fields formed by analyzing and programming a plurality of fields of original data 350a, then combining.
Field 501 of attribute A, for example, may be the patent publication number, patent number, patent application number, and so on. Because a patent corresponds to a patent number, a patent publication number or a patent application number, the data in field 501 of attribute A will form a one-to-one relationship with the original data of the patent document if field 501 of attribute A is specified as one of the above numbers. Field 502 of attribute B may be an inventor field. Because a patent may correspond to one or more inventors, field 502 of attribute B forms a one-to-many relationship with the original data of the patent document. Field 503 of attribute C is a text data field, such as independent claims or abstract. Hence, the patent corresponds to the patent independent claims or patent abstract field. Field 504 of attribute D may be patent publication date or patent application date, and so on. Because a plurality of patents may correspond to a patent publication date or a patent application date, the data in field 504 of attribute D forms a many-to-many relationship with the original data of the patent documents. Therefore, if the original data is a patent document, these attribute fields are the fields in the patent document.
If the original data is a research paper, because a research paper corresponds to a research title, first attribute field 501 may be a research title, which has a one-to-one relationship with the research paper. Because a research paper may correspond to one or more authors, second attribute field 502 may be the author field, forming a one-to-many relationship with the research paper. Third attribute field 503 may be a text data field, such as paper abstract or paper content. Because a plurality of research papers may correspond to a publication date, fourth attribute field 504 may be a field of paper publication date and forms a many-to-many relationship with the research paper.
Take patent documents as an example to explain analyzed tables 500 in
In original database 350, the fields of different attributes of the initial table are the data address stored in the database. The bit lengths of fields in the initial table depend on the operating system (OS) or data source. The minimum data block may be 1 bit, 4 bits, 8 bits, 32 bits, 64 bits, and so on.
In step 603, after the comparison, only the different portion or modified portion may be sent to the internal database.
When a client issues a search condition request to technical document analysis system 300,
If fetch original database 350 is determined, then original database 350 is connected to fetch original data 350a according to the search condition, as shown in step 614.
As shown in
For example, search module 711a may provide the client for inputting search condition. Take patent document search as example. The search condition may be any of the fields of the patent document, such as patent application number, patent publication number, assignee, nationality of assignee, inventor, nationality of inventor, title, abstract, patent independent claims, international patent classification number, US patent classification number, technical classification number product classification number, and so on.
Diagram analysis module 706, report commenting module 712 and technical document analysis module 302 may all fetch or analyze the original data in original database 350 so as to enrich the contents of internal database 301.
The following describes the function and operation of diagram analysis module 706, reading report generation module 709, report commenting module 712 and authorization management module 713.
Figure analysis input/output module 711b sends the client's instruction to technical document analysis system 300 so that diagram analysis module 706 and internal database 301 may perform relationship computation and generate analysis report. Figure analysis input/output module 711b allows the client end to input search condition or the fields for analysis, as well as displays the analysis report generated by diagram analysis module 706. Alternatively, based on the user's requirements, figure analysis input/output module 711b may output the analysis result to different formats, such as Excel format, so that the statistics may be directly cited.
In step 902, it is to connect to internal database 301 and select analysis table. The analysis table may be selected according to the search condition.
In step 903, a primary index key table is established according to the analysis table. For example, the application number field of the analysis table may be used as the primary index key to establish the relationship table between the application number field and other related fields. The generated primary index key table may be stored in technical document analysis system 300 or returned to the client end.
In step 904, at least a field is read and parameter translation is performed from the primary index key table and the at least a field. The at least a field may be selected by the client, for example, the inventor versus the assignee relationship. The parameter translation is described as follows. For example, the inventor field of the patent document versus the application number field of the primary index key may be a parameter, and the assignee field versus the application number field of the primary index key may also be a parameter. Take the patent document as an example. The definition of a parameter is not restricted to the field versus the primary index key. The parameter may also be the higher order computed data, such as, data ratio, the number of persons, standard deviation, average, and so on.
In step 905, the relationship computation of each parameter is performed and the figure analysis tables are generated. The generated figure analysis data may also be returned to the client.
By using the parameter relationship to compute the figure analysis data the computation time may be reduced and the higher order statistic data may be computed more rapidly. Take the patent document as example. Not only the statistics on the single field, such as the number of the patents by each inventor, or the number of the assignees, may be computed, but also the higher order statistics, such as the relationship between each assignee and each inventor, the relationship between each international patent classification number and the number of the patents of each year may be computed.
The variable information in the fields, such as assignee, inventor, claims, may be connected to the original database for updating. The number of the patents may utilize the concept of the patent family, such as, treating the patent family of the same parent application as the same patent.
For example, by selecting the combination of international patent classification F (IPC F) versus year 4 of analysis field F-4 (marked as 1002), the distribution of the number of patents versus each international patent classification number may be found, and the 3-dimensional XYZ combination analysis figure may be obtained. If the combination of year E versus IPC 5 of analysis field E-5 (marked as 1004) is selected, the distribution of the IPC in each year may be obtained, and the 3-dimensional XYZ combination analysis figure may be obtained. The difference between these two analysis figures is the interexchange of the XY dimension. The client may also simply use the combination of XZ or YZ to analyze, such as, selecting year E and number of patents 0 field of analysis field E-0 (marked as 1006), to obtain the distribution of the number of patents in each year and the XZ combination analysis figure. Therefore, the client may adjust, add or delete the field of each dimension according to the requirements.
Another exemplary embodiment of
Management analysis report 1000 generated by diagram analysis input/output module 711b includes an X-dimension field, a Y-dimension field and a plurality of data combination analysis fields, where the X-dimension field and Y-dimension field are selected from a group, including at least two of the following: number of patents, assignee, assignee country, inventor, inventor country, application time, priority time, publication time, patent grant time, IPC number, US Patent classification number, agency, examiners, product classification or technical classification. The data analysis combination field is used to drive diagram analysis module 706 so that diagram analysis input/output module 711b may output the combination analysis diagram according to the one-dimensional, two-dimensional or three-dimensional combination analysis diagram generated in accordance with the data of the database. In general, the combination and the permutation order of the X-dimension field and the Y-dimension field are the same. The two-dimensional or three-dimensional combination analysis diagram generated by using data combination analysis field to drive the diagram analysis module has the X-axis and Y-axis with the same content as the corresponding X-dimension field and Y-dimension field.
In other words, when X-dimension field selects assignee B and Y-dimension field selects Number of Patents 0, the two-dimensional combination analysis diagram generated by diagram analysis module 706 driven by data combination analysis field B-0 is a diagram with assignee as the X-axis and number of patents as Y-axis. Before generating the diagram, the system may be configured automatically or manually to filter out the data without displaying under a certain condition, e.g., when the number of data is less than 3, so that the diagram is clear and easy to read. In another exemplary embodiment, if the X-dimension field selects IPC F and Y-dimension field selects Number of Patents 0, the F-0 field will drive the diagram analysis module to generate either two-dimensional IPC (X-axis) vs. number of patent (Y), or three-dimensional IPC vs. number of patents in each year. The above embodiment shows that the user can intuitively analyze the statistics of the patents. To make the management analysis report even more convenient to use and to reduce the possibility that the user may read the wrong column or row, the management analysis report can be colored in different colors for distinguishing.
In addition, the meaningless part of the data combination analysis fields of the management analysis report will be left blank, or marked with X or other symbols or drawings; alternatively, if a meaningless data combination analysis field, e.g., A-0, is indicated as the usual style, a warning message will pop-up when selected to inform the user that the selection will not be executed, and is unable to drive diagram analysis module 706 to generate two-dimensional or three-dimensional combination analysis diagram. The preferred shape of the management analysis report is shown in
In summary, the exemplary embodiment of the present invention may be designed as a easy-to-read analysis system for patent or research paper to solve the long-time problem of patent or research paper statistic analysis to provide convenience of use to the users. The patent or research paper analysis system includes an original database 350, for providing original data 350a, a diagram analysis input/output module 711b, for transmitting a client's instruction to a technical document analysis system 300 and displaying the report from technical document analysis system 300 on the client's end, where technical document analysis system 300 includes a diagram analysis module, a technical document analysis module and a relationship database. The diagram analysis module is to perform relationship computation and analysis with the relationship database according to the client's instruction, i.e., patent or research paper analysis condition, such as, within a specific application period or keywords in a specific abstract. Technical document analysis module 302 is connected to original database 350 for fetching original data 350a from original database 350, updating or storing into the relationship database. The technical document analysis module may arrange the original data into a partial data with regularity and preliminary index to act as primary identifier, and based on the relationship, original data 350a may be converted into a plurality of sub-data. After comparing the sub-data, the technical document analysis module is updated or stored in the relationship database. The original data, after being converted into data groups with corresponding relationship, including one-to-one, one-to-many, many-to-many, or text, is updated or stored in the relationship database.
Furthermore, after the diagram analysis input/output module and relationship database performing relationship computation and analysis, the above patent or research paper analysis system provides a management analysis report for the user to input. The report includes an X-dimension field, a Y-dimension field and a plurality of data combination analysis fields, where the X-dimension field and the Y-dimension field are selected from a group, including at least two of the following: number of patents, assignee, assignee country, inventor, inventor country, application time, priority time, publication time, patent grant time, IPC number, US patent classification number, agency, examiners, product classification or technical classification. The users may select the fields for analysis for the subsequent analysis according to their requirements.
Reading report generation module 709 may be a data combination module for organizing, based on the instruction of the client end, the related contents of relationship database to generate the reading report.
Reading report input/output module 711c is connected to reading report generation module 709, and transmits client's instruction to technical document analysis system, then through reading report generation module 709 to internal database 301, such as relationship database, for computation, and finally outputting reading report.
Reading report generation module 709 may generate the preliminary table of contents of the reading report excerpt according to search conditions, and transmit the preliminary table to the client end. The transmission of the preliminary table does not include a large amount of the entire text. Instead, after the client selects the required reading report, the complete text data of the reading report is then transmitted. Therefore, the large data flow between technical document analysis system 300 and the client end may be avoided. In addition, the integration with interactive report commenting platform may also be done to improve the efficiency of reading technical data and sharing of the comments.
Reading report input/output module 711c of the client end allows the client to input search condition. Take [patent document as an example. The search condition may be patent application number, patent publication number, assignee, assignee nationality, inventor, inventor nationality, title, abstract, patent independent claims, international patent classification number, US patent classification number, technical classification number, product classification number, and so on.
Reading report input/output module 711c may further receive the patent technical document selected for reading by the table of contents returned by technical document analysis system. The table of contents of patents may include patent publication number, patent application number, assignee, patent title, application date, patent family, self-reference number, other-reference number, number of total references, product classification number, technical classification number, and so on, and may be adjusted in accordance with the requirements. After the patent documents are selected, the text of the reading report may be displayed.
Reading report text may include two parts. The first part is the built-in basic information of patent technical documents, such as patent application number, patent application date, patent number, patent issue date, patent publication number, patent publication date, the earliest priority date, patent type, title, assignee, inventor, patent family, abstract, independent claims, and so on. The second part includes the invention summary and description written by the client after reading the technical documents, such as, prior art technical background, the description of the present case, industrial impact, product impact, patent avoidance possibility, and so on. The client may also judge the importance of the technical document, execute technical or product classification, or establish own technical or product classification groups. The client may also file the read technical documents into the established technical or product classification groups.
Reading report input/output module 711c may access the client's personal report, as well as return to the server for future reading by the client. For example, reading report input/output module 711c may input the personal report of the client after reading the technical document and return the personal report to the server for future access.
The input of the reading report of the second part may utilize the authorization management to control the quality of the reading report. Reading report input/output module 711c also provides the viewing capability of direct viewing original patent document and the capability of outputting the entire reading report, such as, using specific format (e.g., Word) for outputting the reading report to facilitate the reading and modification by the client.
Authorization management module 713 may include a system administrator to manage the authorization of the system users, such as, using the project or database category to control the view, add, modify and delete access rights of the system users.
After connecting to internal database 301, such as relationship database, the preliminary computation, called database pre-stored program computation, may be performed according to the search condition. The pre-stored program computation may simplify the subsequent computation after connecting to internal database 301.
After connecting to internal database 301, such as relationship database,
Referring to
After reading report generation module 709 reads the reading report selection from the client end, reading report generation module 709 may first perform database pre-stored program computation according to the search condition to simplify the subsequent computation after connecting to internal database 301, such as relationship database.
Report commenting module 712 and report commenting platform 711d are connected to each other for receiving the report commenting data from the client end. The report commenting data may be stored in internal database 301, or returned and displayed on report commenting platform 711d.
Report commenting platform 711d may provide the client end, when reading, with the capability to record the comments and the commenting and the capability to view comment and commenting from other readers; thus, report commenting platform 711d allows sharing as well as enhancing the reading efficiency of technical documents. Report commenting platform 711d may also send the client's comment and commenting to technical document analysis system 300 or return the data analysis or pre-stored commenting in technical document analysis system 300 to the client end. When the client end expands the reading report, the report commenting may be recorded and readers may reader other readers' commenting, which is similar to the interactive interface of a blog.
Technical document analysis system may further include authorization management module 713. Authorization management module 713 is the system administrator managing the access rights of the system users. Authorization management module 713 may utilize the project or database catalog to control the view, add, modify or delete access rights of the system users, and to manage writing the reading report, publishing and viewing the report commenting by the users. Another exemplary embodiment utilizes authorization management module 713 so that all the users of the system may use reading report input/output module 711c to share the data generated by reading report generation module 711d, but may only use report commenting platform 711d to view the data generated by report commenting module 712 for a specific project, or only specific users are allowed to commentate and analyze through report commenting platform 711d in a specific project.
As shown in
The system may further provide an automatic or manual grading and commentating mechanism. In the analysis process, with the results or comments from reading report module or report commentating module, the present invention allows the user to consider the patent application situation and strategy to further understand the competitiveness of self with respect to the competitors, or the strength and the weakness of the current patent strategy in the current market. This may be used as the future reference for improvement or as the basis for insurance evaluation in terms of risks or insurance rate by the insurance companies.
Therefore, the exemplary disclosed embodiments of the present invention may integrate a plurality of original database, such as patent databases and research paper databases, to form a complete data groups. After the system analysis to form a series of analyzable statistic string tables, the relationship database may be established.
The exemplary disclosed embodiments of the present invention may avoid the unnecessary large amount of data transmission with the clients; therefore, the present invention may accelerate the data analysis processing speed and enlarge analysis scope as well as reduce the interface difference among different original databases.
Although the present invention has been described with reference to the exemplary embodiments, it will be understood that the invention is not limited to the details described thereof. Various substitutions and modifications have been suggested in the foregoing description, and others will occur to those of ordinary skill in the art. Therefore, all such substitutions and modifications are intended to be embraced within the scope of the invention as defined in the appended claims.
Number | Date | Country | Kind |
---|---|---|---|
096143564 | Nov 2007 | TW | national |
097140183 | Oct 2008 | TW | national |