The present invention relates to data mining using natural language processing and interactive user annotations, and more particularly to methods for viewing and searching a database of patents or other documents using tags based on semantic segmentation.
Despite advances in computing and search technology, legal discovery in intellectual property transactions continues to cost billions of dollars worldwide. For instance, take the example of the patent process—each phase in the patent process requires search and discovery by different parties, repeatedly. Each stakeholder such as the patent applicant, prosecuting attorney and examiner before grant, litigating attorney, defending attorney and licensing attorney after grant, performs their own due diligence and analysis—independently. The number of patent search and analysis tools available is almost as complex and assorted as the parties involved in post-grant transactions such as search experts, technology experts, lawyers and judges.
Patents are highly structured documents, and unlike broad internet search, they ought to be relatively easy to index and search. There are less than 100 million total patents worldwide—a small number by internet standards. Patents have well defined fields such as Title, Abstract, Claims and Specification (Description, Drawings, and References). The crux of the invention claimed by a patent is described in the Claims that are usually written in a prescribed format and style. The independent claims capture the core inventive steps, and the dependent claims describe extensions of the idea (which are additional constraints or ‘limitations’ on the independent claim in a legal sense). However, what makes the patent search hard is that despite the prescribed structure there are many ways to say the same thing. In order of increasing scope: a single word may have many synonyms, similar phrases, or technical equivalents; a set of claims may split ideas across independent and dependent claims in many ways; a patent may split content across claims, description, drawings and references in many ways; similar patents may have subtle differences in legal language for broader scope or patentability; patent classes may have high overlap or non-uniform coverage of technical areas; and finally the inventor's perspective impacts the focus of the invention as “one man's trash is another man's treasure”.
Patent search today is largely conducted via non-semantic keyword based search engines. This requires extensive experimentation with keywords and synonyms, Boolean and proximity operators, and multiple patent fields such as classes, title, abstract, claims, forward and backward citations, inventors, assignees, etc. It is a laborious process that requires a large amount of manual intervention and non-deterministic, iterative heuristics to achieve the right context. Patent search is a daunting prospect to the average inventor, to the extent that there is a multi-billion dollar industry engaged in services and tools for search and analysis of patents and broader Intellectual Property. There is a plethora of patent search engines in the market ranging from Government Patent Office Tools to commercial software packages and cloud services, to Google Patents. Each database has its own user interface, format, capabilities, performance, and portability of results.
As is well known in the search community, simple keywords do not capture the semantic context of search. While keyword search casts a wide net for potentially relevant patents (high ‘recall’), it has fairly poor ‘precision’—returning orders of magnitude more results than are relevant, depending on the length of search query and query words. In legal domains such as patent search, it is indeed important to have highest possible recall and not miss a potential patent match that could swing the pendulum in a billion-dollar freedom to operate, infringement, or invalidity trial. However, the poor precision of today's search engines vastly overloads the search and discovery process, slowing it down by orders of magnitude.
The present invention provides a semantic-segmentation based model of patent representation that enables more precise search, and also leads to a visually engaging user interface that accelerates user comprehension, among other things.
In a first aspect, a method for semantic tagging of a patent claim is provided, the method comprising: semantically analyzing and segmenting the patent claims to create tags for preambles, elements, sub-elements, and their respective attributes; identifying the type of claim, and segmenting the claim into a plurality of tags using Natural Language Processing based algorithms; editing default natural language based segments and tags into more precise or other invention specific segments by means of human curation; creating a flexible dictionary for each tagged segment that pulls in content from patent specification and images and external sources such as technical taxonomies.
In a second aspect, a method for searching for patents similar to the patent of interest by means of queries automatically generated with the semantic segments is provided. The method comprises: analyzing the user's query patent and creating a plurality of semantic tags by segmenting the claims of the user's query patent using natural language processing based algorithm; representing the patent documents on the basis of semantic-segmentation model; parsing the semantic tags to add synonyms, technical taxonomies, adding sub-field tags to identify relationship between the semantic tagged elements; indexing the user's query by mapping the semantic tags with the patent database to derive a result set; and ranking the relevancy score of result set based on semantic tag matching algorithm.
In a third aspect, a web-based user interface for systematically representing a patent claim or a concept that the user is interested in analyzing is provided. The user interface displays the patent claims or the concept into a plurality of semantic tags, wherein the plurality of semantic tags by segmenting the patent claim or concept using natural language processing based algorithm; the said user interface allows the user to edit, annotate, correct the plurality of semantic tags or add comments. The user interface further provides a dictionary feature that allows the user to see synonyms or taxonomies of selected text. The user interface allows the user to select the semantic tags to view the text from the specification and the figures where the selected semantic text is present. The segmentation and annotation provided in the above steps could be used for multiple purposes including, but not limited to: (a) better understanding of a given patent and annotating it for future use or for sharing among different users for patent prosecution, litigation, licensing, assertion, or other uses, (b) tagging the patent with new searchable semantic tags for improving the performance of the patent search engine, and (c) creating better search queries to search for similar patents.
Further objects, features and advantages of the present invention will become apparent from the following detailed description taken in conjunction with the accompanying figures showing illustrative embodiments, results and/or features of the exemplary embodiments of the present invention, in which:
In the following detailed description of embodiments of the invention, numerous specific details are set forth in order to provide a thorough understanding of the embodiment of the invention. However, it will be obvious to a person skilled in the art that the embodiments of invention may be practiced with or without these specific details. In other instances well known methods, procedures and components have not been described in details so as not to unnecessarily obscure aspects of the embodiments of the invention.
Furthermore, it will be clear that the invention is not limited to these embodiments only. Numerous modifications, changes, variations, substitutions and equivalents will be apparent to those skilled in the art, without parting from the spirit and scope of the invention.
The present invention provides a system and a method for classifying a patent document based on the essential components of the inventions. The method provides a generic way to inter-relate the essential components and associate a relative importance to the essential components. The method accomplish this objective by providing a way to semantically tagging the patent claim or concept using natural language processing based algorithm.
Embodiments of the method of the present invention utilize the fact that the inventions described in the patent documents are conceived around finite concepts. A typical inventor comes up with a new idea based on some existing ideas and concepts, and applies the idea to a system with finite components to extract some benefit. The invention consists of multiple conceptual components or ‘elements’, which may be objects, actions, processes, concepts, equations, reactions, code fragments, applications, etc. The novelty of the invention lies in the constitution of one or more of the elements, or the relationships among elements, or both—as captured in the claims. Embodiments of the present invention provide a method to call out the various assumptions and concepts in a typical invention described in a patent document in a much more explicit manner, such that they can be tagged and individually searched and analyzed. Most importantly, the present invention provides a method where the core invention can be pinpointed and tagged by using key components and their relationships. Embodiments of the invention also provide a method that allows association of estimated economic values and applications to the patent at an element level. The process of tagging all the patents with all possible applications of the invention and their respective economic values can be executed in number of ways such as by crowdsourcing or sole sourcing to one or more of: universities, subject matter experts, patent search firms, education testing services. Several monetization schemes can be designed to use these analytics in different patent centric scenarios—valuation, due diligence, litigation, IP transaction clearinghouse, patent, technology and business strategy, etc—and offered as a range of services from freemium for individual inventors to premium for corporate legal counsels.
The claims are the important constituents of the invention. Apart from defining the scope of protection for the invention, the claims categorically provide an overview of the novel and inventive aspects of the invention. The claims are formulated to define the essential components of the invention and how the essential components are related to each other. The claims are generally of two types: independent claims and the dependent claims. Independent claims stand alone and do not refer to other claims and the dependent claims refer to the independent claims and add limitation to the independent claims. A typical claim consists of a preamble part defining the field of the invention, a transitional phase that characterizes the element that follows and a set of limitations that define the attributes of the invention.
A patent can therefore be systematically represented by extracting semantic segments from independent and dependent claims—preamble, elements, sub-elements and respective attributes—and supplementing them with semantic segments from the Title, the Abstract and the Specification.
Segmenting and tagging a document generally requires creation of a data structure composed of (1) segment boundaries in the original document characterized by character or word locations or other positional markers of content, (2) segment content in the original document including text, images, or other content, (3) tag labels used to mark the segment as being of a certain tag type, and (4) tag content further characterizing the tag including text, images, links, references, and metadata entered by the user or recorded by the document management system. The tag content may be pulled from elsewhere in the document or from sources external to the document.
For semantic patent tagging proposed in this invention, the tag content may be a dictionary or lookup table, with each tag's dictionary containing terms similar in meaning or connotation to the segment content. The terms may be pulled from taxonomies, ontologies, bibliographies, indices, tables of content, summaries and descriptions of a multitude of sources: databases comprising language and grammar dictionaries and thesauruses, synonyms, homonyms, hypernyms, hyponyms, patent classes, library records, academic publications, scientific and technical publications, professional and business publications, and web glossaries.
Furthermore, the tag's dictionary may contain terms pulled from fields in the patent being tagged, or from fields in other patents. The field may be one or more of: title, abstract, claims, background, field of invention, summary of invention, description of figures, description of embodiments, specification, images, figures, drawings, tables, and references.
The tag contents may also contain a lookup table containing links and references related to the segment content. The links and references may be pulled from fields in the patent being tagged, or from fields in other patents, the fields comprising title, abstract, claims, background, field of invention, summary of invention, description of figures, description of embodiments, specification, images, figures, drawings, tables, and references. The links and references may also be pulled from external sources described above.
Implementation of tagging can be done by means of annotation software built with languages using HTML, CSS, Javascript, JQuery, EmberJS, AngularJS, coffeescript, NodeJS, XML, HTML5, java, C, C+, Csharp, python, Django, Natural Language Toolkit (NLTK) in python, Open NLP in Solr, Solr/Lucene, Tesseract Optical Character Recognition, and many other languages and software packages.
Embodiments of the present invention provide a method and a search engine that create automatic tags for preamble, elements/sub-elements and their attributes in the patent claims by segmenting the claims using natural language processing based algorithm. Since the core invention can be described using the independent and dependent claims, therefore the claim can be used to identify the details of the invention. The method uses a NLP (Natural Language Processing) based algorithm to identify the type of claims such as identifying whether the claim is a method claim, system claim or an apparatus claim among others. Similarly the nature of claim is identified using the NLP based algorithm to categorize the independent claims and the dependent claims, for example by searching for the word “claim” or numbers in the first few words. The method further uses the NLP based algorithm to segment independent claims into tags such as noun phrase, preposition phrase. The dependent claims are also segmented into tags for attributes of elements and sub-elements. The method ensures that the preamble, element and sub-elements and the attributes for each element/sub-elements are automatically tagged while the generic language components are not tagged, but may be incorporated into the element/sub-element tags or their attributes.
The Natural Language Processing engine contains a pipeline of blocks that (1) parse the patent into words separated by whitespaces (tokenizer), (2) tag the words with their grammatical part of speech (POS tagger), (3) chunk the tags into phrases of interest such as noun phrases, preposition phrases, verb phrases, adjective phrases, etc (chunker), (4) semantically tag the chunks into tags of interest such as claim preamble, elements, sub-elements, or their respective attributes.
After identifying the punctuation marks in step 402, if the punctuation contains semicolon or colon in addition to the commas, as shown in step 414, then the process proceeds towards verifying structure of the claim in terms of preamble and elements, and extracting Noun Phrases after colon or semi colon as depicted in step 416.
In an alternate embodiment of the present invention, natural language processing algorithms may be modified to identify semantic tags of patents written in languages other than English, by identifying the appropriate grammar structures and parts of speech in those languages. Alternatively, natural language processing algorithms may be applied to English translations of patents originally written in non-English languages.
In alternate embodiment of the present invention, the economic value or monetary value can be attached in addition to the semantic analysis. The patents can be tagged at an element level with possible applications of the invention and the economic value of the applications. Then while preparing a query, these economic values can be used as second field, in addition to semantic analysis, to further refine the search results.
The method automatically creates a dictionary for each tag using external databases including synonyms, language/grammar dictionaries, technical taxonomies, academic publications, and library bibliographies. The dictionary additionally contains related terms from internal databases such as patent classes, other patents, or other fields in the patent being tagged. For example, the NLP algorithm extracts terms and definitions from the patent specification that are relevant to tags such as preamble, elements, and sub-elements.
In an embodiment of the present invention, the method can be used to create a patent database that contains patents with claims segmented in semantic tags and having a global dictionary that contains all the keywords that are present in all the patents with possible synonyms and technical terms.
In another embodiment of the present invention the method for semantic segmentation can be used in a patent search engine, thereby using the patents tagged with semantic segments in a database to do better searches by using queries that call out the specific tags.
In another embodiment of the present invention, a method for searching similar patents by generating keywords or search queries based on semantic segmentation of the claims is provided. When a search query is entered, the claims of the patent being searched are segmented into various fields namely preamble, key elements, and sub-elements. This segmentation is then used to create better, more accurate, search queries.
All of these segmentations and coding are done in an automated fashion thereby providing the user a very quick, visual, and easy way to assess the key semantic interpretation of the Claim. The method also enables the user to correct any faulty segmentation provided by the automated engine and to add user's own comments, thereby providing a powerful way to the user to correct interpretation of the Claim. This corrected or curated information could then also be used in any subsequent steps including annotation of patents for future use or sharing, creating better keywords or search strings.
Once the claims are semantically segmented and a better search query is generated using the segmentation, a query parser adds synonyms, technical taxonomy or technical terms using the global dictionary. The search query is then indexed to add sub field tags within claims to capture the WHAT, WHY and HOW elements. The method maps the semantic tags to match with the existing patents in the database and identifies the relevant patents showing similarity with the semantic tags. The scorer uses these semantic tags to rank the results by relevance and the result set containing the relevant patents are displayed to the user. The ranking algorithm uses the criteria where the patents that have more semantic tags matching with the query key words are ranked higher than those with less tags matching the query keywords. The method displays the closest patent classes based on query the keywords. It may also display some description of the top patents found to the user. It then asks for a selection, and if the user selects none of the result then the method displays more patent that are closer to the search query. The method searches deep in selected classes (using maximal class-specific synonyms, ranks by tags) and if the user wants more, then the method searches in other classes by selecting alternative synonyms. The ranking algorithm of the method provides the option of ranking the relevant closest patent by field: title, abstract, claim tags, claims, description, references and rank by proximity.
In one embodiment of the invention one or more searches performed can be saved in a search history and made available to the user to selectively edit and recompose from, to converge faster to the correct results.
In an embodiment of the present invention, a search engine is provided that utilizes the method for searching similar patents by generating keywords based on semantic segmentation of the claim, as described above. The search engine is based on performing search for closest patents using the semantic segmentation of claims, tagging the claims for generating keywords and mapping the generated keywords for identifying the closest patent. The keywords are mapped to the patents stored in the patent database. The mapping of the keywords based on semantic segmentation of claims is performed by semantically segmenting the claims of patents stored in the patent database.
The typical search query consists of keywords or phrases. According to this invention the search query may consist of one or more of: keywords, phrases, pseudo-claims, segments, tags, tag dictionaries, tags and segments viewed by means of a user interface, and tags and segments edited by means of a user interface. The user interface 202 is described in more detail in a later section.
A simple representation model for the search engine as described in the embodiment of the present invention that captures the typical capabilities provided by the existing search engines and build it up to the semantic model is described below. Remarks on notation used in the following equations: lowercase unbolded variables are scalars, lowercase bolded variables are row vectors (special cases: 1=vector of all ones, 0=vector of all zeros, 1[i]=vector of ones and zeros with ones at location (or indices) marked in the set [i]), uppercase unbolded variables are constants, uppercase bolded variables are matrices, a[i] is the value in the ith location of a, A[i,j] is the value in the ith row and jth column of A, for a 1×A vector a the 1-norm is defined as |a|=Σi=1i=A|a[i]|, the transpose of row vector a is the column vector a′, the inner product of two 1×K vectors is defined as ab′=Σi=1i=Ka[k]b[k].
A global dictionary with a list of global keywords is assumed to exist, which includes all possible keywords that occur in the database of patents. Some of these keywords may not occur in any patents but may be used in search queries, e.g. as synonyms. The global dictionary is described as a row vector g in Equation 1. These keywords may be single words or phrases of co-occurring words such as n-grams, where n is typically 2 or 3. They may be listed in ascending or descending alphabetical order, or some other order suitable to speedy implementation in hardware.
Global keyword dictionary (1×K) g=[g1 . . . gk . . . gK]
Equation 1: Dictionary of all Possible Keywords as a 1×K Vector, where K is Very Large
A patent contains some of these keywords (not in the same order as in g), and can be represented as an indicator vector or incidence vector relative to g. As shown in Equation 2, the indicator vector has zeros everywhere except at the indices where the patent contains words in common with g, where it is equal to ‘1’. While a simplest representation of a patent as an indicator vector with ‘1’s to indicate presence of the corresponding keyword in g is used, more advanced representations may be used, such as those taking into account the number of occurrences of the keyword.
The uth patent as an indicator vector (1×K) pu=1[u]=[0 . . . 1[u] . . . 0], |pu|=total keywords in patent
Equation 2: Representation of a Patent as an Indicator Vector Relative to Dictionary g—with ‘1’s at Indices where Patent Keywords Occur in g
All patent indicator vectors can be stacked up to represent the entire database of patents as a matrix, shown for a database with U patents in Equation 3.
Note that any database can be represented in this fashion, in particular the patent classes and their descriptions can be represented in the manner described here and searched for in the manner described in the following.
The user's Search Query consists of a bunch of keywords, which can also be represented as an indicator vector relative to g as shown in Equation 4. As mentioned earlier, the dictionary is assumed to contain all possible user query keywords, which makes this representation possible. For simplicity, it is assumed that the query keywords are distinct, i.e. none of them are repetitions.
Search Query keywords as an indicator vector (1×K) q=1[q]=[0 . . . 1[q] . . . 0], |q|=total keywords in query
When the user performs a search, the query keyword is matched against all patents. This is mathematically shown Equation 5, where a nominal ‘rank’ of patent pu against query q is defined. The more the query words found in the patent, the higher is its rank Note that this vector product is properly defined because both the patent and query are represented consistently relative to the same global dictionary.
Nominal search rank of the uth patent ru=puq′=Σk=1k=Kpu[k]q[k]
Equation 5: Rank of a Patent Defined as the Inner Product of a Patent with Query
Search rank of all patents in the database is a vector as shown in Equation 6. This nominal rank measures the query keyword count in each patent.
Search Query operators can be mathematically implemented by selecting patents with certain rank values against the query as shown in Equation 7.
Note that the per-operator conditions described on submatrices Pop in Equation 7 are element-wise conditions on each element of the column vector rop=Popq′. To implement combinations of operators, successive operators can be applied on successive submatrices, as shown in Equation 8 for the example query=(OR (all keywords in q1)) AND (OR (all keywords in q2)).
OR on q1=>take submatrix P1 of P such that P1q1′≧1,
OR on q2=>take submatrix P2 of P such that P2q2′≧1,
if P1 is the smaller than P2, result=submatrix
if P2 is the smaller than P1, result=submatrix
More sophisticated methods using advanced algebra may be applied for applying complex operators to complex queries. For example, operators can be implemented as a non-linear function φ as shown in Equation 9.
Rank list after operators (Ū×1)
Synonyms may be added to the query by asking for user input or by automatically accessing a language dictionary (WordNet) or technical taxonomies (IEEE Explore, Library of Congress, PubMed etc). For each query keyword qi in the query vector q (total keywords=sum of nonzero positions=|q|), synonyms are represented as indicator vectors relative to g and then added to the keyword as shown in Equation 10 (assuming they are all distinct, and different from the keyword). This is done for one query keyword at a time, qi=1[i] has only one nonzero entry at the location contained in [i]. The corresponding synonym vector qi,syn has nonzero entries at locations contained in [qi,syn], representing all included synonyms of qi.
Break up q into single-keyword indicator vectors q=Σi=1i=|q|qi=Σi=1i=|q|1[i]
Synonyms as an indicator vector qi,syn=1[q
New query vector for qi={circumflex over (q)}l=qi+qi,syn
New rank for {circumflex over (q)}l={circumflex over (r)}=p{circumflex over (q)}l′=p(qi+qi,syn)′=r+pqi,syn′≧r
To perform OR of {keyword, synonyms} in {circumflex over (q)}l, take submatrix Ps of P such that Ps{circumflex over (q)}l≧1
The additive operation increases the rank as it finds more potential matches. In other words, for a fixed rank threshold above which patents are returned in results, this increases the number of returned patents, as expected by adding synonyms.
This per-keyword operation can be compactly expressed by the more general method of Query Expansion. Most search engines use query expansion to conduct parallel searches. This can be implemented as an expansion of the query vector to a matrix as shown in Equation 11.
This outputs a rank matrix, with columns corresponding to input query rows. For general query expansion, this rank matrix can be further analyzed to derive optimal results, e.g. to tune the search engine by adjusting weights described elsewhere in this document. For our case of synonyms, this format makes it easy to add synonyms independently to each keyword row as shown in Equation 12.
Proximity of Search Query keywords is another feature offered by most modern patent search engines. As shown in Equation 13, it can be added to our model as a diagonal weighting matrix W(q) that is a function of the query. Each proximity weight wu(q) is inversely proportional to the distance spanned by query keywords q occurring in patent pu. It may be defined simply as wu(q)=1/(1+δ(q)) where δ(q)=the minimum number of words separating all keywords in query, i.e. words between the first occurring keyword and the last occurring keyword in the patent (excluding the keywords), over all occurrences of the keywords in the patent. Other definitions may be used, for example to account for cases when only some of the keywords are found (i.e., ru<|q|). In order to differentiate the weighted rank from the pure (keyword count) rank, we call the weighted rank a ‘score’ instead.
Note that for any kind of rank weighting, application of search operators becomes trickier, and it is generally easiest to apply search operator selections to the rank list before applying weights. An alternative implementation of query expansion shown in Equation 14 may be useful for weighting scores. The query vector is expanded into a Q-times longer vector containing alternative queries (for example synonym-expanded keywords described earlier), and the patent matrix is replicated into a diagonal matrix. The resulting rank vector is a Q-times longer vector that can be weighted by any meaningful weight matrix V.
Let us use the notation from Equation 14 to re-do with synonyms the proximity example of Equation 13. The re-done example is shown in Equation 15, where q contains the per-keyword synonym vectors {circumflex over (q)}l defined in Equation 10, V contains the keyword proximity weights vu(q) defined similarly to wu(q) in Equation 13, for each patent u that survives operation φ (submatrix selection shown in Equation 12).
In more sophisticated engines, information about patent classes may be used to improve search. For example, the most frequent keywords in each class may be identified and tagged in the patent database matrix P. When the query keywords contain these class words, patents in that class may be weighted higher. Class weights can be incorporated similarly to proximity weights, as shown in Equation 16, as a diagonal weighting matrix C(q) that is a function of the query, and each weight cu(q) is a function of the patent's class and query. Weights can be set to 1 and 0s to select any particular class.
Technology-specific phrases and acronyms are often important in patent classes. As an alternative to n-grams which are computationally intensive to index, a simpler way to implement class-specific phrase search is to apply proximity weights in conjunction with class weights.
Almost all search engines offer search within patent fields such as Title, Abstract, Claims, Specification etc. This can be easily incorporated into our model by representing each field as an indicator vector against the dictionary g, and adding them to the patent vector. The patent vector extends to a patent matrix, with each row representing a field of the patent as shown in Equation 17 for total F fields, including the original full patent as field.
Patent fields can also be weighted to emphasize certain fields over others. Academic literature shows that keyword searches in Title, Abstract and Claims tend to yield more accurate results than searches in Specification. Therefore a simple way to improve relevance of results is to weight these fields higher than Specification. Equation 18 illustrates weighting by fields. Weights can be set to 1 and 0s to select any particular field. The weights shown are uniform across patents and may be made a function of class, for example to de-emphasize fields that are known to be sparse in certain classes.
Embodiment of the present invention proposes semantic segmentation of Claims with enhancement from other fields, to create new searchable fields from Tags. An example of tags called “Elements” is shown in Equation 19. “Elements” centers around the invention elements described in Claims, and enhances them by pulling in relevant content from the Title, Abstract and Specification. Details of how “Elements” and other Tags are created were described in the previous section. This invention further proposes designing the weight vector judiciously to improve search results—by taking advantage of the fact that Tags such as Elements are semantically curated fields and should generally be weighted higher than other fields. In some cases, optimally designed Tags fields may be exclusively used for high relevance search, over any other fields.
The relative expected lengths of existing and proposed patent fields are schematically shown in Equation 20 by dashed lines.
Another embodiment of the present invention is a user display (User Interface) that utilizes the novel semantic segmentation technique as described in the previous embodiments of the invention. This user interface is used in analyzing any given patent or document and provides a unique method of viewing different segments of that patent (or document) in a way that provides the user very critical information towards understanding that patent (or document). The user can then use and modify this information to perform various steps. These steps may involve, but are not restricted to, providing better information or keywords for searching a specific concept or patent, doing a more thorough due diligence of a particular patent or technical document, and annotating the patent or technical document for future use or sharing.
In another embodiment of this invention, the user is also provided with a way to edit the tags and segments, for example to correct any errors occurring in the automated NLP engine.
In another embodiment of this invention the user is also provided with an automated way to show possible synonyms or technical mapping (taxonomy) of any selected word group.
In another embodiment of this invention, the user is provided with a method to automatically extract and display the relevant figure from the patent along with a description of the figure and a legend of components labeled in the figure.
In another embodiment of this invention the user is also provided a method of automatically seeing the relevant word segments from various parts of the patent specification. The user is given an ability to select any specific word or tag.
The search results and claims worksheet can be edited, saved or printed in user selectable formats by authorized users (for example in a secure system), and shared with select users.
The search engine and method of the present invention provides specific advantages over the existing search engines. The users can edit and annotate tags, choose colors (color, font size, other markers), and annotate any text or drawing with comments. The user can save, retrieve, share annotations with select other users. Algorithm for merging multi-user annotations (majority rule, ignore common words if conflict) can be provided. User can search for similar patents—by default claim elements are used in search query, user. Dictionaries for tags is provided—user sees dictionary of tag by clicking on it, and can browse, edit, add, share dictionaries of tags, and use or remove them in a search query. Figures for tags is provided—user sees corresponding figure by clicking on tag, figure shows tag keywords highlighted in labels in matching colors (as a legend or overlaid on figure). Image processing based methods including OCR to identify figure number and labeled invention components, NLP to associate figure number with labeled invention components is provided. Specification quotes for tags—user sees quotes from specification that includes selected tag, user can edit tag's dictionary by selecting, deselecting, annotating quotes is provided. Natural Language processing to find best quote (e.g. sentence/paragraph that contains most # tag keywords) is provided.
In another embodiment of the present invention, the search platform stores the metadata associated with a user's search session and history, and provides the user with a view/edit interface to the metadata. The user can store all data related to one search under a selected title. The search history begins with the first search query in the first search session and ends with the final search results and/or documents being delivered to the customer in the final search session. The search engine stores the search strings and metadata associated with each search session. The user may perform a number of operations such as search, view, edit, and save, on a number of documents such as patents, patent applications, image file wrappers, patent tags, uploaded external publications—all of which is recorded along with time stamps. The stored data can subsequently be retrieved by the user in a later session. This feature enables review of organizational workflow statistics for operational efficiencies and functions such as performance evaluation, billing, tool performance, etc. The platform also allows selective sharing of workflow with users in the same or external organizations.
The foregoing merely illustrates the principles of the present invention. Other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed invention from a study of the drawings, the disclosure, and the appended claims. In the claims, the word “comprising” does not exclude other elements or steps and the indefinite article “a” or “an” does not exclude a plurality. The mere fact that certain measures are recited in mutually different dependent claims does not indicate that a combination of these measures cannot be used advantageously. Any reference signs in the claims should not be construed as limiting the scope of the claims. Various modifications and alterations to the described embodiments will be apparent to those skilled in the art in view of the teachings herein. It will thus be appreciated that those skilled in the art will be able to devise numerous techniques which, although not explicitly described herein, embody the principles of the present invention and are thus within the spirit and scope of the present invention. All references cited herein are incorporated herein by reference in their entireties.
This application claims benefit to U.S. Provisional Patent Application No. 61/801,594, filed Mar. 15, 2013, the disclosure of which is incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
61801594 | Mar 2013 | US |