The present application claims the priorities of Korean Patent Application Nos. 10-2023-0142479 (provisional application, filing date: Oct. 23, 2023), 10-2023-0142478 (provisional application, filing date: Oct. 23, 2023), 10-2023-0195362 (regular application, filing date: Dec. 28, 2023), and 10-2024-0137855 (regular application, filing date: Oct. 10, 2024), the entire contents of which is incorporated herein for all purposes by this reference.
The present invention relates to a document search method for providing similar information on similar documents.
Existing patent search platforms have the function of providing similar prior documents based on keywords or search conditions input by a user. However, in case an amount of similar prior documents provided by the patent search platform is very large, the user is burdened with having to select the documents he/she want from a vast number of documents. There are also frequent cases where the provided documents do not match the user's search intent, which deteriorates reliability of the search results. For this reason, the user has to spend a lot of time and effort to reach the information he/she are looking for.
In order to help the user find the necessary information efficiently, more sophisticated search algorithms and user-friendly filtering functions are essential. It is necessary to reinforce the function of evaluating and classifying the relevance of prior documents so that the user can easily find related documents directly. In addition, improvement in an interface that provides search results more intuitively and clearly is also required.
A purpose of the present invention is to provide a document search method that shows similar information on similar documents through a user-friendly interface.
A document search method of the present invention may comprise the steps of: searching for similar documents based on at least one phrase input by a user, listing the searched similar documents, and listing from the searched similar documents cards including at least one phrase or at least one similar phrase having a meaning similar to the at least one phrase.
The document search method of the present invention can provide search results to the user through a user-friendly interface. The present invention can collect and list sentences or paragraphs including phrases input by the user among similar documents. The user can easily find the similar documents desired by the user using the listed sentences or paragraphs.
Since the present invention may make various modifications and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail. However, this is not intended to limit the present invention to the specific embodiments, and it should be understood that this includes all modifications, equivalents, or substitutes included in the spirit and technical scope of the present invention.
The terms such as first and second may be used to describe various components, but the components should not be limited by these terms. The terms are used only for the purpose of distinguishing one component from another component. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and likewise, the second component may also be referred to as the first component. The term “and/or” includes a combination of a plurality of related/described items or any one of the plurality of related/described items.
In case a certain component is referred to as being “linked” or “connected” to another component, it should be understood that there may be other component between them, although the certain component may be directly linked to or connected to another component. On the other hand, in case a certain component is referred to as being “directly linked” or “directly connected” to another component, it should be understood that there are no other component between them.
The terms used in the present application are used merely to describe specific embodiments and are not intended to limit the present invention. A singular expression covers a plural expression unless the context clearly indicates otherwise. In the present application, the terms such as “comprise” or “have” are intended to specify the presence of a feature, number, step, operation, component, part, or combination thereof described in the specification, but should be understood not to preclude the presence or addition of one or more other features, numbers, steps, operations, components, parts, or combinations thereof.
In this regard, the terms such as “about” and “substantially” described throughout the specification are used in the meaning of a numerical value or a value close to the numerical value when inherent manufacture and material tolerances are presented in the meaning mentioned, and are used to prevent unscrupulous infringers from unfairly utilizing the disclosure in which accurate or absolute values are mentioned to help understanding of the present invention. The terms “step of doing˜” or “step of˜” used throughout the specification of the present invention do not mean “step for˜.”
In this specification, the term “part” includes a unit realized by hardware, a unit realized by software, and a unit realized using both of them. In addition, one unit may be realized using two or more hardware, and two or more units may be realized by one hardware.
According to an embodiment of the present disclosure, a ‘module’ or ‘part’ may be implemented as a processor and a memory. The ‘processor’ should be broadly construed to include a general-purpose processor, a central processing unit (CPU), a microprocessor, a digital signal processor (DSP), a controller, a microcontroller, a state machine, and the like. In some circumstances, the ‘processor’ may also refer to an application specific integrated circuit (ASIC), a programmable logic device (PLD), a field programmable gate array (FPGA), and the like. The ‘processor’ may also refer to a combination of processing devices such as, for example, a combination of the DSP and the microprocessor, a combination of a plurality of microprocessors, a combination of one or more microprocessors in conjunction with a DSP core, or any other combination of such configurations. Further, the ‘memory’ should be broadly construed to include any electronic component capable of storing electronic information. The ‘memory’ may also refer to various types of processor-readable media such as a random-access memory (RAM), a read-only memory (ROM), a non-volatile random-access memory (NVRAM), a programmable read-only memory (PROM), an erasable-programmable read-only memory (EPROM), an electrically erasable PROM (EEPROM), a flash memory, a magnetic or optical data storage device, a register, etc. If the processor can read information from the memory and/or write information to the memory, the memory is called to be in electronic communication with the processor. The memory integrated in the processor is in electronic communication with the processor.
In the present disclosure, the ‘system’ may include at least one of a server device and a cloud device, but is not limited thereto. For example, the system may be configured of one or more server devices. As other example, the system may be configured of one or more cloud devices. As another example, the system may be operated by configuring the server device and the cloud device together.
Some of the operations or functions described as being performed by a terminal, an apparatus, or a device in this specification may instead be performed by a server connected to the terminal, the apparatus, or the device. Likewise, some of the operations or functions described as being performed by the server may also be performed by the terminal, the apparatus, or the device connected to the server.
Unless defined otherwise, all terms used herein, including technical or scientific terms, have the same meaning as generally understood by a person who has an ordinary knowledge in the technical field to which the present invention belongs. The terms such as those generally defined in the dictionary should be interpreted as having a meaning consistent with the meaning of the context of the relevant technology, and shall not be interpreted in an ideal or overly formal sense unless defined in the present application explicitly.
Hereinafter, with reference to the attached drawings, preferred embodiments of the present invention will be described in more detail. In order to facilitate overall understanding in describing the present invention, the same reference numerals are used for the same components in the drawings, and redundant descriptions of the same components are omitted.
A document search method of the present invention can provide search results to a user through a user-friendly interface. The present invention can collect and list sentences or paragraphs containing phrases that the user is looking for among similar documents. The user can easily find the similar documents desired by the user using the listed sentences or paragraphs.
Hereinafter, the attached drawings illustrate a user interface screen displayed to the user through the document search method of the present invention. Hereinafter, for the convenience of explanation, the document search method of the present invention is expressed as providing the screen illustrated in the drawings, but the present invention is not limited thereto. The fact that the document search method of the present invention provides the screen may mean generating data for providing the screen illustrated in the drawings and transmitting the generated data to a user's device. In addition, in the following descriptions, the ‘user’ may be a person who searches for similar documents similar to a target invention through the document search method of the present invention.
The document search method may provide a screen for receiving information on a target invention from a user. The document search method may receive information on the target invention in various ways (for example, searching by a sentence, searching by a patent number, searching by an image, uploading a document that discloses the invention). In case the user selects a search by the sentence, the document search method may display element input fields 111 and 112 for receiving information on the target invention. The user may input information in the form of phrases into the element input fields 111 and 112, respectively. The phrases input into the element input fields 111 and 112 may correspond to elements of the target invention, respectively. That is, the phrases input by the user may correspond one-to-one with the elements of the target invention, respectively. In the following descriptions, the phrase means to consist of one or more words. For example, the phrase may be in the form of a sentence in which a plurality of words are listed, or may be in the form in which the plurality of words are combined by an operator (for example, and, or, not, etc.) or one or more words are listed. The term “element” of the invention may be expressed as a configuration or component of the invention, but it is not limited thereto. The term “element” of the invention may also refer to the technical means or effects of the invention.
The document search method may display a search condition 113 for searching a document. In
If the user completes input of information on the target invention or/and setting of the search condition, the user may press a search button 114 at a bottom of the screen. The document search method may search for similar documents similar to the target invention when the user's input clicking the search button 114 is obtained. The name of the button is not limited thereto.
The similar documents similar to the target invention may be determined as being similar to the target invention among candidate documents based on document similarity between the target invention and the candidate documents. The document similarity may be corrected by a phrase similarity statistical value of the relevant documents. For example, the phrase similarity statistical value of a first candidate document may be calculated using a phrase similarity statistical value for a first element, the phrase similarity statistical value for a second element, . . . , the phrase similarity statistical value for a kth element (k is an integer of 1 or more) and a weight for each element. The phrase similarity statistical value for the kth element may be any one of a highest value, an average value, a lowest value, or an intermediate value of similarity(s) between the first similar phrase to the nth similar phrase corresponding to the kth element in one candidate document, or a n value (n is an integer of 1 or more) which is the number of frequency of appearance of the similar phrase, but is not limited thereto. The weight for each element may be set by a user or an administrator. Alternatively, the weight may be inversely proportional to the similarity statistical value for each element of the entire documents. For example, since a universal element has a high similarity statistical value for the corresponding elements, a low weight may be applied to the corresponding similarity.
The document search method may provide a screen for receiving information on a target invention from a user. The document search method may use the method described with reference to
Although the name of the button is not limited thereto, if the user clicks the load button, the document search method may provide a screen 130. The screen 130 may provide a patent search area 131, a content area 132 showing contents of the patent document searched by the user, and a claim area 133 showing claims of the patent document searched by the user.
The document search method may recognize claims selected by the user as the target invention. The document search method may divide the claims selected by the user into a plurality of phrases. The plurality of phrases constituting the claims may correspond to elements of the target invention, respectively. That is, if information on the target invention input by the user is information on a patent document, the elements of the target invention may be expressed by one of the claims of the patent document or may be corrected from them by the user. This content is described in detail with reference to
If the user completes input of information on the patent document, the user can press a search button. The document search method can search for similar documents similar to the target invention when input of the user who clicked the search button is obtained.
As described with reference to
The document search method may provide a search result screen 200 displaying search results for similar documents similar to a target invention. The search result screen 200 may include a document area 210, a matching area 220, and a phrase area 230. However, the present invention may not include some of the areas 210 to 230 shown in the search result screen 200 and may additionally include unshown areas, without being limited thereto.
The document search method may display in the document area 210 a similar document list, similarity between the target invention and the similar documents, and similarity between elements of the target invention and phrases included in the similar documents. The similarity may indicate a highest value, an average value, a lowest value, or an intermediate value among the similarities between the element for each element and each of the phrases, or may mean the number of frequencies of appearance of the similar phrases.
The document search method may search for similar documents disclosing similar inventions similar to the target invention in a database. The document search method may designate as the similar documents determined to be similar to the target invention among candidate documents based on document similarity between the target invention and the candidate documents stored in the database. The similar documents may be documents whose document similarity with the target invention is greater than or equal to a threshold document similarity. For example, the threshold document similarity may be a value predesignated by a user or an administrator. As another example, the threshold document similarity may be an average value of similarity between the candidate documents and the target invention. In this specification, the similarity between the target invention and the candidate documents may mean similarity between the target document and the candidate documents. The target document may be in the form of a document generated based on phrases input by the user or patent documents input by the user.
The document similarity between the target invention and the candidate documents may be calculated based on at least one of an average value of the phrase similarity of at least one phrase included in the candidate documents, a maximum value of the phrase similarity, a maximum value of the number of frequency of the at least one phrase, an average value of the number of frequency, and a minimum value of the number of frequency.
The document search method may generate a similar document list regarding the similar documents. The similar document list may include information on the similar documents. For example, the information on the similar documents may include a document number (application number, publication number, registration number, etc.), a document date (application date, publication date, registration date, etc.), and information on an applicant.
The document search method may generate document similarity information between the similar documents and the target invention. The document search method may display on the search result screen 200 the document similarity information between the similar documents and the target invention adjacent to the similar document list.
Further, the document search method may generate phrase similarity information between phrases of the similar documents and elements of the target invention. The document search method may display on the search result screen 200 information on similarity between the similar phrases included in the similar documents and the elements of the target invention adjacent to the similar document list. The phrase similarity information may be corrected based on a document similarity value between the target invention and the candidate documents. The similar phrase may be a phrase whose phrase similarity between the elements of the target invention among the phrases contained in the similar documents is greater than or equal to a threshold phrase similarity. In the following descriptions, the kth similar phrases may be phrases whose phrase similarity with the kth element is greater than or equal to the threshold phrase similarity among phrases of the similar documents. The nth-k similar phrase may be a phrase whose phrase similarity with the kth element is greater than or equal to the threshold phrase similarity among phrases of the nth similar document. n and k are natural numbers. The threshold phrase similarity may be a value predesignated by a user or an administrator. In addition, the threshold phrase similarity may be an average value of similarity between phrases included in the similar documents and elements of the target invention.
The document search method may generate kth phrase similarity information and/or nth-k phrase similarity information. The kth phrase similarity information and the nth-k phrase similarity information mean similarity information on the kth similar phrase and the nth-k similar phrase, respectively. For example, the nth-k phrase similarity information may be a highest value among similarities of the nth-k similar phrases, the number of the nth-k phrases, etc. For example, the ‘all’ shown in
The document search method may comprise a step of searching for the similar documents, based on the 1st phrase to the mth phrase in the form of sentences listing a plurality of words input by the user (wherein m is an integer of 1 or more) and the nth phrase to the kth phrase in the form of a word or a plurality of words combined by an operator (wherein n and k are integers of 1 or more). In this case, the searched similar documents may be documents that include the nth phrase to the kth phrase while including the 1st similar phrase to the mth similar phrase having the meaning similar to the 1st phrase to the mth phrase, respectively. For example, among the documents including the 1st similar phrase to the mth similar phrase having the meanings similar to each of the 1st phrase to the mth phrase input by the user, the documents that include the nth phrase to the kth phrase as they are may be provided.
The document search method may display in a phrase area 230 information on similar phrases included in similar documents selected by the user. The user may select one of the similar documents displayed in the document area 210. An initial screen of the search result may display information on similar phrases included in the first similar document. The first similar document means a similar document first displayed in the document area 210.
The document search method may display similar selection items adjacent to the similar phrases displayed in the phrase area 230 so that the user can finally select whether the similar phrases are similar or not. The similar selection items may consist of similar, related, and dissimilar. The user may select one of the similar selection items to select final similarity of cards. This is described in detail with reference to
The user may select one of the similar phrases displayed in the phrase area 230. The document search method may display a portion that discloses the selected similar phrases among the similar documents in the matching area 220. The document search method may display in the matching area 220 not only the similar phrases but also the contents before and after the portion including the similar phrases in the similar documents. If the user clicks the selected similar phrase once more, or mouses over it, the document search method may display in the matching area 220 the next portion including the similar phrases in the similar documents. The document search method may search whether there is a portion including the similar phrases in the entire similar documents (bibliographic information, detailed description, claims, drawings).
The document area 210, the matching area 220, and the phrase area 230 may be listed in a first direction. The sub-groups included in each of the document area 210 and the phrase area 230 may be listed in a second direction. The sub-groups in the document area 210 may be information on each of the similar documents. The sub-groups in the phrase area 230 may be similar phrases divided by each of similar elements. The first direction and the second direction may be perpendicular to each other. For example, if the first direction is a horizontal direction, the second direction may be a vertical direction.
A card mode screen 300 such as
Each of the areas may display one of a similar document list, similar document information, and first to fourth type layout screens. The sub-areas may display the content displayed in the areas more concretely. For example, each of the sub-areas in the area displaying similar document information may display one of bibliographic information, detailed description, claims, and drawings of similar documents. Specifically, the sub-areas displaying bibliographic information may display at least one of legal status of the relevant document (e.g., publication, registration, etc.), a document type, application number, application date, publication number, publication date, international publication number, international publication date, priority information, first priority date, an applicant, an inventor, a current right holder, abstract, and technical classification (e.g., IPC, CPC, etc.). The sub-areas displaying the claims may display only independent claims or both the independent claims and dependent claims according to the user's selection.
The first type layout screen 301 is used to display a card corresponding to one element for the selected similar documents. When the area illustrated in
The card mode screen 300 may consist of various combinations of similar document information and the first type layout screen 301. For example, a first area may display information on the selected similar documents, and a second area may display the first type layout screen for the selected similar documents. In this case, each sub-area included in the first area may display one of bibliographic information, detailed description, claims, and drawings for the selected similar documents, and each sub-area included in the second area may display cards corresponding to different elements for the selected similar documents. As another example, each of a plurality of areas may display the first type layout screen 301 for different similar documents.
When searching for the similar documents, one or more phrases corresponding to each of one or more elements may be input. The first type layout screen 301 may display first cards 164a, 164b and 164c including one or more first phrases corresponding to one element or first similar phrases similar to the one or more first phrases, among the selected similar documents. The first cards 164a, 164b and 164c may be listed in a second direction perpendicular to a first direction. A plurality of cards 164a, 164b and 164c may be listed according to similarity with the first phrases.
Herein, the card means a part of similar documents including at least one phrase input by a user or a similar phrase having a meaning similar to the at least one phrase. Referring to
Referring to
The document information area 163 is an area for displaying information on the selected similar documents. The document search method may display brief information on the similar documents (for example, at least one of title of the invention, application number, publication number, and registration number) through the document information area 163.
In this case, according to the user's input, the document search method may set a scope from which the card is extracted. For example, the scope from which the card is extracted may be any one or more of the title, abstract, independent claim, dependent claim, entire claim, and description of the invention, or may be a combination thereof. The scope from which the card is extracted may be set differently for every phrases, respectively. The set scope may be displayed together with the selected phrases in the selection phrase area 161.
The second type layout screen 302 is used to display cards corresponding to a plurality of elements for selected similar documents. The card mode screen 300 may consist of various combinations of similar document information, the first type layout screen 301, and the second type layout screen 302. For example, a first area may display information on the selected similar documents, a second area may display the first type layout screen 301 for the selected similar documents, and a third area may display the second type layout screen 302 for the selected similar documents. In this case, the second area may display cards for the remaining elements that are not displayed in the first area. As another example, when only the second type layout screen 302 appears on the card mode screen 300 without the first type layout screen 301, the second type layout screen 302 may display cards for all the elements.
The second type layout screen 302 may display element areas 173, 174 and 175 corresponding to each of the plurality of elements. The element areas 173, 174 and 175 may be listed in a second direction. Each of the element areas 173, 174 and 175 may display one or more cards of the groups including one or more phrases corresponding to the relevant elements and similar phrases similar to the one or more phrases.
The element area 173 includes a selection phrase area 173a, a card 173b, and a page cursor 173c. The element area 173 may be an area for indicating the cards included in a first card group corresponding to a first element. The element area 173 may display the card 173b corresponding to the first element. The selection phrase area 173a and the card 173b are substantially the same as the selection phrase area 161 and the card 164a described with reference to
The third type layout screen 303 is used to display a card corresponding to one element for a plurality of similar documents. In case the area shown in
Similar to what was described with reference to
The third type layout screen 303 may display document areas 183, 184 and 185 corresponding to a plurality of documents, respectively. The document areas 183, 184 and 185 may be listed in a second direction. Each of the document areas 183, 184 and 185 may display one card including one or more phrases corresponding to the relevant elements and similar phrases similar to the one or more phrases. The documents indicated by each of the document areas 183, 184 and 185 may be the same. The document areas 183, 184 and 185 may be listed according to similarity between each card included in the document areas 183, 184 and 185 and the first phrase.
The document area 183 may display a document information area 183a and a card 183b. Since the document areas 183, 184 and 185 have substantially the same element, the description of the remaining document areas 184 and 185 is omitted.
The fourth type layout screen 304 is used to display cards corresponding to a plurality of elements for a plurality of similar documents. The fourth type layout screen 304 may display element areas 192, 193 and 194 corresponding to the plurality of elements, respectively. The element areas 192, 193 and 194 may be listed in a second direction. Each of the element areas 192, 193 and 194 may display one or more cards of card groups including one or more phrases corresponding to the relevant elements for the plurality similar documents and similar phrases similar to the one or more phrases.
The element area 192 includes a selection phrase area 192a, a card 192b, and a page cursor 192c. In this case, the card 192b may also display information on the similar documents including the card. The element area 192 may be an area for displaying cards included in a first card group corresponding to a first element. The element area 192 may display the card 192b corresponding to the first element. The cards displayed in the element area 192 may not be cards included in one similar document, but may be cards included in a plurality of similar documents. The fourth type layout screen 304 may display another card included in the first card group instead of the card 192b depending on a user's operation (e.g., selecting the page cursor 192c, touching the screen, scrolling, etc.). That is, depending on the user's operation, the card displayed in the element area 192 may be replaced with another card. The number of cards displayed in the element area 192 simultaneously is not limited to one, and the element area 192 may display one or more cards simultaneously. Since the element areas 192, 193 and 194 have substantially the same element, the description of the remaining element areas 193 and 194 is omitted.
Similar to what was described with reference to
For example, if a first area represents the fourth type layout screen 304, a second area may represent the first type layout screen 301. In this case, the first area may display a card corresponding to one element for the entire similar documents. If the first area includes a plurality of sub-areas, each of the sub-areas may display cards corresponding to different elements. The second area may display a card corresponding to one element for the selected similar document among the entire similar documents. As another example, if the first area represents the fourth type layout screen 304, the second area may display the similar document list. In addition, the first area may represent the fourth type layout screen 304, and the remaining areas may configured of combinations described with reference to
In the areas and sub-areas described with reference to
Referring to
In this case, the similar phrase areas for each document 420 and 430 may be displayed so as to be distinguished by each corresponding element (for example, element 1, element 2, etc. included in the element area 410). Specifically, each of the similar phrase areas for each document 420 and 430 may display at least one card corresponding to the element 1 and at least one card corresponding to the element 2 so as to be distinguished from each other. That is, the similar phrase areas for each document 420 and 430 corresponding to the element 1 may be listed in the same row as each other, and may be listed in a different row from the similar phrase areas for each document 420 and 430 corresponding to the element 2. In addition, first cards included in the similar phrase area 420 may be listed in the same column as each other, and may be listed in a different column from second cards included in the similar phrase area 430.
The element area 410 and the similar phrase areas for each document 420 and 430 may be listed in a first direction. Sub-groups included in each of the element area 410 and the similar phrase areas for each document 420 and 430 may be listed in a second direction. The first direction and the second direction may be perpendicular to each other. For example, if the first direction is a horizontal direction, the second direction may be a vertical direction.
The document search method may display elements of the target invention on the element area 410. The elements of the target invention may be listed on the element area 410 in the second direction.
According to an embodiment of the present invention, a first similar document and a second similar document included in the claim chart screen 400 may be similar documents stored by the user on the search result screen 200 as shown in
According to an embodiment of the present invention, cards displayed on the similar phrase areas for each document 420 and 430 may be similar phrases selected as being similar by the user among the similar selection items on the search result screen 200 as shown in
The similar phrases displayed on the similar phrase areas for each document 420 and 430 may be divided by each similar element and listed in the second direction. That is, the document search method may label at least one similar phrase on the search result screen 200 according to the user's instructions, and may display at least one labeled card stored according to selection of the claim chart. Herein, the labeling may be made by allowing the user to click or select a selection item (e.g., a button, etc.) provided along with each of the cards included in the search result screen 200.
Although
Additionally, according to an embodiment of the present invention, the document search method may generate and provide contents of the claim chart screen 400 of
According to an embodiment of the present invention, in a document providing method for providing similar documents that disclose inventions similar to a target invention, the method is performed by at least one processor and comprises the steps of: searching for the similar documents based on at least one phrase input by a user; and listing first cards including at least one of a first phrase or a first similar phrase similar to the first phrase from one document of the searched similar documents, wherein the first phrase is a phrase corresponding to a first element among the at least one phrase input by the user.
According to an embodiment of the present invention, the method that is performed by at least one processor may provide a claim chart providing method comprising the steps of: searching for similar documents based on at least one phrase input by a user; and displaying first cards including at least one of a first phrase or a first similar phrase similar to the first phrase from the searched similar documents, wherein the first phrase is a phrase corresponding to a first element among the at least one phrase input by the user. The method may further comprise a step of displaying similarities and/or differences between the first card and the first element.
The document search method of the present invention may be a method that is performed by at least one processor, and searches a database for similar documents disclosing similar inventions similar to a target invention. The document search method of the present invention may comprise the steps of: generating a similar document list regarding similar documents determined to be similar to the target invention among candidate documents, based on document similarity between the target invention and the candidate documents, and generating first phrase similarity information on first phrases determined to be similar to a first element among the phrases, based on phrase similarity between the first element among the elements of the target invention and the phrases included in the similar documents. In this case, each of the first phrases may consist of one or more words.
The step of generating the similar document list may include adding to the similar document list similar documents among the candidate documents whose document similarity with the target invention is greater than or equal to a threshold document similarity.
The step of generating the first phrase similarity information may include generating a first phrase similarity information on the first phrases among the phrases whose phrase similarity with the first element is greater than or equal to a threshold phrase similarity.
The document search method of the present invention may further comprise a step of acquiring information on the target invention. In case information on the target invention is input in the form of a plurality of phrases, the plurality of input phrases may correspond one-to-one with the elements of the target invention.
The document search method of the present invention may further comprise a step of acquiring information on the target invention. In case information on the target invention is information on a patent document describing the target invention, the elements of the target invention may be expressed by one of the claims of the patent document.
The elements of the target invention and the phrases constituting one of the claims may correspond one-to-one.
The document search method of the present invention may further comprise a first listing step of listing a similar document list in a document area, and listing in a phrase area a 1st-1 phrase belonging to a first similar document among the first phrases. The first similar document may be one of the similar documents belonging to the similar document list.
The first listing step may include listing the 1st-1 phrase and the 1st-2 phrase in the phrase area, wherein the 1st-2 phrase is a phrase belonging to the first similar document among the second phrases determined to be similar to the second element among the elements of the target invention.
The document search method of the present invention may further comprise a second listing step of listing in the phrase area the 2nd-1 phrase belonging to the second similar document among the first phrases instead of the 1st-1 phrase when a user's input of selecting the second similar document from the similar document list listed in the document area is obtained.
The document search method of the present invention may further comprise a third listing step of listing in a matching area a portion including the 1st-1 phrase in the first similar document when a user's input of selecting the 1st-1 phrase listed in the first phrase area is obtained.
The third listing step may list similar documents included in the similar document list on the document area in a first direction, and the document area, the matching area, and the phrase area may be listed in a second direction perpendicular to the first direction.
The document search method of the present invention may further comprise a fourth listing step of rearranging similar documents so that the similar documents listed on the document area satisfy a listing criterion when a user's input for selecting the listing criterion for the similar documents is obtained.
The document search method of the present invention may further comprise a fifth listing step of filtering similar documents so that the similar documents listed on the document area satisfy a scope criterion when a user's input for selecting the scope criterion for the similar documents is obtained.
The first listing step may list similarity between the first element and the 1st-1 phrase on the phrase area so that it is adjacent to the 1st-1 phrase listed on the phrase area.
The first listing step may list similarity between each of the elements of the target invention and the first similar document on the document area so that it is adjacent to the first similar document listed on the document area.
The document search method of the present invention may further comprise a sixth listing step of listing on the final area one or more of final first phrases selected by the user among the first phrases.
The step of generating the similar document list may calculate each of the document similarities between the target invention and the candidate documents based on at least one of an average value of phrase similarity of at least one phrase included in each of the candidate documents, a maximum value of phrase similarity of at least one phrase, a maximum value of the number of frequency of at least one phrase, an average value of the number of frequency, and a minimum value of the number of frequency.
The document search method of the present invention may be a method that is performed by at least one processor and searches a database for similar documents describing similar inventions similar to a target invention. The document search method of the present invention may comprise the steps of: generating information on similar documents determined to be similar to the target invention among the candidate documents based on document similarity between the target invention and the candidate documents; and generating information on a table in which similar document blocks corresponding to the similar documents respectively are listed in a first direction, and final similar phrases determined to be similar to the elements of the target invention among similar phrases included in the similar documents corresponding to the similar document blocks respectively are listed in a second direction. Each of the phrases may consist of one or more words.
The document search method of the present invention may further comprise a step of listing the similar documents in a document area and listing the similar phrases in a phrase area on a search result screen.
According to the document search method of the present invention, the step of generating information on the table may further include the steps of: obtaining information on final similar phrases selected as similar by a user among the similar phrases listed in the phrase area on the search result screen, listing similar document blocks on a global claim chart screen in a first direction, and listing the final similar phrases by dividing them into each of similar elements in a second direction.
The document search system 2000 illustrated in
The communication unit 2100 may include one or more components that allow the document search system 2000 to communicate with an external electronic device. The communication unit 2100 may include a short-range wireless communication unit (not shown), a mobile communication unit (not shown), or a broadcast receiving unit (not shown). The short-range wireless communication unit may include a Bluetooth communication unit, a Bluetooth low energy (BLE) communication unit, a near field communication unit, a WLAN (Wi-Fi) communication unit, a Zigbee communication unit, an infrared (IrDA, infrared Data Association) communication unit, a WFD (Wi-Fi Direct) communication unit, a UWB (Ultra Wideband) communication unit, an Ant+ communication unit, etc., but is not limited thereto. The mobile communication unit transmits and receives a wireless signal with at least one of a base station, an external terminal, and a server on a mobile communication network. Herein, the wireless signal may include various types of data according to transmission and reception of a voice call signal, a video call signal, or text/multimedia message. The broadcast receiving unit receives a broadcast signal and/or broadcast-related information from the outside through a broadcast channel. The broadcast channel may include a satellite channel or a terrestrial channel. According to an embodiment, the communication unit 2100 may not include the broadcast receiving unit. The document search system 2000 may also receive order logs for existing items, attribute information for existing items, and attribute information for new items from an external device through the communication unit 2100.
The memory 2200 may store a program for processing and controlling the processor 2300, and may also store data input to or output from the document search system 2000. In addition, the memory 2200 may store information on candidate documents searched by the document search system 2000.
The memory 2200 may include at least one type of storage medium among a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (for example, an SD or XD memory, etc.), a RAM (Random-Access Memory), a SRAM (Static Random-Access Memory), a ROM (Read-Only Memory), an EEPROM (Electrically Erasable Programmable Read-Only Memory), a PROM (Programmable Read-Only Memory), a magnetic memory, a magnetic disk, and an optical disk.
The processor 2300 may typically control overall operation of the document search system 2000. The processor 2300 may perform the document search method described with reference to
The above-described contents are concrete embodiments for carrying out the present invention. The present invention will include not only the above-described embodiments, but also embodiments that may be simply designed and changed or be easily changed. In addition, the present invention will also include technologies that can be easily modified and implemented using the embodiments. Therefore, the scope of the present invention should not be limited to the above-described embodiments, but should be determined by the claims described below as well as the equivalents to the claims of the present invention.
The present invention may generate a document embedding that represents each of a target invention and patent documents by considering the structure of the target invention and the patent documents which are subjected to comparison, and may quickly obtain a similar document list based on the document embedding. The present invention may effectively compare similarity between the target invention and the patent documents by comparing sentence embeddings corresponding to the detailed contents of the target invention and the similar documents.
Referring to
Also, in the sentence embedding generation 1114, the document search method is configured to extract sentences and drawings to be used for the comparison from each of the target invention and similar patent documents, and convert the sentences and drawings into data. In the similarity determination 1115, the document search method is configured to determine a secondary similarity, which is the similarity of a sentence level, based on the sentence embeddings of the target invention and the patent document. Herein, the primary similarity is used to determine whether or not it is a similar document, and the secondary similarity is used as the similarity provided to a user.
That is, the search system based on the document comparison according to the embodiment of the present invention uses a document model and/or a sentence model, and can search for similar patent documents by documenting the input words and/or sentences. Furthermore, the search result according to the present invention can be utilized to list the similar documents for each similarity by comparing the sentences of the searched patent documents with the input sentences (including words).
Referring to
The sentence-document hierarchical transformer block 1121 receives primary embeddings 1120a including sentence embedding and/or drawing embedding as tokens, and generates embeddings for each comparison element using transformers of a hierarchical structure. Specifically, the sentence-document hierarchical transformer block 1121 applies a sentence transformer and/or an image feature extractor for each comparison element to the input tokens, and applies a document transformer to the results to generate a secondary embedding.
In other words, the sentence-document hierarchical transformer block 1121 tracks the relationship between the tokens included in the primary embeddings for each comparison element, and generates individual embeddings converted to express context and meaning. Then, the sentence-document hierarchical transformer block 1121 combines the individual embeddings, and generates an integrated embedding for the entire new document through an encoder and a decoder for the combined embeddings.
Referring to
In this case, although not shown in
In this case, the block for generating the primary embeddings may understand all languages as a single symbol by applying a byte-level tokenization technology of the patent domain. Accordingly, it is possible to grasp the same meaning without distinction of languages. In addition, in order to support data embedding of non-standard descriptions, the block for generating the primary embeddings may generate a knowledge graph based on LLM (Large Language Models) and utilize CAS (Chemical Abstracts Service), API (Application Programming Interface). Accordingly, it is possible to understand chemical formulas or DNA (Deoxyribo Nucleic Acid) sequences.
The document pooling block 1122 combines secondary embeddings and generates a document embedding 1120b by reducing the dimension. Through this, the document embedding 1120b representing the document can be obtained so as to compare the corresponding document.
In order to utilize the document embedding model as shown in
The sentence transformer block 1131 receives primary embeddings 1130a including sentence embeddings and/or drawing embeddings as tokens, and generates embeddings for each comparison element using transformers of a hierarchical structure. Specifically, the sentence transformer block 1131 applies a sentence transformer and/or an image feature extractor to the input tokens for each comparison element, and applies a document transformer to the results, thereby generating secondary embeddings for each comparison element. In other words, the sentence transformer block 1131 tracks relationships between tokens included in the primary embeddings for each comparison element, and generates individual embeddings that are converted to express context and meaning.
In this case, although not shown in
The sentence-by-sentence pooling block 1132 generates sentence embeddings 1130b by combining the secondary embeddings 1120b for each comparison element. Through this, the sentence embeddings 1130b of the document can be obtained so as to compare with the target invention.
The number of primary embeddings 1130a of
Referring to
Herein, pairs of patent documents included in the learning data and the labels indicating their similarity may be generated from data related to examination, trial, and/or judgment of domestic and/or foreign patent agencies. For example, a notice of rejection issued from the Korean Intellectual Property Office, a trial decision issued from the Korean Patent Tribunal, and an invalidation decision issued from a U.S. court may be utilized. Since the documents of the relevant patent agencies include judgment on similarity or dissimilarity between patent documents of the patent agencies, the document search method may be configured to generate learning data from the relevant documents. Specifically, information including identification information of patent documents (e.g., application number, publication number, registration number, etc.), issuing agency, and similarity may be generated based on the documents issued by the patent agencies, and the learning data may be generated based on this information.
In an embodiment of
Contrary to this, if the document embedding model receives data of two patent documents and provides similarity as output data, the label may be information on similarity obtained through documents issued by the patent agency. For example, if the two patent documents are similar, the label is set to a probability value corresponding to similarity as the similarity of the two documents, and the loss function can be calculated. As another example, if the two documents are dissimilar, the label is set to a probability value corresponding to dissimilarity, and the loss function can be calculated.
In operation S110, the document search method is configured to document information on a user's input. That is, the document search method is configured to document words and/or sentences, etc. input by the user to search for similar documents. Specifically, the document search method may be configured to document words and/or sentences, etc. by repeating/listing them according to a predetermined algorithm. Accordingly, comparison elements corresponding to information on the user's input may be generated.
In operation S120, the document search method is configured to extract the comparison elements from candidate documents. For example, the document search method may be configured to extract the comparison elements such as abstracts and claims, predetermined from patent documents.
In operation S130, the document search method is configured to generate and compare document embeddings. The document search method is configured to generate a first document embedding from the comparison elements of the target invention using the document embedding model as shown
In operation S210, the document search method is configured to extract the comparison elements from the patent documents selected by the user. For example, the document search method may be configured to extract the comparison elements such as abstracts and claims that are predetermined from the patent documents. In this case, the invention described in the patent documents selected by the user becomes the target invention. That is, the document search method may be configured to generate the comparison elements of the target invention by extracting a portion of the selected patent documents as the comparison elements.
In operation S220, the document search method is configured to extract the comparison elements from the candidate documents. For example, the document search method may be configured to extract the comparison elements such as abstracts and claims that are predetermined from the patent documents.
In operation S230, the document search method is configured to generate and compare document embeddings. The document search method is configured to generate a first document embedding from the comparison elements of the selected patent documents using the document embedding model as shown in
Through the operations as shown in
In operation S310, the document search method is configured to determine similar documents. The similar documents may be determined based on words and/or sentences input by the user or based on patent documents selected by the user. As a specific example, the similar documents may be determined using the primary similarity according to the procedure described with reference to
In operation S320, the document search method is configured to determine similarity of the similar documents. In other words, the document search method may be configured to determine a secondary similarity of each of the similar documents based on words and/or sentences input by the user or based on the patent documents selected by the user. For example, the document search method may be configured to determine the secondary similarity of the similar documents by comparing sentences of the similar documents with words and/or sentences input by the user. As another example, the document search method may be configured to determine the secondary similarity of the similar documents by comparing sentences of the similar documents with at least one keyword of the patent documents selected by the user. Herein, the keyword is a word that has a large number of repetition in the patent documents selected by the user, and for example, may include at least one that appears above a threshold number or a threshold ratio.
In operation S330, the document search method is configured to display similarity of the similar documents. In addition, the document search method may be configured to further display information on the similar documents. That is, the document search method may be configured to display a list of the similar documents, and display a secondary similarity of each similar document together.
The secondary similarity may be determined and displayed as shown in
In operation S410, the document search method is configured to determine a reference sentence embedding. The reference sentence embedding may include a sentence embedding representing words and/or sentences input by the user, or a sentence embedding representing keywords of patent documents selected by the user. That is, the document search method may be configured to generate an embedding corresponding to a result of concatenating the input words and/or sentences into one sentence, or a result of concatenating at least one keyword.
In operation S420, the document search method is configured to determine comparative sentence embeddings of similar documents. The comparative sentence embeddings may include sentence embeddings for the sentences included in a range used for determination of a secondary similarity in the similar documents. Herein, the range used for determination of the secondary similarity may be the entirety or a portion of the patent documents. In case of a portion, the range used for determination of the secondary similarity may be the same as, partially the same as, or different from the range used for generating the document embedding.
In operation S430, the document search method is configured to determine the similarity based on the reference sentence embedding and the comparison sentence embeddings. According to an embodiment of the present invention, the document search method may be configured to calculate a distance between the reference sentence embedding and the comparison sentence embeddings, respectively, and determine the secondary similarity based on the comparison sentence embeddings having a distance less than or equal to a threshold value (for example, checking a ratio of the comparison sentence embeddings having a distance less than or equal to the threshold value). Alternatively, the document search method may be configured to calculate the distance between the reference sentence embedding and the comparison sentence embeddings, respectively, calculate an average of the distance values, and determine the secondary similarity based on the average value of the distances (for example, normalizing the reciprocal of the average value of the distances).
The present invention relates to a document search method for determining the similarity between a target invention and a candidate document, the method being performed by at least one processor. The document search method of the present invention may comprise the steps of: generating first comparison elements of the target invention, extracting second comparison elements from the candidate document, generating a first document embedding of the target invention and a second document embedding of the candidate document, based on the first comparison elements and the second comparison elements, and determining a primary similarity for determining whether the target invention and the candidate document are similar based on the first document embedding and the second document embedding.
The first document embedding and the second document embedding may be generated using a document embedding model.
The document embedding model may receive first embeddings including tokens generated from sentence-typed comparison elements and tokens generated from drawing-typed comparison elements, and may generate a document embedding representing the corresponding document.
The document embedding model may include a hierarchical transformer block which has a sentence transformer or an image feature extractor applied to the first embeddings, or a document transformer applied to the combination of outputs of the sentence transformer or the image feature extractor.
The document embedding model may further include a generating block which generates the document embedding by performing at least one polling operation on the output of the hierarchical transformer block in the document embedding model.
The document embedding model may be trained using learning data generated based on information on the similarity between patent documents and patent documents extracted from the documents issued by a patent agency.
A label of the learning data may include whether or not the patent documents are similar to each other as identified from the documents issued by the patent agency.
The label of the learning data may be set to a median value of the document embeddings of the two documents when the patent documents are identified to be similar from the documents issued by the patent agency.
The label of the learning data may be set to a value separated by a certain distance in a direction opposite to the direction facing each other in each of the document embeddings of the two documents when the patent documents are identified to be dissimilar from the documents issued by the patent agency.
The step of generating the first comparison elements may include a step of documenting at least one of words or sentences input by the user by repeating or listing them.
The document search method of the present invention may further comprise the steps of determining a secondary similarity between the target invention and the candidate document, and providing the secondary similarity to the user together with information on the candidate document.
The step of determining the secondary similarity may include the steps of determining a reference sentence embedding representing the target invention, determining comparison sentence embeddings corresponding to the sentences included in at least a portion of the candidate document, and determining the secondary similarity based on distances between the reference sentence embedding and the comparison sentence embeddings, respectively.
The present invention relates to a document search device for determining the similarity between a target invention and a candidate document. The document search device of the present invention may comprise a memory and a processor. The processor may control to generate first comparison elements of the target invention, extract second comparison elements from the candidate document, generate a first document embedding of the target invention and a second document embedding of the candidate document, based on the first comparison elements and the second comparison elements, and determine a primary similarity for determining whether the target invention and the candidate document are similar based on the first document embedding and the second document embedding.
The document search device 3000 shown in
The communication unit 3100 may include one or more components that allow the document search device 3000 to communicate with an external electronic device. The communication unit 3100 may include a short-range wireless communication unit (not shown), a mobile communication unit (not shown), or a broadcast receiving unit (not shown). The short-range wireless communication unit may include a Bluetooth communication unit, a Bluetooth low energy (BLE) communication unit, a near field communication unit, a WLAN (Wi-Fi) communication unit, a Zigbee communication unit, an infrared (IrDA, infrared Data Association) communication unit, a WFD (Wi-Fi Direct) communication unit, a UWB (UltraWideband) communication unit, an Ant+ communication unit, etc., but is not limited thereto. The mobile communication unit transmits and receives a wireless signal with at least one of a base station, an external terminal, and a server on a mobile communication network. Herein, the wireless signal may include various types of data according to transmission and reception of a voice call signal, a video call signal, or text/multimedia message. The broadcast receiving unit receives a broadcast signal and/or broadcast-related information from the outside through a broadcast channel. The broadcast channel may include a satellite channel or a terrestrial channel. According to an embodiment, the communication unit 3100 may not include the broadcast receiving unit. The document search device 3000 may also receive order logs for existing items, attribute information for existing items, and attribute information for new items from an external device through the communication unit 3100.
The memory 3200 may store a program for processing and controlling the processor 3300, and may also store data input to or output from the document search device 3000. In addition, the memory 3200 may store information on candidate documents searched by the document search device 3000.
The memory 3200 may include at least one type of storage medium among a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (for example, an SD or XD memory, etc.), a RAM (Random-Access Memory), a SRAM (Static Random-Access Memory), a ROM (Read-Only Memory), an EEPROM (Electrically Erasable Programmable Read-Only Memory), a PROM (Programmable Read-Only Memory), a magnetic memory, a magnetic disk, and an optical disk.
The processor 3300 may typically control overall operation of the document search device 3000. The processor 3300 may perform the document search method described with reference to
The above-described contents are concrete embodiments for carrying out the present invention. The present invention will include not only the embodiments described above, but also embodiments that may be simply designed and changed or be easily changed. In addition, the present invention will also include technologies that can be easily modified and implemented using the embodiments. Therefore, the scope of the present invention should not be limited to the embodiments described above, but should be determined by the claims described below as well as the equivalents of the claims of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
10-2023-0142478 | Oct 2023 | KR | national |
10-2023-0142479 | Oct 2023 | KR | national |
10-2023-0195362 | Dec 2023 | KR | national |
10-2024-0137855 | Oct 2024 | KR | national |