1. Technical Field
Embodiments of the present disclosure relate generally to document analysis technologies, and particularly to a system and method for analyzing office actions of patent applications.
2. Description of Related Art
Patent offices, such as the United States Patent and Trademark Office (USPTO), European Patent Office (EPO), State Intellectual Property Office of People's Republic of China (SIPO), and Japanese Patent Office (JPO), may send one or more office actions during the examination process of a patent application. The office action is a document written by a patent examiner, using a template, in response to review of the patent application by an examiner. When a patent applicant receives an office action, the office action must be processed to obtain patent information, such as an application number, filing date, fee payment. The office action may be manually processed, or automatically processed using software programs. However, the software programs may occur some unexpected errors when the template of the office action is changed. Therefore, a more efficient system and method for analyzing office actions of patent applications is desired.
The disclosure, including the accompanying drawings, is illustrated by way of example and not by way of limitation. It should be noted that references to “an” or “one” embodiment in this disclosure are not necessarily to the same embodiment, and such references mean at least one.
The parse module 100 parses an office action using predetermined regular expressions (RE) stored in the storage system 11, to obtain patent information of a patent application of the office action, when the office action is downloaded from the patent office website 2. In the embodiment, the regular expressions provide a concise and flexible means for matching strings of text of the office action, such as particular characters, words, or patterns of characters. A regular expression is written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification. The patent information include, but is not limited to, an application number, a filing date, a publish number, a publish date, a patent number, and fee payment of the patent application, for example.
In one embodiment, the parse module 100 may parse the office action by the following steps. First, the parse module 100 converts the office action into a predefined format document, such as a JPG document, or a TIF document. Second, the parse module 100 extracts characters from the converted document using a character recognition method, such as an optical character recognition (OCR) method. Third, the parse module 100 obtains the patent information from the extracted characters using the regular expressions. In the embodiment, the characters may be composed of numbers, letters, and others special characters of the office action.
The parse module 100 determines whether the office action is parsed successfully using the regular expressions. In one embodiment, if the desired patent information, such as the application number of the patent number, is obtained from the office action using the regular expressions, the parse module 100 determines the office action is successfully parsed. If the desired patent information is not obtained, the parse module 100 determines the office action fails to be parsed. The obtained patent information may be sent to the client computer 4 through the intranet 5.
The extracted module 101 extracts the patent information of the patent application from the office action according to predetermined keywords of the patent information. In one example, the extracted module 101 may search the extracted characters using keywords “APPLICATION NO.” of the application number of the patent application, and extracts numbers after the keywords “APPLICATION NO.” as the application number.
The generation module 102 generates a regular expression of the extracted patent information according to determined rules. In one embodiment, the determined rules include, but are not limited to, each number of the extracted patent information is replaced by “\d”, each space character of the extracted patent information is replaced by “\s”, and characters “a, b, . . . , z” and “A, B, . . . , Z” are replaced by “[A-Za-z]”. For example, if the extracted patent information is the application number “12/547,517”, the generated regular expression is “\d\d/\d\d\d,\d\d\d”, which may be also described as “\d{2}/\d{3},\d{3}”.
The correction module 103 sends the generated regular expression to the client computer 4 to confirm whether the generated regular expression is correct. In one embodiment, the generated regular expression may be displayed on a display screen of the client computer 4, and confirmed by a user of the client computer when the regular expression is displayed. If the generated regular expression is not correct, the generated regular expression is corrected by the user using the client computer 4. And the corrected regular expression is obtained by the correction module 103 from the client computer 4.
The execution module 104 stores the generated regular expression or the corrected regular expression into the storage system 11, so that the patent information of other office actions can be obtained using the generated/corrected regular expression.
In block S10, the parse module 100 parses an office action of a patent application using predetermined regular expressions (RE) stored in the storage system 11, when the office action is downloaded from the patent office website 2. The patent information includes, but is not limited to an application number, a filing date, a publish number, a publish date, a patent number, and fee payment of the patent application, for example.
In block S11, the parse module 100 determines whether the office action is parsed successfully using the regular expressions. If the office action is successfully parsed, the procedure ends. If the office action fails to be parsed, block S12 is implemented. In one embodiment, if the desired patent information, such as the application number of the patent number, is obtained from the office action using the regular expressions, the parse module 100 determines the office action is successfully parsed. If the desired patent information is not obtained, the parse module 100 determines the office action fails to be parsed.
In block S12, the extracted module 101 extracts patent information of the patent application from the office action according to predetermined keywords of the patent information. In one example, the extracted module 101 may search the extracted characters using keywords “APPLICATION NO.” of the application number of the patent application, and extracts numbers after the keywords “APPLICATION NO.” as the application number.
In block S13, the generation module 102 generates a regular expression of the extracted patent information according to determined rules. In one embodiment, the determined includes, but not limited to, each number of the extracted patent information is replaced by “\d”, each space character of the extracted patent information is replaced by “\s”, and characters “a, b, . . . , z” and “A, B, . . . , Z” are replaced by “[A-Za-z]”.
In block S14, the correction module 103 sends the generated regular expression to the client computer 4 to confirm whether the generated regular expression is correct. If the generated regular expression is not correct, block S15 is implemented. If the generated regular expression is correct, block S16 is implemented.
In block S15, the generated regular expression is corrected by the user using the client computer 4, and the corrected regular expression is obtained by the correction module 103.
In block S16, the execution module 104 stores the generated regular expression or the corrected regular expression into the storage system 11, so the patent information of other office actions can be obtained using the generated/corrected regular expression.
All of the processes described above may be embodied in, and fully automated via, functional code modules executed by one or more general purpose computing devices or processors. The code modules may be stored in any type of non-transitory computer-readable medium or other storage device. Some or all of the methods may alternatively be embodied in specialized hardware. Depending on the embodiment, the non-transitory computer-readable medium may be a hard disk drive, a compact disc, a digital video disc, a tape drive or other suitable storage medium.
Although certain embodiments of the present disclosure have been specifically described, the present disclosure is not to be construed as being limited thereto. Various changes or modifications may be made to the present disclosure without departing from the scope and spirit of the present disclosure.
Number | Date | Country | Kind |
---|---|---|---|
201010596747.X | Dec 2010 | CN | national |