The present invention relates to systems and methods for collecting, compiling and analyzing data from an online database. The invention has particular use in environments where an online system has restrictions or other impediments to automated/robotic access of electronic information.
The applications referenced above describe a variety of embodiments for accessing and assessing online electronic information, including from the USPTO PAIR website. As these disclosures note, the PAIR system as currently constituted and presented, does not contain any accessible electronic database to permit the general public to perform conventional search, inspection operations for cases. For instance, in its current incarnation the user is required to know in advance and specify a specific case number (which may be difficult or impossible to locate) before they can see the data associated with such case, and even then the data is not organized in a fashion that makes it easy to review. As an example, the user cannot search any of the underlying communications by the Examiners or applicants to understand or follow what is transpiring in the application. Other than the use of specific case numbers, the user is not permitted to search or identify information of interest by subject matter, inventor, Examiner, or any other convenient parameter.
The PAIR system has been in existence for several years and yet has not been improved upon despite its obvious limitations. In fact the PTO has made every effort to make the information difficult to obtain through a variety of access limiting mechanisms, including using CAPTCHA codes and timeout mechanisms. A US government document dated Sep. 24, 2009 published by the Office of the Chief Information Officer titled “Public Meeting on Data Dissemination—Request for Information” confirms (see page 5) that the US PTO online search systems are designed for single queries, and are not designed for a large amount of traffic. In the RFI attached to this document the authors confirm that the PTO has no present solution to this problem, and they were actively seeking assistance from third parties to research the problem and provide a solution within the next 6 years. Moreover the authors confirm that the USPTO system is designed to prevent machine access to the PAIR data through a CAPTCHA system.
The general entry screens available through PAIR are shown in
Finally, as seen in
Consequently, persons skilled in the art have been actively deterred if not discouraged from accessing and compiling any of this USPTO data. In turn this means that a large amount of very useful data is kept effectively hidden from the general public, which is undesirable and does not advance the purpose of the patent laws. The problem is most acute in cases of reexaminations, which are a form of post-issuance patent challenge. Since reexamination cases are frequently associated with ongoing litigation, the financial stakes are often high and the public interest factor much larger. Yet as with un-issued cases the public is stuck using the very limited PAIR system for obtaining information about ongoing cases. Other examples of organizational processes which do not lend themselves to public inspection and review are well-known, including for example the status of ongoing immigration applications.
Because the data is effectively inaccessible, it is difficult to predict basic information about cases, such as how long they will last, what strategies work or do not work, etc. The public, again, is left with mostly indirect guesswork and gross average statistics published by the PTO itself.
Clearly, there is a need for systems and methods to improve the limitations in the current PAIR (and similar) systems and existing approaches might attempt to do so, but are not sufficient. This need is increasing as Congress has only recently enacted even more post and pre-grant challenge mechanisms for patents and applications in the America Invents Act (AIA) (which provisions are incorporated by reference herein). To fully avail themselves of these new procedures, the public requires a data discovery, review and presentation tool which provides greater transparency and oversight of USPTO (and similar governmental agency) proceedings.
Objects of the present invention, therefore, are to provide an improved system and method that overcomes the aforementioned limitations of the prior art.
As will be explained in more detail below, an organization analysis system permits third parties not otherwise authorized or affiliated with the target organization to better understand, characterize and predict the behavior of such entities, and the outcome of events associated with the same. Despite the apparent deterrents and discouragements Applicants have overcome the technological hurdles imposed by the USPTO, and have solved the long felt need to secure and maintain public data (including aggregate data) that the public is entitled to access but hitherto has been effectively stymied. Thus in the preferred embodiment of a United States Patent Office organization, the system preferably permits users to:
Other functions and capabilities are also described below. Again while the present preferred embodiment uses the example of the United States Patent Office (UPSTO) and its personnel (examiners), it will be apparent to those skilled in the art that any number of target organizations with similar characteristics could be examined and monitored in accordance with the present teachings. For example another application could serve persons interested in studying the behavior and predicted outcome of trademark applications, immigration applications, SEC filings, etc. to help identify optimal documentation formatting, routing, etc., to increase the chances of success for a case. In other embodiments the invention could be used to mine and extract information from databases maintained by similar governmental organizations. For instance the PACER database is used by the United States Judicial branch to maintain events and records for ongoing litigations. Patent litigations could be identified and monitored in the same manner as described herein as events to be tracked. Since patent litigations are a rich source of materials for judicial opinions, declarations, technical subject matter, substantive pleadings, etc., all of these documents could be extracted and stored in a database as described below. The events for litigation cases could then be tracked in exactly the same manner as other events noted below. This would allow for a richer database of materials than that normally offered by traditional reporting services, which tend to focus solely on precedential opinions but not other content that may be of equal or greater interest to the community as as a whole.
In the examples below, specific organizations might be described, but unless otherwise indicated, it should be apparent that in general an “organization” comprises some sort of entity that has personnel (including at least some human participants) who review submissions (typically in the form of documentation) in accordance with a set of procedures (which be related to substance, format, time etc. and can vary). For each submission the organization typically generates a number of measurable events, which may be reflective of the status of the submission, a determination on the merits of the submission, etc. An “event” therefore may be as simple as providing an electronic flag/indication in a publicly accessible file/database that a submission has reached a certain status or treatment within the organization.
The submission 107 is then processed by order/processing support logic 112, which again can be a combination of human and/or machine handlers. In some environments the submission may be examined manually by personnel 114, and classified by them in accordance with rules 116 and/or procedures 118 to put into an electronic docket (not shown) for later uptake by other such personnel. In the case of a reexamination request, the reexamination documents would be scanned into an electronic system, and a variety of classification data would be captured or generated and stored for inclusion as part of an organizational record. For example, a control number can be assigned to the submission (reexamination request), along with other identifying information such as the name of the requester, the number of the patent for which reexamination is requested, the name of the representative for the requester, the inventor name, the title of any associated litigation, etc., etc.
As the personnel 114 (or an automated artificial intelligence agent) review and process submission 107, a variety of events (122, 124) are generated by event logic processor 120, which can be in the form of a combination of computer hardware and software control modules. The event logic processor 120 can reference the rules/regulations 116, procedures 118, etc., and determine a schedule of when personnel 114 are required to act on the submission. For example, an examiner may be required to generate an initial written report within a certain number of days after the submission is deemed complete within the rules and procedures.
As the personnel 114 act on the submission they (or logic 120) generate a set of n events 122, 124, etc., which can range from substantive to procedural. For instance, the examiner may draft and issue an initial determination that a submission 107 raises a substantial new question (SNQ) of patentability within 90 days of a reexamination request. This information is communicated from the target organizations internal computing system (not shown) to one or more externally accessible databases 126 and websites 128, such as the aforementioned PAIR system described above. At this point the information acted upon and generated by the target organization is available for inspection and review by outside third parties through an electronic network 129, which in preferred approaches is the Internet.
During the process of the submission the target organization may also report out or contact entity 105 through link 109, which, again, may take any number of forms, including physical mail, electronic networks, etc. These communications between organization 115 and entity 105 may also be reported in database 126. At the end of the process organization 130 issues a final output report 130 to entity 105. In the case of a reexamination request this may be in the form of a reexamination certificate, a ruling on an appeal by the Board of Appeals, or some other terminating event which effectively ends participation by organization 115. Again for other contexts it will be apparent that other types of events will be appropriate, and it will be understood that the explanation has been simplified to highlight the important aspects of the invention.
As further seen in
Interface 150 is responsible for interacting with a client browser, and includes one or more routines responsible for presenting a graphical user interface to users embodied in a web page as is typical in a client/server system. An example of a preferred interface is shown in
Returning to
Returning to
Again with reference to
Alert logic 185 is responsible for identifying triggered alerts, classifying and issuing alerts, etc., as seen generally below in
Processing and Prediction Logic 190 includes routines for performing additional analytical operations on the content and substance of the submission and user queries. For example, a user can request an indication of an expected average time period that will elapse between two events associated with a submission 107—such as in the case of a reexamination request, some indication of when to expect an Office Action. This type of calculation module is described in more detail in
Profiling Logic 192 includes routines for performing additional analytical operations on the behavior, characteristics and performance of the personnel 114 associated with entity 115. For example, a user can request an indication of which reexamination cases an Examiner is working on, the state of such cases, and historical information on affirmance/rejection rates and the like. This type of profiling module is described in more detail in
Exceptions Logic 194 processes events which occur outside the normal course of the expected path of progress of a submission. For example in the context of a reexamination proceeding, one or more parties may file a petition seeking specific relief that is outside the parameters of the given rules, such as requesting more time, more pages for a response, etc. As discussed further below in connection with
An additional component 196 for crowd-community sourcing logic can also be employed in a preferred embodiment. The details of this are shown in
Looking at
At step 205 therefore, the user can specify a particular target to be profiled, which, in a preferred embodiment, can be a human examiner, or a some larger logical group of personnel, such as an entire examining group, individuals within an art unit, etc. For example, the target to be studied may be Examiner S. Smith, or Art Unit XXX where he/she works.
In step 210 the target's prior and current cases are identified from information compiled in databases 142 as noted above. In preferred embodiments the totality of the content for the cases is stored, including not only the transactional data (indicating events and dates) but also the actual correspondences (Office Actions), submissions and other materials exchanged with the applicant in a case. The data is preferably text-indexed as well to allow for ease of review and querying.
Step 215 includes an optional load factor calculation for the target. This can be as simple as an identification of the total number of cases being currently handled by the target, to a more advanced analysis which considers a relative number of cases compared to historical norms, other examiners, etc. For example, it is possible to characterize an Examiner's current workload not only by reference to cases, but also a status of such cases. This is because he/she may have a certain number of cases in a state which does not require significant further input at this time by him/her. Stated another way, an Examiner with 10 cases that are completed is identified to have less loading than a second Examiner with half as many cases but more expected actions. In this respect, it is more accurate to calculate expected required actions across an Examiner's cases in a predefined time window to estimate the loading of such individual. Other examples will be apparent to those skilled in the art. It will be understood that the degree of sophistication and detail will be a function of a desired application's level of accuracy and complexity.
The loading factor can be a dynamic variable that affects other calculations as noted below. For example it may be discovered that a particular Examiner's page count or rejection rate varies significantly according to their current loading factor.
At step 220 the individual cases for the target are then analyzed with the results being stored in a reference target/case database 230, which, as alluded to above, may be part of databases 142. The analysis and data stored for each target may take into consideration any number of target specific factors 218, including:
Other types of data can be compiled of course. It will be understood of course that all or some of this data can be precompiled and thus made available very quickly in response to a query. Moreover since most cases proceed very slowly, it is relatively easy to keep up to date on the current behavior/profile of a target. Some of the data is useful for quick reference purposes (i.e., which representatives may have experience/knowledge with a particular Examiner) while other parts of the data are useful for understanding a personality, behavior, work rate, reputation, etc.
At step 240 the outcomes of the cases for the target are identified and classified. The outcome data is preferably stored in the case database 230 as well. The analysis and data stored for each case may take into consideration any number of case specific factors 242, including:
At step 250 the user can be presented with the data in accordance with any desired filter and/or visual preferences. Any of the parameters/factors noted above can be used to filter the profile report for the target in question. For example the user could query which cases an Examiner had been involved with and which reached appeal. Or the user could specify that this set should be further broken down graphically by size (length of pages) of the reexamination request, and so on. The data can be presented in list, tabular, or graphical form depending on the information in question. Some types of reports (such as comparing Examiners or art units by overall reexamination duration) may be plotted more easily in chart form. It should be noted that any number of known techniques can be used to generate the reports and form thereof. Furthermore, while the preferred embodiment uses the example of a patent reexamination and a patent examiner, it will be understood that the invention has wider scope and useage and will be implemented with different objects and personnel in other environments.
As seen in
Note that in some instances the system may permit the users (or some selected subset of profilers) to contribute additional individual scoresheets or personality/profiling data for the targets as seen in step 260. Preferably this data is collected anonymously and with protection for the privacy of the contributors to encourage full and fair disclosures on the personnel of the target entity. For example a scoresheet (not shown) may request information from participants on a scale of 1-10 on several aspects of the profilee, including:
From this data it is possible to compute a reputation or authoritativeness score for the target based on one or more of these factors, either alone or combined with the case disposition data below. For example an Examiner may have a reputation score that reflects an overall average of some subset of the figures above computed across all surveyed persons.
To prevent gaming or distortion of the profiling, the credentials of users may be authenticated as a prerequisite. For example in the case of a patent application or reexamination, the name of the inventor, the serial number, or the representative registration number can be solicited. Contributions can be checked to prevent duplication and other efforts to manipulate the results. The data is preferably encrypted/de-personalized as it is stored in database 230 to avoid tracing of the profile contributions.
Conversely, to see the more detailed personnel profile information (or any of the other data/predictions the system can generate) it may be desirable to limit dissemination of such data to users with a particular status, or to users who have been authenticated, etc. In some embodiments it may be useful to “auction” such profile information to a limited number of users who provide a bid that exceeds some threshold number, or even limit the absolute number of users to some figure so that at any moment in time (and for a defined period) only a limited group has access to the detailed personnel profile information. Access to the information could be controlled on a rolling basis so that with each time cycle the users could participate in a new auction with the result of a different group being qualified to access the data in question. This restricted access may be used, again, for certain specific analyses so that the larger community still has access to the bulk aggregate information of interest. Other examples and variations will be apparent to those skilled in the art.
Also to preserve privacy it may be desirable in some embodiments to only permit review of aggregations of the user contributed profiles, and not individual reviews. Thus, for example, an Examiner may be revealed to have a professionalism score of 8.5 across all contributors. The system preferably also permits users to query/plot individual targets relative to each other so that a user can see at a glance the relative perception/scoring of the Examiner by a community of representatives. The data can be segmented as well to compute different reputation scores depending on the type of surveyer (inventor, representative, etc.). Other applications of this technique for scoring reputations of targets will be apparent to those skilled in the art.
The main purpose of this process, as alluded to above, is to construct and maintain the databases 142 to ensure their currency. As an initial step 305 the system identifies and catalogs every case/application being handled by the target organization. The data acquisition process preferably employs a standard, open source web browser (such as Internet Explorer).
In an alternative embodiment Firefox could be employed instrumented via a plugin mechanism to send a set of data from viewed pages to a data acquisition server (which may be part of databases 142). The injected code navigates a viewed page and any tabs (see
The data acquisition server code also maintains a list of requested data and responds to queries for instructions from the browser plugin with details (e.g. a reexamination control number) of the next data to be acquired. The plugin accomplishes the tasks of navigating the pages, selecting tabs as well as checking boxes and clicking on buttons and links as needed to view and capture and transmit the requested data. When solution of a CAPTCHA or other form of human interactive proof (HIP) is required the plugin can be configured to halt operation until a person provides the solution. If the acquisition server is authorized it is possible that in some instances the CAPTCHA can solved via automated machine logic using known techniques.
The basic modules, components, code, etc. of a data scraper system 1400 which implements a data acquisition (DAC) function are shown in
DAC Manager 1492
DAC Scan Engine 1494
DAC Application Scanner 1496
DAC Scheduler 1498
It will be understood that other functions and elements can be included, and that the present exposition is intended simply to denote the key features of a preferred embodiment.
DAC Manager 1492
DAC Manager 1492 is a set of one or more routines that launches, controls, and reports the status of DAC Scan Engine instances (described below). Preferably it has a web service interface that exposes commands, such as “start scan”, “pause scan,” “resume scan,” etc. Other exposed methods report on the status of a scan.
Through exposed services, DAC Manager 1492 can provide a very flexible interface through which humans or other application programs can control and track the status of scanning activities done in connection with a web portal 126 and associated databases 128. The system is preferably configured so that anyone with a browser can work with DAC Manager 1492. DAC Scheduler 1498 also is configured to trigger scanning activity through a separate DAC Manager's Web service interface (not shown). Any other software program, written on any platform, could also interact with DAC Manager 1492 through exposed services.
The highest level unit of work preferably used by data scraper system 1400 is a “scan batch.” In the case of PAIR data, a scan batch preferably consists of one or more identifiers (Application Numbers, Patent Numbers, Control Numbers, etc.), which can be given a name and a description. The “identifiers in the batch can be an explicit list, or they may be defined by certain characteristics, such as “the next 200 serial numbers for an application prefix (e.g., 95/) greater than the highest number found previously by the DAC.” Batches have further attributes that guide how the data will be obtained. For example, one batch could include instructions to download copies of files, while others might not download files. There are many different types of batches, and DAC Manager 1492 preferably exposes methods to launch each of them. For the sake of history, each batch can also be given a name and a brief description.
When a user or program asks DAC Manager 1492 to start scanning a batch, the DAC Manager 1492 preferably performs the following high level operations:
DAC Scan Engine 1494
DAC Scan Engine 1494 is a set of one or more routines that manages all of the processing activity related to a scan batch. To do so, it preferably:
DAC Application Scanner 1496
DAC Application scanner 1496 is a set of one or more routines that actually visit and interact with the web portal in question, in a preferred embodiment, the PTO site. It is the component that obtains data, and stores it to either a database or other file system on server 1491. Each DAC Application Scanner instance is preferably responsible for scanning and saving the data for one application.
To visit the PTO site, each DAC Application Scanner 1496 preferably instantiates an instance of Internet Explorer or some other conventional browser. DAC Application Scanner 1496 code causes the browser to interact with the various tabs, lists, links, buttons, etc. on the PTO site and retrieves information from the PTO Web pages for storage in the database. The DAC Application Scanner is sensitive to information that has been obtained on previous scans and preferably downloads only information that has changed since the previous scan of a particular application. DAC Application Scanner 1496 stores retrieved data in such a manner that a full history of changes can be produced, and the system can identify when each change was detected.
DAC Scheduler 1498
DAC Scheduler 1498 preferably consists of two main parts:
1. a Standard Windows Task scheduler, which allows an operating system of server 1491 to trigger activities on a timed or scheduled basis.
2. A DAC Scheduler application which preferably:
a. Can accept a series of parameters defining a scan batch
b. can call an appropriate DAC Manager Web service method to launch the batch scan, when the program is run.
Together, these components permit important scans to occur on a regularly scheduled basis, without human intervention.
A flow diagram of the operations performed by a preferred embodiment of a scraper system 1400 which can be used for retrieving data from online databases is provided in
As alluded to earlier in an alternative embodiment a data scraping system 1400 can include code injected into pages viewed by a web browser and operations running on a data collection server. The injected code in the alternative embodiment preferably selects, processes and sends information to the collection server and interrogates the collection server for further scraping tasks.
As noted above, a data collection server 1500 (
With reference to
a. It requires no additional third party tools, installation, or configuration.
b. The API stays in synch with new versions of the browser.
c. There is no reliance on templates, which may become dated if there are any structural modifications to pages on a target site. Even if significant modifications are made to the web portal page structures, such as the addition of a new navigation bar, navigation path changes, etc., the DAC 1400 can continue working, because it is programmed to look for specific individual page elements.
d. Unlike the alternative embodiment described below, the primary
DAC 1400 embodiment does not rely on a Page Load event. Rather the primary embodiment relies instead on interfacing with browser APIs, not just the html/javascript code in the browser. Examples of browser level API events include OnNavigated, OnDocumentCompleted, and OnNavigateError. This provides much finer grained control and responsiveness than javascript, which operates outside of the primary DAC code and thus cannot provide the same level of error reporting and other services.
Nonetheless in an alternative embodiment again, the data scraper 1400 can be implemented through an add-on to programs with browser-like capability—including Firefox. The add-on permits the injection of code into web selected pages beginning with event 1401. The add-on filters pages by URL and injects code into pages that match a URL template. In this embodiment, as seen in
As noted above a primary embodiment of data scraper 1400 includes a number of individual data acquisition (DAC) application scanner 1496 instances, each of which is responsible for scanning and saving the data for one application. To visit a site, such as the PTO website, each DAC Application Scanner 1496 instantiates an instance of Internet Explorer. The DAC Application Scanner 1496 code causes the browser to interact with the various tabs, lists, links, buttons, etc. on the PTO site and retrieves information from the PTO Web pages for storage in various databases noted herein. Each DAC Application Scanner 1496 is preferably sensitive to information that has been obtained on previous scans so as to download only information that has changed since the previous scan of that application. The DAC Application Scanner 1496 also preferably stores the data in such a manner that a full history of changes can be produced, and the system can identify when each change was detected.
The DAC application scanner 1496 thus inspects the content of a matched page at 1405 and takes actions appropriate for that page. For example, in the case of a PAIR database or similar database maintained by a governmental agency, the PAIR pages will be either a search page, a CAPTCHA page or a data page. In the primary embodiment of the DAC 1400, a call is automatically made to a Web service which presents the image to a human who can resolve the CAPTCHA (or other similar Human Interactive Proof) at a remote location. The response from the Web service call contains the CAPTCHA solution, which is then passed directly back to the target site as a normal Web request.
In the alternative embodiment described earlier, if the presented page contains a CAPTCHA at 1415 the injected code creates an alarm 1410 (which may be visual, audible, etc.) and stops to allow a data collector to solve the CAPTCHA and continue the scraping process. As with the primary embodiment note that in some instances the CAPTCHA solving can be performed by another remote computing system (not shown). It will be understood that in the event a CAPTCHA is not utilized, this section of the scraper is not required. Furthermore it will be appreciated by those skilled in the art that the nature of the pages will be a function of the particular database being examined by the data scraper.
If the presented page is a PAIR search page at 1430 the DAC Case Scanner 1496 sends a request 1435 to the data collection server 1500 asking for the next case (e.g., a patent application, a patent reexamination, or some other assigned reference number) to be scraped at 1445 and as seen in
Looking at
In some instances access to the database may be restricted by the online database proprietor to preclude or inhibit automated data collection. This is done typically to prevent an automated data acquisition from using excessive bandwidth or computing resources. To avoid triggering such exclusions or restrictions, embodiments of data scraper system 1400 can be imbued with logic to mimic human browsing patterns. In particular the DAC Case Scanner is configured so that:
a). it knows which tabs it is interested in. For performance and other reasons, it ignores all other material that is available.
b. it Navigates to each relevant tab explicitly, but in a random order, just as a human would. Thus, the Case Scanner does not land upon random pages.
c. If a desired tab is not present (which happens randomly at certain sites, including the PAIR site), it moves to the next tab in the random sequence.
d. When the DAC has visited (or attempted to visit) each of the target tabs, it indicates to the scan engine that the scan is complete. This makes a thread available for the Scan engine to launch a new DAC Case Scanner for the next sequence number.
e. it may visit pages other than the PAIR main tabs, as some of the links on these tabbed pages point to other sources of information.
f. a time period for selections of tabs, links, downloads, etc. can be configured to be comparable to that of a human being. In other instances the timing or sequence can be randomized.
Note that to determine the human characteristics to be emulated, surveys/monitoring of actual users can be performed, or in instances where server logs are available, they can be reviewed to identify a baseline human behavior for these actions. In this manner data scraper 1400 does not present any additional burden on a host computing system beyond that of a footprint presented by a typical human operator, and thus is less likely to be flagged for excessive use or blocked from access.
PAIR (and similar governmental databases which have limited reliability) frequently fails to display requested data, often presenting a page containing an error message. The DAC Case Scanner 1496 preferably automatically attempts to try to navigate back to a usable page upon such a PAIR error. Note that prior to extracting data optional benchmarking tests can be performed to determine a reliability and/or loading of the associated PAIR system. In instances where such system appears unreliable or overloaded the scraper 1400 can defer data collection to another time to comply with any regulatory restrictions on data access limits and reduce errors.
An Image File Wrapper (IFW) tab is treated differently when examining PAIR records. When the IFW tab is selected additional code is executed that selects and initiates the download of portable document format (PDF) files. The downloaded PDF files are preferably held on the machine where the browser is running so that they can be collected and split up (if needed) and stored later on the machine where the data collection server is running. Other image or data files may be collected as well for the purpose of optical character recognition to enhance data review of the governmental agency database.
The DAC Case Scanner 1496 (
As alluded to above in some embodiments a table, log or similar history structure can be maintained to allow reconstruction of what appeared at the website (in a preferred embodiment PAIR) each time an application was scraped, and without requiring the DAC application scanner to download the full set of data each time. If the data acquisition module detects no changes, it does not spend any further time downloading large items, e.g., IFW documents or the like. Moreover if charges are identified, the data acquisition module downloads only those specific elements that have changed. By marking any changes (deltas) in one or more of the data capture databases 142, very precise alerts can be logged/generated.
The data collection is driven by a data request queue 1565 (
The update request queue is managed by a DAC Scan Engine 1540 and is constructed preferably by assigning a priority to each known control number. For example in the preferred example of a post grant challenge such as a reexamination, cases that have been completed (e.g. with a Reexam Certificate Issued event) are given a small priority while active reexams may be given a higher priority. In some instances a priority level may be based on how recently there has been some activity—again other examples will be apparent to skilled artisans and it is expected that the particular implementation will be domain dependent.
A data collector 1500 can determine how many reexams to scrape, and submit the number at 1535 to the data collection server. In some embodiments the server can randomly select a priority scheme with DAC Scan Engine 1540 so that many different reexams to be scraped are assigned a likelihood of selection proportional to the priority of the reexam. In the case of customer driven selection schemes, a priority may be assigned in accordance with a particular customer's status with a service provider collecting the proceeding data. When all requests have been processed a completion message can be generated at 1550.
The bootstrapping of the data is thus followed by periodic updates, which are preferably performed on daily basis to ensure that new materials are brought to the attention of the system users as quickly as possible. For example in a reexamination context, the PTO transaction records and/or image files in databases 126 (
For example, the USPTO PAIR site is not searchable, and does not even identify the most recent cases that have been filed/initiated. Thus they must be identified automatically using an auto-discovery technique. To do this, the present system preferably starts with the most recent confirmed case (which hypothetically has an assigned reference number of XXX) and then increments the reference number by some constant (which can be 1, 2, etc.) to locate additional records/filings in the PAIR database. The incremented case number (XXX+1) is then “searched” (under control of monitoring logic 180 and catalog logic 170) to see if there is a corresponding record in database 126. Using this indirect approach the present system can glean the state of events within target organization 115 (here the PTO) without direct access to the latter's internal databases. Again, it will be understood that other techniques may be used in the event the target organization does make its data available through more conventional access routes.
The updates are also preferably performed using some form of prioritization. That is, the system may designate certain cases as active or inactive depending on whether they have reached a termination event. In such instance, cases with a termination event may not be checked or serviced very frequently as there is unlikely to be further developments. It is possible, of course that some exceptions events may occur, and for that reason it is desirable to periodically check even on an infrequent basis. However, to conserve resources, bandwidth, etc., it is preferable to prioritize updates in accordance with a probability model (discussed below in more detail) that determines which cases are most likely to be the subject of an update at any particular moment in time.
The operation of the data collection server is shown in
The probability model preferably studies events within the target organization to determine their relative temporal relationship. For example it may be determined, from analyzing all or at least selected ones of the organizations cases that a first event (E1) 122 generated within target organization 115 by logic 120 (
The present system uses this information to assign a priority for researching and updating databases 142 in accordance with an update schedule. In other words, in the absence of some over-riding consideration, the system would construct an update schedule by considering the highest likely event (E1, with a likelihood of X) to occur (for case Cx) and would start the review of databases 126 and updating of databases 142 using this case first, and then progress through the entire set of cases until completing the list of cases, or some other marking point. For example, it may be decided that only cases above a certain threshold probability Y should be evaluated on a daily basis, while cases below this should be checked less frequently, say on every second day, every week, etc.
Other schemes will be apparent to those skilled in the art from the present teachings for generating an update schedule. The system preferably is knowledgeable of the current state of internal rules 116 and procedures 118 as part of the probability model estimation evaluation. As these are typically published and available to the public, it is not difficult to incorporate them on a dynamic basis as part of the probability model. For example, after filing a reexamination, the USPTO has a certain number of days (90) fixed by regulations, to issue an initial determination. Therefore, there is a strong correlation between such events which is easily identifiable, and if such regulations are varied (to change the time to say 60 days), the system should quickly learn to revise the models based on this dynamic parameter rather than solely prior historical information. Other organizations may have similar temporal restrictions which can be gleaned and exploited this way to optimize a prioritization of cases in an update schedule.
In other instances it may be desirable to monitor the useage patterns of the users of analyzer 110, and to base a prioritization of updates on such useage instead. In other words, if system 110 notes that certain cases are widely accessed, it may bump the priority of such case on an update schedule. To do this, the system can simply log the accesses/queries performed in connection with a certain case (say Ct). These can be sorted, again, into table form to identify a relative popularity/interest level within the community for the cases being handled by target organization 115:
Thus rather than using a probability score as noted above, the system may in some instances use a popularity or interest score for the update schedule. Alternatively some mix of the two can be used in accordance with system objectives, performance, etc., and after routine testing. In such embodiments the popularity/interest score may be used to augment, modify or modulate the probability determinations made for the cases as noted above, to generate a modified update schedule that reflects and incorporates community interest as well.
Returning to
From the set of documents at the target organization (and other sources) the system identifies a subset of key documents at step 310. Again, the system may decide to filter, ignore, or prioritize the intake of documents to give more importance to some types of documents over others. As an example, within a particular case, a first type of document (e.g., a reexamination request) may be processed more quickly than a second type of document (e.g., a status request letter or the like). Other examples of prioritization will be apparent to skilled artisans.
At step 314 the documents are preferably coded in some convenient fashion to make them more easy to be indexed, sorted, queried and/or analyzed as noted below. It may be desirable, for example, to segment and categorize the documents in a different fashion than they they are found natively within database 126 (
As is to be expected, in some instances the documents relating to cases handled by the organization may not always be in readily accessible form in a database 126. In such instances it may be necessary to manually inspect, retrieve and scan the documents for a file/case to ensure completeness as seen in step 316. Since the USPTO records are available to the public, it is not expected that this would present a significant burden. Moreover in some cases it may be possible to secure copies of some documents in this fashion which are not otherwise accessible through the PAIR site.
A sorting operation and preferably an OCR operation is performed on selected documents at step 320. Since many of the target organization documents are not available in text form, this aspect of the invention permits users to search and locate information across cases in a manner that is not possible at this time.
From this data one or more customized case databases 335 and associated indices are constructed at step 330. The customized databases may be in the form of separate files, tables, etc., and may be configured using any number of known techniques.
A user may then query the customized database at step 340, using any number of desired terms or filters. As seen in
At step 350 the system preferably logs the user's request/query into a review database, which can be part of the databases 142 mentioned earlier. Here the system can compile user access to specific cases, specific documents, etc., which can be used for prioritizing an update schedule (noted above). In addition this information can be used for optimizing a content mix for the site. For example, it may be discovered that the cases for a particular company are widely studied, and this can affect and determine the types of press releases and other external data that is accessed, retrieved and catalogued for general consumption.
In some specific instances the user can configure (or the system may autoconfigure) an alert for the cases reviewed by the user, or even other specific cases designated by the user at step 360. In other words if the system notes that the user accessed and studied a particular case, it can then create a programmed alert to inform such person of future events for such case. The alerts can be programmed to use any number of desired delivery options as described below in connection with
In general, the system allows a user to ask questions or prediction outputs for timing, outcomes and recommendations. As an example, the user can provide a case number, and ask for a prediction of timing for the next event in the case, or for timing of a specific event. More specifically, the user can provide a reexamination control number, and be informed that the next likely event is a patent owner response within the next N days. Alternatively the user can request a prediction of when a particular event (such as a reexamination certificate issuance) is likely to occur. In other instances the user can ask about cases that are still pending to get a sense of their timing, and an expected issue date for a case. Again other examples will be apparent from the discussion herein.
Thus at step 410 the user specifies one or more of the above variables using any well-known conventional graphical interface. The input mechanism can make use of any desired input tool, such as using an numeric input, a text input, a graphical input, etc. In some instances, as noted, the user can specify a document (by reference number) or even upload a specific file for the system to consider.
During step 415 the system analyzes its databases (see
In addition it is possible to segment a user/subscriber base in accordance with a user status level, so that different types of users are given prediction information that is a function of such level. A user with the highest level, for example, may be provided a detailed calculation that takes into account a first number of variables that is much larger than that used for a user with a lower status level. In the latter case a base simple calculation might be used from a lesser number of variables. In this respect, users can elect what type of treatment or information they wish to receive.
At step 420 the user can invoke the prediction engine to process the aforementioned input variable, and, using the comparables noted above, combined with simple Bayesian logic, Hidden Markov Modelling or other known technique generate one of any number of desired prediction results. As noted earlier the prediction engine 420 also preferably considers loading and regulatory changes as noted at 422 in rendering a prediction. The predictions can be logged within a database 570 (
The results can be generated/output at step 430 in accordance with any desired visual scheme appropriate for the data set, including list form, graphs, charts, etc. Where graphs/charts are used, the system can annotate the output to indicate statistical confidence levels and the like for the benefit of the user. As noted earlier, the prediction type may include one or more of:
a) a timing or date for an event 442; this may be expressed as a date, a number of days, or a window of time; in very basic instances this can be used to identify the likely date for issuance of a reexamination certificate, or expected date of issuance of a still pending application.
b) an outcome or a percentage likelihood of such outcome or event 444; this may be expressed as a positive or negative result, and/or in numeric or visual form for the different possible outcomes; the results may be classified in the aggregate, or may be broken down if desired on a claim by claim basis. The outcome may also indicate whether the system predicts that the claims will require modification (or amendment) to survive the reexamination process. In other embodiments the system may identify whether a pending application is likely to result in an issued patent.
c) a recommendation or suggestion 446; this may be used by practitioners and expressed as an express suggestion to add/remove a rejection type, amend one or more claims, or modify a proposed document in accordance with insights gleaned from comparable documents;
d) an estimate of the number of manhours and/or cost associated with the case, or which are required to complete the case. This can be based on evaluating the current state of the case, the number of documents/pages associated with the case, studies and/or surveys of average billing rates, reexamination costs, etc. Large entities may find this information particularly useful for budgeting and planning purposes, and it will be understood that the principle could be applied to regular prosecution as well, so that IP managers can get a firm grasp on expected upcoming legal expenses needed to support a group of patent applications.
While some exemplary predictions were noted above, it will be apparent that based on the databases maintained in the system (see
The system can provide a prediction of all events expected as output for some or all of a set of cases currently pending across one or all Group Art units. In effect, this can be used to forecast a cumulative output of the target entity within any desired target window.
Alternatively, the system could predict when a future, hypothetical unfiled case would be likely finalized by the target entity. This could be used for planning to evaluate options vis-à-vis using the reexamination process versus a litigation action.
The system can provide a prediction of all events expected as output for some or all of a set of cases currently pending with an Examiner. In effect, this can be used to forecast a cumulative output of the Examiner within any desired target window.
The system can provide a prediction for certain types of documents which can be characterized or associated with a definite resolution or outcome. The resolution or outcome may be binary, or may be expressed as percentages, etc. The documents may be as simple as the result of a petition for an extension of time, or a more complex document such as the Request itself. The document in question can be compared statistically to prior documents similar to it using any number of metrics as part of the evaluation. In effect, this can be used by interested parties to understand the expected potential outcomes, and therefore how the target entity's process should be factored into other proceedings. For example the prediction tool 400 may estimate—based on analyzing the content of the Request itself—that the chances of a particular reexamination succeeding in invalidating a patent are extremely high or extremely low. This calculation is useful when the case is relatively young, and can be part of an evaluation process for estimating the value of an underlying patent, settlement with a patent owner, etc.
The system can provide a prediction for the resolution of a particular case at a later stage of proceedings, which, again, can be characterized or associated with a definite resolution or outcome. The resolution or outcome again may be binary, or may be expressed as percentages, pie chart allocations, etc. This may be based on a number of factors, including as noted a total number (or at least selected ones of) documents found in the reexamination proceedings. The documents in question can be compared individually statistically to prior documents similar to them using any number of metrics as part of the evaluation, or can be evaluated as a whole against other collections. As before this can be used by interested parties to understand the expected potential outcomes, and therefore how the target entity's process should be factored into other proceedings.
In the situation where a case represents a still pending application, some embodiments of the invention can monitor PTO documents to identify basic documents such as notices of allowance, issue date notifications, etc., to provide estimates for when a particular application will issue as a patent. This feature in effect acts as a form of early patent radar to alert a user to the potential impact of new intellectual property. It is expected that businesses will avail themselves of this option to gain competitive intelligence, insights, etc., on competitors who may be in the process of securing new patents in the immediate future.
Other types of predictions may be based on hypothetical or simulated cases or documents (or portions thereof) which have not been filed, for the purpose again of developing optimized strategies and understandings of the target entity. This type of “what if” analysis can be used by a myriad of interested persons, including decision makers, litigation attorneys, prosecution attorneys, etc., to understand and formulate appropriate strategies.
This prediction tool can look at a target patent, and, based on its characteristics, determine a potential outcome and timing for a resolution. As is well-known, patents can be analyzed with respect to a number of different characteristics, including general technology area, specific classification, specification word content, claim wording/content, inventor pedigree, assignee name, priority date, citations, prior art cited, underlying Examiner, and many other factors known in the art. Using these characteristics the system can compare the target patent against all (or some selected group) of patents which have been subjected to reexamination to determine the probability of success, timing, etc. It should be noted that the outcomes can be specified with different degrees of granularity, so that for example, specific target claims can be examined within the target patent, along with the patent as a whole.
The data predictions can be used, of course, in a converse manner to provide comparative reports to visualize a breakdown of timing/outcomes for patents based on the above characteristics. For example, a user can ask to have a plot of claim length versus rejection rate, rejection type, timing, etc.
This prediction tool allows the user to specify a quantity and type of rejection to determine potential timing and outcomes. For example the user can identify/compare the difference between having a single 102 rejection for a claim, compared to having two or more of the same type. Or, the user can specify multiple types of rejections (102, 103 and 112). In this manner the user can review relevant references, claims, etc., and determine an appropriate strategy for a contemplated filing.
As above the data predictions for rejections can be used, of course, in a converse manner to provide comparative reports to visualize a breakdown of timing/outcomes based on the rejections. For example, a user can ask to have a plot of rejection types versus outcomes, timing, etc. The user may use such information to identify an optimal mix of proposed rejections to make so that they can ensure a more careful and longer examination period in the target organization. This can save expected fees, as well, as the drafter of the reexamination may determine that certain rejections are unlikely to be successful for a particular patent, and thus if they are only marginal to begin with they can be removed.
Here the prediction tool allows the user to provide/upload a document that they propose introducing into a particular case. The document is analyzed against other comparable documents to identify characteristics that would tend to indicate its utility, potential for successful outcome, and timing. Both statistical and semantic processing can be employed to analyze word, sentence choices, etc. Natural language processing of documents to identify similarities and differences is well-known, and any number of suitable techniques could be employed herein. For example the user can author/generate a petition for an extension of time, upload an electronic file with the contents, and have the system predict a likelihood of success of such petition being granted. In addition, the system can identify/flag other petitions that are most like the user's (from a content perspective) to help him/her identify examples of outcomes that are favorable or unfavorable.
As with example #7, this prediction tool can parse a specific type of document, namely, a completed submission/request, to identify its potential for success, and an expected timing for resolution. As before, the Request can be is analyzed against other comparable documents using natural language techniques to identify characteristics that would tend to indicate its potential for successful outcome, and timing.
Again it should be noted that all of the above predictions may be based on a variety of factors, including a current dynamic loading experienced by the personnel, historical/seasonal variations, etc. In a preferred embodiment the system continually receives feedback 450 of actual events and results from pending cases, as for example can be determined from database 560 (see
As can be seen above, the system allows the user to identify and exploit biases and predilections that would otherwise be obscured—if not invisible—based on day to day observation of the workings of the target organization. By analyzing the output and events in bulk and across the entire organization, the system identifies event correlations, document correlations, etc., which are hitherto not seen since the data has not been compiled, maintained and dynamically evaluated in this manner. Using the prediction tool the user can also mix and match parameters to consider multiple constraint dimensions. For example the user can predict the optimal rejection types for a particular claim in a particular patent.
In addition to the USPTO as a target organization that can be evaluated for predictions, it will be apparent that other types of entities could be analyzed in the same way. For example, decisions of the Board of Appeals could be examined in the same manner, and the personnel of such entity (a panel of judges) similarly studied to identify correlations in behavior and processing of submissions (in this case, appeals). Thus, both the outcome and timing of appeals could be estimated by embodiments of the invention by analyzing appeal submissions, and a panel of judges reviewing the submission. Since the panels are frequently correlated with particular subject matter, it is not difficult to predict a panel composition for a particular case. In instances where the applicant requests oral argument the panel composition is in fact then defined for the user. Thus, given a set of judges, or an expected set of judges, the invention can be used to predict an outcome and timing for an appeal.
In other instances it is possible to further collect data on the personnel of the entity, such as by observing them in public when they are hearing oral arguments and the like. The resulting oral arguments are frequently captured in electronic form, and can be transcribed to identify content associated with the invididuals. The oral arguments for the CAFC for example are kept in an online accessible database their website and can accessed there. The present system can be augmented with appropriate conventional speech recognition routines and supporting logic (not shown) to identify a prosody of each individual during a hearing (from the audio signal) on a continuous basis to determine a prosody or affinity score of the judge for the party in question. The content (questions, statements) made by the judge can be similarly identified using such routines and scored to determine a positive or negative valence/affinity for the party in question. The combination of content and prosody score can be used, in conjunction with historical information for the judges, to determine a likelihood of a resolution of the case in favor of one party or the other. Historical information can be compiled for each judge to identify signature word content choices, key prosody identifiers, etc., along with outcomes for the cases, and timing tags to indicate when/for whom the statements were made in the context of the hearing. For example it may be determined that a particular judge's use of certain expressions is strongly correlated with ruling for/against a certain party. By analyzing these correlations across a wide set of rulings the invention can be used to study a judicial organization for its predilections and observable biases.
In still other embodiments it can be expected that product reviews, stock reviews, etc., can be predicted from entities that provide such functions. Other administrative agencies which use well defined procedures could be modeled as well.
As seen In
A first type of database is a mirror/backup database 550. The purpose of this database is to try and emulate or approximate the content of database 126 as close as possible. This allows for creating a second access path to the target organization data through bandwidth, routing, etc., that does not impose a processing load on the entity in question, but which may be more optimized for public consumption. For this reason it is expected that embodiments of the invention will be attractive to individuals and other companies who want to access the target organization's data in a more robust manner, and/or through faster connections, without worry that the data is not a 100% duplicate. Note that in some instances it may be desirable to “scrub” database 550 so that obvious errors are removed, while in some embodiments it may be required to duplicate the content exactly, including with any blatant errors.
In a preferred approach a main system master case file and document database system 551 is also maintained. This database (which is comprised of multiple databases) has a structure/format optimized for ensuring rapid retrieval of relevant documents for the cases, and thus may vary significantly from that used by the mirror database 550. Therefore the raw data for each case is maintained here (with events and links to events), which may be sanitized or corrected for obvious errors. In addition the original documents (or links to the documents for retrieval from another storage system) are stored within this database system to permit users access to the image files typically associated with PAIR, such as stored typically in Acrobat PDF versions. Furthermore, as noted above, text indices are also actively maintained and constructed for each document OCRd by the system to ensure text based and search predicate based querying of the underlying content in the submissions.
A master case db 515 includes records used by the system with data that is extracted from the original case records found in database 126 as well as other data generated by the system to identify a case. For example each case may be given a system reference number that is matched to a case submission number for the target organization (i.e., a control number within the PAIR context). Other fields within the master case db may include patent numbers, inventor names, Examiner names assigned to the case, attorney names, assignees, etc. Result data may be retained for each case as well, including disposition and histories of claims rejected, the basis therefore, and final dispositions of the cases, such as whether the reexamination/reissue resulted in a certificate, a final rejection, etc. In the case of pending applications similar information would be maintained as well.
An Examiner database 520 contains names and other profile data associated with the personnel of target entity 110, and their respective work groups (which can be art units). The profile data may include a list of cases worked on, and other survey data collected as noted earlier.
A production/loading database 525 tracks loading of the organization by case with time and by Examiner. This can be done in any convenient fashion, including through chronological snapshots which create a record, for each time period, of the cases being actively worked on, and the identity of the Examiners involved. Other techniques for determining the loading can also be used by correlating other data as noted below. It will be understood that the time period can be set to any desired value (daily, weekly, etc.) to monitor the organization's productivity and loading.
The attorneys and agents working on the cases are also tracked in database 530. This can include basic information such as names, addresses, firms, registration numbers, requester/patent owners represented, number of cases in active status, etc., and can also include more advanced data such as the identity of all cases worked on, links and references to documents authored, success or failure rate in cases, and so on.
Administrative information associated with the running of the analyzer is stored in database 535. This can include basic information about users and subscribers, and may also include more advanced data identifying the state of the currency of the data in the system, the level of accuracy measured for the system, and so on.
User subscription data is maintained in database 540, including identifying information, plan level data, account balances, company affiliations, contact information, profiling information 260 provided (see
Database 552 is used to store crowdsource/vote data as noted above. Again identifying information for each contributor is kept, along with an indication of a case and a vote value provided for such case. For example, a user can provide a “vote” which is multi-dimensional and specifies: a) that a particular Examiner; b) will reject a particular case identifier; c) under a particular theory; d) on a particular date. Other forms of data can be received and processed as desired, such as a probability of success predicted for a specific case by the voter, and so on. Other examples will be apparent to those skilled in the art. Authentication information can also be maintained to minimize and reduce vote fraud.
Events are logged for each case in database 560. In a preferred approach, this database contains an entry for each event generated by target entity 110 for any case being processed. The events are preferably logged with reference to multiple indicia including some or all of the following: an event number; an associated case reference number; a system reference id number; an event classifier (e.g., what type of event occurred) and a time stamp. Other types of data may also be included if desired, including an entity responsible for the event, links to any documents associated with the event, and so on.
A prediction database 570 is used to maintain prediction that is either generated in response to user requests, or auto-generated periodically in response to the former and/or program conditions as noted above in
A database of reexamination submission requesters and/or patent owners and their identifying information is also preferably maintained in a database 580. Like the attorney/agent database 530, other data can be maintained and cross referenced for each requester/patent owner, such as associated reexam numbers, patent numbers, assignees, attorney/agents, and so on, along with more advanced information, such as success rates, number of cases in active status, etc.
A patent database 582 may include basic bibliographical information for the patent as conventionally stored at the USPTO site, along with cross reference information to reexam numbers, system reference numbers, etc. The actual patent documents, in electronic form, can also be stored here. Document links can also be provided to be able to access and retrieve patent related documents easily within a graphical interface presented to the user. File history documents (such as may be present in a prior prosecution proceeding for the patent) can be maintained in a database 584.
In some embodiments it may be desirable to consolidate and maintain as much information concerning a patent in question as is feasible to provide a one-stop service. Accordingly additional types of documents (petitions, appeals, press releases, Internet content, etc.) can be also stored and text indexed as seen for database 554. Database 554 may also be linked to (or have data from) the USPTO's assignment database so that users can be informed of a current assignee and any prior records of transfer. Since this is normally kept outside the aforementioned PAIR system, this again allows for a better one-stop user experience by integrating multiple otherwise disconnected databases together.
While some current organizations offer basic patent services (such as obtaining copies of the patent themselves) it is extremely impractical if not impossible to easily glean the totality of data for an issued patent that is subject to reexamination in context, including for example previous prosecution, prior art cited against it, the communications with the Examiner, petitions, board decisions, etc.
An alert database 590 is used to store notifications which are sent or will be sent to subscribers. These may be indexed by subscriber, or case/event number. As noted above, subscribers can ask to be kept abreast of developments for a particular case, or if a particular prediction of interest has changed dramatically beyond a preset threshold. For example, if the timing between two events, or the likelihood of success is calculated to change by more than 10% the subscriber can be alerted. Users may choose to be alerted of specific events as well, such as a notice of intent to issue a reexamination certificate, or a notice of allowance (for a pending case), an issue date notification (for a pending case) and so on. Other examples will be apparent to those skilled in the art.
The alerts pass through an interface 595 where they can be directed (see
In general, the petitioner can ask to be excused from any mandate over which the target organization otherwise has jurisdiction to enforce. To obtain such relief they are required specifically to file a petition with the organization/entity and secure a favorable decision authorizing the rules exception. Depending on the nature of the petition, it may be handled by different groups or personnel within the PTO.
The exceptions handling aspect of the invention is also unique in that no current publicly accessible system exists for permitting practitioners and other entities to observe and monitor a collection of petitions and decisions. Thus, it impossible to simply search databases 126 for petitions by subject matter, by filer, by decision, by date, etc., to gain wider scale insights and understandings into the operation of the USPTO. As seen below, users can select and filter such submissions quickly and efficiently to identify outcomes, timing, etc. and obtain reports on the same in text, chart and/or graphical forms. In addition the users can easily identify and retrieve the actual decisions by the organization relating to the petitions, so that a comprehensive dataset can be presented in one convenient interface to the user.
In a preferred embodiment users can identify at step 605 a particular target type of petition, such as petitions handled by a specific person/group within the PTO, petitions associated with a particular case, petitions associated with a particular submitter, petitions containing certain content (text) or more generally those classified in accordance with a particular type of relief sought, or by reference to a resolution (which may be favorable or unfavorable for example). In both cases the user can be presented with any convenient form of pulldown menu to select a name, a subject matter/topic, etc. For instance, users can ask to see petitions associated with a particular type of relief, such as an extension of time, extra pages, a prior art submission, a declaration, etc. Alternatively the users can simply request to see every petition filed for every case, broken down logically according to type.
At step 610 any cases or petitions from a petitions database 619 (which may be information gleaned from one or more databases discussed in
Other identification/analytical data 618 associated with the petition submissions, such as the name of the submitter, the associated patent owner, the length or complexity of the document, and the timing associated with a related decision, can all be computed or determined as well at step 620, if it is not already conveniently available. The outcomes can then be summarized as well at step 625.
At step 640 the results of the petition/case search can be presented to the user in any convenient form as noted above. Comparisons can be made to identify particular correlations or trends of petitions or decision outcomes, timing with particular personnel, with particular art units, etc. The resulting report preferably includes embedded resource locator links to permit the user to easily find, review and utilize the actual content of the petitions and decisions, including text, graphics, etc.
Since the petitions and decision data is stored in both image form and text (OCRd) form, it can be searched for relevant text content. Accordingly users of the system are also able, for the first time: a) to search and consider such documents from a content perspective; b) to search across an entire set of such document spanning over multiple cases. For example users can locate and review petitions which discuss a particular rule, regulation, etc., or which make mention of a particular patent, person or precedent across all cases handled by the PTO. This is in contrast to existing architectures (seen on the top of
It will be noted that while the discussion for
At step 805 the user is allowed to designate a set of cases for which he/she desires to receive alerts. In a preferred embodiment the user can specify not only specific case numbers, but also parameters associated with cases, such as patent number, inventor name, Examiner name, attorney name, requester name, etc. Any data associated with a submission may be considered to determine a set of cases to be monitored.
In some instances it may be desirable to subscribe to pre-configured specific “channels” 807 organized in some logical fashion by topic or subject. That channels may be identified with specific companies, specific Examiners, specific cases, specific events, etc. Any number of variants will be apparent to those skilled in the art. Alternatively a user may specify his/her customized channels 809, which, in some cases may correspond to a docket of cases that he/she (or their company) is affiliated with or responsible for. At this point it will be understood that the set of cases, as defined/filtered by the user, will be associated with a set of new potential events of interest that are generated as the target organization processes submissions.
The user can then specify or configure their requirements for delivery of the alerts at step 810, including defining individuals, email accounts, message accounts, portable devices, etc., which are to receive the alerts. Any number of different conventional options may be elected here.
At step 820 the user can specify alert parameters, including particular types of alerts and/or thresholds 825 that they wish to impose to filter the set of new potential events. For example, the “type” of event may be tied to a particular type of document being associated with an event, such as an Office Action, a Response, a Petition, etc. Alternatively the event may be based on a press release or litigation event (which may be derived from a separate news/litigation database or service) for a company in question that is associated with one of the cases.
All such actions, and others apparent to those skilled in the art, can result in a triggered/candidate alert. The user can specify that they nonetheless do not want to see such triggered alerts until a certain number of press releases (a threshold) are identified (which can be a form of confirmation) in recent news searches, and so on. In other embodiments the user can request that they only be sent an actual alert through one of the channels when the aggregate number of triggered alerts (across all channels) exceeds some number.
In addition, users can specify that they wish to be alerted based on user-defined thresholds which may be associated with a prediction generated by the process noted above in
As the alerts are triggered and/or sent they are stored and updated in a database as seen in step 828. With this type of data the system can also monitor and develop correlations between users, channels and alerts to give recommendations to users at step 830 for new channels, cases, companies, alert types, etc. This can be implemented using any conventional collaborative filtering algorithm, corroborative filtering algorithm, etc. which uses some form of prediction. This technique can help users find and identify other subject matter of interest that they may have overlooked.
The alerts are then sent/reported to the users at step 840 in accordance with their preferred delivery mechanisms noted above. The data can be archived as desired for each user as well.
As seen at step 920, the user is then presented information from the relevant database (
Examples of the types of queries and reports possible with the present system are shown in
For example,
As seen in
In
From the above it will be understood that these are again but examples of the types of reports that can be easily extracted using conventional tools from the data collected in databases shown in
Accordingly at step 1005 the voter is classified according to their status, which may be one of multiple levels or labels. For example, a user may be designated as novice, average, expert, etc. In some instances the user's membership status may be factored into their voting capability/status.
At step 1010 the case or event is selected by the user, along with a relevant time period. Again the selection can be facilitated using any conventional tools, including the query interface discussed above. The user for example can vote on specific cases, or even specific events. As an example the user may contribute a vote on when an Examiner may render an Office Action, and/or if the Office Action is expected to be favorable or unfavorable on a specific claim. Other examples will be apparent.
The user is then presented with an entry prediction screen, which, as noted above, may take any convenient form, including similar that offered by the Piqqem site. The main difference, of course, is that instead of predicting the performance of securities, the present invention allows users to provide predictions associated with the entity's behavior. In other embodiments the users may be permitted to vote on related patent matters, such as the result of a litigation, trial, etc., or the amount of damages expected to be awarded, the likelihood of an injunction and so on. Again any factor associated with patents may be presented for prediction within an interface/voting screen.
In some embodiments it will be desirable to tally and recognize contributors based on their prediction performance. As with other sites which perform this function, the recognition can be calculated and presented to other members of the site in any convenient fashion.
Unless otherwise stated, it is intended that the boxes and content shown in
Beginning in
In addition, other dynamic content, culled from the databases described above (
As seen in more detail in
Interface 730 shows an example of a more elaborate search implemented by embodiments of the invention. For example the user can specify a particular type of document (by code), a certain first phrase, and a logical predicate (search operator such as within x words) followed by another phrase, and other date restrictions. Other types of filters can clearly be implemented as desired.
In a preferred approach the user can move throughout the result set using any convenient field, by selecting one or more of the column labels. For example, the user can scan and review cases using a control number, a filing date, patent number, inventor, assignee, status, etc. Other types of high level data could be presented of course. Sorting of the results can be achieved by selecting a control field (not shown) associated with the columns as well.
It can be seen that the tool 740 permits the user to review an entire dataset if desired as well in an extreme case where no filter is imposed. The user can thus immediately and at a glance move back and forth through a raw dataset of reexamination events in a manner that is not possible using any conventional tool, including PAIR. This flexibility allows for the user to emulate the capability of an internal tool otherwise being the only mechanism available/required for the purpose of reviewing and analyzing the organization's data in this fashion. The productivity and time savings associated with this tool are also substantial, as it permits an outside user to perform analyses that would otherwise take several hundred manhours of manual online access to perform using conventional tools. More importantly embodiments of the invention take the guesswork and speculation out of the picture by ensuring, through the automated updates noted above, that the events published in database 126 from the organization's internal data are accurately located and extracted.
As seen in the right hand side of the interface, the user can be given different types of selection buttons as well, so that he/she can navigate seamlessly across different case numbers, different patent numbers, etc., all within interface 750. This interface, therefore, integrates multiple elements of the prior art into a single location to further increase utility, productivity, etc., as the user does not lose access to fundamental data concerning the case whilst examining certain materials more closely. In some instances therefore it may be desirable to open any desired documents from an image file wrapper directly within the interface again for the user's convenience. While this illustrates one embodiment of a case-record review tool, it will be understood that there can be countless variations on this approach consistent with the present teachings.
Embodiments of the present invention can also be configured to imitate a functional interface presented by the prior art PAIR system discussed above, to allow a greater number of users access to this important public data. From direct observation it can be seen that the PAIR system throttles access to some extent using CAPTCHAs, timeouts, etc. Since the site has limited resources available to meet worldwide demand for this important data, it is apparent that it would benefit the public and the governmental agency to develop a more robust secondary access channel.
Thus in
Finally, another useful enhancement which can be added to augment the emulated experience is the addition of an additional browsing button 762, which can cause successive cases to be presented within the interface using a single click. The active logical tab can be highlighted and linked as shown so that for example, selecting the arrow keys moves backwards/forwards by one case when the “Application Data” tab is highlighted. In other cases it may be desirable to skip forwards or backward using Attorney/Agent as the logical grouping, and so on. In still other instances it may be desirable to provide additional document linking or selection logic within the emulated interface. Other examples will be apparent to those skilled in the art.
The graphical editor allows the user to manipulate a data entry point in two dimensions, so that in this instance, the user can specify both a time prediction value (along a horizontal axis) and an outcome prediction value (along a vertical axis). The graph can be annotated with convenient labels to assist the user in inputting his/her vote. In some instances the graph/chart can be controlled to give visual feedback while the user is inputting data, to permit him/her to see more distinctly the values they are contributing for the parameter in question. The user can be shown his/her prediction along with a crowd prediction, an expert prediction, etc., using any convenient and conventional visual output tool appropriate for the data in question. While the example is shown for a two parameter vote, it will be understood that additional dimensions of data beyond two could be captured, and for other prediction value types.
A representative example of a Personnel Profiling interface 770 is shown in
On the right side the user can be presented with a variety of useful data metrics about the individual, along with a comparison of their data to their peers. Other types of data could be studied of course as well. Again it will be understood that the implementation can be done in any number of variations depending on the underlying data and events being studied.
In a similar fashion a representative example of an attorney/firm Profiling interface 790 is shown in
On the right side the user can be presented with a variety of useful data metrics about the attorney/firm, along with a comparison of their data to their peers. Other types of data could be studied of course as well. Again it will be understood that the implementation can be done in any number of variations depending on the underlying data and events being studied.
On the right side the user can be presented with a variety of useful data metrics about the attorney/firm, along with a comparison of their data to their peers. Other types of data could be studied of course as well. Again it will be understood that the implementation can be done in any number of variations depending on the underlying data and events being studied.
In some cases patent assets are intentionally abandoned by their owners because they are perceived (subjective or objectively) to have little or no remaining value, or perhaps value that is not commensurate with the cost of obtaining such value. In some instances, however, a patent owner may not be aware of the maintenance fee requirement, or may not receive the fee notification, and the patent lapses due to inattention. This can cause valuable assets to be lost due to simple carelessness or lack of appreciation by the patent owner of the true value of the patent assets.
To remedy such mishaps there is a procedure by which patent owners presently can “revive” patents which have become abandoned due to lack of maintenance fee payments. To avail themselves of this option, however the patent owner must meet certain requirements (such as establishing that the abandonment was unintentional or unavoidable) and pay an extra petition fee. Furthermore, in some cases the petition must be filed within 2 years of the abandonment.
The USPTO puts out an Official Gazette every week (in electronic and print form) which identifies which a list of patents have become recently expired. Unfortunately the OG is usually a few weeks behind, so by the time it is published it is too late to remedy any missed payments. To assist patent owners, however the OG does also publish a prospective list of patent numbers which will require payment in an upcoming period. By manually checking these two lists patent owners and other interested parties can learn of patents which have expired and/or which may go expired. This information is useful as a means of discovering potential assets that may have gone unappreciated but which may still have useful value.
Clearly the above infrastructure is not optimal for preserving the value of patent assets, or helping third parties discover valuable patent assets. The method shown in
At step 1205 a customer or interested party can define their interest in potential target patent assets (and events surrounding the same) with reference to any number of criteria to a discovery service provider. In a preferred embodiment the user can specify that they wish to examine patents which fall within a certain class (e.g., class 705), or which belong to a certain entity, or which related to certain subject matter, contain certain keywords. It should be apparent that any number of matching criteria can be used for this purpose.
At step 1210 additional filters and query logic can be imposed as needed to properly formulate the query to a database of potential patent assets. For example, time restrictions may be imposed to prevent discovery of assets for which there is no potential revival or use. Alternatively the user may specify that there are only interested in assets which have a certain priority date, or which issued within a certain time period, etc. Other examples will be apparent to those skilled in the art.
In the event multiple users are to be serviced by examining the databases, the user/client requests can be consolidated at step 1215 in a master list to avoid duplication of effort. That is, it is possible that there will overlap in the search coverage, and more than one user may want to examine a particular database in more detail. By consolidating requests the system can reduce the amount of overhead and processing/bandwidth requirements.
The search can than proceed across multiple databases to discovery patent items of interest that are on a master list generated at step 1215. As seen on the bottom of
As seen on the far left therefore, a first option is performed there through step 1220. The system takes the master list as an input and search all patents on it which expired within N (preferably 2) years of the target date (TO) for failure to pay maintenance fees. It will be apparent that N and the target date can be set to any convenient value, but, in most cases, TO will be a present date or a future date. For example on January 1 a user may want to know all the patents which expired 2 years before February 1. This is because, as a practical matter, it is difficult to coordinate and prepare a petition on short notice to have a case revived (if necessary). Some users may also set N to be very small so that the search is only looking for recent cases. The search for expiration of patents for failure to pay maintenance fees can be done using any conventional database, including the USPTO PAIR system and from offerings by third parties such as DELPHION. Both of these systems (and others) have logical fields for identifying a maintenance status of patents.
In some instances it may be the case, however, that the indication for the patent is in error because the patent owner has remedied the deficiency. The data in the OG or other databases may be “stale” in some cases therefore. To ensure that the user receives most up to date information on such expired cases and events for the same, the present invention can also automatically check a USPTO maintenance database at step 1225 to identify a status for one or more cases. To do this an automated script (akin to the one described above for the USPTO PAIR review) can be employed to work from the master list, one at a time, ascertain their maintenance status events and record the same in an expired patent report list.
At step 1280 any number of desired relevant documents and data items can be collected based on the report list. For example, in the case of a report on expired patents the user can be given a report that includes the patent details, along with information on the current patent owner (which may be gleaned automatically from assignment records as noted below) as well as a copy of the patent in question, maintenance fee payment records, new maintenance fees due, petition fees required to bring the case into compliance, etc. Other documents can be collected of course as needed.
A lead report is generated at step 1285, which is packaged to include the additional supporting material/data items and communicated to the user for their consumption. At this point the user preferably has a full complement of materials to help them assess the value of the lead patent asset, along with sufficient lead information to contact the patent owner and, if desired, procure the same.
At step 1290 a service provider may update an overall “watch” list for the user (or the user base) so that a record is kept for each event and item presented to a user. In some cases users can be given notifications/alerts of individual events detected at this point, in the manner described above. This permits users to profile and identify acquisition leads far in advance of competitors. For example, a user could be informed through SMS, email, etc., of a recently discovered asset that has just gone abandoned (or changed status) since a last iteration thorugh the applicable database(s).
In some instances a service provider may be given instructions by a user to automatically pay a maintenance fee for the patent if it otherwise meets certain criteria specified by the user concerning timing, cost, etc. For example the user may specify that the provider should pay the fee to reinstate (or even maintain) a patent if the cost is below some threshold, and so long as the time from expiration does not exceed some time period. This type of preemptive action may be useful in some cases given the cost/benefit analysis associated with reviving cases (compared to the cost of simply maintaining) and as potential leverage in discussions with the patent owner.
Returning to step 1230 in the middle of
In addition it should be apparent that the system can also learn from prior historical behavior of specific patent owners (or based on subject matter, anticipated costs, etc.) to identify entities that are more or less likely to permit an asset to go abandoned. Accordingly at step 1240 the system can prioritize a search to identify assets in an ordered priority of expected likelihood of abandonment. This ordered priority list then is used at step 1245 to perform an automated search of the patent maintenance database. If a record is indicated as having been paid, the system ignores the item. Other ways of prioritizing the discovery process will be apparent from the present teachings.
Otherwise, as previously explained for step 1280, a report and documentation package is prepared for the user for extant patent assets meeting the desired profile. These leads can be reported out at step 1285 as before. In some cases at step 1290 the user/client may ask that the service provider put the asset on a special watch list. Items on this watch list are monited by the service provider within the maintenance database proactively, and up to the last minute, to determine if the patent owner has paid the fee. The client may authorize the service provider to pay the maintenance fee under user-defined parameters, as noted above, to prevent degradation of the asset. Note that in some cases where the user “rescues” the asset before it goes abandoned, they may nonetheless not be able to reach agreement with the patent owner, in which case they have lost the benefit of the payment without any return consideration. In many instances, however, due to the value of finding leads earlier, and the value of such leads, this type of loss may be more than acceptable in an overall acquisition program.
For some types of patent assets the client may also instruct the service provider to send an urgent communication to the patent owner to alert them to the impending expiration. This has the benefit of getting the patent owner's attention and, in the event a deal is not consummated but the patent asset nonetheless goes expired, the patent owner will have greater difficulty availing themselves of the benefit of the rule concerning “unintentional” abandonment since they were imbued with notice prior to the expiry of the patent. The user/client therefore is somewhat protected against the patent owner concluding the opportunity with a third party since after expiration the asset would be impaired. The user can thus avoid having the opportunity spoiled by a third party, or in some instances where infringement exposure exists, the user (or an affiliated entity) can even avoid the risk of potential infringement since the patent asset may no longer be revivable within the requirements of the patent regulations/statutes. This technique therefore could be used by competitors to mitigate risk from patents that might otherwise be problematic if they were not to expire. That is, if the patent goes expired after being notified of the potential for expiration, or a petition to revive is not filed before such notification, it is possible the patent cannot be revived under the present standards. By automating this type of technique certain entities can reduce their overall exposure to competitive portfolios by optimizing the chances that these assets are not revived.
Returning again to the middle of
This information can be used by a variety of different entities for a variety of different purposes. For example, a first company may desire to monitor the expected issuances or allowances of another company. By seeing which cases are allowed, issued (or predicted to become such) the invention permits a competitor to assess whether it has prior art or other materials that are germane to the application. If it determines that these materials are relevant it can make the decision to submit/introduce these materials during the initial examination—as opposed to a post issuance challenge where the rules and timing may not be favorable. This improves the overall quality of examination as well since applications can be expected to be better vetted before they are issued.
In other instances the information for events can be mined for other purposes. For example the filing of a change in attorneys might be indicative of a quality change at a law firm, or a change in ownership of the patent application. The ownership change in turn may be reflective of an ongoing or prospective asset purchase that is not well-known. An entity status change from small entity to large entity may also reflect either a merger with (or purchase by) a larger company, an increase in personnel, or a successful licensing of the patent application to a larger company.
Similarly it is well-known that many publicly traded companies' stock is affected by public announcements of issued patents. In other instances the rejection of a patent application on a key aspect of the company's product line may similarly affect its economic prospects. The present invention can be used to automatically mine these situations and find prospective issuances/rejections before they are widely known, giving the trader an advantage against the rest of the market. Accordingly a stock trading decision can be based on an automatic identification and evaluation of events surrounding an application this way. Other events and data can be mined of course for similar reasons, and the invention is not limited in this respect.
Again at step 1260 the system can perform an optimization or prioritization operation to identify leads which are most germane to the user's request. For example, the system could use a priority date/filing date to conduct the in depth search. Other factors can be considered as well, for example, the system may be programmed to use a list of companies whose stock performance varies most dramatically in response to patent developments (pro or con). By assessing the most volatile companies first, the invention can thus find trading opportunities earlier as well.
At step 1265 the system then proceeds to work from the prioritized version of the master list to identify the current status and related events of the selected cases in the USPTO through the Public PAIR database, or any other convenient database containing such data. This automated tool works as noted above for the other software routines which can examine PAIR records to identify the current status of cases and events surrounding the same. The system can then consider any one of a number of desired status codes or events to select them for final inclusion on a lead list. For example, the application may have a status that the application has gone abandoned (for failure to respond to an Office Action for example) or that the application is expected to issue in the near future (from an issue notification or a notice of allowance), or that a change in status has been indicated (from small entity to large entity), or a change in attorneys, etc. While the assets in this case are merely pending applications, they may still have significant value to a third party. This aspect of the invention allows a third party to dig deeper into the USPTO and identify lucrative leads before they become issued patents through monitoring of these events.
The desired status/event codes are thus identified at steps 1271 (abandoned) step 1272 (could be abandoned), has an issue notice or notice of allowance, change in attorneys, change in entity status, office action rejection, etc. (step 1275) and so on. The user has the option therefore of pinpointing and selecting cases which meet a desired profile, and permit the user the opportunity to identify events about the patent owner, and exploit a potential deal for the asset long before it shows up in the conventional US patent database for publication. For example, an issue notification event is typically generated several weeks before an actual issuance date. By availing themselves of the present invention, interested third parties can identify and develop leads much further ahead than their competitors. Notice of Allowances or issue fee payments events are issued even further ahead of time, and can be similarly mined and exploited as desired to identify leads.
As suggested above an IP manager at a company can thus study and evaluate competitor patent developments before they become issued, and, as noted, take preemptive action in some cases to ensure further review of an application in light of new prior art that may not have been considered. This is a potentially superior option than having to wait for a patent to issue and then being forced to deal with it in an adversarial capacity while it enjoys a presumption of validity.
In other instances the system may detect that a particular application *should* have an abandoned status, even if one has not been specifically identified by the USPTO. This can be determined from an examination of the prior entries in the file history in PAIR. By analyzing the file history therefore and comparing it to other cases an automated system can predict a future event, such as the fact that an application is likely to go abandoned. This information, too, is useful since the invention does not have to rely on an explicit status indicator to classify the asset. Rather, the invention can assign a tentative status level based on an assessment of the overall file materials. This information may be useful to third parties, the patent owner, etc.
For pending application lead identifications the same steps as before can be performed to collect relevant materials at step 1280 and populate a lead database. In this instance however the system may go a step further and collect more detailed information from the prosecution database in order to give the user/reviewer a richer picture of what events are transpiring in the USPTO with the application. As an example, the user could be given any Office Actions, amendments, petitions, decisions, prior art, status changes, etc. for the case to help them make an overall assessment much easier in one convenient package.
The report is then generated (with supporting documentation) at step 1285 as before. A watch list can be updated at step 1290 in the same manner. Here the watch list may include items that the system suspects (or predicts) beyond a threshold are likely to change status in an upcoming period, and therefore should be checked more regularly to see if such status does in fact change. Accordingly, in the prioritizing search of PAIR (step 1260) this expected change probability data for each item can be used as a factor to initiate accesses to the external database. Other uses will be apparent as well. Again, in selected cases, and to the extent permissible by law, a user may elect to rectify or cure any defects to remove an abandonment designation within the file. While not shown, it is possible that in some cases the service provider could be given a commission, payment, or some other form of remuneration for discovery of assets that are acquired by one of its clients.
From the above it can be seen that embodiments of the invention can be used to effectively perform lead generation (and entity data mining) at a more comprehensive and deeper level than prior art tools to unearth patent acquisition opportunities. Other embodiments will be apparent to skilled artisans from the present teachings.
As noted above, the current databases for reviewing assignments are somewhat difficult to access and cryptic as they are again maintained at the USPTO. While they permit a user to search for assignments across a number of parameters, they do not permit users to easily search and browse by time or entity. For example, there is no mechanism by which a user can simply ask to see the most recent assignments (irrespective of entity or patent) from a certain date. This information is useful since entities may be affected by the transfer of rights in patent assets, and yet not receive notice of the same in a timely fashion.
Accordingly, at step 1305 the invention develops a list of existing reel and frame numbers from studying the USPTO database. In one embodiment the system can simply query the database with a range of reel/frame numbers, starting with 0001/0001 for example, and automatically incrementing these figures until it reaches a reel/frame combination that matches a current date (or a most recent date) and/or that is no longer valid (because there is no current record corresponding to an entry yet). For example the system may determine that the most recent reel/frame combination is MMMM/NNNN; in the next update process the system would pick up from there and look for a next valid frame/reel combination. Thus the invention can figure out where the personnel within the organization have left off and are expected to proceed anew in a next document recording cycle.
At step 1310 the entries are logged and maintained in a separate database that includes all the relevant fields from the assignment database and preferably additional items as well (such as US classifications, patent text, etc.).
As with the reexamination records noted above, the assignment recordings are thus now accessible through the service provider and do not require a user to have to navigate through the USPTO system. With this data the system can provide an interface (see
For users who want to do more comprehensive searches and receive alerts of assignment activity, the system permits a customer to define their target interests/criteria at step 1315. The user therefore can specify the name of an entity, a US class for the patent, a patent number, keywords in the patent, etc. Other examples and fields can be used of course as well depending on the desired functionality.
At step 1320 the system then uses the user defined filter to construct the desired query. This is then executed against the assignment database at step 1330, and documents can then be gathered at step 1335 in the same manner as noted above for
A report is then generated for the user at step 1340, and alerts can be provided to those persons who wish to be kept abreast of new developments in this database based on their customized criteria as noted above for
For example, a typical query can specify whether the user wants to examine applications that are pending, issued, abandoned, etc. with a certain date range, and which match a certain class (705, 710, etc.) or contain one or more user selectable keywords. The user can further specify whether they want to filter based on a particular Examiner, entity (i.e., company or individual), representative (patent attorney, agent, etc. or alternatively ask for all or a subset. Finally the user can specify the target event to be identified for the applications, which event may include a simple indication that that application has been filed, to a more a more mature event, such as the fact that the application has received a Notice of Allowance (NOA). Any of the available tags used by the USPTO within PAIR to designate specific events can be used for this purpose. Alternatively, as mentioned above, a transaction record history, or text/content of submissions can be mined and indexed to respond to the query.
The user can also filter the output by means of a threshold, so that for example only matching classes which have a particular number of applications, target events, etc., or ratio of target events/applications in excess of some figure are presented in the output. In other instances a timing relationship can be requested, to identify average times of prosecution in each of the targeted classes or categories.
The resulting heat map 1600 is then presented to the user in visual form as noted, so that the relative number of applications matching the query is represented by a size of the corresponding image block. For example, the number of applications in class 715 could be perceived to be much larger than the number of applications found in class 700. The sizes of the matching classes could be normalized and scaled to fit within a defined area of a window using any number of conventional techniques. While rectangular blocks are shown, it will be apparent that the heat map 1600 may be embodied in other visual form using a pie chart (with wedges) or some other polygonic shape.
A shading of the respective blocks can be used to denote a magnitude of an absolute or relative number within each class that matches the target event. Thus, a darker shading may indicate more matches to the target event, a lesser shading may indicate fewer matches, and so on. Colors or other indicators may also be employed of course.
The raw statistical information may be optionally presented directly on the heat map, or in some cases might be more conveniently presented by a mouseover type action as shown in
For example, a query can be made to identify and sort areas within the PTO by a ratio of allowances to patent applications. This can be used to understand, plan and budget for prosecution related activities, or make predictions and projections on the number of cases that may be allowed in a particular set of patent applications in a portfolio.
In another instance, the behavior of particular classes of subject matter can be tracked to identify trends in filings and allowances. The number of filings can be used for competitive intelligence to identify areas of exploitation by competitors. Areas where allowances are higher can be targeted so as to maximize prosecution efforts in particular areas of technology which have the highest potential for securing protection.
Individual examiners can be profiled of course to see an overall behavior of the organization. The examiners may be sorted into a spectrum as shown in
Representatives can be similarly examined to identify areas of expertise, success, etc. For example, a prospective customer may desire to identify practitioners, firms, etc., who have a large number (or some threshold number) of applications in a particular area, and/or who have achieved a certain degree of success in particular areas. The provided reports allow for an objective measure of the performance of such invididuals and firms to guide better selection of assistance.
Competitors (entities) can also be studied, to identify a breakdown in applications by subject matter area, and corresponding success (as measured by target events) in such classes (or technology areas). This information, too, can be used for driving decision making in companies for patent acquisitions, planning, etc., by focussing or avoiding areas where examination resistance, behavior or timing is poor.
The tools of the invention can also consider changes over time, so that differences between different periods can be mapped. For example, a user may inquire and identify which classes, technology areas, etc., have shown a greatest change in application numbers, target event/application ratio, etc. This information, too, can enlighten decision makers to understand which personnel or subgroups within the entity are changing behavior, or to better understand a collective behavior of one or more competitors.
Additional queries and reports can of course be created, and the above will be understood as simply exemplars. Any number of combinations of filters and variables can be specified to provide a desired visual and numerical report of interest. The present invention, by permitting a deeper and more thorough analysis of the US patent system, allows for greater insights, planning and prediction than prior art approaches.
A preferred embodiment of an on-demand case fulfillment system 1700 and its operations is shown in
The automatic, autonomous data acquisition system 1700 is intended to accomplish three purposes:
1) to fill gaps (or update stale data) in the existing database in real time in response to a customer request for a new record;
2) to fill gaps (or update stale data) in the existing database based on real-time prediction of queries a customer can be anticipated to submit in the future for particular records;
3) to continuously, automatically, and autonomously maintain the completeness and currency of the database (taking into account that some data become stale more quickly than others).
In this way fresh and complete data are always available to customers while computing and bandwidth resource requirements are reduced.
As seen in
One component of system 1710 is a preferably parallelizable data acquisition worker module 1711. Instances of worker 1711 can be implemented in parallel on a single CPU, a single machine or multiple, or on virtual distributed machines (e.g., a Cloud). Each worker 1711 preferably repeatedly queries a prioritized acquisition request queue 1716 for a highest priority request (randomly selected in the case of a tie). As seen in
The acquisition workers 1711 further cooperate with a human interaction proof resolution module 1712, which passes on requests for decoding CAPTCHAs and the like to an external resource or service (not shown) as described above for
A Request log 1713 acts as a repository of case requests. Generally speaking, when a customer submits a query for data which do not yet exist in database 515 (or are stale according to some predefined criteria) web application 1707 preferably will insert an urgent data request into a prioritized acquisition queue 1716, and then defer responding to customer 1705 long enough to allow an on-demand acquisition by system 1710 to complete or until some maximum time has elapsed.
As seen in
It can be seen readily that system 1710 alleviates the navigational and timing burden placed on a typical user who desires to access one or more records at web portal 1720 in instances where the latter imposes additional sign-in or access constraints (including a CAPTCHA or the like) it is tedious and cumbersome for users to have to navigate to the site, wait for an access control page 1715 to load, and then solve a CAPTCHA 1716. Instead proxy 1710 negotiates and performs all expected and necessary navigation operations on behalf of the requester, and keeps these connections open and available for users as they perform queries against cases. The access control page, or at least those critical portions needed for navigation to a query page, are effectively prefetched for the user. To solve the CAPTCHA at page 1715, a request can be made to a third party service that specializes in such tasks as noted earlier.
After the data have been acquired (or the time-out has elapsed) web application 1707 will process the customer query with the data available, including rendering it into appropriate format useable by the customer. In this instance the request may be satisfied by presenting the customer with data in a format that is shown in
Queue monitor 1717 is a software module that is primarily responsible for assessing the state of the prioritized acquisition queue 1716 and ensuring that requests are being timely handled. The queue monitor 1717 ensures that prioritized acquisition queue 1716 always has enough requests so that most of the available workers 1711 are kept busy while at the same time at least one worker is waiting for a request and ready to respond so that an urgent request can be fulfilled immediately. The queue monitor 1717 also monitors the status (working or free) and state of health of the acquisition workers 1711 so that the number of truly available workers is always accurately known. In addition, in the case that a worker fails to fulfill a request the queue monitor will re-queue the request so that another worker will pick it up.
System 1710 also preferably includes a predictive acquisition prioritization (PAP) module 1715. This tool complements (and in some cases implements) some of the predictive functions described above in connection with
A preferred process used by an on-demand system 1710 is shown in
Beginning with step 1730, again one useful aspect referred to earlier is that the inventive system preferably automatically initiates and maintains one or more session with a web portal 1720, so that ongoing open and persistent connections may be used by entities at a later time. Further as alluded to already, one reason for these connections is simple: in instances where web portal 1720 imposes additional sign-in or access constraints (including a CAPTCHA or the like) it is tedious and cumbersome for users to have to navigate to the site, wait for an access control page 1723 to load, and then solve a CAPTCHA 1724. Instead acquisition workers 1711 at step 1730 negotiate and perform all expected and necessary navigation operations on behalf of the requester, and keep connections open and available for users as they perform queries against cases. The access control page 1723, or at least those critical portions needed for navigation to a query page, are effectively prefetched for the user.
To solve the CAPTCHA 1724 at page 1715, and open the link, a request can be made at step 1735 to a third party service that specializes in such tasks as noted earlier.
At step 1740 a determination is made to see if a threshold number of links or connections are open/available. If not, the aforementioned steps are repeated until a desired target number of connections is achieved.
Unlike conventional prior art proxy systems, which only initiate a session and open a connection after a client has made a request to a third party site, preferred embodiments of the present invention anticipate user needs and establish a certain required number of connections to be available at all times. In effect it prefetches data for the desired portal pages before they are needed, and renders them useable (i.e. by solving the CAPTCHAs) so they are more easy to interact with than existing solutions.
By studying user request traffic it is also possible to configure system 1710 with scheduling control logic so that at any moment in time it is maintaining an additional set of open—and preferably persistent—connections beyond the current demand. For example, an additional fixed number, or an additional target % of connections beyond an existing demand can be maintained. This means that the number of connections should respond dynamically up or down in accordance with case demands.
At step 1745 proxy 1710 receives a request from a client or API call for services, in this instance, a request to access a case at the web portal 1720. The request can include a case identifier, a range, etc. as noted above. A determination is made at step 1750 to see if the record(s) is available already in database 515. If not, a request is initiated at step 1755 and logged in the request log 1713. In a preferred embodiment, if the record requested is new, meaning there is no data for it in databases 515, the request can be marked urgent in PAP 1716 so that an acquisition worker 1711 will pick it up quickly, and use an open connection to retrieve target data 1728 from a webpage 1727. Again since the CAPTCHA is preferably already resolved this mechanism accelerates the user experience over that which is available in conventional prior art simple URL link to the portal 1720 in question.
Proxy 1710 would then connect the client request directly at step 1750 to one of the preexisting open connections so as to establish an ongoing data session through the proxy connection.
As part of the request a specific case number or other query parameter (normally inserted at 1719) optionally can be passed along/provided as well. This additional piece of data can cause a case number to be injected into the appropriate search query box on a web portal page so as to initiate retrieval of the case in question. In such instance the client or API would see a retrieved record page 1717 and thus bypass two layers of the web portal navigation. In some applications it may be possible to specify a longer list of case numbers so as to cause retrieval of multiple records.
When a client or API accesses a data record 1717, additional code at proxy 1717 can be used to automatically download the contents of such at step 1755 for later use. This can be done independently of the user request or interaction with the data record, to take advantage of the existence and bandwidth of the connection.
In addition as cases are retrieved they can of course be cached so that if a proxy detects a request to the same case number it can simply bypass both the access control page 1715 and case access/query page 1718 and present the contents of the case application pages (as shown above) including relevant tabs, links, files, etc.
At some point 1760 the user may be presented with a new challenge item and associated access control page, again such as a CAPTCHA. In some instances it may be possible to intercept such requests and have them handled again, automatically by code at proxy 1710. In other cases it may be necessary to have the originator of the request provide/satisfy the access control parameters.
It will be understood from the discussion above that while the embodiment is described in connection with accessing cases at patent database, the inventive system and processes will be useable with any number of different environments where it is desirable to expedite and facilitate user interaction with a third party site which employs access controls. Note that the invention further offers the benefit in that the connections are in effect used by human operators during interaction and engagement with the web portal, and thus they should not run afoul of conventional restrictions governing or constraining automated access to such sites. In other words, by acting in an assistive capacity the inventive system becomes an extension of the user instead of an entirely separate automated proxy.
Thus when a request is made for a case the system 1710 links it logically to any one of the open connections when such case is not available, or is stale according to any number of customizable criteria. The user nonetheless transparently and seamlessly accesses a second level portal web page 1725, into which a search parameter can be specified at query field 1726. Thus this page (or critical elements thereof) are also now effectively prefetched for the user. This bypasses the user's exposure to the access control page 1715, the query page 1725 and the need for the user to engage with the same.
To implement the above functions a server computing system used by the described embodiments is preferably a collection of computing machines and accompanying software modules of any suitable form known in the art for performing the operations described above and others associated with typical website support. The software modules described below (referenced usually in the form of a functional engine) can be implemented using any one of many known programming languages suitable for creating applications that can run on client systems, and large scale computing systems, including servers connected to a network (such as the Internet). Such applications can be embodied in tangible, machine readable form for causing a computing system to execute appropriate operations in accordance with the present teachings. The details of the specific implementation of the present invention will vary depending on the programming language(s) used to embody the above principles, and are not essential to an understanding of the present invention.
From the present teachings it can be seen that embodiments of the present invention effectively implement solutions to the problems identified in the prior art, including RFPs put out by the US government for public access to certain key data in PAIR and PALM databases that is not otherwise available in bulk or network access form. One additional benefit of the invention is the fact that it offloads substantial traffic from US PTO networks, and thus allows a separate channel of access to benefit the public at no cost to the government. By opening up this previously unaccessible information on a wider basis the invention can also further facilitate the identification of potential technical experts, prior art, etc.
The present application claims priority to and is a continuation-in-part of the following applications, all filed Jan. 20, 2012: Ser. No. 13/355,218; 13/355,232; 13/355,241; 13/355,298; 13/355,342; and 13/355,392, all of which claim priority to provisional application Ser. No. 61/434,588 filed Jan. 20, 2011 and Ser. No. 61/442,049 filed Feb. 11, 2011 and all of which are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
61434588 | Jan 2011 | US | |
61434588 | Jan 2011 | US | |
61434588 | Jan 2011 | US | |
61434588 | Jan 2011 | US | |
61434588 | Jan 2011 | US | |
61434588 | Jan 2011 | US | |
61442049 | Feb 2011 | US | |
61442049 | Feb 2011 | US | |
61442049 | Feb 2011 | US | |
61442049 | Feb 2011 | US | |
61442049 | Feb 2011 | US | |
61442049 | Feb 2011 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 13355218 | Jan 2012 | US |
Child | 13828923 | US | |
Parent | 13355232 | Jan 2012 | US |
Child | 13355218 | US | |
Parent | 13355241 | Jan 2012 | US |
Child | 13355232 | US | |
Parent | 13355298 | Jan 2012 | US |
Child | 13355241 | US | |
Parent | 13355342 | Jan 2012 | US |
Child | 13355298 | US | |
Parent | 13355392 | Jan 2012 | US |
Child | 13355342 | US |