A portion of the disclosure of this patent document contains material, which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent files or records, but otherwise reserves all copyright rights whatsoever. The following notice applies to this document: Copyright © 2018 Thomson Reuters.
This application claims priority to U.S. Provisional Application 62/519,230 filed on Jun. 14, 2017, the contents of which are incorporated herein in their entirety.
This disclosure relates generally to performing respective legal research legal research. More specifically, the disclosure is directed towards systems and methods for conducting prospective legal research.
Traditionally, in order to conduct legal research on a particular subject, researchers were required to carefully craft search strategies and apply such strategies to existing court opinions, motions, briefs, transcripts, secondary sources such as treatises or articles, statutes, web pages, etc. While such processes returned relevant results that provided a plethora of information to legal researchers on the particular subject, such research would not necessarily allow for legal researcher to identify future trends and prospective critical issues regarding the particular subject. Accordingly, there exists a need for systems and methods that provides for prospective legal research, which identifies future relevant court opinions, motions, briefs, transcripts, secondary sources such as treatises or articles, statutes and web pages, which can subsequently be grouped according to relevant categories, and which in turn allows legal researchers to identify future trends and upcoming issues pertaining a legal topic.
The present invention is directed towards systems and methods for conducting prospective legal research, which comprises receiving an initiated user question at a graphical user interface comprising one or more search terms and performing query expansion on the received search query. One or more documents that are responsive to the expanded search query are then identified, and from the set of responsive documents, a subset of documents that reference future development are then identified. The one or more responsive documents that reference future development are grouped into one or more document clusters and a topic is identified for each of the one or more document clusters. The one or more document clusters and the associated topics are then presented at the graphical user interface.
According to one embodiment of the present invention, identifying one or more responsive documents that reference future development further comprises determining whether one or more documents contains at least one of a future date, which comprises at least one of an explicit future date, a future date phrase and a future date range; a future term, which comprises at least one of a modal verb, a common prospective term and an uncommon prospective phrase; and a relevant feature, which comprises at least one of a prospective legal phrase, a rare phrase, an entity tags and a part of speech tags. According to another embodiment of the present invention, grouping the one or more responsive documents that reference future development into one or more document clusters is completed based on at least one of matching keywords, matching subjects, matching entities, matching unstructured text, matching authorship, matching quotes, matching dates, related dates, volume of documents, tagging relationships and direct connections between documents.
A system, as well as articles that include a machine readable medium storing machine-readable code for implementing the various techniques, are disclosed. Details of various embodiments are discussed in greater detail below.
Additional features and advantages will be readily apparent from the following detailed description, the accompanying drawings and the claims.
Like reference symbols in the various drawings indicate like elements.
In the following description, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration specific embodiments in which the disclosure may be practiced. It is to be understood that other embodiments may be utilized and structural changes may be made without departing from the scope of the present disclosure.
In general, the systems and methods described herein may relate to improvements to aspects of searching for information using a computer. These improvements not only improve the functioning of how such a computer (or any number of computers employed in the search) is able to operate to serve the user's research goals, but also improves the accuracy, efficiency and usefulness of the search results that are returned to the searcher.
The present system may be described in the context of information being comprised in “documents.” In this sense, a document is simply a logical container for information. Examples of documents in the legal research field may include, for example, court opinions, motions, briefs, transcripts, secondary sources such as treatises or articles, statutes, web pages, etc. Documents may also comprise issue summaries or index headings rather than judicial opinions, briefs, secondary source chapters or other longer-format documents. For example, a document that is returned by the system may be a Westlaw Key Number, headnote or American Law Review (“ALR”) article. It is also possible that one document may exist within another document—for example a book may be considered a document and also each chapter within that book may also be considered a document.
Turning now to
For example, the present disclosure is operational with numerous other general purpose or special purpose computing consumer electronics, network PCs, minicomputers, mainframe computers, laptop computers, as well as distributed computing environments that include any of the above systems or devices, and the like.
The disclosure may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, loop code segments and constructs, and other computer instructions known to those skilled in the art that perform particular tasks or implement particular abstract data types. The disclosure can be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules are located in both local and remote computer storage media including memory storage devices. Tasks performed by the programs and modules are described below and with the aid of figures. Those skilled in the art may implement the description and figures as processor executable instructions, which may be written on any form of a computer readable media. In one embodiment, with reference to
According to one embodiment processor 112 is central processing unit (“CPU”) that use communicative circuits to pass binary encoded signals acting as instructions to enable various operations. These instructions may be operational and/or data instructions containing and/or referencing other instructions and data in various processor accessible and operable areas of memory 529 (e.g., registers, cache memory, random access memory, etc.). Such communicative instructions may be stored and/or transmitted in batches (e.g., batches of instructions) as programs and/or data components to facilitate desired operations. These stored instruction codes, e.g., programs, may engage the CPU circuit components and other motherboard and/or system components to perform desired operations. One type of program is a computer operating system, which, may be executed by CPU on a computer; the operating system enables and facilitates users to access and operate computer information technology and resources. Some resources that may be employed in information technology systems include: input and output mechanisms through which data may pass into and out of a computer; memory storage into which data may be saved; and processors by which information may be processed. These information technology systems may be used to collect data for later retrieval, analysis, and manipulation, which may be facilitated through a database program. These information technology systems provide interfaces that allow users to access and operate various system components.
As shown in the
According to one embodiment, the suggestion module 124 is utilized to automatically suggest question components or segments, such as expanded words or phrases, suggested secondary or alternate words or phrases, related date terms and tagged entity terms, in response to a user initiated user question. The clustering module 126 serves to identify appropriate groups or clusters of the search results. Additional details of modules 122 through 126 are discussed in connection with
As shown in
In one embodiment, the network 140 uses wired communications to transfer information between an access device 180, the server device 110, a news content data store 150, a legal content data store 160 and a supplemental content data store 170. In another embodiment, the network 140 employs wireless communication protocols to transfer information between the access device 180, the server device 110, the news content data store 150, the legal content data store 160 and the other content data store 170. For example, the network 140 may be a cellular or mobile network employing digital cellular standards including but not limited to the 3GPP, 3GPP2 and AMPS family of standards such as Global System for Mobile Communications (GSM), General Packet Radio Service (GPRS), CDMAOne, CDMA2000, Evolution-Data Optimized (EV-DO), LTE Advanced, Enhanced Data Rates for GSM Evolution (EDGE), Universal Mobile Telecommunications System (UMTS), Digital Enhanced Cordless Telecommunications (DECT), Digital AMPS (IS-136/TDMA), and Integrated Digital Enhanced Network (iDEN). The network 140 may also be a Wide Area Network (WAN), such as the Internet, which employs one or more transmission protocols, e.g. TCP/IP. As another example, the network 140 may employ a combination of digital cellular standards and transmission protocols. In yet other embodiments, the network 140 may employ a combination of wired and wireless technologies to transfer information between the access device 180, the server device 110, the news content data store 150, the legal content data store 160 and the supplemental content data store 170.
According to one embodiment, the news content data store 150 is a repository that maintains and stores new documents from one or more news organizations, such as REUTERS. In one embodiment, the legal content data store 160 is a repository of legal documents, such as WESTLAW that maintains court decisions, litigation dockets and filings, legal treatises, law review articles and annotations thereto. According to one embodiment, the supplemental content data store 170 is a representative repository of non-news and non-legal documents that are relevant to the search query are generally available on the accessible Internet.
In one embodiment, the data store 130 is a repository that maintains and stores information utilized by the before-mentioned modules 122 through 126. In one embodiment, the data store 130 is a relational database. In another embodiment, the data store 130 is a directory server, such as a Lightweight Directory Access Protocol (“LDAP”). In yet another embodiment, the data store 130 is an area of non-volatile memory 120 of the server device 110.
In one embodiment, as shown in the
Although the data store 130 shown in
The access device 180, according to one embodiment, is a computing device comprising: a touch-sensitive graphical user interface (“GUI”) 184, a digital signal processor (“DSP”) 182 having an access application module that allows a user to access the server 110, access application module 182A, transient and persistent storage devices (not shown); an input/output subsystem (not shown); and a bus to provide a communications path between components comprising the general purpose or special purpose computer (not shown). According to one embodiment, access application module 182A is web-based and uses thin client applications (not shown), such as a web browser, which allows a user to access the server 110. Examples of web browsers are known in the art, and include well-known web browsers such as such as MICROSOFT® INTERNET EXPLORER®, GOOGLE CHROME™, MOZILLA FIREFOX® and APPLE® SAFARI®. According to another embodiment, access device 180 is a mobile electronic device having a GUI, a DSP having an access application module, internal and external storage components; a power management system; an audio component; audio input/output components; an image capture and process system; RF antenna; and a subscriber identification module (SIM) (not shown). Although system 100 is described generally herein as comprising a single access device 180, it should be appreciated that the present invention is not limited to solely two access devices. Indeed, system 100 can include multiple access devices.
Further, it should be noted that the system 100 shown in
Turning now to
Once entered, the search query initiated by the user is submitted to query module 122 over the network 140. The query module 122, upon receipt of the initiated user search query, signals the suggestion module 124 to perform one or more suggestion processes upon the received search query utilizing the defined grammar and linguistic and data constraints encoded in the grammar maintained in the suggestion data store 132, step 220. In one embodiment, a suggestion process may include expansion of the search query using well known techniques in the field, including but not limited to stemming techniques, tokenization, Word2Vec and term frequency-inverse document frequency (TF-IDF). Continuing from the previous example for the search query comprising the single search term “drones,” stemming techniques may be used to expand search query to include the terms “drone” and “drones,” and Word2Vec modeling can be used to generate the Word2Vec terms “remotely-piloted,” “remotely-piloted-aircraft-systems,” and “RPA” for expansion of the search query. Additionally, a suggestion process can also include the suggestion of secondary or alternate search terms by the suggestion module 124. Continuing from the previous example, search terms “Unmanned-aerial-vehicles” and “Unmanned-aerial-vehicle” are also included in the expanded search query.
Returning to
At step 240, one or more of the responsive documents maintained in the combined content data store 132 are identified that reference future development by the query module 122. In one embodiment, documents that reference future development include documents that include future dates as compared to the publication date of the documents, as well as documents that include future indicative terms, such as modal verbs. Additional details regarding terms that reference future development are discussed in relation to
Returning to
In one embodiment, the grouping in step 250 is performed according to a to a combination of the factors in conjunction with known clustering techniques, such as document similarity (cosine similarity) based on the unstructured text of each individual document.
At step 260, a topic for each of the one or more document clusters is identified by the clustering module 126 by the clustering module 126 and stored in the results data store 136. According to one embodiment, statistical modeling, such as latent Dirichlet allocation (LDA) statistical modeling, is used on the unstructured text of the individual documents to identify a relevant topic for each document cluster. For example, a subset of the responsive documents to the search term “drones” that indicate future development may be clustered on the basis that the subset of responsive documents contained the matching person entity, “David Cameron.” Using LDA modeling, a topic for the subset of documents could is identified as “Government and European Focus” based on the statistical modeling performed on the unstructured text of each document. Table 3 illustrates the relevant unstructured text of each document used to generate the relevant topic of “Government and European Focus.”
Returning to
Turning now to
Once entered, the search query initiated by the user is submitted to query module 122 over the network 140. The query module 122, upon receipt of the initiated user search query, signals the suggestion module 124 to perform one or more suggestion processes upon the received search query, step 320. In one embodiment, the search query is expanded using well known techniques in the field as discussed in conjunction with
At step 330, a search is executed by the query module 122 comprising the expanded search query against one or more data sets, such as the news content data store 150, the news content data store 150, the legal content data store 160 and the supplemental content data store 170. One or more documents that are responsive to the expanded search query are retrieved from the one or more datasets by the query module 122 and stored in the combined content data store 132. At step 350, each of the one or more responsive documents are parsed into individual sentences by the query module 122.
A determination is then made by the query module 122 as to whether the individual sentence contains a future date as compared to the publication date of the given document, step 360. This is the first step in determining whether a given document references future development. According to one embodiment, determination of whether the individual sentence contains a future date comprises identifying whether the individual sentence comprises (i) an explicit future date as compared to the publication date of the document, e.g. the parsed sentence includes the explicit date Nov. 20, 2020 in a news article from May 31, 2017; (ii) a future date phrase, e.g. “next month” or “following year” and (iii) a future date range, e.g. 2020-2030 in legislation text from 2015. As stated previously, the suggestion data store 134 maintains a listing of data constraints, which are directed in part to identifying explicit future dates, future date ranges and future date phrases.
Table 4 presented below illustrates an exemplary set of results from the subset of relevant document responsive to the search query “drones,” which contains a future date.
If yes, the document is labeled as relevant, step 362, and the document is added to the data set for document clustering and presentation stored in results data store 136, step 364.
If a determination is made that the individual sentence does not contain a future date, the process moves to step 370, where a determination is made as to whether the individual sentence contains a future term. According to one embodiment, determination of whether the individual sentence contains a future term comprises identifying whether the individual sentence contains (i) a modal verb, e.g. “could,” “would, “should”; (ii) a common prospective term, e.g. “expect,” or (iii) an uncommon prospective phrase, e.g. “prospect of,” “seeks views,” “to ban,” “proposals to” and “new law.” The suggestion data store 134 maintains a repository future terms that are utilized by the query module 122 in executing this determination.
If the sentence does contain a future term, then the document is labeled as relevant, step 362, and the document is added the data set for document clustering and presentation, step 364. Alternatively, if a determination is made that the individual sentence does not contain a future date, the process moves to step 380, where a determination is made by the query module 122 as to whether the individual sentence contains a relevant feature. According to one embodiment, determination of whether the individual sentence contains a relevant feature comprises identifying whether the individual sentence contains (i) a prospective legal phrase, e.g. “new law” “upcoming legislation or (ii) a rare phrase, e.g. “plans being considered,” and “call for an end.”
If the sentence does contain a relevant feature, then the document is labeled as relevant, step 362, and the document is added the data set for document clustering and presentation, step 364. However, if the individual sentence does not contain a relevant feature, then the document is labeled as not relevant ad will not be included as part of the potential results, step 385.
Turning now to
At step 450, each of the one or more responsive documents are parsed into individual sentences. A determination is then made as to whether the individual sentence contains a future date by the query module 122 in a similar fashion as described in relation to step 360 of
If a determination is made that the individual sentence does not contain a future date, the process moves to step 470, where a determination is made by the query module 122 as to whether the individual sentence contains a rare phrase. According to one embodiment, a rare phrase is a specific phrase that a temporal attribute tied to an event type, e.g. “first person arrested,” “first person to be convicted,” “first arrest,” “Government to publish,” “could be banned,” “call for an end” and “plans being considered.” As stated previously, the suggestion data store 134 maintains a listing of data constraints, which are directed in part to identifying rare phrases.
If the individual sentence does contain a rare phrase, the document is labeled as relevant, step 462, and the document is added the data set for document clustering and presentation, step 464. Otherwise, if the individual sentence does not contain a rare phrase, then the process flow continues to step 480, where a determination is made as to whether the individual sentence contains a common phrase or modal verb, both of which denote a future indication. As stated previously, the suggestion data store 134 maintains a listing of data constraints, which are directed in part to identifying common phrases or modal verbs. According to one embodiment, a common phrase is a phrase containing common prospective terms, e.g. “proposals to,” “new law,” “new legislation,” “could face” and “Government plans.” Examples of modal verbs include “can,” “could” and “may.” If the individual sentence does not contain a common phrase or modal verb, then the document is labeled as not relevant ad will not be included as part of the potential results, step 482.
In the event that the individual sentence does contain a common phrase or modal verb, process flow continues to step 484, where a determination is made as to whether the individual sentence contains a combination of future terms and relevant features. According to one embodiment, future terms include common phrases or modal verbs and relevant features includes entity tags such as persons or organizations and parts of speech tags. For example, if the individual sentence references a common phrase such as “new law” and an organizational entity name, “Federal Aviation Administration,” a determination would be made by the query module 122 that the individual does indeed have a combination of relevant features. In which case, process flow would continue to step 462 where the document is labeled as relevant and is then added the data set for document clustering and presentation, step 464.
While various embodiments of the present disclosure have been described above, it should be understood that they have been presented by way of example, and not as limitations. It would be apparent to one skilled in the relevant art(s) that various changes in form and detail could be made therein without departing from the spirit and scope of the disclosure. Thus, the present disclosure should not be limited by any of the above-described exemplary embodiments, but should be defined only in accordance with the following claims and their equivalents. For example, it should be noted that the processes described in relation to
Further,
In software implementations, computer software (e.g., programs or other instructions) and/or data is stored on a machine readable medium as part of a computer program product, and is loaded into a computer system or other device or machine via a removable storage drive, hard drive, or communications interface. Computer programs (also called computer control logic or computer readable program code) are stored in a main and/or secondary memory, and executed by one or more processors (controllers, or the like) to cause the one or more processors to perform the functions of the disclosure as described herein. In this document, the terms “machine readable medium,” “computer program medium” and “computer usable medium” are used to generally refer to media such as a random access memory (RAM); a read only memory (ROM); a removable storage unit (e.g., a magnetic or optical disc, flash memory device, or the like); a hard disk; or the like.
Notably, the figures and examples above are not meant to limit the scope of the present disclosure to a single embodiment, as other embodiments are possible by way of interchange of some or all of the described or illustrated elements. Moreover, where certain elements of the present disclosure can be partially or fully implemented using known components, only those portions of such known components that are necessary for an understanding of the present disclosure are described, and detailed descriptions of other portions of such known components are omitted so as not to obscure the disclosure. In the present specification, an embodiment showing a singular component should not necessarily be limited to other embodiments including a plurality of the same component, and vice-versa, unless explicitly stated otherwise herein. Moreover, the applicants do not intend for any term in the specification or claims to be ascribed an uncommon or special meaning unless explicitly set forth as such. Further, the present disclosure encompasses present and future known equivalents to the known components referred to herein by way of illustration.
The foregoing description of the specific embodiments so fully reveals the general nature of the disclosure that others can, by applying knowledge within the skill of the relevant art(s), readily modify and/or adapt for various applications such specific embodiments, without undue experimentation, without departing from the general concept of the present disclosure. Such adaptations and modifications are therefore intended to be within the meaning and range of equivalents of the disclosed embodiments, based on the teaching and guidance presented herein. It is to be understood that the phraseology or terminology herein is for the purpose of description and not of limitation, such that the terminology or phraseology of the present specification is to be interpreted by the skilled artisan in light of the teachings and guidance presented herein, in combination with the knowledge of one skilled in the relevant art(s).
Number | Date | Country | |
---|---|---|---|
62519230 | Jun 2017 | US |