The field of the invention is database searching.
Current data search paradigms are quite good at performing trivial (level 0) correlation analyses, i.e., identifying records that contain matching terms. For example, if one searches the Internet for the string [“Joe Peterson” and “John Mitchell”], Google™, Bing™ Yahoo™ and other search engines find records that contain names of both individuals.
Level 1 correlations are much more difficult. For example, if both individuals attended UCLA, but there are no records containing both of their names, finding that correlation between the individuals could be challenging.
Level 2 and higher order correlations are even more difficult. For example, if Joe Peterson attended Stanford, where Mary Golden went to school, and Mary married John Mitchell, the correlation between Joe Peterson and John Mitchell would be extremely difficult to find using current search tools.
One of my earlier applications teaches use of concordances to facilitate searching in some circumstances, but that application does not contemplate successive (iterative) concordances. See US 2007/0219983 (Fish).
What is needed is computer systems, methods and models for assisting searchers in mining databases to identify Level 1 and higher order correlations.
The inventive subject matter provides apparatus, systems and methods in which correlations are located through matching of terms found in successive concordances.
Various objects, features, aspects and advantages of the inventive subject matter will become more apparent from the following detailed description of preferred embodiments, along with the accompanying drawing figures in which like numerals represent like components.
It should be noted that while the following description is drawn to a computer/server based work package processing system, various alternative configurations are also deemed suitable and may employ various computing devices including servers, interfaces, systems, databases, agents, peers, engines, controllers, or other types of computing devices operating individually or collectively. One should appreciate the computing devices comprise a processor configured to execute software instructions stored on a tangible, non-transitory computer readable storage medium (e.g., hard drive, solid state drive, RAM, flash, ROM, etc.). The software instructions preferably configure the computing device to provide the roles, responsibilities, or other functionality as discussed below with respect to the disclosed apparatus. In especially preferred embodiments, the various servers, systems, databases, or interfaces exchange data using standardized protocols or algorithms, possibly based on HTTP, HTTPS, AES, public-private key exchanges, web service APIs, known financial transaction protocols, or other electronic information exchanging methods. Data exchanges preferably are conducted over a packet-switched network, the Internet, LAN, WAN, VPN, or other type of packet switched network.
The following discussion provides many example embodiments of the inventive subject matter. Although each embodiment represents a single combination of inventive elements, the inventive subject matter is considered to include all possible combinations of the disclosed elements. Thus if one embodiment comprises elements A, B, and C, and a second embodiment comprises elements B and D, then the inventive subject matter is also considered to include other remaining combinations of A, B, C, or D, even if not explicitly disclosed.
In
In
Although the interfaces are shown here as being visual interfaces, such as might be used on a desktop, laptop or tablet, it is contemplated that one could additionally or alternatively use an auditory interface, or even some other interface.
Readers should appreciate that all names of persons are hypothetical, and that any connection with real persons are entirely coincidental. The same is true for web URLs.
The ellipses indicate that there could be many additional items in corresponding list, and the “========” markings indicate where other terms would be shown. Throughout the figures the marked boxes indicate that the user has chosen that corresponding entry.
As used herein, the term “concordance” means a collection of words or other terms used in a body of work, within a context. The context here is preferably a window of x number of words about the search term, where x can be any reasonable number. Contemplated windows can include anywhere between 10-1000 characters on one or both sides of the search terms, but more preferably between 15 and 500 characters, still more preferably between 20 and 100 characters, and most preferably between 25 and 250 characters.
The body of work can be any set of database records, which should be interpreted to include their equivalents in non-database data structures, including for example the databases mentioned above with respect to the search engine companies.
Contemplated concordances can be based on any suitable number of records within the body of work, preferably between 10 and 1000 records, more preferably between 20 and 500 records, and most preferably between 50 and 100 records.
Concordances shown to a user need not, and indeed preferably are not, complete listings of all words located within the windows of the examined records. For example, connector words such as “the”, “and”, “or”, and “therefore”, etc should be ignored. Also, one might want to ignore words that include numerals. Concordances preferably, but do not necessarily include, frequencies or numbers of occurrences. Concordances are preferably, but not necessarily, derived from windows disposed about a search term. For example, concordances could be derived from all the words in a record or other document, or perhaps only from emphasized or frequently used words and phrases. Concordances might also be derived only from main text in a record or document, perhaps ignoring advertising.
On the other hand, when constructing the concordance, phrases can advantageously be included. For example names of places and things “University of Pennsylvania” and “President Obama”, can be used instead of the individual words comprising the phrase. Commercially available concordance programs already use phrases (see e.g., http://www.concordancesoftware.co.uk/), and the various search engines should all have extensive lists of phrases that could be used. Readers will note that in the examples shown in the figures, some of the concordance terms are single words, and some are phrases.
In
In
It should be apparent to those skilled in the art that many more modifications besides those already described are possible without departing from the inventive concepts herein. The inventive subject matter, therefore, is not to be restricted except in the scope of the appended claims. Moreover, in interpreting both the specification and the claims, all terms should be interpreted in the broadest possible manner consistent with the context. In particular, the terms “comprises” and “comprising” should be interpreted as referring to elements, components, or steps in a non-exclusive manner, indicating that the referenced elements, components, or steps may be present, or utilized, or combined with other elements, components, or steps that are not expressly referenced. Where the specification claims refers to at least one of something selected from the group consisting of A, B, C . . . and N, the text should be interpreted as requiring only one element from the group, not A plus N, or B plus N, etc.
This application claims priority to U.S. Provisional Application No. 61/730,149 filed Nov. 27, 2012. That application, as well as all other referenced extrinsic materials, is incorporated herein by reference in their entirety. Where a definition or use of a term in a reference that is incorporated by reference is inconsistent or contrary to the definition of that term provided herein, the definition of that term provided herein is deemed to be controlling.
Number | Name | Date | Kind |
---|---|---|---|
3868675 | Firmin | Feb 1975 | A |
5251316 | Anick | Oct 1993 | A |
5752424 | Rosene et al. | May 1998 | A |
5809242 | Shaw et al. | Sep 1998 | A |
5948061 | Merriman et al. | Sep 1999 | A |
6662177 | Martino | Dec 2003 | B1 |
7076443 | Emens et al. | Jul 2006 | B1 |
7664313 | Sproat | Feb 2010 | B1 |
7752243 | Hoeber | Jul 2010 | B2 |
7970750 | Goel | Jun 2011 | B2 |
9471672 | Walters | Oct 2016 | B1 |
10169484 | Bellville | Jan 2019 | B2 |
20020156776 | Davallou | Oct 2002 | A1 |
20020194166 | Fowler | Dec 2002 | A1 |
20030163302 | Yin | Aug 2003 | A1 |
20040015783 | Lennon | Jan 2004 | A1 |
20040093321 | Roustant et al. | May 2004 | A1 |
20040215612 | Brody | Oct 2004 | A1 |
20040267725 | Hank | Dec 2004 | A1 |
20050004949 | Trepess | Jan 2005 | A1 |
20050071325 | Bem | Mar 2005 | A1 |
20050071328 | Lawrence | Mar 2005 | A1 |
20050108001 | Aarskog | May 2005 | A1 |
20050154713 | Glover | Jul 2005 | A1 |
20050187916 | Levin | Aug 2005 | A1 |
20050222989 | Haveliwala et al. | Oct 2005 | A1 |
20050289100 | Dettinger | Dec 2005 | A1 |
20060184357 | Ramsey | Aug 2006 | A1 |
20060242130 | Sadri et al. | Oct 2006 | A1 |
20060253418 | Charnock | Nov 2006 | A1 |
20070130310 | Batke | Jun 2007 | A1 |
20070214132 | Grubb et al. | Sep 2007 | A1 |
20070219983 | Fish | Sep 2007 | A1 |
20080040324 | Sadri et al. | Feb 2008 | A1 |
20080222099 | Morgana | Sep 2008 | A1 |
20090234826 | Bidlack | Sep 2009 | A1 |
20100145676 | Rogers | Jun 2010 | A1 |
20120036478 | Boguraev | Feb 2012 | A1 |
20130007124 | Sweeney | Jan 2013 | A1 |
20130191365 | van Putten | Jul 2013 | A1 |
Number | Date | Country | |
---|---|---|---|
61730149 | Nov 2012 | US |