This application relates to information services, such as information services for facts extracted from content meaning across differing sources on a wide area network. Content meaning can be derived through linguistic analysis, metadata, or other approaches.
Many approaches for extracting and using information from large networking environments, such as the Internet, have been proposed and implemented. Search engines and manually generated indexes are among the most common tools used for this purpose today, but there are literally hundreds of other specialized and/or complex data mining techniques that have been developed. And a large amount of effort is constantly being expended to improve and reengineer existing approaches as well as to develop new ones.
In one general aspect, the invention features a network fact information service system that includes a real time database that stores information about facts on the network by recording at least an identifier and an occurrence timepoint for each fact, wherein the occurrence timepoint identifies a time at which the fact occurred, fact-based expression logic operative to interact with expressions that define relationships between facts based on both their identifiers and their timepoints, a relationship database for storing representations of the relationships that satisfy the expressions, and a service interface operative to allow a service consumer to query the database of stored relationships.
In preferred embodiments, the fact-based expression logic can be operative to define different types of relationships, with the relationship database being operative to store information identifying a type for at least some of the representations of relationships, and with the service interface being responsive to queries that include relationship type identifiers. The service interface can include a timeline display interface operative to display a timeline that graphically shows a temporal relationship between facts. The service interface can be operative to present scheduled future facts on the timeline. The system can further include storage for future facts and current facts. The system can include prediction logic operative to generate predictions of future facts. The service interface can include a timeline display interface operative to display a timeline that presents at least one predicted future fact and graphically shows a temporal relationship between facts. The timeline display interface can be operative to present likelihood indicators in association with the presentation of predicted future facts. The timeline display interface can be operative to present relatedness indicators that visually indicate an association between correlated facts.
In another general aspect, the invention features a wide area network fact information service system that includes a fact information extraction interface operative to extract information about facts from different kinds of textual sources that include information about those facts, a database that stores at least some of the extracted information about the facts from the different types of information by recording at least an identifier and an occurrence timepoint for each fact, wherein the occurrence timepoint identifies a time at which the fact occurred, ranking logic operative to associate a ranking with at least some of the facts, and a service interface operative to enable a service consumer to access the stored facts based on at least their timepoints and their associated rankings.
In preferred embodiments, the service interface can be available via the internet. The system can further include timepoint extraction logic operative to extract the occurrence timepoints for the facts from documents on the network. The fact-based network interaction engine can include search logic operative to find facts that satisfy one or more of the expressions. The fact-based network interaction engine can include search logic operative to find sets of facts that satisfy one or more of the expressions. The search logic can be operative to find one or more past, current, and/or future facts. The fact-based network interaction engine can include monitoring logic operative to find one or more sets of facts that satisfy one or more of the expressions as they occur. The fact-based network interaction engine can include monitoring logic operative to find one or more sets of facts that satisfy one or more of the expressions as they occur. The fact-based network interaction engine can include personal fact aggregation logic operative to aggregate facts for a user based on one or more of the expressions. The fact-based network interaction engine can be applied to news stories. The system can further include sending logic operative to issue an alert or message when one or more of the expressions is satisfied. The alert or message can be machine-readable. The alert or message can be human-readable. The alert logic can issue the alerts or messages using an RSS format. The fact-based network interaction engine can include logic operative to define actions to be taken based on the detected sets. The actions can include the initiation of a commercial transaction. The actions can include the initiation of a security purchase transaction. The fact-based network interaction engine can further include logic operative to automatically initiate the actions. The actions can include financial transactions. The facts can be stored and monitored in real-time. The facts can include news flashes, blog modifications, weather data, or organizational information releases. The facts can be scraped of the internet, read from RSS feeds, or gained/uploaded through other sources. The database can be part of a scalable relational data warehouse. The network can be the internet. The service interface can include a list display interface that is operative to display a ranked list of results. The identifier can include information about both source and content for the fact. The identifier can include meta-data for the fact. The service interface can be a user interface to allow human end users to interact with the service as service consumers. The service interface can be a software interface to allow software to interact with the service as service consumers. The system can be operative to select facts to store information about based on input from the service consumer. The system can be operative to interact with information about facts from a plurality of different types of sources. The fact system can be operative to interact with facts from RSS feeds. The system can further include a search expression sales interface operative to allow service consumers to purchase predefined search expressions. The system can further include an entity extractor. The entity extractor can be operative to extract some information about facts based on formal linguistic processing and some information about facts based on entity-verb clustering. Fact information can be stored in a real time cache for a predetermined amount of time and then be moved to the database. The service interface can include display logic operative to display information about the facts in a continuously updated sub-area of a computer display. The service interface can include display logic operative to display information about the facts in a sub-area of a computer display and wherein the area is operative to display information relating to entities and/or facts for which information is displayed in another sub-area of the computer display. The service interface can include a timeline display interface operative to display a timeline that shows a temporal relationship between facts. The timeline display interface can be operative to update the timeline in real time as new future facts occur or are predicted. The timeline display interface can display the temporal relationships graphically. The service interface can be operative to present scheduled or predicted future facts on the timeline. The system can further include storage for future facts and current facts. The system can further include prediction logic operative to generate predictions or inferences of future facts. The system can further include the ability for end users to submit predictions and their likelihood of occurring to the database. The ranking logic can be operative to derive rankings based on a third party source document ranking. The ranking logic can be operative to derive rankings based on occurrence position in a document. The ranking logic can be operative to derive rankings for information about facts based on the source of that information. The service interface can includes timeline display interface operative to display a timeline that presents at least one predicted future fact and graphically shows a temporal relationship between facts. The timeline display interface can be operative to update the timeline in real time as new future facts occur or are predicted. The timeline display interface can be operative to present likelihood indicators in association with the presentation of predicted future facts. The timeline display interface can be operative to present relatedness indicators that visually indicate an association between correlated facts. The system can further include ontology management logic operative to maintain an ontology for classifying the information about facts. The fact information extraction interface can be operative to extract estimated timepoints.
In a further general aspect, the invention features a network fact information service system, including a real time database that stores information about facts on the network by recording at least an identifier and an occurrence timepoint for each fact, wherein the occurrence timepoint identifies a time at which the fact occurred, fact-based expression logic operative to interact with expressions that define relationships between facts based on both their identifiers and their timepoints, and a timeline display interface operative to display a timeline that shows a temporal relationship between facts.
In preferred embodiments, the timeline display interface can be operative to present scheduled future facts on the timeline. The system can further include storage for future facts and current facts. The system can further include prediction logic operative to generate predictions of future facts. The timeline display interface can present at least one predicted future fact and graphically shows a temporal relationship between facts. The timeline display interface can be operative to present likelihood indicators in association with the presentation of predicted future facts. The timeline display interface can be operative to present relatedness indicators that visually indicate an association between correlated facts. The system can further include an advertizing engine operative to associate advertizing with past, current, or future facts. The advertizing can engine includes a reverse auction engine that can set prices based on a length of a time period before a fact, wherein shorter periods are associated with higher costs.
Systems according to the invention can be beneficial in that they can allow users to approach temporal information about facts in new and powerful ways, enabling them to search, analyze, and trigger external events based on complicated relationships in their past, present, and future temporal characteristics.
Referring to
The system 10 can also include research, monitoring, analysis, and execution machinery 30, which is responsive to the information sources 20. This part of the system can cooperate with a fact data warehouse 50, as well as several external interfaces. A data cache 40 can also be provided to speed up data retrieval in certain circumstances.
The external interfaces include a user interface, which is temporal logic based, for searching historical, present, and future facts 60, and a user interface for defining complex sequences of facts 70. The external interfaces also include a Web services interface, which is temporal logic based, for searching historical, present, and future facts 80, and a Web services-based programming interface for defining complex sequences of facts 90. The system 10 can also generate a “subscribable” fact stream for generated facts in the “real world” (e.g., buying a stock, creating a news story, triggering a supply chain update).
Facts are pieces of information about occurrences that can take place anywhere and can then be described, reported, or otherwise manifested or revealed in some form on a computer network. A sports feed can report facts for a game, for example, such as by updating a score tally. A sports blog can also focus on different facts from the same game and/or can describe the same facts from the same game in different ways.
The facts themselves can also be network-based. In the case of an electronic corporate securities filing, for example, the occurrence on the network of the filing itself can be a fact. And it can also act as a source of descriptive material for facts that it describes, such as a company's product release dates.
The existence of facts and information about them are typically acquired by applying software such as entity and event extractors to text documents/sources. One approach to extraction is to linguistically analyze plain text, such as through the use of services from Reuters, ClearForest, InXight, and/or Attensity. Extraction can also involve simple harvesting where the content already contains meta-data, such as Resource Description Framework (RDF) tags.
If, for example, an article includes the following sentence:
“Fort Orange financial completes $3.3M stock offering.”
the system can use linguistic analysis to map the document date to the investment fact. Note that in some circumstances, techniques amounting to less-than-perfect linguistic analysis, such as entity-verb clustering, can be used without excessive loss of performance.
In another example, an article includes the following sentence:
“Look for a barrage of shareholder lawsuits against Yahoo next week”
In this case, the system can map the lawsuit fact to a “next week” timepoint (a scheduled future fact).
Future facts can be scheduled facts, such as the expected Yahoo lawsuits or events extracted from an Internet calendar. They can also be predicted based on a variety of prediction methods. These can range from complex statistical forecasting methods to simple inferences, such as where a company's next annual meeting is predicted to be on the same day as all of its past annual meetings.
Referring to
Above the fact loading layer 100 is a fact transformation layer 108, which can operate based on linguistics, semantics, and/or mathematics/statistics. Above the fact transformation layer is relations storage 110, a fact data warehouse 112, and fact in-memory segment 114 (cache), and an inverted future (timelines) module 116. At the next level is a fact modeling and computation engine 118, which can work with prediction, correlation, and probabilities. Layered above the fact modeling and computation engine is a temporal-based fact query language 120. A text search/modeling user interface 122, a graphical user interface framework 124, and an application programming interface/software development kit 126 are all layered over the temporal-based fact query language. Domain-specific applications 128 are in turn layered above these modules.
Examples of domain-specific applications can include:
Referring to
The system can present its results to the user in a variety of formats. It can present them in a simple hit list-based result output, similar to that of a traditional search engine, or it can use a temporally oriented format, such as a timeline. It can also use any other suitable user-oriented or machine-oriented format, such as more elaborate graphical user interfaces, RSS feeds, e-mail alerts, XML documents, or proprietary binary formats. Advertising can be associated with results, and this advertising can be targeted based on the specific facts and/or entities involved.
The system can provide a variety of types of services. A fact-based searching system can be provided for use by the general public or a specific segment. Fully customized, minimally filtered, or even raw fact feed subscriptions can also be provided. And more quantitative searching solutions could be provided, as well, such as for financial services applications.
One type of service is a news service. The service receives a user profile, which allows a user to specify interests. Information about facts relevant to these interests can then be provided to the user in a variety of formats, such as feeds, or an electronic newspaper format.
Mapping facts to temporal information in the database allows the system to answer questions that may be difficult to answer with traditional search engines. Here are some examples:
What will the pollen situation be in Boston next week?
Will terminal five be open next month?
What's happening in New York City this week?
When will movie X be released?
When is the next SARS conference?
When is Pfizer issuing debt next?
Where Will George Bush be next week?
Systems according to the invention can also answer more complex questions about the relationship between facts, such as “what happened to similar entities in similar chains of events?”
Referring to
A software development kit 166 allows developers to iterate facts, perform transformations and predictions, and implement user interface elements. The system can also provide a search/query engine 168 as well as user experience templates 170 and rendering 172 to produce different types of interfaces, such as search, timeline, and newspaper interfaces. RSS feeds 174 can also be generated from the database.
The system described above has been implemented in connection with a special-purpose software program running on a general-purpose computer platform, but it could also be implemented in whole or in part using special-purpose hardware. And while the system can be broken into the series of modules and steps shown in the various figures for illustration purposes, one of ordinary skill in the art would recognize that it is also possible to combine them and/or split them differently to achieve a different breakdown.
The present invention has now been described in connection with a number of specific embodiments thereof. However, numerous modifications which are contemplated as falling within the scope of the present invention should now be apparent to those skilled in the art. It is therefore intended that the scope of the present invention be limited only by the scope of the claims appended hereto. In addition, the order of presentation of the claims should not be construed to limit the scope of any particular term in the claims.
This application is a divisional of U.S. application Ser. No. 12/156,455 filed May 29, 2008, which claims the benefit under 35 U.S.C. 119(e) of U.S. provisional application Ser. No. 61/068,967, filed Mar. 11, 2008 and U.S. provisional application Ser. No. 60/940,643; filed May 29, 2007. This application is also related to another divisional application being filed today and having the same title as this application. All of these related applications are herein incorporated by reference.
Number | Date | Country | |
---|---|---|---|
61068967 | Mar 2008 | US | |
60940643 | May 2007 | US |
Number | Date | Country | |
---|---|---|---|
Parent | 12156455 | May 2008 | US |
Child | 13621154 | US |