The present invention relates generally to the automated generation of documents. The present invention further relates to information “push” systems which provide electronic documents to end users.
The number of personalized information service providers including personalized news providers is growing rapidly. However, the level of personalization presently provided is primitive and typically constrained to the selection of a set of predefined categories and topics by the personalized information service provider.
Current information “push” systems are typically not automated and are limited in scope. Generally a user is required to complete certain portions (or even all) of a given workflow, including such items as: gathering the content; filtering it for applicability; and laying it out. The user does not have a lot of freedom to specify his or her real interests. Furthermore, the provider is generally not using the user's actual experience and behavior in the information consumption process to improve the user experience. Finally, many of the information service providers focus only on web publishing, or email, and thus the print functionality is not easily accessible at a low cost. The resulting documents are thereby necessarily human constructed and so are time consuming and costly to produce, as well as lacking much in the way of personalization.
The current state of the art for information push may be found as characterized in several forms. One such form is typified by “portal” kinds of services such as found on the internet for example at myYahoo.com, where a user can choose certain categories of interest, and decide some things about how that information is laid out. Two examples are shown in
In U.S. Pat. No. 5,754,939 to Herz, herein incorporated by reference in its entirety for its teachings, the invention described relates to customized electronic identification of desirable objects, such as news articles, in an electronic media environment, and in particular to a system that automatically constructs both a “target profile” for each target object in the electronic media based, for example, on the frequency with which each word appears in an article relative to its overall frequency of use in all articles, as well as a “target profile interest summary” for each user, which target profile interest summary describes the user's interest level in various types of target objects. The system then evaluates the target profiles against the users' target profile interest summaries to generate a user-customized rank ordered listing of target objects most likely to be of interest to each user so that the user can select from among these potentially relevant target objects, which were automatically selected by this system from the plethora of target objects that are profiled on the electronic media. Users' target profile interest summaries can be used to efficiently organize the distribution of information in a large scale system consisting of many users interconnected by means of a communication network. Additionally, a cryptographically-based pseudonym proxy server is provided to ensure the privacy of a user's target profile interest summary, by giving the user control over the ability of third parties to access this summary and to identify or contact the user.
Another information push service example is in the area of company newsletters that are collated and sent out to company employees on a regular basis. Most such newsletters are created without an automated process, and are not personalized. A further form example is in the area of web pages with changing content. Services exist where a user can sign up to be notified if a set of web pages they are interested in change in any way. The information about what has changed is then pushed to the subscriber. This information is typically simply a list of changes, but is not supplied as a formatted document synthesizing the information about all of the changes.
So yet again portal-based information service forms such as described above have a limited and existing set of categories that the user must choose from, and a tightly limited layout capability.
Thus, it would be desirable to provide a methodology for personalized information service providers to offer individually personalized customized report documents. These personal report documents being provided with results from a simple query that includes a wide variety of diverse results, including filtering those results against a particular user profile, and for which the diverse content pieces are laid out without human intervention into a user personalized deliverable report document format, the layout also as provided by the user profile. These user personalized report documents need to be less costly to produce, minimize the user time consumed in their setup, and improve the user experience by employing the user's actual responses and behavior in the information consumption process.
Disclosed in embodiments herein is a method for personalized report document generation comprising: profiling user interests into a user profile; querying various data repositories for content matching user interests; filtering the results, returned from the querying step, for scoring and profiling against the user profile for relevant content results; applying automated document layout techniques to the relevant content results to yield a personalized report document; and delivering the personalized report document. Further, tracking of the user's actual usage of the report document and feeding of that usage back into the user profile performed.
Also disclosed in embodiments herein is a method for custom report document generation involving profiling user interests into a user profile and querying various data repositories for content matching those user interests. This is followed by filtering the results, returned from the querying step, against the user profile for relevant content results. Then applying automated document layout techniques to the relevant content results to yield a custom document; and delivering the resultant custom document. The user's actual usage of the report document is tracked and fed back into the user profile.
Further disclosed in embodiments herein is a system for personalized report document generation comprising: a user interface profiler to capture user interests into a user profile; a query module for querying various data repositories for content matching user interests; a content filter for filtering the results returned from the querying step for scoring and profiling against the user profile for relevant content results; an automated document layout module for applying automated document layout techniques to the relevant content results to yield a personalized report document; and a delivery system for delivering the personalized report document to the user, tracking the user's actual usage of the report document and feeding that usage back into the user profile.
The teachings provided herein disclose a method to automatically search for filter, and lay out information content into a personalized report document. Heretofore, there has been no notion of taking a simple web query that returns a wide variety of diverse results, filtering those results against a particular user profile, and laying out the diverse content pieces into a deliverable report document without any human intervention which may further dynamically alter its layout depending upon the delivery medium chosen. As described herein, user can submit a profile containing a description of the kinds of information she is interested in, and the system will then “push” a document out to the user that contains the appropriate content, laid out into a pleasing document design. As will be understood to those skilled in the art, this invention can be applied to many types of information and report documents. However, for the purposes of disclosure, a personal newspaper or news service that may be provided in hardcopy or electronic form has been chosen as but one embodiment to illustrate the claimed teachings.
As depicted in
The user 300 in
By using key words derived from this user profile 303, a meta search engine 310 then searches the news repositories and gives an initial ranking to the results. When so invoked, query is made in one embodiment of various web based providers which may include for example: CNN.com 306, the BBC.com 307 and Reuters.com 308, or any other web based, local area network, wide area network or other connected data repository. In this example instance HTML/NewsML 309 is provided to the content generation module 310. At content collection 311 each of the chosen top results is then condensed into a set of information entities and compared against the pool of information entities stored in the user profile 303 through knowledge profiling technology 312. The most relevant results 316 are chosen and sent after text generation 313 summarization 314 and merging 315, to the automatic layout module 318 and a best layout style is applied via the advanced layout technology in view of layout document model 319. The layout document model 319 derives its parameters from the user profile 303 and the intended publishing delivery mechanism. The produced document is finally published 320 as a PDF 321, HTML 322, or email 323, and sent via digital printing 208, web publishing 206, or email 207 with the ADL adjusting the MyNewsPaper layout to fit each publishing type as appropriate. The entire workflow in this example embodiment is automated via industry stands such as PDF or JDF.
In
The content query module 410 will seek to perform a keyword match against the content of various database repositories 420 (for example Reuters.com) for interesting content and collect results thereby. The responsibility, in this embodiment, of content query module 410, is to locate and identify candidate content to be included in the delivered document, not to select content for inclusion, a requisite result as content query 410 may return the same content across multiple query invocations of the report document system. These query results are then passed to the content filtering module 430 for profiling and scoring against the user profile 400. In one embodiment the content filtering module 430 is implemented by product software as is taught in U.S. Pat. No. 5,754,939, SYSTEM FOR GENERATION OF USER PROFILES FOR A SYSTEM FOR CUSTOMIZED ELECTRONIC IDENTIFICATION OF DESIRABLE OBJECTS; U.S. Patent Publications: US20030069877, SYSTEM FOR AUTOMATICALLY GENERATING QUERIES; US20030061201, SYSTEM FOR PROPAGATING ENRICHMENT BETWEEN DOCUMENTS; US20030033288, DOCUMENT-CENTRIC SYSTEM WITH AUTO-COMPLETION AND AUTO-CORRECTION; US20030033287, META-DOCUMENT MANAGEMENT SYSTEM WITH USER DEFINABLE PERSONALITIES; and EPO Patent Publications, EP1143356A3, META-DOCUMENT AND METHOD OF MANAGING META-DOCUMENTS; which are herein incorporated by reference in their entirety for their teaching.
An alternative approach for content filtering module 430 is implemented by a profile scheme. The profiles considered here concern documents, users, communities and information sources and more generally objects that can each be associated to textual information. The profiles are composed of Atomic Profile Elements (APE). An APE typically contains the most important concepts concerning a document or user interest, or community interest or information covered by an information source. One APE contains only terms of one language but any object associated with textual information in different languages can be profiled by several APE's (one for each language). Please note that the concepts in the APE can be stored as terms with a corresponding weight as in classical vector space model. The concepts can also be represented in a manner of finer granularity as terms, noun phrases, entities, etc. Instead of storing terms independently in vectors, text phrases can also be represented in contextual graphs thus keeping knowledge about relations between words or about possible translations of words. A monolingual document may then be represented by one single APE. A multilingual document may be represented by several APE's one per language used in the document. For more complex entities (user, community, information source), it may be preferable to use several APE's, each describing an aspect of the information of interest. In an integration development environment, there are many applications tracking in a variety of different ways, which textual data is relevant for the entity. Therefore, the profile is structured along those applications. The data of each application which is tracking information about the entity is used to build one part of the profile. One profile part concerning an application can again contain several APEs. Thus, the profile scheme is extensible, as new parts can be added to the profile as soon as there is a new application which is gathering data about the entity. The final profile scheme may then be represented as a tree with APE's at its leaves.
We can illustrate the profile definition with an example user profile. The user is using two applications, a collaborative filtering system and a knowledge-sharing tool capturing an organization-related view of the WWW. The user is in this example a member of the communities “Handhelds” and “Profiles” in the collaborative filtering system. Here both applications, the collaborative filtering system and the knowledge-sharing tool, will gather information about the user. The collaborative filtering system will keep the list of documents that the user submitted to his communities as well as his appreciation (the score) which he gave to the reviewed documents. The knowledge-sharing tool will store the bookmarks for the user. The information gathered by the collaborative filtering system and the knowledge-sharing tool can then be used to deduce the interests of the user. Based on the documents and their score and possibly other available information, we can extract APEs for each collaborative filtering system community the user is active in, and also for the set of documents bookmarked through knowledge-sharing tool. For example, let's say that the user reviewed documents in French and English for the community “Handhelds”. The result then will be two APEs in the user's profile for the community “Handhelds”. One APE extracting the information of interest for the French documents and another for those that are in English.
The content filtering module 430 is responsible for the selection of relevant content results to be included in the delivered document and as such it may use a variety of algorithms and data to make that determination. In particular it may use information about the users interests found in the user profile and historical data about what the user has previously seen and possibly responded to when making that determination. Usage of a weighted scoring algorithm that factors previously viewed content low, updates to previously viewed content high, content that contains keywords used to select previously viewed content moderately high, and content that contains keywords identified in the user profile medium, results in a suitable yet dynamic content set. These results are then in turn passed onto the document layout module 440. In one embodiment the document layout module 440 is implemented by ADL (Automated Document Layout) software as is taught in U.S. Patent Applications Attorney Docket No. A1456-US-NP entitled “CONSTRAINT-OPTIMIZATION SYSTEM AND METHOD FOR DOCUMENT COMPONENT LAYOUT GENERATION”, Patent Application Attorney Docket No. A1583-US-NP entitled “SYSTEM AND METHOD FOR CONSTRAINT-BASED DOCUMENT GENERATION”, Patent Application Attorney Docket No. A1586-US-NP entitled “SYSTEM AND METHOD FOR DYNAMICALLY GENERATING A STYLE SHEET”, Patent Application Attorney Docket No. A1699-US-NP entitled “CASE-BASED SYSTEM AND METHOD FOR GENERATING A CUSTOM DOCUMENT”, as previously cited above and incorporated herein by reference in their entirety. The ADL may be utilized interactively and dynamically so that as user interests are fed back better identifying both reports and advertisements of interest to the user the MyNewsPaper will reflect that feedback both in content and layout as well as delivery service. Once the page layout is complete it is then routed along on its way to the user by the delivery service 450, to print, web browser display, email, etc. If the user interactively changes the desired delivery service (hardcopy of the emailed report) the ADL will dynamically provide the report in the most appropriate layout to accommodate that request.
The teaching provided herein as provided for and discussed above uses automated search, filtering, and layout technologies to provide an end-to-end information push service. As such, it enables complete personalized report documents to be automatically created, thereby reducing cost in existing personalized document workflows, as well as enabling higher value documents to be created to increase consumer satisfaction and knowledge worker productivity.
The claims, as originally presented and as they may be amended, encompass variations, alternatives, modifications, improvements, equivalents, and substantial equivalents of the embodiments and teachings disclosed herein, including those that are presently unforeseen or unappreciated, and that, for example, may arise from applicants/patentees and others.
Attention is directed to commonly owned and assigned co-pending application Ser. Nos.: Patent Application Attorney Docket No. A4048-US-NP entitled “AN INDIVIDUALLY PERSONALIZED CUSTOMIZED REPORT DOCUMENT SYSTEM”; Patent Application Attorney Docket No. A1456-US-NP entitled “CONSTRAINT-OPTIMIZATION SYSTEM AND METHOD FOR DOCUMENT COMPONENT LAYOUT GENERATION”; Patent Application Attorney Docket No. A1583-US-NP entitled “SYSTEM AND METHOD FOR CONSTRAINT-BASED DOCUMENT GENERATION”; Patent Application Attorney Docket No. A1586-US-NP entitled “SYSTEM AND METHOD FOR DYNAMICALLY GENERATING A STYLE SHEET”; Patent Application Attorney Docket No. A1699-US-NP entitled “CASE-BASED SYSTEM AND METHOD FOR GENERATING A CUSTOM DOCUMENT”.