Method and system for automatic summarization and digest of celebrity news

Information

  • Patent Application
  • 20070174343
  • Publication Number
    20070174343
  • Date Filed
    January 25, 2007
    17 years ago
  • Date Published
    July 26, 2007
    17 years ago
Abstract
A system and method for automatically creating a digest of celebrity information received over an adjustable time range from publicly available electronic data sources by applying automated qualitative and quantitative analytical methods to select information for inclusion in the digest and to convert such information to summary form.
Description

BRIEF DESCRIPTION OF THE DRAWINGS

The above and other features, aspects, and advantages of the present invention are considered in more detail, in relation to the following description of embodiments thereof shown in the accompanying drawings, in which:



FIG. 1 is a block diagram showing database generation according to a first embodiment of the present invention;



FIG. 2 is a block diagram showing information summarization according to a first embodiment of the present invention; and



FIG. 3 is a block diagram showing sentiment analysis according to a first embodiment of the present invention.


Claims
  • 1. A method of summarizing information concerning celebrities, comprising the steps of: establishing a relational database for holding such information wherein such information contains data selected from the group consisting of:name;gender; andage;gathering one or more stories from a plurality of sources;parsing the stories to determine specific indicators of one or more celebrities;storing said stories in said relational database with appropriate tags to enable retrieval by a user of the database;automatically summarizing the stories based on story text for each celebrity; andpresenting said story summaries for viewing by said user.
  • 2. The method of claim 1, wherein said stories are gathered over a global communication network.
  • 3. The method of claim 1, wherein the step of parsing the stories further comprises the steps of: tagging each story based on date, title, and abstract information;matching patterns in the stories to a predetermined list of known celebrities;identifying celebrity names using domain specific terminology;identifying keywords in the story indicative of celebrity information.
  • 4. The method of claim 1, wherein the step of summarizing said stories further comprises the steps of: generating a corpus of data formed by concatenating all story text associated with a selected celebrity;stripping predetermined incidental words from said corpus of data;generating a plurality of bigrams of the remaining text in the corpus of data;performing term frequency-inverse document frequency analysis on the plurality of bigrams;assigning a weight value to each bigram; andselecting one or more sentences from said corpus of data based on the weighted value of said bigrams.
  • 5. The method of claim 1, further comprising the steps of: performing sentiment analysis on said one or more stories.
  • 6. The method of claim 5, further comprising the steps of: generating a corpus of data formed by concatenating all story text associated with a selected celebrity;parsing said corpus of data into parts of speech that can be assigned qualitative values;assigning a qualitative value to each said part of speech;performing term frequency analysis on said corpus of data; andassigning a weighted value to said data based on said qualitative values for each part of speech.
Provisional Applications (1)
Number Date Country
60762083 Jan 2006 US