SBIR Phase I: Statistical Inference for Advanced Entity Resolution

Information

  • NSF Award
  • 1143373
Owner
  • Award Id
    1143373
  • Award Effective Date
    1/1/2012 - 13 years ago
  • Award Expiration Date
    12/31/2012 - 12 years ago
  • Award Amount
    $ 179,992.00
  • Award Instrument
    Standard Grant

SBIR Phase I: Statistical Inference for Advanced Entity Resolution

This Small Business Innovation Research (SBIR) Phase I project addresses the problem of integrating information about named entities, such as people, companies, and products, from numerous data sources. Integrating information about entities from multiple sources can be difficult because sources may use different formats and terminology to describe the same entity, a problem referred to as "entity resolution". Most existing commercial enterprise systems rely on rule-based matching techniques for entity resolution. This project investigates statistical learning techniques that allow a system to estimate the probability of a match, rather than computing a score based on ad-hoc rules or weights. Because the approach is based on sound statistical principles and uses evidence compiled from large datasets, it can produce more accurate results than existing methods. Moreover, these advantages are amplified when handling data that that has highly variable, missing or noisy attributes, such as data extracted from Web sites. <br/><br/>The broader impact/commercial potential of this project lies in enabling enterprises to perform more accurate and reliable data integration. The are many potential target markets that need better technology for integrating information about businesses, products, people, locations, and other entities. This capability is critical for some of the nation's largest companies and institutions, from search engines, to the U.S. Intelligence and law enforcement community, to financial institutions. In particular, large enterprises often have difficulty utilizing data extracted from news, foreign language data sources, and social media, because the extracted data is noisy and not-well structured. The technology developed in this project will help enterprises make use of the growing amount of information on the Web, so that they can take advantage of the network of relationships that link people, companies, and other entities to serve their customers better.

  • Program Officer
    Juan E. Figueroa
  • Min Amd Letter Date
    11/29/2011 - 13 years ago
  • Max Amd Letter Date
    6/15/2012 - 12 years ago
  • ARRA Amount

Institutions

  • Name
    InferLink Corporation
  • City
    El Segundo
  • State
    CA
  • Country
    United States
  • Address
    326 Loma Vista Street
  • Postal Code
    902452901
  • Phone Number
    3103839234

Investigators

  • First Name
    Steven
  • Last Name
    Minton
  • Email Address
    steven.n.minton@gmail.com
  • Start Date
    11/29/2011 12:00:00 AM