The field of the invention is analyzing natural language text files for extraction and structuring of meaningful search terms.
Extremely large databases, such as may be used for marketing purpose, are being aggregated at an increasingly rapid pace. These databases are becoming progressively more complex as a result. Such databases may contain thousands of categories of data (i.e., data segments) each concerning hundreds of millions of individuals, households, or other entities that are tracked in the database. These databases are becoming so large and complex in fact that they are expanding beyond the capability of human search and recall capabilities to effectively utilize them. This is because a human curator of such a database is inherently limited to his or her own set of interpretive heuristics which, however intuitive they may be, can never be so comprehensive as to take into account the full richness of such complex databases. In addition, the slow speed at which human curators are able to formulate appropriate searches, combined with the rapidly increasing number of searches that much be completed and the rapid turnaround necessary for the results to be relevant in a constantly evolving, dynamic marketing environment, means that human searches are simply no longer practical because they cannot be completed in a timeframe in which the results will still be relevant.
Marketing databases such as described above are used by data buyers, who seek to purchase data from the marketing database provider for the purpose of directing a marketing message to a particular targeted audience. For example, if the data buyer is a provider of a weight loss service, the data buyer may be interested in identifying an audience of persons who seek to lose weight. Data buyers thus are typically interested in exploring audience segment offerings that are related to specific marketing campaign audience criteria. These criteria are generally listed, in natural-language format, in a request for proposal (RFP) document prepared by the data buyer. The RFP will include terms describing behavior (including buying patterns or “propensities”) that are desired in a particular audience for the data buyer's marketing message, and outline specifications such as age, gender, income, and other demographic requirements of the desired audience.
Because the data buyer is working from an RFP or other natural-language document that sets out the desired audience characteristics, data buyers are required to somehow translate their own natural-language descriptions of a desired target audience into search terms. Two alternatives exist today. First, the data buyer can act for him- or herself to transform the RFP into SQL queries to search the desired marketing database. The other alternative is for the data buyer to send the request to human curators whose task is to then find relevant data segments based on the data buyer's descriptions. Either way, the accuracy of the result depends upon the ability of a human to recall a massive number of constantly evolving data segments and identify the optimal ones for this particular RFP. The inevitable human error means that data buyers are receiving sub-optimal audience results, given the available data in the marketing database that could be used to target the marketing message. In addition, the human element adds greatly to the cost of the process (particularly when a separate human curator is employed). Finally, this also means that the turnaround time for the overall process is highly variable, typically taking hours or days depending on the human curators' familiarity with the database, whether the data is housed in one or more databases, and if there are special permissions associated with the data to be returned. As noted above, this turnaround time has now rendered human-curated searches impractical, because targeted messages must be constructed quickly to be meaningful as the data is constantly evolving and being updated and expanded in the fast-moving, Internet-based marketing environment of today. What is desired then is a faster and more accurate method of translating an RFP or other natural-language document or text identifying a target audience into a request identifying optimal data elements to be searched in a marketing database, where the improved method leads to optimal results from the marketing database and also provides a sufficiently quick turnaround time such that the results are commercially meaningful and actionable.
References mentioned in this background section are not admitted to be prior art with respect to the present invention.
The present invention is directed to a system and method for extracting search terms for corresponding data elements from a natural language description of a desired audience, such as might be found in an RFP from a data buyer. In certain implementations, the invention is directed to a system that identifies particularly meaningful words within the context of a written (or potentially verbal) data order; identifies and structures the keywords; expounds on the keywords to optimize the search results; and captures the most relevant data elements from the marketing database. The system uses transformations to structure the natural language input and generate from it the optimal search terms for the desired data elements. Two general types of information are structured in this process: pre-determined demographic characteristics, and short (one- or two-word) search phrases that capture descriptors of behavioral characteristics. Pre-determined demographic characteristics may include, for example, age, income, gender, and geographic location. The behavioral characteristics may include, for example, a propensity for buying certain categories of products or services. The completed process yields a parameter set naming demographic and behavioral characteristics along with a structure that is optimized for search within a database comprising a large number of data elements.
The invention, in various implementations, may be utilized in a stand-alone fashion to create an optimized search from a document such as an RFP, or may be used as an add-on process to improve results derived from the typical search process whereby a data buyer enters search terms directly.
These and other features, objects and advantages of the present invention will become better understood from a consideration of the following detailed description of the preferred embodiments and appended claims in conjunction with the drawings as described following:
The present invention will be described below with reference to one or more specific implementations; it is understood, however, that these implementations are not limiting to the invention, and the full scope of the invention is as will be set forth in any claims directed to the invention in this or a subsequent application directed to the invention.
Referring now to
Further processing as shown in
The original plain text input is also subject to key search feature extraction wherein specific patterns, words, and descriptors are detected and structured lists of supplementary search information is derived from them. The key search features are cues in the text that are particularly relevant for sorting, scanning, or otherwise searching through the data being ordered including (but not limited to) demographic callouts (i.e., “women ages 21-34”), geolocation recognition (i.e., “mid-westerner”), and entity recognition (i.e., “Weight Watchers” in the previous example input text). If any of the key search features exist in the text set, they are captured and extracted in step 6 and then, if applicable, terms on the key phrase search term list that are comprised of text collected as a key search feature are removed in step 7. With repeated phrases and key search feature-related terms removed, the key phrase search term list, ordered by importance, is structured for and submitted to the search system along with the structured list of key search features if applicable.
After all of the processing described above is performed, processing then moves to step 8, where the structured list of key search features and structured list of key phrase search terms are fed into the marketing database at step 8 in order to extract the corresponding data elements.
Turning now to
The invention may, in various implementations, utilize commercial and open-source libraries, tools, and infrastructure to perform its task. The service may utilize an in-memory cache store for word lookups and word meaning extraction. The process may also use an in-memory processing tool or library to calculate word relationships and context for word association and range or mapping relationships. The tools should provide the programming throughput and speed to perform the calculations and analysis. Important components in certain implementations are the fast cache and in-memory processing tools that provide faster throughput for the service.
In alternative implementations, the invention could be changed to detect many additional demographic, psychographic, or any features directly corresponding with information represented in the marketing database. The method in which feature-related information is detected and stored could be altered to include additional functions beyond string pattern and entity recognition. These changes would be implemented as additional processes at step 6 of the extraction process illustrated in
Python code for one implementation of the routine for search term extraction from the audience request language may be as follows. In this example, demographic features searched include gender, income, age, and presence of children. Bigrams (pairs of consecutive words) of interest are identified and then prioritized.
Unless otherwise stated, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. All terms used herein should be interpreted in the broadest possible manner consistent with the context. When a grouping is used herein, all individual members of the group and all combinations and subcombinations possible of the group are intended to be individually included in the disclosure. All references cited herein are hereby incorporated by reference to the extent that there is no inconsistency with the disclosure of this specification. If a range is expressed herein, such range is intended to encompass and disclose all sub-ranges within that range and all particular points within that range.
The present invention has been described with reference to certain preferred and alternative embodiments that are intended to be exemplary only and not limiting to the full scope of the present invention, as set forth in the appended claims.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2018/054331 | 10/4/2018 | WO |
Publishing Document | Publishing Date | Country | Kind |
---|---|---|---|
WO2019/070954 | 4/11/2019 | WO | A |
Number | Name | Date | Kind |
---|---|---|---|
5265065 | Turtle | Nov 1993 | A |
7634467 | Ryan et al. | Dec 2009 | B2 |
8380745 | Nayfeh | Feb 2013 | B1 |
9020840 | Akolkar et al. | Apr 2015 | B2 |
20150199436 | Bailey | Jul 2015 | A1 |
20160140619 | Soni et al. | May 2016 | A1 |
20160147760 | N et al. | May 2016 | A1 |
20160171088 | Johnson | Jun 2016 | A1 |
Entry |
---|
Andrade, “Enterprise Reference Lexicon Building from Business Models,” run.unl.pt/bitstream/10362/14256/1/Andrade_2014.pdf (Sep. 2014). |
Wedel et al., “Marketing Analytics for Data-Rich Environments,” rhsmith.umd.edu/files/Documents/Departments/kannan-jm-2016-final.pdf (Nov. 30, 2016). |
Number | Date | Country | |
---|---|---|---|
20200285972 A1 | Sep 2020 | US |
Number | Date | Country | |
---|---|---|---|
62568374 | Oct 2017 | US |