The present disclosure relates to data analysis. Particularly, the present disclosure relates to identifying and classifying text in a database. More particularly, the present disclosure relates to the use of values, categories, core trends, concepts, and clusters for evaluating a plurality of data records to present a comprehensive view of the data.
The background description provided herein is for the purpose of generally presenting the context of the disclosure. Work of the presently named inventors, to the extent it is described in this background section, as well as aspects of the description that may not otherwise qualify as prior art at the time of filing, are neither expressly nor impliedly admitted as prior art against the present disclosure.
Consumers, marketers, advertisers, content managers, other businesspersons, and internet users regularly search the internet, or other databases, for information related to a particular search topic using a search engine. Depending on the search engine used, the search engine will return the most relevant documents or links related to the search topic entered. These documents or links can take the form of various information sources such as articles, blogs, videos, tweets, status updates, social media posts, webpages, and other sources of data and information. The search engine may display relevant documents in some order, but these documents are not organized in a way that an internet user such as a consumer or marketer can readily ascertain meaning. Instead, the internet user must open or otherwise access these documents or links and read the information to uncover their meaning. The user can then repeat this process to develop a more comprehensive meaning of all of the relevant documents from the search. While this manual process of searching and reading can result in a more comprehensive understanding of the topic of interest, it is not optimal.
One way that search engines have attempted to categorize documents is by creating tag clouds or word clouds based on search results or trending topics. Tag clouds are a way to visually represent data from a document or a set of documents. Tag clouds use semantic technology to identify the most pertinent words in that document or set of documents. The most pertinent words are then displayed within the tag cloud in different font sizes based on their relevance. In this case, it tells the user which words are most relevant by putting those words that appear most frequently in larger text than the words surrounding it. Tag clouds therefore may provide information about the search topic, but they do not allow the user to derive meaning from the search. Further, tag clouds do not allow the user to define what meaning they are looking to derive from the search.
Additionally, companies routinely review and analyze customer data, product review data, social media data, company review data, and/or other data related to the company. In some cases, this company data may be reviewed or analyzed by reviewing each data entry or item individually. However, individual review may require a relatively substantial amount of time. Moreover, in order to obtain a comprehensive understanding of the company data, each data entry or item would need to be individually read and analyzed.
Thus, there is a need in the art for search systems and methods and data analysis systems and methods that capture the meaning of documents and data, and convey that deeper understanding to the user. More particularly, there is a need for systems and methods that quantify meaning from data or search results and display it in a usable way to, for example, consumers, marketers, and internet users.
The following presents a simplified summary of one or more embodiments of the present disclosure in order to provide a basic understanding of such embodiments. This summary is not an extensive overview of all contemplated embodiments, and is intended to neither identify key or critical elements of all embodiments, nor delineate the scope of any or all embodiments.
The present disclosure, in one embodiment, relates to a system for analyzing and presenting data. The system may include a mapping engine in communication with an information database storing a plurality of records. The mapping engine may create a map defining a value, category, or core trend from at least a first record by applying a text mining tool to the first record. The mapping engine may determine whether a second record relates to the value, category, or core trend by comparing the second record to the map using a semantics analysis. Moreover, the mapping engine may assign a score to the second record based on the semantics analysis. The score may signal how well the second record relates to the value, category, or core trend. The mapping engine may additionally associate the second record with the value, category, or core trend based on the score. In some embodiments, the system may additionally have an analysis module in data communication with the mapping engine. The analysis module may create graphic visualizations of records associated with the value, category, or core trend, and may present the graphic visualizations via a user interface. In some embodiments, associating the second record with the value, category, or core trend may include comparing the score to a threshold, and, if the score meets or exceeds the threshold, associating the second record with the value, category, or core trend. The system may have a custom collections database in some embodiments. The custom collections database may have a plurality of records from which the mapping engine creates the map. The custom collections database may additionally have a values library, a categories library, and a core trends library. In some embodiments, the mapping engine may store the map in the custom collections database. In some embodiments, the system may have a clustering engine in data communication with the information database. The clustering engine may derive a concept and/or cluster from the second document. The mapping engine may create a map for each of a plurality of values, a plurality of categories, and a plurality of core trends, in some embodiments. The user interface may be configured to receive a search term entered by a user for searching the information database. Moreover, the second record may be a search result from a search of the information database in some embodiments.
The present disclosure, in another embodiment, relates to a method for retrieving and interpreting search results from a search of an information database storing a plurality of records. The method may include creating a map defining a value, category, or core trend, receiving a search term at a user interface, searching the information database for the search term to generate a plurality of search results, determining whether the search results relate to the value, category, or core trend by comparing each of the search results to the map using a semantics analysis. Moreover, the method may include assigning a score to each search result based on the semantics analysis. The score may signal how well the search result relates to the value, category, or core trend. The method may include associating at least one of the search results with the value, category, or core trend based on the assigned scores. The method may additionally include presenting the search results, at the user interface, based on associated concepts, clusters, values, categories, and/or core trends. The step of creating a map may include applying a text mining tool to a plurality of records from the information database. Moreover, the method may include deriving concepts and/or clusters from the search results using a clustering engine. In some embodiments, the method may include storing the map in a custom collections database comprising a plurality of records from which the mapping engine creates the map. The custom collections database may additionally have a values library, a categories library, and a core trends library. In some embodiments, the method may include creating a map for each of a plurality of values, a plurality of categories, and a plurality of core trends.
The present disclosure, in another embodiment, relates to a method of analyzing a plurality of data records. The method may include receiving the plurality of data records, creating a map defining a value, category, or core trend, and determining whether the data record relates to the value, category, or core trend by comparing each data record to the map using a semantics analysis. Moreover, the method may include assigning a score to each data record based on the semantics analysis. The score may signal how well the data record relates to the value, category, or core trend. The method may include associating at least one of the data records with the value, category or core trend based on the assigned scores. The method may additionally include presenting the data records, at the user interface, based on associated concepts, clusters, values, categories, and/or core trends. The step of creating a map may include applying a text mining tool to a plurality of data records. In some embodiments, the method may include deriving concepts and/or clusters from the data records using a clustering engine. The method may include storing the map in a custom collections database storing a plurality of records from which the mapping engine creates the map. In some embodiments, the method may include creating a map for each of a plurality of values, a plurality of categories, and a plurality of core trends.
While multiple embodiments are disclosed, still other embodiments of the present disclosure will become apparent to those skilled in the art from the following detailed description, which shows and describes illustrative embodiments of the invention. As will be realized, the various embodiments of the present disclosure are capable of modifications in various obvious aspects, all without departing from the spirit and scope of the present disclosure. Accordingly, the drawings and detailed description are to be regarded as illustrative in nature and not restrictive.
While the specification concludes with claims particularly pointing out and distinctly claiming the subject matter that is regarded as forming the various embodiments of the present disclosure, it is believed that the invention will be better understood from the following description taken in conjunction with Figures that are displayed within the text below of this provisional application.
The present disclosure relates to novel and advantageous systems and methods for analyzing data. Particularly, the present disclosure relates to systems and methods for analyzing a plurality of data records to provide a comprehensive understanding of the data. For example, in some embodiments, systems and methods of the present disclosure relate to searching one or more public or private databases based on a user's search term(s). The results from the search may be analyzed to determine values, categories, and/or core trends within the search results. The values, categories, and/or core trends may be pre-defined, and in some embodiments may be customized based on user needs. The search results may additionally or alternatively be analyzed to determine concepts and/or clusters present within the search results. The search results may be grouped or organized based on the values, categories, core trends, clusters, and/or concepts, and may be presented to a user via a user interface. The user interface may provide a comprehensive understanding of the search results, by graphically illustrating the search results according to the values, categories, core trends, concepts, and/or clusters. Additionally, in some embodiments, systems and methods of the present disclosure relate to analyzing public or private company data, and tracking or monitoring the data over time. For example, customer reviews, complaints, social media posts, or other data related to a company or product may be analyzed to determine values, categories, core trends, concepts, and/or clusters. One or more databases may be monitored, such that new reviews, complaints, posts, or other data entries may be received and analyzed in real time, substantially real time, periodically, or at any other interval. A user dashboard may provide a graphic display of real time analysis of the company data, according to values, categories, core trends, concepts, and/or clusters. As used herein, “real time” may generally refer to at or substantially near the same time. For example, where data or search results are received for analysis, the data or search results may be analyzed, and results of the analysis may be made available to a user, substantially near the time the data or search results are received
Turning now to
The user interface 102 may generally allow a user to access the various components of the system 100. The user interface 102 may allow a user to input information, as well as view and modify information. The user interface 102 may be provided via a desktop computer, laptop computer, tablet computer, smartphone, or any other suitable computing device. In some embodiments, the user interface 102 may be a website. In other embodiments, the user interface 102 may be a computer program. In some embodiments, the user interface 102 may provide a login page wherein a user may input a username and password, for example, to log into the system 100. In some embodiments, analyses performed by the system 100 may be conducted based on information input by a user at the user interface 102. For example, a user may input a search term for searching the one or more information databases 104 via the user interface 102. Moreover, the results of analyses performed by the system 100 may be displayed at the user interface 102. For example, the user interface 102 may provide a real time dashboard showing up-to-date analyses performed by the system 100.
The information database(s) 104 may generally comprise data that is accessible and/or searchable by the mapping engine 108, clustering engine 110, and/or analysis module 112. One or more information databases 104 may comprise public data, such as articles, blogs, videos, webpages, social media posts, government data, trusted data collections, and/or websites. Additionally or alternatively, one or more information databases 104 may comprise client data, which may include client-specific data including marketing data, product information, or research and development information, which may or may not be private data. Each information database 104 may be local or remote to other system components.
The custom collection database 106 may comprise stored data, such as articles, documents, webpages, social media posts, etc., or extrapolated data therefrom, for at least one of a search topic, a core trend, a category, a value, or a concept. The documents and data in the custom collection database 106 may be derived from the one or more information databases 104. The custom collection database 106 may collect and store relevant data or documents, from the one or more information databases 104, for each core trend, value, category, or concept. For example, the system 100 may collect the 1,000, or other suitable number, data entries or documents most related to a particular value, and then store those relevant entries or documents for the particular value within the custom collection database 106. In at least one embodiment, the custom collection database 106 may have one or more relevant data entries or documents for each core trend, value, category, and/or concept of the mapping engine 108. In some embodiments, the documents or data stored for each trend, value, category, or concept may be selected manually or partially manually by a user, for example. In other embodiments, the documents or data stored for each trend, value, category, or concept may be selected automatically by, for example, the mapping engine 108. In some embodiments, the system 100 may continually, periodically, or randomly search the one or more information databases 104 to change or increase the stored data in the custom collection database 106. This stored data or documents within the custom collection database 106 may be retrievable by at least the mapping engine 108, as described below.
The mapping engine 108 may generally be configured to determine correlations between data in the information database(s) 104 and one or more values, one or more categories, and/or one or more core trends.
A “value” may be defined in some embodiments as a criterion of importance or significance (for example, authenticity, safety, fear, power, freshness, health, etc.). Values may have positive, negative, or neutral connotations. Values may include for example, but are not limited to, Access, Achievement, Adventure, Affluence, Ambition, Aspiration, Assistance, Authenticity, Balance, Beauty, Belief, Belonging, Celebration, Challenge, Change, Choice, Civility, Comfort, Commitment, Community, Compassion, Competition, Confidence, Connectivity, Conservation, Contentment, Control, Convenience, Cool, Courage, Creativity, Curiosity, Deliciousness, Dependability, Design, Desire, Detail, Devotion, Dignity, Discipline, Discovery, Diversity, Efficiency, Empowerment, Endurance, Energy, Entertainment, Equality, Escape, Excellence, Exclusivity, Experience, Expertise, Faith, Family, Fantasy, Fear, Fitness, Fragility, Freedom, Fresh, Friendship, Fun, Future, Growth, Happiness, Harmony, Health, Honesty, Honor, Hope, Idealism, Identity, Image, Independence, Individuality, Indulgence, Information, Ingenuity, Innovation, Inspiration, Integrity, Intelligence, Intimacy, Intuition, Joy, Justice, Knowledge, Leadership, Learning, Legacy, Logic, Love, Loyalty, Luck, Luxury, Maturity, Nature, Natural, Nostalgia, Novelty, Optimism, Order, Originality, Passion, Patience, Patriotism, Peace, Performance, Personalization, Pleasure, Populism, Power, Prevention, Pride, Privacy, Prosperity, Purity, Quality, Relaxation, Respect, Responsibility, Romance, Safety, Security, Sensuality, Serenity, Sexuality, Sharing, Simplicity, Speed, Spirituality, Stability, Status, Stealth, Strength, Style, Subversion, Success, Sustainability, Teamwork, Thrift, Thrill, Transparency, Trust, Truth, Uniqueness, Unity, Value, Vitality, Wealth, Wellness, Whimsy, Wisdom, and Wonder.
A “category” may be defined as a classification of similar things (for example, sports, health & beauty, fashion, home furnishings, automotive, manufacturing). Categories may include for example, but are not limited to, categories used or defined by the standard categories of the Open Directory Listing Project (ODLP), subsets thereof, or any other defined category of textual meaning. In some embodiments, categories may be derived from the Bureau of Labor Statistics (BLS). A category in this system 100 may also refer to sub-categories within a broader defined category. The system 100 may also comprise tiered categories of multiple categories, wherein each category has a different level of scope or importance. In some embodiments, a category may also be a value (for example, safety). The mapping engine 108 may define a finite number of values and/or categories. In at least one embodiment, the mapping engine 108 may be in data communication with the user interface 102 such that the user may select the at least one value and/or category that the user wishes to measure or compare with respect to the data.
A “core trend” may be societal or cultural metrics that may help to indicate where a document or data entry coincides with one or more forces of the consumer or cultural landscape. For example, core trends may include, but are not limited to society, technology, economy, environment, and politics (i.e., “STEEP”). In other embodiments, core trends may be any other suitable societal or cultural metrics. Core trends may be selectable by a user in some embodiments.
In some embodiments, the mapping engine 108 may have a categories library, a values library, and/or a core trends library. For example, a categories library may comprise a number of categories, which in some embodiments may be derived from categories used or defined by the standard categories of the Open Directory Listing Project (ODLP), any other suitable category listing, or subsets thereof. In some embodiments, the system 100 may use an algorithm such as Naïve Bayes Classifier or another algorithm to assist with categorization. The custom collection database 106 may include multiple data entries (documents, articles, social media posts, and other data) associated for each category of the categories library. For example, the custom collection database 106 may comprise up to 500, or other suitable number, or more relevant documents for each category of the categories library. Using a text mining tool such as open source NLP utility LingPipe, Apache Mahout, Mallet, or similar tool, the mapping engine may create a map for one or more categories in the category library. A map may generally be or include an association between a category and documents in the categories library. In one embodiment, the map for a category may include the particular documents in the categories library that correspond with, or have been found via the text mining tool or another tool to relate to, the category. In another embodiment, the map for a category may include a numerical value or score, or a plurality of numerical values or scores, which may be produced by one or more algorithms. The maps created for each category may be stored in the custom collection database 106. In at least one embodiment, the mapping engine 108 has at least one stored map in the custom collection database 106 for each category.
In at least one embodiment, a Categories library may be pre-loaded with a number of categories, which may have up to three tiered levels or sub-levels: a broad category, zero, one, or more sub-categories of each broad category, and zero, one, or more sub-categories of each sub-category. In one example embodiment, the Categories library may have up to 755, up to 770 categories, or more categories. The system 100 may then retrieve, for example, as many as up to 500 or more relevant results for each of the categories of the Categories library and store these category results in the customs collection database 106. Using an NLP utility such as Apache Open NLP, LingPipe, Apache Mahout, Mallet, or other tool, a map of the data is created that characterizes the content for each category and then stores that category map in the customs collection database 106.
The mapping engine 108 in some embodiments may similarly comprise a values library comprising a number of values, and may map the values to documents in the custom collection database 106. The custom collection database 106 may include stored articles, documents, etc., or other data associated for each value of the values library. For example, the custom collection database 106 may comprise up to 500, or other suitable number, or more relevant documents for each value of the categories values. Using a text mining tool such as Apache Open NLP, LingPipe, Apache Mahout, Mallet, or similar tool, the mapping engine 108 may create a value map for each value to the data in the custom collection database 106. A map may generally be or include an association between a value and documents in the values library. In one embodiment, the map for a value may include the particular documents in the values library that correspond with, or have been found via the text mining tool or another tool to relate to, the value. In another embodiment, the map for a value may include a numerical value or score, or a plurality of numerical values or scores, which may be produced by one or more algorithms. In at least one embodiment, the mapping engine 108 has at least one stored value map in the custom collections database for each value.
In some embodiments, the mapping engine 108 may similarly have a core trends library comprising a number of core trends and a number of values and/or categories associated with each core trend. For example, each of the STEEP core trends may be associated with a number of values. The mapping engine may additionally map the core trends to documents in the custom collection database 106. For example, the custom collection database 106 may comprise up to 500, or other suitable number, or more relevant documents for each value associated with at least one of the core trends. Using a text mining tool such as open source NLP utility LingPipe, Apache Mahout, Mallet, or similar tool, a core trend map can be generated from the data for each core trend and then stored. A map may generally be or include an association between a core trend and documents in the core trends library. In one embodiment, the map for a core trend may include the particular documents in the core trends library that correspond with, or have been found via the text mining tool or another tool to relate to, the core trend. In another embodiment, the map for a core trend may include a numerical value or score, or a plurality of numerical values or scores, which may be produced by one or more algorithms. In at least one embodiment, the mapping engine 108 has stored core trend maps in the custom collection database 106 for each core trend.
The mapping engine 108 may additionally be configured to map documents or data entries in the one or more information databases 104. Specifically, in some embodiments, the mapping engine 108 may be configured to determine one or more scores for one or more data entries or documents in the information database(s) 104. A score may be determined for a particular data entry or document by comparing the document to a value, category, or core trend map provided by the mapping engine 108. In some embodiments, a score may provide an indication of the degree of correlation between a value, category, or core trend, and a data entry or document in a database. For example, a score may be or include a numerical value in some embodiments, signifying or quantifying the degree or amount of correlation between the value, category, or core trend, and the data entry or document. Each data entry or document may correspond with a plurality of scores for a plurality of values, categories, or core trends. In some embodiments, the score may be determined based on a textual analysis and/or linguistic technology used to help quantify the text or meaning in a document or data entry. A semantic analysis may be performed to compare each data entry or document to each value, category, and core trend map. In at least one embodiment of this system 100, however, the Metadata, the content data, or any of the data (e.g., documents, articles, social media posts, etc.) in the information database(s) 104 may be compared against and/or matched with the documents or maps stored in the custom collection database 106. This matching can be done with any existing linguistic technology. By matching the data entries or documents in the one or more databases 104 to the stored documents or maps in the custom collection database 106 via the mapping engine 108, it can be determined which values, categories, and core trends are most or least relevant to each data entry or document in the information database(s).
In some embodiments, the system 100 may additionally include a clustering engine 110, such as Carrot2, a clustering engine that uses the Lingo or STC Clustering Algorithm, LingPipe, another open source clustering engine, or any other suitable clustering engine. The clustering engine 110 may be configured to determine concepts and/or clusters present in the data entries or documents in the information database(s) 104. In one embodiment, “concepts” are words or phrases that identify the prominent subject represented in a data entry or document, such as an article, blog, social media post, etc. A “cluster” may be a collection of words that appear frequently across a plurality of documents or data entries.
The analysis module 112 may generally compile and/or analyze information received from the mapping engine 108 and clustering engine 110. In some embodiments, the analysis module 112 may provide an output, such as a display or a report, to present data from the informational database(s) 104 to a user according to the values, categories, core trends, concepts, and/or clusters. In some embodiments, the analysis module 112 may provide relevant document or data entries, or links thereto, from the information database(s) 104. As indicated above, for a given set of data or search results the analysis module 112 can identify and display top matching categories, top matching values, and matching core trends for the search results as a whole. In further embodiments, relationships may be created or analyzed between the resulting categories, values, and/or core trends. For example only, after identifying the top matching categories, the analysis module may identify the data entries, search results, documents, etc. that match the top matching categories or a subset thereof. Then, the analysis module 112 can identify the values and/or core trends associated with those entries or documents to create or identify relationships or associations between the categories and the values and/or core trends, thereby using the data entries or documents to link categories, values, and/or core trends and see how these are aligned with one another. The foregoing is just one example. Of course this type of relationship analysis could alternatively start with the data entries or documents that match the top matching values or a subset thereof (or that match one or more specific core trends). Then, the analysis module 112 can identify the categories and/or core trends (or categories and/or values) associated with those data entries or documents to create or identify relationships or associations between the values and the categories and/or core trends (or between the core trends and the categories and/or values).
Systems of the present disclosure may be configured to perform one or more methods. For example,
As indicated above, the method 500 may include developing value, category, and/or core trend maps 502. As described above, values, categories, and/or core trends may be words or phrases that may convey an idea, feeling, or other parameter. The maps may be developed based on documents in the custom collections database, for example, which may be selected based on their relevance to a particular value, category, or core trend. In some embodiments, a map may include a series of words, phrases, or semantic combinations that correlate with the particular value, category, or core trend. In some embodiments, a map may be developed for each value, category, and core trend. The values, categories, and or core trends for which maps are created may be predefined in some embodiments. The values, categories, and/or core trends may be defined by one or more standards, or may be defined by a user or a client, as indicated above.
The method 500 may additionally include receiving a search term 504. The search term may be received from a user, for example. A user may enter or select a search term via a user interface. In other embodiments, a search term may be received from a different source, or may be automatically generated, for example, based on one or more parameters or conditions. The search term may be a word or phrase, for example. Moreover, in some embodiments, multiple search terms may be received and searched simultaneously.
Turning back to
The method 500 may include deriving concepts and/or clusters from the search results 508. As described above, for example, a clustering tool such as Carrot2, a clustering engine that uses the Lingo and STC Clustering Algorithms, LingPipe, another open source clustering engine, or any other suitable clustering tool may be used to derive concepts and clusters from the search results. In some embodiments, the search results may be organized or grouped by clusters and/or concepts.
The method 500 may include comparing the search results to the value, category, and/or core trend maps to determine correlation between the search results and the values, categories, and core trends 510. As indicated above, a mapping engine, for example, may compare each search result to each value, category, and core trend map. Semantics tools may be used to compare the search results to the maps to determine a correlation between each search result and each map.
In some embodiments, comparing the search results to the value, category, and/or core trend maps may include assigning a score to each search result with respect to each value, category, and/or core trend. For example, each document may be assigned a score with respect to the value of energy. The score may be determined by comparing each document with the map for the value of energy. Algorithms such as, but not limited to, Naïve Bayes may be used to compare the document with the map to produce a numbered score, for example. The score may signal how strongly the document correlates with the value of energy. In some embodiments, a higher score may signal a stronger correlation with the value, category, or core trend, whereas a lower score may signal a weaker correlation with the value, category, or core trend. In one embodiment, if there is no evidence of a correlation between a particular search result and a particular value, category, or core trend, the search result may receive a score of zero for that particular value, category, or core trend. In another embodiment, a clear correlation between a document and a map may produce a score of zero, and lesser correlations or a lack of correlation may result in negative scores. Each search result or document may thus be assigned multiple scores, one for each value, category, and/or core trend. In some embodiments, a threshold may be applied to the value, category, and/or core trend scores for the search results. A threshold may determine whether each search result will be presented as associated with a particular value, category, or core trend. If a document meets or exceeds the threshold for the value of energy, for example, the document may be associated with the value of energy. In some embodiments, the scores assigned to a search result or document may be ordered, such that the value, category, and/or core trend having the highest correlation to the document may be selected as a predominant or most relevant value, category, or core trend for the search result.
In at least one embodiment, each relevant result may be limited to matching with a maximum number of values, categories, and/or core trends, which may be a pre-selected or pre-configured number, and may be configurable by a user. For example, in one embodiment, each relevant result may be matched with a maximum of five categories and five values. For even further example, where the relevant results from a user's search amount to 100 results and the maximum number of categories that each result may be matched with is five categories, there will be a maximum of 500 category data points for the user's search result. Once matched, the categories analysis can be displayed visually in various forms that show the relative importance or ranking of the categories matched with the search results.
The method 500 may additionally include presenting the search results to a user 512. In some embodiments, the search results may be presented via a dynamic, interactive user interface. In other embodiments, the search results may be presented via a report or other output.
In some embodiments, systems and methods described herein may comprise a tracking feature that allows the user to set up an alert for additional information relating to the documents or data entries. A user may select a specific search topic for tracking and the system may continually, periodically, or at other suitable interval, retrieve new search results for the search function and store the search results into a tracking database. The mapping engine and analysis module may update periodically, or at another suitable interval, to reflect the new search results. In at least one embodiment the analytics module may compare the analysis with the new search results. If the analysis changes, the system may send the user an alert or otherwise flag the changed analysis. In at least one embodiment the analysis module may compare a score for the search term with a new score for the search term. If the score changes, the system may send the user an alert or otherwise flag the changed score. As a particular example, if a top scoring or most relevant value for a search term changes over time, as evidenced by the tracking, an alert may be sent. When the user enters the system, the user may view, through the user interface, displayed information regarding the changes and retrieve the supporting data, which the user may access.
As indicated above, the method 2000 may include receiving data 2002. The data may be private or public company data in some embodiments. Moreover, the data may include a plurality of data entries or records. For example, the data may be provided as a database of reviews, complaints, social media posts, or other data related to a company or a company's product. The data may be provided by the company to which it pertains, or it may be publicly accessible or retrievable. In some embodiments, the data may be received from multiple databases, such as multiple social media databases, for example. In some embodiments, the data may consist of historical data, such as historical reviews, complaints, social media posts, or other data related to a company or a company's product.
The method 2000 may additionally include developing value, category, and/or core trend maps 2004. As described above, values, categories, and/or core trends may be words or phrases that may convey an idea, feeling, or other parameter. The maps may be developed based on documents in a custom collections database, for example, which may be selected based on their relevance to a particular value, category, or core trend. In some embodiments, a map may include a series of words, phrases, or semantic combinations that correlate with the particular value, category, or core trend. In some embodiments, a map may be developed for each value, category, and core trend. The values, categories, and or core trends for which maps are created may be predefined in some embodiments. The values, categories, and/or core trends may be defined by one or more standards, or may be defined by a company, for example. In one particular example, a company may wish to organize customer complaints by responding department, and thus may define categories as “information technology,” “billing,” and/or other departments. By evaluating the complaints received in the received data, maps may be developed for each of these custom categories. In some embodiments, the maps may be created based on a portion of the received data. For example, where the received data contains 1000+ customer complaints, the data maps may be developed based on an analysis of 100, or another suitable number of, customer complaints.
The method 2000 may additionally include deriving concepts and/or clusters from the received data 2006. As described above, for example, a clustering tool such as Carrot2, a clustering engine that uses the Lingo Clustering Algorithm, LingPipe, another open source clustering engine, or any other suitable clustering tool may be used to derive concepts and clusters from the received data. In some embodiments, the data may be organized or grouped by clusters and/or concepts.
In some embodiments, presenting the data may include comparing each of the data entries or records in the received data to the developed value or category maps 2008 to determine correlation between the data and the values, categories, and core trends. As indicated above, a mapping engine, for example, may compare each data entries or documents to each value, category, and core trend map. Semantics tools may be used to compare the documents or data entries to the maps to determine a correlation between each data and each map.
In some embodiments, similar to step 510 described above with respect to
In at least one embodiment, each relevant document or data entry may be limited to matching with a maximum number of values, categories, and/or core trends, which may be a pre-selected or pre-configured number, and may be configurable by a user. For example, in one embodiment, each document or data entry may be matched with a maximum of five categories and five values.
The method 2000 may additionally include presenting the received data in terms of values, categories, concepts, and/or clusters 2010. The data may generally be presented according to the values, categories, clusters, and/or concepts. In some embodiments, the data may be presented via a dynamic, interactive user interface, such as via a dashboard view. In some embodiments, any or all of the outputs described above with respect to
Systems and methods of the present disclosure may generally allow a user to view and understand data in a more comprehensive manner. For example, the user may have the ability to view common themes or trends among data. The analyses described herein may additionally allow a user to view changes to data over time. By comparing values, categories, clusters, concepts, and/or other data, such as BLS spending data, a user may have the ability to see new market opportunities, opportunities for better customer relationships or understanding, and other information. In general, the systems and methods of the present disclosure may allow a user to develop a comprehensive understanding a large quantity of data or data entries in an efficient and effective manner, without the need to read, review, or analyze individual data entries manually.
For purposes of this disclosure, any system described herein may include any instrumentality or aggregate of instrumentalities operable to compute, calculate, determine, classify, process, transmit, receive, retrieve, originate, switch, store, display, communicate, manifest, detect, record, reproduce, handle, or utilize any form of information, intelligence, or data for business, scientific, control, or other purposes. For example, a system or any portion thereof may be a minicomputer, mainframe computer, personal computer (e.g., desktop or laptop), tablet computer, mobile device (e.g., personal digital assistant (PDA) or smart phone) or other hand-held computing device, server (e.g., blade server or rack server), a network storage device, or any other suitable device or combination of devices and may vary in size, shape, performance, functionality, and price. A system may include volatile memory (e.g., random access memory (RAM)), one or more processing resources such as a central processing unit (CPU) or hardware or software control logic, ROM, and/or other types of nonvolatile memory (e.g., EPROM, EEPROM, etc.). A basic input/output system (BIOS) can be stored in the non-volatile memory (e.g., ROM), and may include basic routines facilitating communication of data and signals between components within the system. The volatile memory may additionally include a high-speed RAM, such as static RAM for caching data.
Additional components of a system may include one or more disk drives or one or more mass storage devices, one or more network ports for communicating with external devices as well as various input and output (I/O) devices, such as a keyboard, a mouse, touchscreen and/or a video display. Mass storage devices may include, but are not limited to, a hard disk drive, floppy disk drive, CD-ROM drive, smart drive, flash drive, or other types of non-volatile data storage, a plurality of storage devices, a storage subsystem, or any combination of storage devices. A storage interface may be provided for interfacing with mass storage devices, for example, a storage subsystem. The storage interface may include any suitable interface technology, such as EIDE, ATA, SATA, and IEEE 1394. A system may include what is referred to as a user interface for interacting with the system, which may generally include a display, mouse or other cursor control device, keyboard, button, touchpad, touch screen, stylus, remote control (such as an infrared remote control), microphone, camera, video recorder, gesture systems (e.g., eye movement, head movement, etc.), speaker, LED, light, joystick, game pad, switch, buzzer, bell, and/or other user input/output device for communicating with one or more users or for entering information into the system. These and other devices for interacting with the system may be connected to the system through I/O device interface(s) via a system bus, but can be connected by other interfaces such as a parallel port, IEEE 1394 serial port, a game port, a USB port, an IR interface, etc. Output devices may include any type of device for presenting information to a user, including but not limited to, a computer monitor, flat-screen display, or other visual display, a printer, and/or speakers or any other device for providing information in audio form, such as a telephone, a plurality of output devices, or any combination of output devices.
A system may also include one or more buses operable to transmit communications between the various hardware components. A system bus may be any of several types of bus structure that can further interconnect, for example, to a memory bus (with or without a memory controller) and/or a peripheral bus (e.g., PCI, PCIe, AGP, LPC, etc.) using any of a variety of commercially available bus architectures.
One or more programs or applications, such as a web browser and/or other executable applications, may be stored in one or more of the system data storage devices. Generally, programs may include routines, methods, data structures, other software components, etc., that perform particular tasks or implement particular abstract data types. Programs or applications may be loaded in part or in whole into a main memory or processor during execution by the processor. One or more processors may execute applications or programs to run systems or methods of the present disclosure, or portions thereof, stored as executable programs or program code in the memory, or received from the Internet or other network. Any commercial or freeware web browser or other application capable of retrieving content from a network and displaying pages or screens may be used. In some embodiments, a customized application may be used to access, display, and update information. A user may interact with the system, programs, and data stored thereon or accessible thereto using any one or more of the input and output devices described above.
A system of the present disclosure can operate in a networked environment using logical connections via a wired and/or wireless communications subsystem to one or more networks and/or other computers. Other computers can include, but are not limited to, workstations, servers, routers, personal computers, microprocessor-based entertainment appliances, peer devices, or other common network nodes, and may generally include many or all of the elements described above. Logical connections may include wired and/or wireless connectivity to a local area network (LAN), a wide area network (WAN), hotspot, a global communications network, such as the Internet, and so on. The system may be operable to communicate with wired and/or wireless devices or other processing entities using, for example, radio technologies, such as the IEEE 802.xx family of standards, and includes at least Wi-Fi (wireless fidelity), WiMax, and Bluetooth wireless technologies. Communications can be made via a predefined structure as with a conventional network or via an ad hoc communication between at least two devices.
Hardware and software components of the present disclosure, as discussed herein, may be integral portions of a single computer or server or may be connected parts of a computer network. The hardware and software components may be located within a single location or, in other embodiments, portions of the hardware and software components may be divided among a plurality of locations and connected directly or through a global computer information network, such as the Internet. Accordingly, aspects of the various embodiments of the present disclosure can be practiced in distributed computing environments where certain tasks are performed by remote processing devices that are linked through a communications network. In such a distributed computing environment, program modules may be located in local and/or remote storage and/or memory systems.
As will be appreciated by one of skill in the art, the various embodiments of the present disclosure may be embodied as a method (including, for example, a computer-implemented process, a business process, and/or any other process), apparatus (including, for example, a system, machine, device, computer program product, and/or the like), or a combination of the foregoing. Accordingly, embodiments of the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, middleware, microcode, hardware description languages, etc.), or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present disclosure may take the form of a computer program product on a computer-readable medium or computer-readable storage medium, having computer-executable program code embodied in the medium, that define processes or methods described herein. A processor or processors may perform the necessary tasks defined by the computer-executable program code. Computer-executable program code for carrying out operations of embodiments of the present disclosure may be written in an object oriented, scripted or unscripted programming language such as Java, Perl, PHP, Visual Basic, Smalltalk, C++, or the like. However, the computer program code for carrying out operations of embodiments of the present disclosure may also be written in conventional procedural programming languages, such as the C programming language or similar programming languages. A code segment may represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, an object, a software package, a class, or any combination of instructions, data structures, or program statements. A code segment may be coupled to another code segment or a hardware circuit by passing and/or receiving information, data, arguments, parameters, or memory contents. Information, arguments, parameters, data, etc. may be passed, forwarded, or transmitted via any suitable means including memory sharing, message passing, token passing, network transmission, etc.
In the context of this document, a computer readable medium may be any medium that can contain, store, communicate, or transport the program for use by or in connection with the systems disclosed herein. The computer-executable program code may be transmitted using any appropriate medium, including but not limited to the Internet, optical fiber cable, radio frequency (RF) signals or other wireless signals, or other mediums. The computer readable medium may be, for example but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device. More specific examples of suitable computer readable medium include, but are not limited to, an electrical connection having one or more wires or a tangible storage medium such as a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a compact disc read-only memory (CD-ROM), or other optical or magnetic storage device. Computer-readable media includes, but is not to be confused with, computer-readable storage medium, which is intended to cover all physical, non-transitory, or similar embodiments of computer-readable media.
Various embodiments of the present disclosure may be described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products. It is understood that each block of the flowchart illustrations and/or block diagrams, and/or combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-executable program code portions. These computer-executable program code portions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a particular special purpose machine, such that the code portions, which execute via the processor of the computer or other programmable data processing apparatus, create mechanisms for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. Alternatively, computer program implemented steps or acts may be combined with operator or human implemented steps or acts in order to carry out an embodiment of the invention.
Additionally, although a flowchart or block diagram may illustrate a method as comprising sequential steps or a process as having a particular order of operations, many of the steps or operations in the flowchart(s) or block diagram(s) illustrated herein can be performed in parallel or concurrently, and the flowchart(s) or block diagram(s) should be read in the context of the various embodiments of the present disclosure. In addition, the order of the method steps or process operations illustrated in a flowchart or block diagram may be rearranged for some embodiments. Similarly, a method or process illustrated in a flow chart or block diagram could have additional steps or operations not included therein or fewer steps or operations than those shown. Moreover, a method step may correspond to a method, a function, a procedure, a subroutine, a subprogram, etc.
As used herein, the terms “substantially” or “generally” refer to the complete or nearly complete extent or degree of an action, characteristic, property, state, structure, item, or result. For example, an object that is “substantially” or “generally” enclosed would mean that the object is either completely enclosed or nearly completely enclosed. The exact allowable degree of deviation from absolute completeness may in some cases depend on the specific context. However, generally speaking, the nearness of completion will be so as to have generally the same overall result as if absolute and total completion were obtained. The use of “substantially” or “generally” is equally applicable when used in a negative connotation to refer to the complete or near complete lack of an action, characteristic, property, state, structure, item, or result. For example, an element, combination, embodiment, or composition that is “substantially free of” or “generally free of” an element may still actually contain such element as long as there is generally no significant effect thereof.
In the foregoing description various embodiments of the present disclosure have been presented for the purpose of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise form disclosed. Obvious modifications or variations are possible in light of the above teachings. The various embodiments were chosen and described to provide the best illustration of the principals of the disclosure and their practical application, and to enable one of ordinary skill in the art to utilize the various embodiments with various modifications as are suited to the particular use contemplated. All such modifications and variations are within the scope of the present disclosure as determined by the appended claims when interpreted in accordance with the breadth they are fairly, legally, and equitably entitled.
The present application claims priority to U.S. Provisional Application No. 62/346,602, entitled Systems and Methods for Identifying and Classifying Text, and filed on Jun. 7, 2016, the content of which is hereby incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
62346602 | Jun 2016 | US |