The present disclosure relates generally to systems and methods for managing opinion networks with interactive opinion flows and more particularly, but not exclusively, to systems and methods for collecting and analyzing electronic opinion data.
Web-based systems and data networks provide users with an interactive experience, for example, through contributions to Web-based content (e.g., Web pages). Web-logs (“blogs”), online forums, and so on allow users to interact with each other by creating/editing Web content accessible to other users. A large portion of this Web content reflects a user's sentiment/opinion toward various objects (e.g., electronic commerce products, politics, and celebrities). To facilitate an understanding of the increasing volume of sentiment/opinion data, opinion mining (or sentiment analysis) is often used to process and extract subjective information from the data.
Approaches to opinion mining, aggregation, and sentiment analysis have conventionally attempted to perform broad sentiment analysis on larger blocks of text. These approaches have text classification as a primary aim, and endeavor to identify overall sentiment polarity, with best results typically obtained in review sites where the object is easily identified. These conventional approaches rely heavily upon “bag-of-words” statistical relevance and prior-polarity tagging of specific subjective keywords. The “bag-of-words” model quantizes extracted text—such as from a sentence or a document—as an unordered collection of visual words. Polarity-tagging includes classifying certain text as positive, negative, or neutral. Similar methods have been applied in blogs and news articles, or on micro-blogging platforms (e.g., Twitter® and so on), with varying results.
One drawback of these conventional approaches is a lack of precision in identifying the entity or concept which is the object of the opinion. Some conventional approaches use a triangulation method to calculate proximity of subjective keywords with known entities within a text. These approaches have more success in identifying sentiment around particular objects, but limited understanding of the actual opinion. For example, the term “big” may not have an associated prior-polarity, yet may find meaning in a particular context that traditional methods fail to capture. Other conventional approaches are restricted to hand-annotated training data, which quickly becomes outdated.
In view of the foregoing, a need exists for an improved opinion network and method for opinion mining, aggregation, and sentiment analysis in an effort to overcome the aforementioned obstacles and deficiencies of prior art systems.
The field of the disclosure relates generally to systems and methods for managing opinion networks with interactive opinion flows and more particularly, but not exclusively, to systems and methods for collecting and analyzing electronic opinion data. In one embodiment, a method for analyzing opinion data includes the steps of receiving electronic opinion data, wherein the opinion data includes words of a natural language; mapping the opinion data to unifying opinion objects, the unifying opinion objects provided as a controlled natural language; and providing a presentation having at least one portion corresponding to at least one of said unifying opinions.
In an alternative embodiment, the method further includes ranking the unifying opinion objects in an opinion graph to generate per-user relevance.
This summary is provided to introduce the subject matter of the disclosure and not intended to identify essential features of the claimed subject matter, nor is it intended for use in determining the scope of the claimed subject matter. Other systems, methods, features, and advantages of the disclosure will be or will become apparent to one with skill in the art upon examination of the following figures and detailed description. It is intended that all such additional systems, methods, features, and advantages be included within this description, be within the scope of the disclosure, and be protected by the accompanying claims.
In order to better appreciate how the above-recited and other advantages and objects of the disclosure are obtained, a more particular description of the embodiments briefly described above will be rendered by reference to specific embodiments thereof, which are illustrated in the accompanying drawings. It should be noted that the components in the figures are not necessarily to scale, emphasis instead being places upon illustrating principles of the disclosure. Moreover, in the figures, like reference numerals designate corresponding parts throughout the different views. However, like parts do not always have like reference numerals. Moreover, all illustrations are intended to convey concepts, where relative sizes, shapes, and other detailed attributes may be illustrated schematically rather than literally or precisely.
In accordance with at least one embodiment of the disclosure, a network-based computing system may be used to maintain and analyze a rich opinion network. As opinion networks grow, a method for enabling users to express their ideas, connect them to a wider community of related users, content, and opinions, and provide a platform to interact can mobilize communities and impact the wider world. This result can be achieved, according to one embodiment disclosed herein, by an opinion network-based computing system 100 as illustrated in
The opinion network-based computing system 100 includes a data network 101, configured to access a variety of Internet Services, such as, the World-Wide Web (“Web”)—a well-known data exchange system over the Internet. The Web is commonly used to access electronic content using an application Web browser. By way of illustration, the data network 101 may include one or more Local Area Networks (“LANs”), a Wide Area Network (“WAN”) (e.g., Internet Protocol (“IP”) network), and/or mobile/cellular wireless networks connected to one another. Communication/data exchange with network 101 may occur via any common high-level protocols (e.g., Transfer Control Protocol (“TCP”)/IP, User Datagram Protocol (“UDP”), and so on) and may comprise differing protocols of multiple networks connected through appropriate gateways. The communication/data exchange supports both wired and wireless connections.
Web service users 105 can access various network resources—such as Web services 102, opinion capture server 103, and opinion-enhanced Web services 104—over data network 101 using user devices 105A, 105B, 105C, and 105N. In one embodiment, Web services 102 and opinion-enhanced Web services 104 represent Web pages, each uniquely identifiable via Uniform Resource Locators (“URL”), accessible using any common networking protocol (e.g., HyperText Transfer Protocol (“HTTP”), HTTP Secure (“HTTPS”), Transport Layer Security (“TLS”), and Secure Sockets Layer (“SSL”)) requests.
User devices 105A, 105B, 105C, and 105N are preferably Internet-based communication systems and include, but are not limited to, desktop computers, laptop computers, mobile phones, personal digital assistants (“PDAs”), multimedia players, set top boxes, and other programmable consumer electronics, multiprocessor systems, microprocessor-based systems, and distributed computing environments.
As discussed above, conventional approaches to opinion mining, aggregation, and analysis perform broad sentiment analysis on larger blocks of text, rely heavily on “bag-of-words” statistical relevance and prior polarity tagging, calculate proximity of subjective words using a triangulation method with known entities, and so on. While these approaches may be effective for an object, entity, or concept that is easily identifiable, these techniques continue to lack precision in identifying the object of unstructured opinions and variable entities. Approaches restricted to hand-annotated data for fully understanding the opinion data are quickly outdated. Accordingly,
Turning to
Both the system memory 222 and the fixed disk 208 may embody tangible computer-readable mediums. As one of ordinary skill in the art would appreciate, system memory 222 and fixed disk 208 may also be any type of mass storage device or storage medium, such as, for example, magnetic hard disks, floppy disks, cloud storage, optical disks (e.g., CD-ROMs), flash memory, DRAM, and a collection of devices (e.g., Redundant Array of Independent Disks (“RAID”)). Although shown in
The system, apparatus, methods, processes, and operations for processing electronic opinion data 301 described herein may be wholly or partially implemented in the form of a set of instructions executed by one or more programmed computer processors (e.g., processor(s) 220), including a central processing unit (“CPU”) or microprocessor. The set of instructions may be stored on a computer readable medium, such as memory 222 or fixed disk 208. For example,
Returning to
Additionally, the electronic input data 301 includes words of a natural language (e.g., English), sentence fragments of a natural language, sentences, and graphics/video/audio corresponding to words of a natural language. As used herein, “words of a natural language” should be understood to include phrases of a natural language (e.g., “over the moon”). For graphics/video/audio corresponding to words of a natural language, well known graphics processing, optical character recognition, audio processing (e.g., voice recognition and speech-to-text analysis), and video processing can be used to translate a variety of opinion data to electronic input data 301.
Opinion capture server 103 passes the electronic input data 301 to a core service engine 302 through an application programming interface (“API”). This interface allows users 105 to quickly and easily create opinion structures for precise data and accurate aggregation. APIs describe the ways in which a particular task is performed and are specifications intended to be used as an interface by software components to communicate with each other. APIs may include specifications for routines, data structures, object classes, and variables. Each specification may include a complete interface, a single function, or a set of APIs. The use of APIs is well known and understood by those of ordinary skill in the art.
As input data 301 includes opinions from various sources, users 105 often provide input data 301 in a variety of structures. For example, input data 301 may be highly structured (e.g., opinions via Last.fm); whereas, in other cases, input data 301 lacks any consistent structure (e.g., opinions via Twitter®). In one embodiment, a controlled natural language interface may guide user 105 to capture and model human opinions of input data 301 in a structured, machine-readable form. The natural language interface extracts the essence of the opinion from input data 301 without devaluing the content or imposing significant constraints on expressivity. Users 105 may actively structure their opinions through the guided input flow, in accordance with the natural language interface, or by using predefined syntax.
In one example, users 105 submit opinions to server 103 using a Web browser on their user device 105A, 105B, 105C, and 105N. Server 103 provides an opinion entry interface that incorporates predictive text and/or “auto-complete” techniques. A user 105 may start typing a first few letters in a text entry box on a Web page or in an application for a mobile device. In response, auto-complete options may be presented, which include a combination of stored entities and opinion words. User 105 can then decide to complete the word or use the auto-complete suggestion. As a specific example, if the user 105 inputs an entity word (e.g., “trains”), server 103 would then require an opinion word (e.g., “love” or “hate”) to apply to the entity. Server 103 therefore provides auto-complete suggestions for either the top 5 trending opinion words used in conjunction with that entity or user's 105 frequently used opinions. Similarly, if the user 105 entered an opinion word, the server 103—requiring an entity to apply it to—would suggest the top 5 trending topics used in conjunction with the opinion word from a user's opinion graph (e.g., a user 105 entering “love” is presented with films and cameras in the user's 105 opinion graph), which will be further discussed below with reference to
In order to create structured opinion data, an example opinion entry interface of server 103 may capture the following dimensions of each opinion:
Object: This is the entity about which the opinion is being expressed. These are uniquely identified and related to one another in an entity graph. This is linked to open datasets—such as Freebase (and consequently the Linked Open Data graph)—and, therefore, is continually being updated and extended. The object may also be a geographical place (e.g., city or neighborhood) or venue (e.g., restaurant, café, bar, park, attraction, etc.). Additionally, users 105 can upload photographs or videos that then become Objects in database 302B, or users 105 can refer to existing resources on the Web via hyperlinks (e.g., articles, videos, pages, etc.).
Subject: This is the opinion-giver (i.e., user 105). The server 103 may draw on data from the opinion-giver's existing profile on a social media platform, activities on the Web, location, and profile information to add relevance/detail to the data presented. Users 105 in the system may be individuals, groups, organizations, or companies.
Affect: This is the subjective content within the opinion (i.e., the meaning of the opinion word). Server 103 may capture the semantic meaning of this word and related words (e.g., synonyms, antonyms, and hypernyms). In one embodiment, affect is derived from links to a lexical database (e.g., WordNet), which semantically clusters concepts and relates them to a hypernym taxonomy. For example, the affect may reference one or more synsets. A synset is a group of opinion words that are synonyms or have sufficiently similar meaning.
Intensity: This is the intensity with which opinions are expressed. This is captured at the point of opinion entry to server 103 on an intensity slider, which forms part of the opinion entry user interface (“UI”), or through natural language analysis of the text. Words that contain intrinsic intensity are marked up in a function table, but more commonly intensity is derived from particular modifiers (e.g., “very”), which map the function along an intensity spectrum.
Polarity: This is the sentiment polarity of the opinion itself—as compared to the individual opinion word—as a whole, taking into account negation and modifiers. All functions are stored in a database and tagged with prior-polarity (i.e., they contain intrinsic sentiment data, such as from a hand annotated dataset). However, the server 103 can also redress the overall polarity of the opinion based on the modifiers used or the entity it relates to.
Context: This is the location on the Web where the opinion is being expressed. This might be a Web page or an article/item of media identified on the Web. Context also includes reactions to another opinion. Context may form a node within an opinion graph to allow a user 105 to see which opinions have been prompted by that particular page.
Condition: The opinion can be qualified using a trigger word (e.g., “because” or “when”) followed by a natural language statement to add extra metadata to the opinion.
Reasons: This is a natural language comment attached to an opinion to provide additional justification or explanation for the expressed opinion. Users 105 may express multiple reasons for holding an opinion.
The opinion entry interface also offers the ability to model discourse surrounding an idea over time. For example, server 103 detects when a user 105 has reacted to another user 105, whether they agreed or disagreed, the opinion reaction, and the resulting action taken. Server 103 isolates temporal moments, which prompted shifts in opinion, and attaches that to meaning, rather than tracing the frequency of a particular string from the Web. This facilitates development of a rhizomatic opinion network system 100 around conversation, which grows in intelligence over time and with extensive use.
As mentioned above, a controlled natural language interface may be provided to guide the user when inputting opinions and enforce a particular structure. In a preferred embodiment, this controlled natural language is modeled on Resource Description Framework (RDF) triples. RDF is a standard model for data interchange on the Web and is well understood and appreciated. By example, a controlled natural language interface may encode opinions into various forms including, but not limited to:
Status: [User 105]:[adjective]
Intent: [User 105]:[verb]:[noun phrase]
Property: [User 105]:[noun phrase]:[adjective]
Connection: [User 105]:[noun phrase]:[verb]:[noun phrase]
These basic structures are extendable, and constantly evolving in response to user 105 activity. For example, users 105 may add a condition to their opinion with a trigger word that is either pre-defined or parsed to provide additional information surrounding these statements. This may include temporal or geographical restrictions on the validity of the opinion (e.g., “hate:London when it's rainy”) or a reason for the opinion (e.g., “hate:London because it's rainy”). If a particular (i.e., unknown) trigger word becomes statistically significant, server 103 elevates the trigger word and similar conditions are aggregated around it, such that the qualifiers are constantly evolving through user 105 interaction.
Users 105 may also impose a qualifier on the individual components of the opinion (e.g., “hate:slow trains” or “red iPods:brilliant”). Additional opinion structures include conjunctions—either subordinating or coordinating—that allow multiple opinions to be tied together, or reliant on each other.
Alternatively, where users 105 are not guided by a controlled natural language interface, opinion capture server 103 is configured to translate unstructured, natural language input data 301 into the aforementioned structures. The core service engine 302, therefore, includes an opinion encoding module 302A for translating the electronic input data 301 into a unifying model (e.g., aided by the constraints of the controlled natural language interface).
In a preferred embodiment,
From the POS tagger (action block 4004), verbs/adverbs/adjectives are tied together to the appropriate POS for which they qualify (action block 4005). In one embodiment, a stemmer subsequently reduces each verb/adverb/adjective to its root word (e.g., “fishing,” “fished,” and “fishes” are each reduced to “fish”) to facilitate mapping variations of each word (action block 4006). Each root word then is mapped to a database, accessible over data network 101 (e.g, database 302B or a third-party database, such as Freebase) (action block 4007). Mapping to a third-party database provides instant references to similar topics across the Web, thereby providing users 105 immediate access to additional resources related to an opinion's topic. Conjunctions (either subordinating or coordinating) (action block 4008) allow multiple opinions to be tied together or reliant on each other are also reflected in the resultant structured opinion (end block 4012).
As an additional input 301 source, structured opinions from elsewhere on the Web can be translated into specific opinions within the site and claimed by users 105. For example, a user 105 may convert Facebook® “likes” or “dig”ed articles from “digg.com” into structured opinions. The user 105 provides authentication credentials (e.g., username and password) to server 103 to access the user's 105 “liked” or “dig”ed items. Once parsed and tagged, spotting and disambiguation engine 32B identifies entities, disambiguates, and maps the opinion entity to a Freebase topic based on a corresponding Web page (e.g., Facebook® Web page or Wikipedia® entry). A confidence level may be maintained for each identified entity based on the method of disambiguation. A confidence threshold is then used to filter out less confident imported opinions. Optionally, the proposed opinions may be presented to the user 105 for manual filtering/selection. A similar mechanism may be provided for topic-based services, where users 105 can import positive or negative ratings, such as consumer media/product reviews (e.g., last.fm, Netflix®, and Amazon®). Users 105 then will be able to view their collected opinions, expressed on multiple platforms and in multiple networks, in a centralized location.
Returning to
The spotter and disambiguation engine 32B draws on both statistical methods and linguistic parsers to identify relevant entities within input data 301, and selects an appropriate disambiguation for a given term based on the context in which it is found. Spotting/identifying relevant entities creates a layer of meta-data on top of the original source input (e.g., Web page or article), which subsequently allows for disambiguation of the various spotted entities. In addition to this domain-based contextual disambiguation, however, the relevance of the disambiguation is also influenced by an opinion graph, creating a relevant, trending entity dictionary which is ranked according to the activity of the entities within the system 100 and in the Web as a whole. Accordingly, the spotter and disambiguation engine 32B may assist a user 105 in expressing opinions on topics expressed in an article or page (e.g., Web page) that the user 105 is reading, importing statistics about entities and opinions to improve the background relevance statistics for the system 100 (e.g., the relevance of entities and opinion words generally in the world at a given time, rather than specific to a particular user 105 or context of the opinion), and automatically creating collections of entities within entity dictionary database 32C based on spotting entities from the Web (e.g., news streams).
Similar analysis may also be performed on content that is associated with a user 105 (e.g., data spotted using a bookmarklet tool or shared in a Twitter® “tweet”). In addition to text content described above, content 301B may further include data extracted from the group consisting of machine readable tags, metadata, images, external data APIs, and combinations thereof.
In
Once the topics are spotted in process 5000, server 103 may optionally disambiguate topics using disambiguation engine 32B based on the detected domain or category that the input data belongs to. Specifically, to detect the domain or category, the entities from the entire page are ranked in order of relevance for the article, which will be further discussed below. As previously mentioned, disambiguation results are enhanced over time based on continual feedback of relevant topics/domains.
After the input data 301 is translated into a unifying, structured model, nodes extracted from this model may be inserted into database 302B within the core service engine 302. As discussed, these opinion structures may correspond to a controlled natural language, creating a framework and a vocabulary for opinion analysis. In one embodiment of opinion analysis, capturing the contextual and semantic data surrounding an opinion enables the server 103 to populate and navigate an opinion graph. An opinion graph is a network of entities connected by subjective statements. This opinion graph may include the mapping to similarly related topics on the Web, thereby overlaying the developing structured Web of entities, such as from Linked Open Data. The Linked Open Data project refers to a set of well-known best practices for publishing and connecting structured data on the Web integrating cloud computing.
The opinion graph can be advantageously explored from the perspective of any node within it, including: user 105, function, entity, sentiment, context, and intensity. In one embodiment, the opinion graph contains three sub-graphs: (1) a social graph containing relationships between users 105 (e.g., friend-of-a-friend); (2) a function graph containing links between related words; and (3) an entity graph containing semantic relationships between entities and links into the Linked Open Data cloud. Opinion graph 600 provides the additional advantage of directional relationships between users 105 and entities (e.g., an opinion is applied towards an entity). Defining relationships in this way enables facilitated analysis of the opinion (e.g., clustering similar users and so on). A sample opinion graph 600 in accordance with at least one embodiment of the disclosure is illustrated in
As shown, opinion graph 600 (i.e., for structured opinion “Helen:love:Barack Obama”) contains three sub-graphs including social graph 601, function graph 602, and entity graph 603. Social graph 601 is a social network derived from the asynchronous relationship created when users 105 “follow” or “subscribe to” other users 105 within system 100. When a user 105 joins the system 100, they also have the option to draw/import relationships from various social networking platforms. Examples of known social networking platforms include, but are not limited to, Facebook®, Twitter®, LinkedIn®, and MySpace®.
Function graph 602 is an internal lexicon composed of a rich clustering of words in semantic categories. This is linked to a lexical database (e.g., WordNet), which provides connections between the functions (e.g., “love”) and equivalents in other languages. Functions and their equivalents provide a semantic clustering for enabling aggregation of opinions. Each function is stored in database 302B and marked with a polarity and intensity score as described above (where applicable).
Entity graph 603 diagrams the relationship between the extracted entity of which the opinion applies (e.g., “Obama”). Each entity is connected by virtue of the opinions expressed about them. As previously mentioned, entities are uniquely referenced in server 103 and linked to an equivalent entity in a well known database, accessible over data network 101 (e.g., Freebase). This provides access to rich semantic links between objects in the Linked Open Data graph and may be constantly updated. In addition to structural relationships, entities are categorized such that, for example, the spouse, location of birth, or occupation of a given entity can be shown. Entity graph 603 not only structurally links “Obama” to an opinion reflecting “love,” but also categorizes Obama based on occupation and spouse. These relationships may be exploited in order to fuel a suggestions engine and add to relevance calculations.
The entity graph 603 may also reflect trending topics pulled from the Web. An RSS aggregator 32A provides disambiguation engine 32B with topics pulled from the Web (e.g., RSS feeds). The engine 32B statistically ranks entities per domain to provide a base relevance for particular disambiguation of a given entity, thereby allowing isolation of trending groups of entities. Analysis of the data drawn from the RSS aggregator 32A enables users to explore collections of entities that are derived from both queries into the entity graph and the statistical analysis from RSS aggregator 32A. For example, a collection of entities might include “books currently trending in London” or “most popular people in politics.” Ultimately, users 105 may generate collections by framing any query into the opinion graph (e.g., “most hotly debated movies”).
Because input data 301 includes a broad scope of opinions from multiple contexts and networks described above, server 103 is configured to aggregate similar opinions across multiple platforms for an accurate and comprehensive opinion summary. Users 105 can publish opinion structures and associated data out to any network, increasing the scope of system 100 growth. The community of users 105 collected around a similar idea is known as a “cosm,” and includes all the users 105 who have contributed to that opinion. When a user 105 makes an opinion, they enter an implicit group together with other members of that “cosm.” Opinion graph 600 illustrates a “macro-cosm” 604, which is a clustering of all the similar attitudes towards a given entity (e.g., the users 105 that all love Obama), or of all the similar types of objects/entities. Conversely, “micro-cosms” can be shown, which consist of all the particular reasons that have been expressed for a given opinion. Users 105 may also elect to share a particular “cosm” to selected users 105, or users 105 within another “cosm,” to structurally link unrelated opinions. Over time, “cosm networks” are created that contain users 105 with broadly similar ideas, from which other social communities are formed. Accordingly, server 103 provides the additional advantage of graphically analyzing and navigating large amounts of opinion data from different platforms easily.
For example, any organization, political party, group, or individual can form “cosm networks” to broaden their support base or publicize their campaign to specific targeted interest groups. Other users 105 can cluster around particular ideas and take collective or individual action on the basis of an expressed opinion. Advertisers similarly can create or select specific “cosm networks” based on opinions regarding their own products, services, areas of interests, and so on to communicate directly with an audience group having a specific, similar interest. The audience group can be further filtered according to the geographic location of individual members of the audience group, specific opinions, or demographic information (e.g., age or gender). In this way, an advertiser can choose to show advertisements to, for instance, all members of an audience group who have stated positive opinions on skiing and are based in the UK. In one embodiment, users 105 must choose to take part in a “cosm network.”
As each opinion is aggregated into “cosms,” server 103 further is configured to notify (e.g., via e-mail, mobile, application, and so on) the respective users 105, whose opinions were aggregated, that their opinions have been counted and published. In one embodiment, this notification includes a link to the location of the published aggregate opinion to allow the user to view the relative impact of their submitted opinion. This constant feedback to the user 105, therefore, provides the advantage of attracting new users to a new location (e.g., Web page) for both reinforcing that the opinion is heard and establishing a new, relevant audience.
In order to compute opinion similarity—such as, to generate a “cosm,” server 103 may draw on both a linguistic understanding of opinion words and statistical analysis of the usage patterns stored at server 103 (e.g., database 302B or 32C). Words stored at server 103 are mapped to a lexical database (e.g., WordNet) to provide semantic relationships between words. For example,
Server 103 can also learn based on user 105 activity. If an unknown word is repeatedly used in reaction to, or conjunction with, another cluster of words, server 103 may infer a strong link between the words, which may be a basis for aggregation. In this way, new words are continually adapted into the server 103 database (e.g., database 302B, 32C), and the internal lexicon may evolve as organically as natural language trends outside of system 100.
In an alternative embodiment, the server 103 can improve the accuracy of the clusters of words and semantic relationships using statistical techniques based on the co-occurrence of words within opinion objects. For example, word A and word B commonly are used together (e.g., by users forming opinions). If word A and word C similarly are used together, server 103 can infer a relationship between words B and C. However, any similar statistical technique may be used for clustering and aggregation, and are well known in the fields of machine learning and data mining. It should similarly be understood to those of ordinary skill that this process can apply to both user-submitted opinion data to server 103 and derived opinion data from corpuses of text and Web pages, for example, representing larger discussions over longer periods of time.
In yet another alternative embodiment, deriving relationships between words and sentiment/polarity scoring may include manually ranking and processing sample sets. A plurality of manual ranking scores is averaged to account for “wisdom of crowds.” To facilitate this process, well known human intelligence in Web service solutions, such as Mechanical Turk from Amazon®, may be used.
Opinion words stored in the database 302B, 32C are also closely tied to suggested actions which arise from particular “cosms.” Users 105 are able to suggest actions which relate to opinions, enabling users 105 to act upon the ideas stimulated by and expressed within the system. In one example, user 105 may be an organization or company, who could “sponsor” an action which would be suggested to particular “cosms.” Server 103 statistically analyzes words usage patterns within and outside the server 1033 to indicate potential actions which can be tied to an opinion.
In an alternative embodiment, once the structured opinion is ranked—based on domain recognition (i.e., via disambiguation engine 32B)—and graphed (e.g.,
In one embodiment, relevance is calculated per user 105 on the basis of the activity of their specific network. For example, relevance may reflect a user's 105 ideas based on the creation of “cosm networks” above. Recommendations based on this type of relevance typically are centered on a user's 105 social graph 601. As discussed above, users 105 may also draw/import relationships from various social networking platforms, which ultimately enables users 105 to receive recommendations from multiple social networking platforms in a centralized location.
For every user 105 in system 100, relevance engine 303 isolates the nodes within their opinion graph 600 to calculate individual scores based on an n-dimensional matrix, where each dimension represent a different relevance parameter. These parameters include, but are not limited to, type/domain of the entity, “SocRank” (i.e., weight in the social graph based on opinions made by a user 105's social network), “CosmRank” (i.e., weight in the opinion graph based on opinions that the user has made in the past), “PageRank” (i.e., based on matching the text in an article opined on with descriptions of an entity—derived from manual input or third-party database—to create text-based representations of user opinions), “GeoSpatial Rank” (i.e., based on geographical location where opinions are made), “Trend Rank” (i.e., ranking opinion/entity nodes from followers and influencers higher than other opinions), “Tracking Rank” (i.e., ranking specific users, entities, and categories higher when a user optionally follows/tracks it), ranking related entities and categories, and “opinion activity rank” (i.e., higher ranking reflecting greater activity, such as responses). Users' 105 input may also be used to specify ranking parameters to server 103. In a preferred embodiment, weight is assigned to each of the aforementioned parameters on a numerical scale from 1-10.
In one embodiment, relevance engine 303 calculates relevance scores as an offline process at the point of user 105 interaction. Any number of scores can be added for new parameters, such as, for example, data based on new relationships or temporal information.
In an alternative embodiment, relevance engine 303 retrieves relevant nodes from the opinion graph 600 immediately after user 105 submits a new opinion. These nodes are aggregated to be presented as “opinion results” to user 105. “Opinion results” illustrates to the user many connections and interesting paths to follow in the opinion network as a direct result of the currently submitted opinion. These connections and paths may include, but are not limited to, relevant entities, users, opinions, actions, articles, or combinations thereof.
As discussed, electronic input data 301 includes generic/worldwide topics 801, user submitted information 802, and various opinion streams 803. Through analysis of articles 801A in the news/throughout the web (e.g., via a RSS news feeder), processing 800 spots entities from the text, populates entity dictionary database 32C, and ranks each entity according to the degree to which the entity is trending globally, and per domain (e.g., using spotter and disambiguation engine 32B). Similarly, server 103 parses and disambiguates trending entities 801B of a generic/worldwide type (e.g., trending Twitter® topics) to calculate a ranking score based on global trends.
Relevance calculations also occur for user 105 submitted information 802 including: user submitted URLs 802A (i.e., where a user 105 has directly indicated their interest in a particular site); user-shared URLs 802B (i.e., where a user 105 shares a link with other users 105 of their social network); user's 105 activity 803C pulled from their other accounts from the Web (e.g., a played track on Last.fm, a book bought on Amazon.com®, or a movie from Netflix®). Server 103 matches these entities to generate background relevance data.
When a user 105 actively creates an entity 802D within server 103, server 103 is also configured to generate related entities that may be of relevance to the user 105, such as by semantic relationships. Users 105 may also activate a bookmarklet 802E on an article or post for server 103 to record the context (i.e., domain name) and add a ranking accordingly. Articles 801A, user submitted URLs 802A, user-shared URLs 802B, and bookmarklet 802E articles are run through spotter and disambiguation engine 32B (action block 704) to identify the relevant entity and disambiguate based on the context.
Furthermore,
Based on the calculated relevance scores, users 105 may also browse and discover new relevant content, not yet suggested. When users 105 make opinions in the context of an article, for example, server 103 may provide the user 105 other sources (e.g., articles and other contexts) where the opinion has been made for uniquely relevant content suggestions. Conversely, users 105 can similarly browse other opinions that a particular piece of content has prompted.
In one embodiment, once a user 105 views specific information or opinions about an entity, associated and related entities that may also be of interest to the user may be displayed (i.e., based on relevance score). Accordingly, the association of one entity to another may come from multiple sources, such as the text matching described above. However, the association of two or more entities may be compiled from manually curated associations (e.g., a curator or an administrative panel). Some associations of two or more entities are formed based on context of a previously submitted opinion, which formed a bidirectional relationship between two or more entities (e.g., a news article opinion on the topic “football” would form a bidirectional relationship between “football” and the article). Associations between entities may be formed in response to an opinion on a different topic, nonetheless, forming a bidirectional relationship (e.g., an opinion on “cake” receiving a response of an opinion that “donuts” are “better” would create a bidirectional relationship between “donuts” and “cake”). These associations are scored and ranked based on popularity, semantics, and so on. In one embodiment, associations may be reflected in entity graph 603.
Once the input data 301 is translated to a unifying, structured model, graphed, and ranked according to relevance scores, an opinion network is generated such that users 105 can interact with a large volume of opinion data. Users 105 are able to better understand what a community is saying about a specific entity, product, brand, or issue from multiple platforms across the Web. More specifically, users 105 have the option for understanding the opinion/recommendation from like-minded users with similar interests, which may increase the propensity to make purchases and promote consumer transactions. Capturing structured, rich opinion data allows, as another example, companies to discover specific opinions about their products or brands with associated reasons that are mapped and organized at various levels of aggregation. Therefore, both individual opinion-givers and trends can be identified, including key influencers and opinion leaders, while users and companies can engage directly with supporters, customers, and critics.
In one embodiment, this data can be shown to the users 105 on their user devices 105A, 105B, 105C, and 105N. Specifically, users 105 can access Web services 102 from their user devices 105A, 105B, 105C, and 105N. Web services 102 may include various Web sites such as social networking platforms, media pages, blogs, and electronic commerce (“e-commerce”) sites. However, processed opinion data, such as by opinion capturing server 103, enables users 105 to experience Web services 102 as opinion enhanced Web services 104. Users 105 request access to opinion enhanced Web services 104 (e.g., via Web browser) to view opinion graphs 600, browse social networks, receive recommended opinions and products (e.g., targeted advertising), analyze cognitive/linguistic data, and so on.
In addition to browsing a rich opinion network, opinion enhanced Web services 104 provide a discourse model to trace propositions, justifications, responses, resolutions, and actions taken in response to an opinion. As a specific example, opinions can be presented in the form of a debate. A debate is identified when there are at least a predefined (i.e., configurable) threshold number of opinions with respect to a particular entity that uses function words from two or more opposing synets (e.g., synsets with opposing meanings). The different sides of the debate may be named using the most frequently used opinion word from each sysnet associated with the entity. Users 105 with opinions that contribute to the identified debate may be notified of the debate.
Users 105 are encouraged to interact with opinion enhanced Web services 104 (e.g., participating in interactive flows of the opinion network) for promoting growth of system 100. In one embodiment, users 105 can invite friends and other users to join their social network and participate in one or more opinion flows. For example, upon seeing an opinion, a user 105 can elect to respond to the opinion in at least three ways: (1) agree/disagree; (2) ask “why?” and (3) comment. If a user 105 chooses to agree or disagree, an option is also provided to generate a new opinion. The new opinion maintains a link (e.g., agreement/disagreement relationship stored in database 302B and reflected in opinion graph 600, for example) with the original opinion. For the original opinion word, the controlled natural language interface, discussed above, prompts synonyms (i.e., in the case of agreement), antonyms (i.e., in the case of disagreement), or free-form opinion guidance (i.e., in the case of responding with “ask why?”) to assist the user 105 in creating the structured input for their new opinion. The chosen opinion word may be used to clarify the confidence of the semantic relationship (e.g., synonym/antonym) to the original opinion word. The author of the original opinion is then notified that another user 105 has replied to their opinion.
Similarly, specific opinions may be shared among users 105. For example, user 105A elects to share an opinion or ask for an opinion about a particular entity. User 105A chooses to share the opinion with user 105B. Sharing channels include, but are not limited to, social networking platforms, e-mail, and short message service (“SMS”) communication. A notification is sent to user 105B, for example, via e-mail, SMS communication, push notification to user device 105B, or upon user's 105B subsequent request for opinion enhanced Web services 104. User 105B includes both users registered with server 103 and users who have not registered with server 103. User 105B then follows the notification (e.g., via hyperlink) and server 103 maintains history that user 105A successfully prompted user 105B to access opinion enhanced Web services 104. User 105B can similarly respond to user's 105A opinion in the manner described above.
In order to further incentivize users 105 to interact with an opinion network and enter opinions, users 105 may earn rewards for their participation. These rewards include special achievements, impact scores, and gaining status roles. A user 105 receives achievements whenever they hit a particular milestone. Achievements are intended to encourage users for specific actions. Some examples include: an achievement for being the first user to publish an opinion for a given topic); a “one-sided debate” achievement for a user elaborating on a created opinion without enticing others to participate; a “debate” achievement for users participating in a debate; “opinion count milestones” for various thresholds (e.g., 10, 25, 100, and so on for the number of submitted opinions from a single user); “category milestones” for various opinion thresholds for a specific entity/category; “reason milestones” for generating an opinion that includes responses surpassing various thresholds; a “polarized agreement” achievement when a threshold ratio (e.g., 90%) of the opinions for an entity agree with a user's opinion; a “polarized disagreement” achievement when a threshold ratio (e.g., 10%) of the opinions for an entity agree with the user's opinion; a “thought leader comparison” achievement when a user's opinion disagrees with the opinion of a thought leader, which will be further described below; and a “friend comparison” achievement when a user's opinion disagrees with the opinion of another user within their social graph for a particular entity.
Similarly, impact scores are used to quantify a specific user's influence in the system 100. In one embodiment, points to determine an impact score are accrued as shown in Table 1:
For each action represented in Table 1, the impact score is then the total number of points accrued over a pre-defined time period (e.g., 120 days).
Similar to achievement awards and impact score, individual users 105 can attain “thought leadership” status when their opinion generates the highest number of agreements for that topic. To become a thought leader, the number of agreements for that topic exceeds a minimum threshold (e.g., 5 users) and the thought leader's total impact score exceeds any other user 105 by at least a threshold number of points (e.g. 2 points). In one embodiment, thought leaders are identified—including the thought leader's specific opinion and number of agreements prompted—when any user 105 views the particular entity topic. However, in an alternative embodiment, the top 5 users 105 may appear as thought leaders on a given topic. Identifying a thought leader occurs when there is a threshold number of associated users 105 (e.g., 1 user) that have prompted at least one agreement. Each user 105 is similarly associated with the number of thought leader roles the user holds, the number of agreements the user has prompted, and the number of topics for which they may become thought leaders (e.g., 3 user agreements away).
Similar to “thought leaders,” in an alternative embodiment, server 103 may assign additional roles to specific users 105, which create a unique experience for that type of user 105. These roles include, but are not limited to:
Advocates: These are individuals that rally support and act as an “advocate” for a particular opinion. An advocate role enables other users 105 effectively to add support, weight, or backing to the advocate user on that particular opinion, thereby allowing the advocate user to speak and emote on another user's 105 behalf. Representative can emerge within system 100 and the community can form a democratic support system for specific opinions.
Thought Leaders: Particular users 105 can be thought leaders based on their specific influence within system 100. When a user 105 stimulates another user 105 to give an opinion/change their mind, server 103 rewards that user 105 by giving him greater visibility to other users 105 (e.g., highlighting the user on cosm pages or providing direct rewards, such as badges).
Administrators: Trusted users 105 have the ability to act as administrators to moderate data and behavior in system 100. Administrative duties include moderating disputes and abusive behavior, correcting existing opinions presented about entities or functions, and mapping new words as they emerge (e.g., slang). Administrators may be democratically promoted or rewarded with privileges based on activity in system 100.
Groups: Users 105 may create or join groups gathered around a particular idea, entity, or context. These groups can be led by specific organizations, companies, or individual users. Groups are administrated by the community and server as hubs which stimulate further conversation and action.
Personas: Personas are a type of implicit group formed by virtue of a user's 105 opinions. For example, an opinion profile may demonstrate a user 105 to be Republican, a movie buff, or a dog-lover. These “personas” may also form the basis for an action or query into the system, such as, generating a collection based on the opinions trending amongst a specific political party, or share an action or “cosm” directly with all animal-lovers.
At any given time, server 103 also is configured to communicate globally with all users 105 of server 103. This provides the advantage of providing opinions/messages that relate to all users (e.g., global, philanthropic messages such as flu vaccine notifications), thereby promoting particular causes and educating any user 105. Opinions and reactions of users 105 can be posted dynamically.
In an alternative embodiment, a dashboard widget operates at the input 301 level to provide quick access to opinion enhanced Web services 104 from a user device 105A, 105B, 105C, and 105N. A dashboard is a display intended to show interesting/specific aspects of the opinion network to a particular user 105. The dashboard includes at least one “widget,” which is a contained area of Web or application content for providing various summaries of the opinion network. Widgets are typically moveable or resizable to scale according to the size of a user 105 display or for customizable layouts. The dashboard may appear on a Web page (e.g., opinion enhanced Web services 104 and third-party Web pages from partner owners), a mobile application, electronic public displays, and so on. Additionally, a set of dashboard widgets can be used to show interesting information from the opinion network to a user immediately after making an opinion (e.g., opinion results described above). For example, an electronic article or Web page incorporates scripting code (e.g., JavaScript) for integrating a specific widget. The specific widget is uniquely identified and communicates through an API to process various input opinion data 301. The use of dashboards and widgets are well understood and appreciated by those of ordinary skill in the art. A dashboard widget allows users 105 to seamlessly make opinions, such as through opinion enhanced Web services 104, view opinions, explore user profiles, and browse various topics.
For example, dashboard widgets may be used to display a polar topic category. A category is defined as a semantic grouping of entities (e.g., presidents, public speakers, and people). To entice users 105 to make opinions on topics in a given category which have prompted highly positive or negative opinions, a dashboard widget may be used to show the topic in a given category (e.g., “ridiculous politicians”) which has the most average positive or negative opinions. For a given category, the dashboard widget can show a cluster of all entities in that category with a similar overall sentiment.
In another example, a dashboard widget may be used for a single function, such as for enabling a user to submit an opinion without leaving a Web page. For example, an aggregated opinion relating to an article/Web page/product may be placed as a widget next to the respective entity/topic (e.g., “Overstated” button next to an article link). Users 105 click on the widget to automate their opinion to the article.
Dashboard widgets are effective not only for users providing opinions but also for publishers and bloggers who wish to aggregate opinions and responses to their published content. For example, a publisher widget works at the article level (i.e., the published content) and creates a layer of metadata on top of the published text. The publisher widget is integrated into the published text (i.e., script within the page source) and includes a unique identification code. For each opinion or comment on the article, the URL of the published text is communicated to server 103 along with the unique identification code of the publisher widget. Once server 103 receives the data, the spotter and disambiguation engine 32B determines relevant topics/entities from the published text article (e.g., using natural language processing and text-mining described above, while ignoring advertisements). Each relevant topic/entity is linked to any relevant topics/entities, such as, from a third-party database (e.g., Freebase), thereby connecting the topic/entity to similar references for additional information. As a community of readers, as well as the author of the article, read the article and form opinions, the publisher widget is also configured to retrieve the entity list from database 302B for creating aggregate views. Accordingly, publisher widgets provide the additional advantage for gaining insight about the context of the article, relative opinions, and the profiles for other readers and authors.
Dashboard widgets also may be used for, but not exclusively:
In the foregoing specification, the disclosure has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the disclosure. For example, the reader is to understand that the specific ordering and combination of process actions described herein is merely illustrative, and the disclosure may be performed using different or additional process actions, or a different combination or ordering of process actions. For example, this disclosure is particularly suited for analyzing opinion data from a Web-based server; however, the disclosure can be used for a variety of opinion mining systems. Additionally and obviously, features may be added or subtracted as desired. Accordingly, the disclosure is not to be restricted except in light of the attached claims and their equivalents.
This application claims priority to: U.S. Provisional Application Ser. No. 61/523,823, filed on Aug. 15, 2011; U.S. Provisional Application Ser. No. 61/625,560, filed on Apr. 17, 2012; and U.S. Provisional Application Ser. No. 61/650,240, filed on May 22, 2012. Priority to these provisional applications is expressly claimed, and the disclosures of respective provisional applications are hereby incorporated by reference in their entireties and for all purposes.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/IB2012/001581 | 8/14/2012 | WO | 00 | 10/8/2014 |
Number | Date | Country | |
---|---|---|---|
61523823 | Aug 2011 | US | |
61625560 | Apr 2012 | US | |
61650240 | May 2012 | US |