The present application claims priority to a Patent Application Serial Number 3599/MUM/2014 filed before the Indian Patent Office on Nov. 11, 2014, which is incorporated herein by reference in its entirety.
The present subject matter described herein, in general, relates to identifying e-mavens for any specific industry or brands.
Several companies are constantly endeavoring to devise innovative ways of advertising their products and services. One of the ways is to use social media in innovative ways to promote such products or services. These products and services are generally more acceptable to people on social media when referred by their connections. The social media users referring brand/industry/product on the social media may be identified as e-mavens, mavens, knowledge spreaders, social influencers, information spreaders, market mavens, market connoisseurs, and market diffusers.
Precisely identifying the e-mavens on the social media is a recent practice used for advertisement. The e-mavens may be identified precisely upon analyzing data of the social media users from the social media. The data may comprise of posts uploaded by the social media users, likes and comments on posts uploaded by friends. The data may be analyzed for learning about the social media users.
Further, identifying the e-mavens based only on an activity of the social media users on the social media may not always be a perfect approach. This approach may not consider an industry relevance of the social media users. The e-mavens thus identified, may always not spread a positive word about the brand/industry/product not related to the e-mavens.
This summary is provided to introduce aspects related to systems and methods for identifying an industry specific e-maven by analyzing textual data from social media and the aspects are further described below in the detailed description. This summary is not intended to identify essential features of the claimed subject matter nor is it intended for use in determining or limiting the scope of the claimed subject matter.
In one implementation, a method for identifying an industry specific e-maven by analyzing textual data from social media is disclosed. The method may comprise, collecting textual data posted by users on social media. The textual data may be collected using an industrial corpus. The industrial corpus may comprise a collection of industry specific keywords. The method may comprise determining characteristics of the users by analyzing the textual data. The textual data may be analyzed using an adaptively self-learning database. The characteristics of the user may comprise socio-networking characteristics, socio-behavioral characteristics, and psychometric characteristics. The method may further comprise calculating maven scores for the users. The maven scores may be calculated based on the scores for each characteristic of the users. The method may further comprise calculating industrial maven scores for the users by using the maven scores and an industry adjustment factor. The method may further comprise identifying at least an industry specific e-maven based on the industrial maven scores.
In one implementation, a system for identifying an industry specific e-maven by analyzing textual data from social media is disclosed. The system may comprise a processor and a memory coupled to the processor for executing programmed instructions stored in the memory. The processor may collect textual data posted by users on social media. The textual data may be collected using an industrial corpus. The industrial corpus may comprise a collection of industry specific keywords. The processor may further determine characteristics of the users by analyzing the textual data. The textual data may be analyzed using an adaptively self-learning database. The characteristics of the users may comprise socio-networking characteristics, socio-behavioral characteristics, and psychometric characteristics. The processor may further calculate maven scores for the users based on the characteristics of the users. The processor may further calculate industrial maven scores for the users by using the maven scores and an industry adjustment factor. The processor may further identify at least an industry specific e-maven based on the industrial maven scores.
In one implementation, a non-transitory computer readable medium embodying a program executable in a computing device for identifying an industry specific e-maven by analyzing textual data from social media is disclosed. The program may comprise a program code for collecting textual data posted by users on social media. The textual data may be collected using an industrial corpus. The industrial corpus may comprise a collection of industry specific keywords. The program may further comprise a program code for determining characteristics of the users by analyzing the textual data. The textual data may be analyzed using an adaptively self-learning database. The characteristics of the users may comprise socio-networking characteristics, socio-behavioral characteristics, and psychometric characteristics. The program may further comprise a program code for calculating maven scores for the users based on the characteristics of the users. The program may further comprise a program code for calculating industrial maven scores for the user by using the maven scores and an industry adjustment factor. The program may further comprise a program code for identifying at least an industry specific e-maven based on the industrial maven scores.
The detailed description is described with reference to the accompanying Figures. In the Figures, the left-most digit(s) of a reference number identifies the Figure in which the reference number first appears. The same numbers are used throughout the drawings to refer like features and components.
Systems and methods for identifying an industry specific e-maven by analyzing textual data from social media are described in the present subject matter. The system may collect textual data posted by users on social media. Though it should be appreciated that the social media users and users can be used interchangeably in the present invention. The system may collect the textual data using an industrial corpus. The industrial corpus may be programmed by an administrator to store industry specific keywords. The industry specific keywords may be related to a brand of product or service, a category of product or service, and industry jargons. The system may identify the industry specific keywords present the text posted by the users. Identifying the industry specific keywords may subsequently help the system in identifying an industry specific e-maven.
Further, the system may determine characteristics of the users. The system may analyze the textual data posted by the users for determining the characteristics of the users. The system may analyze the textual data using an adaptively self-learning database. Further, determining the characteristics may comprise determining socio-networking characteristics, socio-behavioral characteristics, and psychometric characteristics of the users.
Post determining the characteristics, the system may calculate maven scores for the user. The system may calculate the maven scores based on the characteristics of the users. The system may further calculate industrial maven scores using the maven scores and an industry adjustment factor. Finally, the system may identify at least an industry specific e-maven based on the industrial maven scores. In an example, users having highest industrial maven score may be identified by the system as the industry specific e-maven.
While aspects of described system and method for identifying an industry specific e-maven by analyzing textual data from social media may be implemented in any number of different computing systems, environments, and/or configurations, the embodiments are described in the context of the following exemplary system.
Referring now to
In one embodiment, as illustrated using
The I/O interfaces 204 may include a variety of software and hardware interfaces, for example, a web interface, a graphical user interface, and the like. The I/O interfaces 204 may allow the system 102 to interact with a user directly. Further, the I/O interfaces 204 may enable the system 102 to communicate with other computing devices, such as web servers and external data servers (not shown). The I/O interfaces 204 can facilitate multiple communications within a wide variety of networks and protocol types, including wired networks, for example, LAN, cable, etc., and wireless networks, such as WLAN, cellular, or satellite.
The memory 206 may include any computer-readable medium known in the art including, for example, volatile memory, such as static random access memory (SRAM) and dynamic random access memory (DRAM), and/or non-volatile memory, such as read only memory (ROM), erasable programmable ROM, flash memories, hard disks, optical disks, and magnetic tapes.
In one implementation, the system 102 may collect textual data posted by users on social media. The textual data posted by the users, on the social media, are well known in the art as posts. The social media that may be used for collecting the textual data are FACEBOOK®, LINKEDIN®, TWITTER®, GOOGLE+®, and FLICKR® social networking online forums and other blogs and microblogs. The system 102 may collect the textual data using an industrial corpus. In a case, the industrial corpus may be a programmable database. The industrial corpus may be programmed by an administrator to store a set of keywords. The keywords may be industry specific and may be related to a brand of product or service, a category of product or service, and industry jargons. The industry specific keywords may be related to geography of interest for an industry. In an example, a cell phone manufacturer may be interested in identifying the e-mavens. The industrial corpus may then be programmed to store the keywords related to the cell phone industry. For an example, the keywords related to the cell phone industry may comprise cell phone, smart phone, mobile, and phablet. Further, the keywords related to different brands of cell phones and some models of the different brands may also be stored in the industrial corpus. Thus, the system 102 may scan for the industry specific keywords, using the industrial corpus, in the posts. Scanning for the industry specific keywords may help in identifying the posts relevant to a specific industry.
In one embodiment, the system 102 may scan for the industry specific keywords on a post of the user. The post of the user may be present on the FACEBOOK® social networking online forum. In an example, the post of the user may be “my smart phone now helps me to connect with my friends and family.” The system 102 may identify the keyword “smart phone” because smart phone represents a product and the keyword may already be stored on the industrial corpus.
Upon identifying the post of the user including the industry specific keywords, the system 102 may extract a user identification code (user ID). The system 102 may extract the user ID from information about the user. Post extracting the user ID, the system 102 may acquire entire textual data associated with the user ID of the user. The entire textual data may correspond to other posts of the user on same social website or on other social media websites used by the user. Further, the system 102 may collect a screen name of the user and social demographics of the users.
Post collecting the entire textual data, the system 102 may refine the entire textual data for removing stop words, tokenizing the textual data, and correcting spelling of the textual data. Subsequently, the system 102 may perform language style analysis and language content analysis of the entire textual data. The system 102 may perform the language style analysis for identifying different language fields. The different language fields may comprise a count statistics, pronouns, emotions, and grammar. The different language fields may be identified based on a set of parameters. In one embodiment, the set of parameters identified in the count statistics of the entire textual data are as follows.
In one embodiment, pronouns associated with the entire textual data may be as follows.
In one embodiment, the set of parameters determined during grammar analysis of the entire textual data are as follows.
In one embodiment, types of emotions associated with the entire textual data may be identified as follows.
Post performing the language style analysis of the entire textual data, the system 102 may perform language content analysis of the entire textual data. Language content analysis of the entire textual data may be performed using the adaptively self-learning database 208. While performing the language content analysis using the adaptively self-learning database 208, the system 102 may determine characteristics of the users. The characteristics may comprise of three components. Specifically the three components being socio-networking characteristics, socio-behavioral characteristics, and psychometric characteristics of the user.
The system 102 may further determine the socio-behavioral characteristics of the user by using a socio-behavior lexical repository 210. The socio-behavior lexical repository 210 may comprise a socio-behavior seed word dictionary. The socio-behavior seed word dictionary may comprise a prestored set of seed words. Each set of the seed words may correspond to a socio-behavior characteristic. The system 102 may match keywords present in the entire textual data related to the users with seed words of the socio-behavior seed word dictionary. A section of the socio-behavior seed word dictionary comprising set of seed words, a socio-behavior characteristic corresponding to each set of the seed words, and a description of the socio-behavior characteristic is illustrated below.
In an embodiment, the variables determined from the socio-behavioral characteristics of the users may be as follows:
The system 102 may further determine the psychometric characteristics of the user by using a psychometric lexical repository 212. The psychometric lexical repository 212 may comprise a psychometric seed word dictionary. The psychometric seed word dictionary may comprise a prestored set of seed words. Each set of the seed words may correspond to a psychometric characteristic. The system 102 may match keywords present in the entire textual data related to the users with seed words of the psychometric seed word dictionary. Further, a section of the psychometric seed word dictionary comprising set of seed words, a psychometric characteristic corresponding to each set of the seed words, and a description regarding the psychometric characteristic is illustrated below.
In an embodiment, variables apart from the language fields derived during language style analysis may be derived. Cumulatively, the variables derived from the psychometric may be as mentioned below:
Further, the system 102 may also be connected to standard lexical databases available online over World Wide Web. In one embodiment, the system 102 may be connected to the standard lexical database like WORDNET™. The standard lexical database may help the system 102 in identifying words similar to the seed words i.e. synonyms of the seed words. In this manner, the standard lexical databases may augment word coverage of the socio-behavior seed word dictionary and the psychometric seed word dictionary and thus improvise the adaptively self-learning database (208). Thus, the adaptively self-learning database (208) learns and adapts in the above described manner.
While determining the socio-behavioral characteristics and psychometric characteristics of the users, the adaptively self-learning database 208 of the system 102 may come across a new topic or a new word. The new topic may not be already present in the adaptively self-learning database 208. During such a situation, the system 102 may communicate with a lexical database which may be available online over the World Wide Web. In one embodiment, the system 102 may communicate with the lexical database DBPEDIA™. The system 102 may communicate with DBPEDIA™ using WIKIPEDIA™ Application Programming Interfaces (API's). API's refer to a set of program instructions for performing a dedicated task. Using DBPEDIA™, the system 102 may classify the new word into one of predefined categories. The predefined categories used for classifying the new word may be related to characteristics of a name, place or a thing. A section illustrating the predefined categories for the text classification is as shown below.
Post determining the socio-behavioral characteristics and psychometric characteristics of the users, the system 102 may determine the socio-networking characteristics of the users. The system 102 may determine the socio-networking characteristics of the users from social network activity of the users. The parameters used for defining the social network activity/socio-networking characteristics of the users are as follows:
Based on the language content analysis of the entire textual data, the system 102 may derive a structured set of textual data. The structured set of textual data may comprise a plurality of variables derived from the entire textual data present in an unstructured form. In one embodiment, the plurality of variables derived from the unstructured form of textual data are as follows:
Post deriving the plurality of variables, the system 102 may store the plurality of variables against a corresponding user ID. This leads to the derivation of a structured form of textual data from an unstructured form of textual data. Further,
After deriving the structured form of textual data, the system 102 may compute a maven score for the users. In one embodiment, the maven score may depend on number of connections of the users. The number of connections of the users may help in determining a reach of the users into the social media. Types of the connections that the users may have are primary connections, secondary connections, and tertiary connections. The primary connections may refer to friends present in a friend list of the users. The secondary connections may refer to friends of friends of the users. The tertiary connections may refer to persons viewing and/or commenting on the posts of the users and the persons may not be connected to the users. In one case, the primary connections of the user may only be considered for computing the interim maven score for the users. An increase in a number of the connections of the users may correspond to an increase in the interim maven score. Thus, an equation 1, as mentioned below, may be derived using the present relation.
Interim maven Score∝Number of connections (N) Equation 1
Further, a reach of the posts of the users may depend on a number of views of the posts of the users. The number of views of a post of a user may be determined based on a number of comments on the post, a number of likes for the post, and a number of retweets of the post. The number of views of the posts may help in determining a quality of the information provided by the users. Thus, equations 2, 3, 4, and 5 as mentioned below, may be used for calculating the interim maven score.
Interim maven score∝Number of product mentionŝ(Average number of comments+average number of likes+average number of retweets+posts velocity) Equation 2
Interim maven score∝Number of brand mentionŝ(Average number of comments+average number of likes+average number of retweets+posts velocity) Equation 3
Interim maven Score∝Number of new product mentionŝ(Average number of comments+average number of likes+average number of retweets+posts velocity) Equation 4
Interim maven score∝Number of deals/offerŝ(Average number of comments+average number of likes+average number of retweets+posts velocity) Equation 5
Further, the system 102 may calculate a high media consumption score of the users by acquiring information from media sources. The media sources may comprise books, newspapers, Television (TV) soaps, news, and movies. The high media consumption score may be calculated based on the following factors:
Thus equation 6, as mentioned below, may be derived for calculating the high media consumption score.
High media consumption score=Percentage of words associated with the social media Equation 6
Equation 7, as mentioned below, may be used in an embodiment for calculating a maven score.
Maven Score=Psychometric characteristic+High media consumption+Number of connections,*{Number of brandŝa*Number of Productŝb*Number of New Productŝc*Number of Dealŝd} Equation 7
The variables used in the equation 7 have their significances as described henceforth. Here,
The system 102 may also determine an area of interest factor. The area of interest factor is indicating to the social media users' interest in a specific industry. The area of interest factor may be used as an industry adjustment factor in calculating industry specific maven scores. The area of interest factor of the social media users may base on the following features:
Thus equation 8, as mentioned below, may be used for determining an area of interest factor of the users.
In one embodiment, the system 102 may use the area of interest factor of the users, as an industry adjustment factor. The industry adjustment factor may then be used for calculating an industrial maven score of the users. The industrial maven score may be used for identifying the industry specific e-mavens. Equation 9, as mentioned below, may be used for calculating the industrial maven score.
Industrial Maven Score=Psychometric characteristic+High Media Consumption+Industry Adjustment Factor*Number of Connections*{Number of Brandŝa*Number of Productŝb*Number of New Productŝc*Number of Dealŝd} Equation 9
The variables used in the equation 9 have their significances as described henceforth. Here,
Hence, the industrial maven score, so derived, may be used for ranking the users and thus identifying the e-mavens. The e-mavens identified in this manner may be privileged by the brands, products, or services. The e-mavens may make an effort of publicizing about the brands, products, or services on the social media in a manner as explained here afore.
Referring now to
The order in which the method 400 is described is not intended to be construed as a limitation, and any number of the described method blocks can be combined in any order to implement the method 400 or alternate methods. Additionally, individual blocks may be deleted from the method 400 without departing from the spirit and scope of the subject matter described herein. Furthermore, the method can be implemented in any suitable hardware, software, firmware, or combination thereof. However, for ease of explanation, in the embodiments described below, the method 400 may be considered to be implemented in the above described system 102.
At block 402, textual data posted by users on a social media may be collected. User identification (user ID) may be extracted using the textual data. Further textual data related to the user on other social media may be collected and stored against the user ID. This textual data may further be analyzed.
At block 404, characteristics of the users may be determined by analyzing the textual data. The characteristics may be determined by performing language style and language content analysis of the textual data related to the users. The characteristics may comprise socio-networking characteristics, socio-behavioral characteristics, and psychometric characteristics of the user. The socio-behavioral characteristics may be determined using the socio-behavioral lexical repository 210. Further, the psychometric characteristics may be determined using the psychometric lexical repository 212. In one implementation, the characteristics may be determined by the processor 202.
At block 406, maven scores may be calculated based on the characteristics. The maven scores may be calculated using a set of equations (Equations 1-7). In one implementation, the maven scores may be calculated by the processor 202.
At block 408, industrial maven scores may be calculated using the maven scores and an industry adjustment factor. In one implementation, the industrial maven scores may be calculated by the processor 202, using equations 8 and 9.
At block 410, e-mavens may be identified using the industrial maven scores. The users having a higher value of the industrial maven score may be identified as the industrial e-mavens. In one implementation, the e-mavens may be identified by the processor 202.
Although implementations for methods and systems for identifying an industry specific e-maven by analyzing textual data from social media have been described in language specific to structural features and/or methods, it is to be understood that the appended claims are not necessarily limited to the specific features or methods described. Rather, the specific features and methods are disclosed as examples of implementations for identifying industry specific e-mavens by analyzing textual data from social media.
Number | Date | Country | Kind |
---|---|---|---|
3599/MUM/2014 | Nov 2014 | IN | national |