Social media may represent a source of user generated data, which may be harnessed to extract subtle insights that have the potential to bring significant market differentiation. Social media data may be used for various purposes such as user feedback driven product feature design, psychological intervention for mental disorders, etc.
Features of the present disclosure are illustrated by way of example and not limited in the following figure(s), in which like numerals indicate like elements, in which:
For simplicity and illustrative purposes, the present disclosure is described by referring mainly to examples. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present disclosure. It will be readily apparent however, that the present disclosure may be practiced without limitation to these specific details. In other instances, some methods and structures have not been described in detail so as not to unnecessarily obscure the present disclosure.
Throughout the present disclosure, the terms “a” and “an” are intended to denote at least one of a particular element. As used herein, the term “includes” means includes but not limited to, the term “including” means including but not limited to. The term “based on” means based at least in part on.
Data-driven social media analytics application synthesis apparatuses, methods for data-driven social media analytics application synthesis, and non-transitory computer readable media having stored thereon machine readable instructions to provide data-driven social media analytics application synthesis are disclosed herein. The apparatuses, methods, and non-transitory computer readable media disclosed herein provide for the design and generation of social media analytics applications. In this regard, the apparatuses, methods, and non-transitory computer readable media disclosed herein provide for synthesizing of a design template for new social media analytics applications by identifying a correct set of features, ensemble of machine learning methods, performance metrics, and a machine learning architecture pipeline.
With respect to the apparatuses, methods, and non-transitory computer readable media disclosed herein, social media may represent a major source of user generated data, which may be harnessed to extract subtle insights that have the potential to bring significant market differentiation. Social media data may be used for various purposes such as user feedback driven product feature design, psychological intervention for mental disorders, etc.
In this regard, it is technically challenging to design and develop machine learning (ML) based data-driven social media analytics (SMA) applications. For example, design and development of machine learning based data-driven social media analytics applications may require significant experience in multiple domains.
The apparatuses, methods, and non-transitory computer readable media disclosed herein may address at least the aforementioned technical challenges by implementing a feature identification process or selection of machine learning method ensembles. In this regard, automated support in the design and development phases may accelerate an overall new application development process, and reduce the delay between ideation to production through reuse of machine learning designs across semantically correlated applications.
According to examples disclosed herein, the apparatuses, methods, and non-transitory computer readable media disclosed herein provide technical benefits such as generation of a new application that is robust and less prone to failure. For example, the apparatuses, methods, and non-transitory computer readable media disclosed herein provide for implementation of a feature identification process or selection of machine learning method ensembles for development of a new application. In this regard, the apparatuses, methods, and non-transitory computer readable media disclosed herein provide for synthesizing of a design template for new social media analytics applications by identifying a correct set of features, ensemble of machine learning methods, performance metrics, and a machine learning architecture pipeline.
For the apparatuses, methods, and non-transitory computer readable media disclosed herein, the elements of the apparatuses, methods, and non-transitory computer readable media disclosed herein may be any combination of hardware and programming to implement the functionalities of the respective elements. In some examples described herein, the combinations of hardware and programming may be implemented in a number of different ways. For example, the programming for the elements may be processor executable instructions stored on a non-transitory machine-readable storage medium and the hardware for the elements may include a processing resource to execute those instructions. In these examples, a computing device implementing such elements may include the machine-readable storage medium storing the instructions and the processing resource to execute the instructions, or the machine-readable storage medium may be separately stored and accessible by the computing device and the processing resource. In some examples, some elements may be implemented in circuitry.
Referring to
According to examples disclosed herein, each social media analytics application of the plurality of social media analytics applications 104 may include a problem description, a social media platform, a feature set, an ensemble of machine learning methods, a performance metric, and an application identification.
A corpus builder and term normalizer 108 that is executed by at least one hardware processor (e.g., the hardware processor 602 of
An actor, action, and object extractor 112 that is executed by at least one hardware processor (e.g., the hardware processor 602 of
An application and embedding space mapper 116 that is executed by at least one hardware processor (e.g., the hardware processor 602 of
A semantic cohesion network generator 120 that is executed by at least one hardware processor (e.g., the hardware processor 602 of
A semantic cohesion estimator 124 that is executed by at least one hardware processor (e.g., the hardware processor 602 of
A semantically cohesive group identifier 128 that is executed by at least one hardware processor (e.g., the hardware processor 602 of
A new application synthesizer 132 that is executed by at least one hardware processor (e.g., the hardware processor 602 of
An affinity group identifier 136 that is executed by at least one hardware processor (e.g., the hardware processor 602 of
A design template generator 140 that is executed by at least one hardware processor (e.g., the hardware processor 602 of
A feature set synthesizer 144 that is executed by at least one hardware processor (e.g., the hardware processor 602 of
An association matrix generator 146 that is executed by at least one hardware processor (e.g., the hardware processor 602 of
A machine learning method ensemble synthesizer 150 that is executed by at least one hardware processor (e.g., the hardware processor 602 of
A performance metric identifier 152 that is executed by at least one hardware processor (e.g., the hardware processor 602 of
A machine learning pipeline identifier 154 that is executed by at least one hardware processor (e.g., the hardware processor 602 of
According to examples disclosed herein, the new application synthesizer 132 may synthesize, based on the identified semantically cohesive groups, the feature set for the new social media analytics application 134, the ensemble of machine learning methods for the new social media analytics application 134, the performance metrics for the new social media analytics application 134, and the machine learning pipeline architecture for the new social media analytics application 134, the new social media analytics application 134.
Operation of the apparatus 100 is described in further detail with reference to
With respect to characterization of social media analytics applications, characterization may include components such as problem specification, social media platform, feature set, ensemble of machine learning methods, performance parameters, and machine learning architectural pipeline.
Problem specification may include, for example, a text based description specifying key objectives of the social media analytics application and problem. For example, a problem specification may indicate: SMA1: “Identify level of drug addiction of a user from his/her twitter feeds”. According to another example, a problem specification may indicate: SMA2: “Find how popular a product is using user feedback data”.
With respect to social media platform, an identifier for the social media platform, where users post their feeds, may include, for example, Twitter™, Reddit™ Facebook™, Amazon Product Feedback™, etc.
A feature set may include a set of features together with the performance association scores (PAS). Examples of feature sets may include lexicon features, syntactic features, dependency features, statistical features, image features, audio features (e.g., MCMC), time series, etc.
The ensemble of machine learning methods may include machine learning methods used for solving underlying learning challenges for building social media analytics application. Examples: Linear Classifier|Non-Linear Classifier
With respect to performance parameters, set of parameters (single or multiple) used for measuring performance of the ML methods underlying the social media analytics application. Examples of the performance parameters may include, for example, accuracy, precision, recall, F-1, etc.
With respect to machine learning architectural pipeline, a sequence of steps to be executed on an AutoML platform may utilize a feature set as a specification to extract feature values from data, training machine learning methods, and estimating performance parameters.
With respect to social media analytics application, examples may include applications for product planning based upon customer feedback analysis, stock prediction using twitter sentiment analysis, mentor identification through Reddit™ analysis, social media analytics for crisis management, social media analytics for smart health, social media analytics for tourism planning and investment, social media analytics for ad selection, social media analytics for product marketing, social media analytics for mental health, etc.
Feature types may include text features, contextual features, and non-text features. Text-features may be extracted from text of the user feed in social-media, lexical, syntactic, semantic, and statistical. Contextual-features may be extracted from the meta information associated with the user feed in social-media, location, language, date-time, gender, and topic of user feed. Non-text features may be extracted from the non-text context of the user feed in social-media, image features, audio/voice features, video features, hyperlinks, and crosslinks.
Referring to
Referring to
With respect to measurement of semantic cohesion among social media analytics applications, at block 204 of
With respect to extracting actors, actions, and objects, at block 206 of
With respect to mapping terms to embedding space, at block 208 of
e(w)←bm25(w)*e(w)
For each multi-word term z=w1 . . . wn, the application and embedding space mapper 116 may generate term embedding by summing embeddings of constituent words as follows:
With respect to mapping of social media analytics applications to embedding space, the application and embedding space mapper 116 may generate application embeddings. For example, for each social media analytics application smaapp in the application database smaDB, the application and embedding space mapper 116 may populate a list of entity terms in its description as en[smaapp], a list of action terms as act[smaapp], and remaining words as r[smaapp]. The application and embedding space mapper 116 may map smaapp onto the embedding space as follows:
With respect to generation of a semantic cohesion network, at block 210 of
for all e=(v1,v2)∈EsmaDB:
bonding(v1,v2)=semCh(smaapp
With respect to determination of semantic cohesion among applications, at block 212 of
e(smaapp
and
e(smaapp
The semantic cohesion estimator 124 may determine semantic cohesion between smaapp
With respect to identification of semantically cohesive application groups, at block 214 of
Φ′=Φ\{V⊆VsmaDB|∃v,v′∈V s.t. (v,v′)∉EsmaDB}
Further, the semantically cohesive group identifier 128 may remove from Φ′ all those subsets of nodes which are included in any other subset in Φ′ as follows:
Φ″=Φ′\{V∈Φ′|∃V′∈Φ′ s.t. V⊂V′}
In this regard, V′ may represent another subset of nodes such that it contains all the nodes in the set V (e.g., nodes in V are also included in V′). In this regard, the semantically cohesive group identifier 128 may identify all the maximal groups of social media analytics applications so that all the applications within each group are semantically cohesively connected to each other. Further, these groups may include maximum possible applications in that if any other application is added to a group, it may not be semantically cohesive to at least one of the existing application in the group. In this regard, Φ″ may represent the list of these maximal groups of social media analytics applications.
Next, with respect to synthesizing a design template for a new social media analytics application, at block 216 of
Γ(Q)=mean{semCh(newsmaAPP,X)|X∈Q}
In this regard, X may represent a variable that ranges over the group Q (e.g., X one by one takes values as different elements of group Q so that for that element semCh(newsmaAPP, X) may be estimated). The affinity group identifier 136 may sort application groups as per descending order of Γ(·). In this regard, the affinity group identifier 136 may specify Qmax∈Φ″ to be the group having maximum Γ(·) as follows:
Γ(Qmax)=max{Γ(Q)|Q∈Φ″}
With respect to generation of a design template for the new social media analytics application, at block 218 of
With respect to synthesizing a feature set for the new social media analytics application, at block 220 of
{smaApp1, . . . ,smaAppk}
The feature set synthesizer 144 may specify feature sets for these group of applications to be
Fset1:{f1,1, . . . ,f1,n
Fsetk:{fk,1, . . . ,fk,n
The features sets may be specified such that feature set of 1st application in Qmax has n1 features, 2nd application has n2 features, . . . etc.
With respect to feature relevance for the new social media analytics application, the feature set synthesizer 144 may populate a feature relevance matrix by specifying
In this regard, Fset may represent the set of all unique features across all feature sets in Qmax. For each feature Fi∈Fset, the feature set synthesizer 144 may specify βi,j to be its performance association score (PAS) for smaAppj as follows:
In this regard, βi,jm×k may represent the feature relevance matrix. The feature set synthesizer 144 may determine (e.g., estimate) a probable performance association score to filter features. The feature set synthesizer 144 may determine a likely performance association score of each feature Fi for the new application newsmaApp based on its performance association score for existing applications in Qmax, and their semantic cohesions with newsmaApp as follows:
The feature set synthesizer 144 may remove all of the features from Fset having a PAS less that a threshold parameter π∈[0,1], where 0 indicates that all the features are to be considered, and 1 indicates that only those features which are likely to contribute to the performance are to be considered as follows:
Fset←Fset\{Fa∈Fset|PAS(Fa)<π}
The feature set synthesizer 144 may sort features in Fset based upon their PAS estimates. The final set of features in Fset may represent the 1st component of the design template for the new application newsmaAPP.
With respect to the machine learning method ensemble for the new social media analytics application, the feature set synthesizer 144 may collect machine learning methods from ensembles of affinity groups. In this regard, with each social media analytics application in smaDB, a set of machine learning methods may be associated to address the learning challenge underlying the application. A set of machine learning methods together may work as an ensemble. When there is one machine learning method, it may be considered as a singleton ensemble. The feature set synthesizer 144 may specify the ensembles of machine learning methods associated with the social media analytics applications in the affinity group of newsmaApp corresponding to Qmax be:
MLset1:{ML1,1, . . . ,ML1,t
MLsetk:{MLk,1, . . . ,MLk,t
Thus, the ensemble of machine learning methods for the 1st application in Qmax may have t1 methods, the 2nd application may have t2 methods, . . . etc.
With respect to feature-methods associations for the new social media analytics application, at block 222 of
MLset may represent the set of all unique machine learning methods across all ensembles for Qmax. Each machine learning method in the ensemble for a social media analytics application may be associated with all the features for the application. Further, across all applications in the affinity group Qmax, each method may be associated with multiple features as it may appear in the multiple ensembles, which may provide the basis for the three-dimensional feature-method-application association matrix. For the three-dimensional feature-method-application association matrix, the 1st dimension may be specified such that each row of the matrix represents a feature appearing in the final feature set Fset. The 2nd dimension may be specified such that each column of the matrix is for a machine learning method in MLset. Further, the 3rd dimension may be specified such that each aisle of the matrix is for an application in Qmax. With respect to contents of the three-dimensional feature-method-application association matrix, each cell φa,b,cm×y×k may be specified as:
In this regard, φa,b,c may represent the performance association scores (PAS) of feature Fa∈Fset for application smaAppc∈Qmax if machine learning method MLb∈MLset is also in the ensemble set MLsetc of application smaAppc.
With respect to the ensemble for the new social media analytics application, at block 224 of
The machine learning method ensemble synthesizer 150 may remove all those machine learning methods from MLset having dsig(·) less that a threshold parameter ϵ∈[0,1]:
With respect to the performance metric for the new social media analytics application, at block 226 of
overlap(smaAppj)=ΣF
The performance metric identifier 152 may sort applications as per overlap(·) in descending order. The performance metric identifier 152 may specify smaAppz to be the application at the top of the list. Further, the performance metric identifier 152 may specify pz to be the performance metric (single or multiple) for smaAppz. In this regard, pz may represent the 3rd component of the design template for newsmaApp.
With respect to the design architecture for the new social media analytics application, at block 228 of
Referring to
The processor 602 of
Referring to
The processor 602 may fetch, decode, and execute the instructions 608 to generate, for each social media analytics application of a plurality of social media analytics applications 104, a corpus 110.
The processor 602 may fetch, decode, and execute the instructions 610 to perform, for the corpus 110 for each social media analytics application of the plurality of social media analytics applications 104, term normalization.
The processor 602 may fetch, decode, and execute the instructions 612 to generate, for the corpus 110 for each social media analytics application of the plurality of social media analytics applications 104, and based on the term normalization, a normalized corpus.
The processor 602 may fetch, decode, and execute the instructions 614 to generate, for the normalized corpus for each social media analytics application of the plurality of social media analytics applications 104, an actor, an action and an object 114.
The processor 602 may fetch, decode, and execute the instructions 616 to map, based on the actor, the action and the object 114, each social media analytics application of the plurality of social media analytics applications 104 into an embedding space 118.
The processor 602 may fetch, decode, and execute the instructions 618 to generate, for each social media analytics application of the plurality of social media analytics applications 104 mapped into the embedding space 118, a semantic cohesion network 122.
The processor 602 may fetch, decode, and execute the instructions 620 to determine, for each social media analytics application of the plurality of social media analytics applications 104, and based on the semantic cohesion network 122, a pair-wise semantic cohesion 126.
The processor 602 may fetch, decode, and execute the instructions 622 to identify, for each social media analytics application of the plurality of social media analytics applications 104, and based on the pair-wise semantic cohesion 126, semantically cohesive groups 130.
The processor 602 may fetch, decode, and execute the instructions 624 to synthesize, based on the identified semantically cohesive groups 130, a new social media analytics application 134.
Referring to
At block 704, the method may include synthesizing, based on the identified semantically cohesive groups 130, a new social media analytics application 134.
Referring to
The processor 804 may fetch, decode, and execute the instructions 808 to synthesize, based on the semantic cohesion analysis, a new social media analytics application 134.
What has been described and illustrated herein is an example along with some of its variations. The terms, descriptions and figures used herein are set forth by way of illustration only and are not meant as limitations. Many variations are possible within the spirit and scope of the subject matter, which is intended to be defined by the following claims—and their equivalents—in which all terms are meant in their broadest reasonable sense unless otherwise indicated.
Number | Name | Date | Kind |
---|---|---|---|
20150248476 | Weissinger | Sep 2015 | A1 |
20180012230 | Feigenblat | Jan 2018 | A1 |
20200042316 | Roy | Feb 2020 | A1 |
20200126174 | Halse | Apr 2020 | A1 |
20210011736 | Kim | Jan 2021 | A1 |
20210065048 | Salonidis | Mar 2021 | A1 |
20210083855 | Polleri | Mar 2021 | A1 |
20210294969 | Viswanathan | Sep 2021 | A1 |
20220051116 | Yu | Feb 2022 | A1 |
20220383639 | Javan Roshtkhari | Dec 2022 | A1 |
Number | Date | Country | |
---|---|---|---|
20220292264 A1 | Sep 2022 | US |