This application relates in general to personality analysis, and in particular, to a computer-implemented system and method for personality analysis based on social network images.
Knowledge of personality traits of individuals can be valuable in many contexts. For example, for organizations, regardless of whether they are business entities, educational institutions, or governmental units, such knowledge can be useful to improve workplace compatibility, reduce conflicts, and detect anomalous or malicious behavior. Similarly, such knowledge may be useful in a medical context, such as for monitoring effects of a treatment on the psychological wellbeing of a patient, detecting depression, and preventing suicide.
Obtaining this knowledge about individuals can be difficult as people may not be willing to provide unbiased information about themselves to interested parties, such as a potential employer. Furthermore, this challenge is exacerbated when a large number of individuals of interest is involved and the information about the personality of each of the individuals needs to be kept up-to-date. Whereas for an evaluation by a trained psychologist may be conducted for a small number of individuals, such an evaluation becomes impractical as the number of individuals grows and there is a need for reevaluations.
Current technology does not provide an efficient and accurate way to deal with these challenges. For example, currently one way a personality of a person is evaluated is by making the subject of the evaluation fill out specifically-designed surveys. Such surveys tend to be time-consuming and potentially intrusive, discouraging the individuals from thoroughly and completely answering the questions of the surveys. Furthermore, such surveys provide information about an individual's personality only at the time the survey is taken; detecting any changes in the individual's personality would require an individual to take the survey again, which is impracticable. Finally, the results of such surveys may be subject to manipulation, with the individuals answering the survey having the opportunity to misrepresent information about themselves in the answers.
Therefore, there is a need for an objective and efficient way to evaluate personality traits of a large number of individuals and detect changes in the individuals' personality over time.
An analysis of images that a person posts on his or her social networking pages provide clues towards the person's personality. The individuals appearing in the images, the scenes in the images, and the objects in the images can be analyzed and a trained supervised machine learning algorithm can perform an evaluation of the person's personality based on the analysis of the images. Such an evaluation does not require a direct input from the person whose personality is evaluated, allowing evaluations of multiple people to be conducted simultaneously, easily repeating an evaluation at a later point of time, and reducing the possibility that a person can influence the evaluation through misrepresentations. The efficiency of the evaluations makes the evaluation easily applicable to a multitude of areas, such as monitoring well-being of individuals, detecting malicious insiders, and advertising products or services best-suited for a person's personality.
In one embodiment, a computer-implemented system and method for personality analysis based on social network images are provided. A plurality of images posted to one or more social networking sites by a member of these sites are accessed. An analysis of the images is performed. Personality of the member is evaluated based on the analysis of the images.
Still other embodiments of the present invention will become readily apparent to those skilled in the art from the following detailed description, wherein is described embodiments of the invention by way of illustrating the best mode contemplated for carrying out the invention. As will be realized, the invention is capable of other and different embodiments and its several details are capable of modifications in various obvious respects, all without departing from the spirit and the scope of the present invention. Accordingly, the drawings and detailed description are to be regarded as illustrative in nature and not as restrictive.
An individual's personality can be described using one or more personality models, with each model being a description of an individual's personality using one or more traits. For example, the Five Factor model, also known as the “Big Five” factors model defines five broad factors, or dimensions, of personality: openness to experience, conscientiousness, extraversion, agreeableness, and neuroticism. Each of these factors is a cluster of particular personality traits. For example, extraversion is a cluster of traits such as assertiveness, excitement seeking, and gregariousness. An assessment of an individual's personality using the model includes calculating a score for each of the five factors, with the score for each of the factors representing the strength the traits clustered under that factor in the member's personality. While personality models such as the Five Factor model aim for a comprehensive assessment of a member's personality, other models can describe specific aspects of a person's personality, such as how strong the person's family ties are, whether the person likes sports, and what kind of activities the person likes to engage in. These models can be quantitative, such as including a score that indicates the strength of a particular trait in a person, or binary, such as whether a user has a particular trait, such as a liking for football, or not.
While there are a multiple ways to evaluate a user's personality, images that a person posts on social networking sites of which he or she is a member are an excellent source of clues towards that member's personality, and can be used to evaluate the member's personality based on one or more personality models. For instance, a member posting photographs with his wife and children is a clue to the member being attached to his family. On the other hand, a member posting photographs of being in a pub along with his friends can be a clue that the member likes to socialize and is not yet married. Processing these pictures allows to evaluate the member's personality. As the processing of the images does not require member input, an analysis of personalities of multiple members can be performed efficiently and repeated as often as necessary. Furthermore, the lack of direct member input in the analysis reduces the likelihood of intentional manipulation of the evaluation results by members. In addition, as many people like to document almost every day of their life on a social networking site, the social networking sites can provide a wealth of detail regarding the person's life.
In one embodiment, the server 11 accesses publicly-available images 14;
in a further embodiment, the server 11 receives from the member access to all of the images 14 stored on the sites 13.
The server 11 implements an image analyzer program 15 that can analyze images 14 accessed from the social networking sites 13 and evaluate the member's personality based on the analyzed images, as further described below beginning with reference to
The server 11 is further connected to a database 17 that can store results of the personality evaluations 18 performed by the server 11. Other data can also be stored in the database 17. For example, the accessed images 14 can be downloaded by the server 11 and stored in the database 17. Still other data can be stored in the database 17.
The server 11 can include components conventionally found in general purpose programmable computing devices, such as a central processing unit, memory, input/output ports, network interfaces, and non-volatile storage, although other components are possible. The central processing unit can implement computer-executable code, such as the image analyzer program 15 and the personality analyzer program 16, which can be implemented as modules. The modules can be implemented as a computer program or procedure written as source code in a conventional programming language and presented for execution by the central processing unit as object or byte code. Alternatively, the modules could also be implemented in hardware, either as integrated circuitry or burned into read-only memory components. The various implementations of the source code and object and byte codes can be held on a computer-readable storage medium, such as a floppy disk, hard drive, digital video disk (DVD), random access memory (RAM), read-only memory (ROM) and similar storage mediums. Other types of modules and module functions are possible, as well as other physical hardware components.
By recognizing and analyzing people, objects, and scenes present in the social networking images 14 posted by the member, the member's personality can be evaluated and monitored.
After the images are accessed, the images 14 are analyzed (step 22), as further described with reference to
The analysis of the objects is further described below with reference to
Based on the analysis of the images 14, the personality of the member who posted the images is evaluated, as further described below with the reference to
In a still further embodiment, the system 10 can determine an approximate time when the member's personality experiences a change. As described below, the analysis of the images 14 can produce statistics regarding people, scenes, and objects present in the images 14. A probability distribution of these statistics can be associated with a set of personality traits. Comparing two such distributions, obtained by analyzing multiple sets of images 14, the sets having been posted at different times, can indicate whether or not the subject's personality has changed. Finding a point in time such that the distribution of statistics before that point is substantially different from the distribution of statistics after that point allows determining the approximate time a change in personality has occurred. This information can be used for monitoring for malicious insiders as well as other purposes.
Any detected changes are output to a user of the system 10 (step 28), ending the method 20. Output can include displaying the changes to a user of the system and sending a message to the user of the system. For example, a message can be sent to the user when a change in a personality is detected. Other ways to alert the user of the system 10 to a change in the personality of the member are possible. Other ways to output the data are possible.
Analyzing the faces present in the images 14 allows to obtain information regarding individuals that a member of a social network spends time with, which can provide clues to the member's personality. For example, a lot of different individuals appearing in the images 14 can be a clue towards the member' extraversion, while a small number or a complete absence of people in the images can be a clue towards the member's introversion.
Once the faces are detected, features of the faces are extracted (step 32) using techniques such as described by N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” International Conference on Computer
Vision and Pattern Recognition, pp. 886-893, 2005, and D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision, 60(2), pp. 91-110, the disclosures of which are incorporated by reference. The extracted features can describe the whole face, or local regions around fiducial points on the faces (such as eyes, nose, mouth, and chin), or regions around automatically determined ‘interest points’ (see e.g. D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,” International Journal of Computer Vision, 60(2), pp. 91-110). Such techniques allow recognizing facial features regardless of the pose and expression of the individual whose facial features are extracted as well as the lighting present in the images 14. Other ways to detect and extract the faces of individuals from the images 14 are possible.
In addition to extracting features from the faces, age and gender of faces are extracted, and consequently, of the individuals whose faces have been extracted, are determined (step 33). The age and gender of the individuals can be determined using a supervised machine learning algorithm that was previously trained on training images. Other ways to determine the age and gender of the individuals, and the faces, are possible.
Constraints for the clustering of the images are also set (step 34). The constraints include a prohibition on putting two faces appearing in the same image 14 from being clustered into the same cluster. A match between the age and gender of the faces being clustered can also be used as another constraint. As age and gender determination may be prone to error, in one embodiment, the age and gender are used as a soft clustering constraint; an exact match between the age and gender of the faces may not be required for faces to be put in the same cluster. In a further embodiment, a match between the age and gender may be required for faces to be in the same cluster. In addition, if there is prior information regarding the face and gender of a particular individual whose face has been extracted, the information can be used as a constraint instead of the determined age and gender. For example, if a particular face can be associated with an owner of a social networking profile, such as through a tag present on the image 14, and the profile has the age and gender of the owner, the age and gender from the profile can be used instead of the determined age and gender. Such use of prior information, also known as priors, as a clustering constraint allows to minimize the age and gender variation between faces in the same cluster. Other constraints are possible.
Once the constraints are set, the faces are clustered based on the similarity of the extracted features for each of the faces and the clustering constraints (step 35). The clusters can be using built using a probability distribution, such as a Gaussian or a t-distribution, using techniques such as described by M. Andreetto, L. Zelnik-Manor, and P. Perona et al., “Non-Parametric Probabilistic Image Segmentation,” ICCV, 2007, and G. J. McLachlan and K. E. Basford. “Mixture Models: Inference and Applications to Clustering,” Marcel Dekker, Inc., 1988. SERIES: statistics: textbooks and monographs, volume 84, the disclosures of which are incorporated by reference. Other clustering techniques can also be used. For example, k-means clustering can be used to generate the clusters of faces. In one embodiment, all features are given equal weight during the clustering. In a further embodiment, some features may be weighted more heavily than others during the clustering. The clustering process further uses an additional background uniform probability distribution to filter out outliers, non-face images, from the clusters, as described in the Andreeto et al. reference cited above. At the end of the clustering process, each cluster should have faces associated with the same individual, with an individual being associated with a single cluster.
Statistics for each cluster is calculated (step 36). Such statistics can include the count of faces in each cluster and the total number of clusters, which corresponds to the number of individuals in the images 14. As an individual's face can appear only once in a single image 14, the count of faces in a cluster equals to the number of images 14 in which an individual associated with the cluster appears. Based on the number of faces in the cluster, a frequency of appearance of an individual associated with the clusters can also be calculated, by comparing the number of images 14 in which the individual appears to a total number of images 14 evaluated. Still other statistics for the clusters can be calculated.
Once the statistics are calculated, the server 11 deduces information about the individuals in the images 14 (step 37), ending the routine 30. For example, the server 11 can deduce which of the individuals associated with the faces in the clusters is the member who posted the images 14 and the relationships, or connections, between the member and individuals associated with other clusters (step 37). The deductions can be based on factors such as the statistics for the clusters and any data known about the member who posted the images 14 and individuals associated with the member. For example, if the age and gender of the member are known, either from the social networking profile of the member or from another source, the age and gender of the member can be compared to the age and gender of individuals associated with the clusters. A cluster with the greatest count of faces and an age and gender of faces matching the age and gender of the member, or having age within a predefined limit of the member's age, can be deduced to be associated with the member who posted the social networking images 14. Similarly, an individual with a gender opposite to the gender of the member, whose age is either the same or within a predefined limit of the age of the member, and with the frequency of appearance among individuals of that age and gender, can be deduced as a significant other of the member, such as a spouse. Likewise, an individual who is younger than the member and who appears in the images 14 with a frequency that satisfies a predefined threshold can be deduced as a child of the member.
In a further embodiment, one of the individuals in the images 14 can be deduced to be the member based on deductions regarding other individuals. For example, if one of the individuals in the images 14 has been deduced to be the member's spouse, an individual of the opposite gender with an age matching, or being within a predefined limit, of the age of the member, and appearing in the same images 14 as the spouse more frequently than other individuals of a similar age and the same gender, can be deduced to be the member.
While an exact relationship between the member and other individuals appearing in the images 14 may not always be possible to determine, the frequency of appearance of the individuals in the images 14 can indicate the importance of the individuals to the member. Thus, if an individual associated with one of the clusters appears in the images 14 with a frequency, or another statistic, satisfying a predefined threshold, the individual can be marked as an important individual to the member.
In a further embodiment, the scenes in which the individuals appear, which are recognized as further described below, can be used to deduce the connections between the member and the important individuals. For example, if an important individual appears mostly in an office scene, the individual can be identified as a coworker. Similarly, if an individual appears mostly at a social scene, such a pub or a restaurant, the important individual can be identified as a friend.
In a still further embodiment, information present either in the images or the metadata associated with the images can be used to deduce which of the individuals is the member and the connections between the member and other individuals. For example, if individuals appearing in the images 14 are tagged with names, the tags can be compared to information known about the member, such as through the member's profiles, to identify the individuals. For example, if the social networking profile of the member lists certain individuals as close friends, the individuals identified by the tags can be deduced to be close friends of the member. Other ways to deduce which of the individuals is the member and the connections between the member and other individuals are possible. Scenes appearing in the images 14 can also provide important information regarding the social network member's personality.
Optionally, the recognized scenes can be categorized by type (step 42). For example, the scenes can be categorized as indoor scenes, such as an office or a home, or and outdoor scenes, such as a mountain or a river. Other types of scenes can also be present. Finally, one or more statistics is generated for the scenes, and optionally, type of scenes, recognized in the images 14 (step 43), ending the routine 40. For example, such statistics can include the count of different scenes present in the analyzed images 14, the number of times each scene appears in the images 14, the types of scenes present in the images 14, and the number of scenes for each of the types present in the images 14. The counts can also be expressed as a frequency of appearance of a scene or a type of scene in the images 14 by comparing the counts to the total number of images 14. Other statistics can be calculated.
In one embodiment, the scenes can be recognized based on the kinds of objects that appear in the images 14, as described above by Quattoni et al. In a further embodiment, the objects appearing can be used to evaluate the personality of the member directly.
Using a supervised machine learning classifier that has previously trained on social networking images 14 allows to utilize the results of analysis of the images 14 to evaluate a personality of the social network member.
While the invention has been particularly shown and described as referenced to the embodiments thereof, those skilled in the art will understand that the foregoing and other changes in form and detail may be made therein without departing from the spirit and scope of the invention.