Fast-food is a 200-billion-dollar industry in the United States alone, with close to 200,000 quick-service restaurants and over 70 different restaurant chains. Most of these fast-food (FF) chains are on Facebook, Pinterest, Instagram, or TikTok, spending an increasing fraction of their $3+ billion advertising budget on social media marketing. They also face the shared challenge of cutting through the clutter with image ads that are engaging, that create favorable thoughts, and that are remembered. For example, measures of image-ad success in terms of engagement and favorable thoughts are available on major social media platforms (e.g., Facebook, Instagram, Twitter) via their ‘share’ or ‘like’ buttons, as well as similar actions such as ‘saving’ on Pinterest. Because only a subset of visual ads succeed in being memorable, liked, and shared, predicting what makes an image ad successful is highly relevant for the industry.
An illustrative system to predict success of advertisements includes a memory configured to store an advertisement, and a processor operatively coupled to the memory. The processor is configured to identify one or more features of an image included in the advertisement, and to identify similarities between the one or more features and one or more defined attributes. The processor also determines a typicality of the advertisement based at least in part on the similarities between the one or more features and the one or more defined attributes. The processor also assigns a likelihood of success score to the advertisement based at least in part on the typicality.
In one embodiment, the one or more features include one or more object words. In another embodiment, the one or more features include a depiction of an item of food. In an illustrative embodiment, the processor is further configured to identify a relational feature between objects within the image included in the advertisement. In such an embodiment, the processor determines the typicality of the advertisement based at least on part on the relational feature between the objects within the image.
In another embodiment, the processor is configured to estimate a level of interest in the advertisement based at least in part on the similarities between the one or more features and the one or more defined attributes. The likelihood of success score can be based at least in part on the estimated level of interest. In another embodiment, the processor is configured to estimate a level of enjoyment that the advertisement provides based at least in part on the similarities between the one or more features and the one or more defined attributes. In such an embodiment, the likelihood of success score is based at least in part on the estimated level of enjoyment. The processor is also configured to estimate a fluency level of the advertisement based at least in part on the similarities between the one or more features and the one or more defined attributes. In such an embodiment, the likelihood of success score is based at least in part on the estimated fluency level. In another embodiment, the processor is configured to estimate an amount of complexity of the advertisement based at least in part on the similarities between the one or more features and the one or more defined attributes. The likelihood of success score can also be based at least in part on the estimated amount of complexity.
In one embodiment, the one or more defined attributes indicate a likelihood of sharing. In another embodiment, the one or more defined attributes indicate a likelihood of being liked on social media. In another embodiment, the one or more defined attributes indicate a likelihood of being remembered.
An illustrative method of predicting success of advertisements includes storing, in a memory, an advertisement. The method also includes identifying, by a processor operatively coupled to the memory, one or more features of an image included in the advertisement. The method also includes identifying, by the processor, similarities between the one or more features and one or more defined attributes. The method also includes determining, by the processor, a typicality of the advertisement based at least in part on the similarities between the one or more features and the one or more defined attributes. The method further includes assigning, by the processor, a likelihood of success score to the advertisement based at least in part on the typicality.
In another embodiment, the method includes identifying, by the processor, a relational feature between objects within the image included in the advertisement and determining, by the processor, the typicality of the advertisement based at least on part on the relational feature between the objects within the image. In another embodiment, the method includes estimating, by the processor, a level of interest in the advertisement based at least in part on the similarities between the one or more features and the one or more defined attributes, where the likelihood of success score is based at least in part on the estimated level of interest. In one embodiment, the method includes estimating, by the processor, a level of enjoyment that the advertisement provides based at least in part on the similarities between the one or more features and the one or more defined attributes, where the likelihood of success score is based at least in part on the estimated level of enjoyment.
Other principal features and advantages of the invention will become apparent to those skilled in the art upon review of the following drawings, the detailed description, and the appended claims.
Illustrative embodiments of the invention will hereafter be described with reference to the accompanying drawings, wherein like numerals denote like elements.
Described herein are methods and systems to measure ad effectiveness. Specifically, the proposed system predicts the degree to which consumers like, share, and remember image ads. Such a system is of much interest to social media marketing agencies. These agencies face the challenge of developing image ads for social media platforms and other clients that are engaging, memorable, and able to create favorable thoughts in viewers. The proposed system allows these agencies to make rational image ad selections in an automated way. As discussed in more detail below, the proposed system uses scalable computational measures of a plurality of different psychological constructs that, in conjunction, predict the success of image ads. These computational measures are validated using human survey data.
Successful image ads are liked, shared, and remembered. Although the success of an image ad can often be understood with the benefit of hindsight, it is less clear how to predict whether an image ad will be received favorably by its target group. Described herein is a framework to disentangle image-related factors that influence liking, sharing, and recognition responses. The proposed framework is tested with the help of Facebook image ads posted by three fast-food chains. Both human judgments and computational measures of aesthetic features of image ads allow for important insights into how image ads are processed perceptually. Moreover, the framework in combination with the computational measures provides significant benefits for image-ad effectiveness research. Namely, it is highly predictive of an image ad's performance, easy to implement, and scalable.
To test the proposed theoretical framework, consumers were asked to consider various sets of fast-food image ads and to assess which of them is shared most often (
Although the success of an image ad can often be understood with the benefit of hindsight, the examples demonstrate that the task of predicting whether an image ad will be received favorably is difficult. Described herein is a new approach that may prove effective in predicting whether an image ad will be liked, shared, and recognized on the basis of image features alone. Specifically, the inventors identify predictive image features using a theoretical framework that specifies three behavioral success outcomes to be caused partially by internal aesthetic responses to features of the image ads. Subjective experiences of interest and enjoyment are the two major dimensions of these aesthetic responses, which in turn are driven by the fluency, complexity, and typicality features of the images.
The inventors utilized a structural measurement approach to estimate the effects of the five constructs and to test the proposed theoretical framework. All five constructs—interest, enjoyment, fluency, complexity, and typicality—are measured via human judgments in multiple studies. In alternative embodiments, fewer, additional, and/or different constructs may be used. A path model that utilizes measures of the five constructs at the image level is used to test and validate the theoretical framework. To facilitate the scalability of the approach, the inventors also examined computational measures of the five constructs and proposed new ones to obtain a more complete representation of the considered constructs. These algorithms are based on perceptual (pixel-based) and semantic (object-based) image-ad characteristics, and yield readily interpretable measures of the five constructs. Computational measures provide a number of important benefits, one of which is the lack of potential confounds introduced when working with real-life stimuli. Using these computational measures, the inventors replicated most of the findings obtained with human judgments. In addition, the inventors show that the approach can be employed readily for the identification of image ads that score jointly above average on the considered ad success measures.
This work brings together several lines of research, including how visual factors shape human perception and interpretation of image ads, how they affect the liking, sharing, and memorability of image ads, and what computer-vision features predict perceptual and interpretational processes. Specifically, it is shown that for image ads for three fast-food (FF) chains, Chipotle, McDonald's and Wendy's, the interest elicited by an ad as well as its level of typicality and complexity predicts its memorability. In contrast, the enjoyment and interest elicited by image ads predict their liking and sharing and mediate fluency, complexity, and typicality effects.
Initially presented herein is a theoretical framework for the joint prediction of the three ad-success measures. The inventors also reviewed the literature about the considered constructs of typicality, complexity, fluency, interest, and enjoyment, reported the empirical approach to test the framework, and presented the empirical findings. Subsequently, the inventors examined and developed computational measures of the five constructs. These computational measures are used in place of human judgments and their performance in predicting the three ad-success measures has been assessed. The description concludes with a discussion of the key results and an outlook for the future.
As mentioned above, to test the proposed theoretical framework, consumers were asked to consider sets of fast-food image ads (
Included below is a description of a conceptual framework to relate the considered constructs of image perceptions to the three ad-success metrics of liking, sharing, and memorability.
Many of these conjectured links (and lack of links) are based on studies of aesthetic judgments that look at subsets of these constructs in isolation. However, as noted below, because the links between interest and enjoyment and their effects on the memorability of images have received little attention in the literature, the inventors also include previously unexamined hypotheses in the path diagram. These hypotheses turn out to be important because often fast—food images have appetizing features that may elicit both enjoyment and interest reactions which, in turn, may affect their memorability.
The inventors focused first on enjoyment and interest as the central predictors of the considered ad-success measures. Subsequently, the inventors explored the effects of fluency on enjoyment and interest and the effects of typicality and complexity on fluency, enjoyment, and interest. Also examined are direct effects on the ad-success measures and links to computational measures of the constructs, a topic that is discussed in more detail below.
Regarding enjoyment and interest, there is much research to suggest that feelings of enjoyment and interest play a central role in affecting the memorability of images as well as their liking and sharing. Interest and enjoyment have been shown to be the two dominant dimensions in studies of aesthetic judgments of photographs, paintings, music, polygons, and anagrams. Enjoyment, measured by self-rated feelings of pleasure and warmth, may not only trigger image liking, but it may also help predict recognition and sharing of image ads when enjoyment is caused by the ease in the perceptual processing of the images. One key stage in this perceptual processing is the segmentation of an image into objects and background features. For example, figure-ground compositions having a focal figure with a rather uniform background are perceptually simpler than image scenes that lack a clear central figure or uniform background. Such perceptual distinctiveness facilitates memorability because it requires less effort to process. As a result, memorability has been found to be higher for images that contain one or a few objects that are presented both close-up and in an uncluttered. Moreover, the resulting relative ease in image processing is experienced as pleasant. The inventors return to these positive hedonic processing reactions below when discussing the fluency construct.
When enjoyment is experienced in the form of amusement, it is often associated with an arousal-boosting mechanism that facilitates deeper processing that can increase both an ad's memorability and its sharing. For example, it has been demonstrated that video-induced feelings of amusement made content sharing more likely than video-induced feelings of contentment did.
Interest is defined as an affective state that results from a joint appraisal of the novelty—complexity dimensions of a stimulus and a person's ability to make sense of it. Thus, interest motivates people to explore image ads that are not understood but are understandable. An implication of this definition is that interest requires more effort than does enjoyment. Making sense of novel or complex image ads by elaborating on new and existing schemas and knowledge requires cognitive resources. This sense-making process can facilitate the recognition of image ads. Specifically, it has been shown that visual memory is positively affected by the meaning of displayed objects and scenes. In fact, a person's capacity to remember visual information may depend more on conceptual than perceptual distinctiveness.
The majority of the considered image ads in the studies depict fast-food products. It is likely that arousal and attention may be higher for food-product images because one of the primary brain functions is to locate nutritious and high-energy food. Thus, image ads of fast-food can be particularly potent in creating interest because they are appetizing. Importantly, it has been shown that content which appears interesting and arousing (or novel) is more likely to be shared. Moreover, in an analysis of a random sample of Instagram photos, it has been observed that food images are one category of photos that are most often shared.
Feelings of enjoyment and interest are important predictors of liking, sharing, and memorability for an image. Enjoyment and interest also share similar processing mechanisms. In particular, they share arousal-boosting processes triggered by feelings of amusement and excitement. Image ads that elicit these feelings are likely to elicit both enjoyment and interest. In this case, enjoyment and interest are likely to be positively related and similarly predictive of liking, sharing, and memorability. Moreover, enjoyment and interest may also respond to similar antecedents albeit in different ways. Enjoyment has been found to be reduced for more complex stimuli but increased for more familiar stimuli. In contrast, by demanding investment in cognitive resources, complexity can increase interest, whereas images that are easily processed can decrease interest but increase enjoyment. The inventors discuss these antecedents in more detail below.
Regarding fluency, as depicted in
Both perceptual and conceptual fluency can elicit positive affect because the ease in processing both visual details and the meaning of an image is pleasurable. It is less clear whether fluency also increases interest. It has been argued that disfluency can feel stimulating and increase interest, especially in novelty-seeking contexts. However, this observation may depend on the product category. In the automobile category, it was found that fluency measures were effective in predicting interest in frontal-car characteristics.
The extant evidence on a direct link between processing fluency and memorability is mixed. There is strong evidence for the perceptual fluency hypothesis, which states that easily processed items are remembered better. However, when taking memory performance into account, it appears that processing fluency can hinder or not affect recall and recognition memory. For example, in several behavioral and electrophysical studies, it was found that perceptual fluency impairs subsequent recollection by reducing later episodic encoding activities. Similarly, it was shown that disfluency, as opposed to fluency interventions, led to improved memory performance in educational outcome measures. In contrast, other studies have shown that that perceptual fluency did not affect subsequent recognition memory. In view of these inconsistent results, no strong predictions can be made about a direct effect of image ad fluency on subsequent recognition memory.
Regarding typicality, the ability to categorize information is fundamental to the acquisition of knowledge and accumulation of experiences. Categories involve mental representations that encode key aspects of category members. Here, typicality is a critical property of a category that reflects its graded structure, with category members differing in the degree to which they are viewed as typical for the category. For example, a hamburger is more often judged to be a typical member of the fast-food category than corn on the cob. Similarly, in category-verification tasks, typical members of a category are identified more quickly and more accurately than less typical ones.
It is well established that perceived typicality—the degree to which objects are viewed as representative of their category—facilitates fast and efficient processing of visual information. Thus, typicality plays a significant role in the processing of image ads. In a fraction of a second, people can identify brands in ads and infer the meaning of the ad. This speed advantage holds in particular for ads that are typical of their product category. Moreover, the resulting efficiency gain gives rise to an experience of perceptual ease or fluency which in turn may lead to a positive hedonic reaction (e.g., enjoyment). In other words, processing ease often mediates the influence of typicality on enjoyment.
It has also been demonstrated in studies involving ads from different product categories that the increased speed and ease with which typical ads are processed are already associated with feelings of interest after a single eye fixation and mediated by the gist of an ad. In contrast, atypical ads, which may require more mental resources to process, are less likely to generate immediate interest. Although these findings are limited to severe exposure time constraints with a single eye fixation, the processing advantages of typical ads remain valid in general. It is less clear, however, whether these initial typicality advantages carry over to an increased feeling of interest under self-terminating exposure situations. The inventors have tested for this possibility and included a link between typicality and interest in
The inventors also analyzed ad complexity. Because perception and perceptual memory are based on capacity-limited systems, mechanisms for efficient data compression are essential to optimize the amount of information that can be processed and stored. One example of such a data compression mechanism is reflected by the joint photographic experts group (JPEG) image standard, which exploits several biological properties of human sight. By taking into account the insensitivity to variation in color compared to luminance and the relative insensitivity to high spatial frequencies, JPEG compression provides substantial storage savings since it does not require excess fidelity that cannot be processed by human vision.
Visual complexity is a broad construct that has been defined in multiple ways across a wide range of disciplines including computer vision, psychophysics, and cognitive psychology. Definitions relate to the level of intricacy and details in an image, the level of difficulty a human observer encounters in describing an image, or the degree of visual clutter and amount of information conveyed in an image as measured by image-compression algorithms. In view of these multiple distinctions, it is likely that complexity is a multidimensional construct with possibly different effects caused by each dimension.
Support for this observation was provided by the introduction of the important distinction between feature complexity, which depends on basic image features, and design complexity, which further depends on such creative design choices as shapes, objects, and their arrangements. In a large-scale eye-tracking study of image ads, it was found that feature complexity reduced both attention and attitudes towards an ad, whereas design complexity increased both attention and attitudes towards an ad, which in turn improved ad comprehensibility. Since the processing of complex pictures requires more cognitive resources, it is possible that observers allocate more resources to design complexity because design complexity may facilitate an understanding of an image ad more effectively than basic feature complexity. Thus, design complexity can increase interest whereas feature complexity can reduce enjoyment.
To date, theoretical and empirical research on the role of stimulus complexity on memory for affective material has been scant and contradictory. The contrasting findings may partly be explained by the type of stimuli (e.g., natural scenes vs. line drawings) and differences in the definitions of perceptual complexity. However, there is support for the hypothesis that image ads that are more difficult to process and less well understood, are also less well recognized. The inventors therefore conjecture that ad complexity can affect interest, fluency, and enjoyment. In view of these multiple pathways, it is also likely that complexity affects memorability both directly and indirectly. Specifically, it has been suggested that visual complexity may have a direct influence on memorability that is not mediated by enjoyment or interest. It was found that visually more complex images are also more likely to be recognized in a memory task. For this reason, the inventors include a direct path between complexity and memorability in
Included below are the results of empirical studies (n=6,900) designed to estimate the links among the five constructs. Facebook Graph API was used to download image ads and their likes and comments. The inventors downloaded all images from the first day the page was created for three fast-food chains, McDonald's, Wendy's, and Chipotle. The images were sorted into five quantiles based on the number of “likes” they received. Subsequently, the inventors randomly picked 20 images from each group. The resulting 100 images for each of the three fast-food chains were used in five studies, which are described below.
The first study focused on measuring the perception of the images with regard to the proposed five constructs. The images were presented to three groups of 600 Amazon Mechanical Turk (AMT) workers (with an approval rate of at least 95%) according to a PBIB (Partially Balanced Incomplete Block) design. Under this design, each image and all pairs of images were evaluated by the same number of respondents. Specifically, each participant rated five images on the following seven items using a seven-point bipolar scale that includes the following questions: How interesting do you find the image? How complex does it feel to look at the image? How pleasant does it feel to look at the image? How typical is this image for <McDonald's, Wendy's, Chipotle>? How easy is the image to understand? What is your overall feeling towards this image? How engaging is this image?
It is noted that single items were used for measuring fluency, complexity, and typicality. This was done for two reasons. First, as demonstrate herein, single items are sufficient to assess measurement error at the ad-image level. Second, it has been shown that fluency and typicality, respectively, can be measured reliably with single items at the person level. However, to gain additional power in predicting the three ad-success measures, the inventors used two items for measuring enjoyment and interest.
After their assessments of the five images, participants were asked two binary questions about their willingness to “like” and “share” each image if they saw it on Facebook. If respondents clicked on the share button, they were asked to type a comment to accompany the shared image. As noted below, the inventors did not use the original liking and sharing counts in the analyses because of endogeneity issues. Asking for “likes” and “shares” in this way may lead to a higher response rate than the ads would obtain in the original social media setting. However, as shown, the quantile categorization that gave rise to the initial selection of the images is consistent with the quantile categorization based on the “likes” and “shares” of the samples.
The inventors conducted a second study to measure the memorability of the images. This study followed a memory game procedure. Participants were shown 130 image ads (2 seconds per image, with a 0.5 second interstimulus interval) and asked to press the space bar to indicate that the image ad was repeated. Participants were told they would receive a bonus for accurately identifying images as new or as repeated. Over 85% of the participants received this bonus by satisfying our internal accuracy criterion of 80%. The following analyses were conducted with and without the participants who did not earn a bonus. No qualitative differences in the results were obtained. To assess memory performance, 10 of the 130 images were selected as target items and repeated once with a spacing of 91 to 123 images. This approximately four-minute time window for each target image pair allowed the inventors to measure memorability beyond short-term and working memory effects. An additional 20 images were selected as filler items and repeated once with a spacing of 4 to 8 images, and the remaining 70 images were shown only once.
The images were presented following a PBIB design such that all pairs of the target images and single images were presented to the same number of participants. Five hundred AMT workers with an approval rate of at least 95% participated in the study for each of the three FF chains. Each target image was presented twice to 50 workers and once to 350 workers, and these observations were used to arrive at the image's overall hit rate (HR) and false alarm rate (FAR). The inventors converted these measures to a discriminability statistic d′=Φ−1(HR)−Φ−1(FAR), where Φ−1 the inverse of the Normal CDF. In the remainder of this document, this measure is referred to as memorability.
A third and fourth study were conducted to obtain liking and sharing judgments without image ratings. In Study 3, 100 images were presented in random order for two-second intervals and respondents had to indicate that they liked an image by pressing the space bar. Five hundred AMT workers evaluated the images of each FF chain, so a total of n=1,500 AMT workers participated. Study 4 followed a similar approach, except that respondents were asked to indicate that they would share an image by pressing the space bar. When respondents chose to share, they were asked to type a comment to accompany the shared image. As in the “liking” task, a total of n=1,500 AMT workers participated, so that the estimated “sharing” proportions of each image are based on 500 respondents.
In a fifth study, the inventors asked AMT workers to judge the design features of the images. Specifically, after receiving detailed instructions n=600 AMT workers provided binary judgments on the following six image characteristics: a) Quantity of objects—few or many?, b) Details of objects—little or much?, c) Shape of objects—regular or irregular?, d) Similarity of objects—similar or dissimilar?, e) Arrangement of objects—symmetric or asymmetric?, and f) Arrangement of objects—regular or irregular pattern? Each participant rated five images and, similar to the PBIB design in Study 1, each image and all pairs of images were evaluated by the same number of participants. The inventors controlled for AMT workers participating in multiple studies or in the same study by recording their IP addresses.
A reliability analysis, reported below, showed that the measurements of the aesthetic, memorability (discrimination), liking, and sharing characteristics of the images are internally consistent and sufficiently accurate to allow one to form tests to predict the success of an image based on its aesthetic characteristics.
The inventors also explored predicting memorability, liking, and sharing using human image judgments. Specifically, the inventors combined the three data sets to simultaneously analyze the liking proportions, sharing proportions, and memorability measures of the images for each of FF chain. Specifically, the inventors computed the images' liking and sharing proportions from the third and fourth studies and the memorability statistic from the second study.
As described herein, the inventors standardized the ratings of the respondents and aggregated them at the image level. To correct for dependencies at the image level that were not removed by the standardization and aggregation, all reported models were fit with robust standard errors. Moreover, to take into account that the endogenous and exogenous variables are measured with error, reliability estimates were used to adjust the variables' error variances. This approach has been shown to maintain good coverage and Type I error rates in small samples.
The path model, depicted in
The inventors fit the measurement-error corrected path model to the data of each FF chain separately as well as jointly.
Although the estimated relationships are consistent with theoretical results that investigated subsets of these constructs, the inventors also obtained additional results. In particular, the direct effects of typicality and complexity on memorability when allowing for enjoyment and interest as mediators add to the literature. However, in contrast to previous studies, it was found that the direct effect of complexity is negative (−0.82 [0.350]) whereas its indirect effect is positive (0.57 [0.277]) in predicting memorability. The finding that enjoyment and interest mediate fully the effects of fluency, typicality, and complexity is also new and shows the importance of these constructs in understanding liking and sharing behaviors.
The data replicate previous results in the literature showing that typicality can mediate the effect of fluency in predicting enjoyment. It is also noteworthy that although complexity is a positive predictor of enjoyment (0.43 [0.059]), it is a much more important predictor of interest (0.92 [0.068]). In contrast, typicality is a more important predictor of fluency (0.60 [0.031]) than complexity (−0.15 [0.064]). It was also found that less typical image ads trigger more interest. Finally, the predicted variance of the liking, sharing, and memorability measures is substantial, with an average of R2=0.66; the only exception is the memorability predictions for Chipotle (R2=0.33). Taken together, these results provide further support for the path-analytic specification in
The inventors also focused on predicting memorability, liking, and sharing using computational image measures. To facilitate the scalability of the proposed approach, the inventors examined and developed computational measures of typicality, complexity, fluency, interest, and enjoyment. Subsequently, these computational measures were analyzed in place of the human ratings to estimate the path-analytic model and predict images that will score better than average on all three ad-success measures.
Computational measures have four important advantages: First, they provide useful insights on how judgments of the five considered constructs are formed. As discussed below, a number of the considered computational measures take into account human processing characteristics that contribute to well-known phenomena of human perception. Second, if a direct link between computational measures and human ratings can be found, on-the-fly image-ad scoring is possible without the need for human judgment or other sources of information. Third, only a relatively small number of images are required to validate the usefulness of a computational measure—in this regard, the approach provides an alternative to deep learning methods that may require a substantial number of well-selected images for the purpose of building robust prediction models. Fourth, a further advantage of using computational measures is the opportunity it provides causal work. By measuring and manipulating, for example, features of an image ad that are known to be related to the considered constructs, one can explore and develop further psychological theories on how human judgments of these image features are formed.
These benefits are especially important when working with real-life stimuli because in this case endogeneity issues cannot always be avoided. For example, exposure to image ads of a brand caused by brand liking may increase typicality experiences and thus confound the hypothesized typicality—liking link. Computational measures are free of such contextual confounders, which simplifies the interpretation of their effects. However, although the focus is on computational measures, deep learning methods can also be applied when computational measures are not available.
Typicality scoring was also performed. A common method to assess the typicality of an exemplar is to ask participants to provide a category goodness-of-fit rating as done in Study 1. Another related method is to give participants a category and ask them to list features of that category (e.g., wings, beak, feathers, etc. for the category bird). Typical members will have more features that overlap with the generated list, whereas atypical members have fewer features. The inventors adopted the second method to create a computational approach that scores the typicality of an image. Because features may be based on objects depicted in an image, on the design relationships among these objects, or on such structural features as pixel intensities, the inventors computed typicality scores using all three considerations.
To obtain a list of objects presented in the images, the inventors used Google Cloud's Vision API. Compared with image recognition services such as Amazon AWS Rekognition, IBM Watson Visual Recognition, and Microsoft Azure Computer Vision, Google Vision is reported to provide the highest accuracy in tagging image contents, and the tags generated by Google Vision are closest to what a human thinks. Next, and separately for each FF chain, the inventors analyzed the resulting list of detected objects for each image using Latent Semantic Analysis (LSA). The first LSA dimension was used as a measure of semantic typicality of the images because it captures the overlap in common terms across the images. Images that score high on this measure show more of the objects that commonly occur in the image ads of each FF chain.
The second measure of typicality was based on the design features of the images. Here the inventors used the DenseNet (i.e., Densely Connected Convolutional Networks) to arrive at predictions of the six design features (i.e., see Study 5). Specifically, the inventors formulated the prediction as a classification problem with four possible outcomes for each of the six design features based on the quartiles of the observed response proportions. Since DenseNet was trained on image classification tasks that differed from what was being used (i.e., design complexity classification), DenseNet was fine-tuned by first randomly splitting the 300 fast-food images into a 200-image training set and a 100-image testing set. The inventors used the 200 training images to fine-tune the pre-trained DenseNet. The resulting prediction accuracy scores of the six design features (i.e., image characteristics mentioned in Study 5: quantity, details, shape, similarity, and symmetric and regular arrangement types) for the 100-image testing set are 0.7, 0.743, 0.684, 0.61, 0.62, and 0.653, respectively. The inventors also considered XGboost and VGG19 with raw image pixels as input but found that the fine-tuned DenseNet performed the best. The inventors used the DenseNet prediction to compute the first dimension of a multidimensional scaling solution based on the distances of these measures across the 300 image ads. This dimension measures the relational similarity across the images as captured by the design features.
The third measure of typicality was derived from the structural similarity measure (SSIM). This measure compares the similarity in the structural information between two images, controlling for luminance and contrast. SSIM is symmetric and takes on the value 1 when two images have identical pixel intensities. The inventors computed SSIM for all pairs of images within a FF chain and extracted the first eigenvector of the resulting matrix as a structural typicality measure of the images. Images that score high on this measure exhibit more structural information than commonly occurs in the chain's image ads.
The proposed approach differs from that of Mayer and Landwehr (2018), who also utilized computational methods to arrive at typicality measures of car design images. In their context, it was possible to arrive at a prototypical car design by averaging characteristic feature points of a given design: The authors computed typicality based on the Euclidean distance between the averaged and the respective feature points. This approach is not feasible in the proposed application because the used images are not sufficiently homogeneous for an average to be meaningful. However, the results show that there are multiple ways to assess typicality by capturing similarity at the pixel level, the object level, and between the relational features of the objects.
The strongest predictor of the three computational measures is based on the semantic similarity of the identified objects in each image. Specifically, image ads are judged as more typical if they contain objects that can also be found in other images. The second strongest predictor is based on the relational features of the objects. Structural similarity features are also found to affect typicality assessments of the images. The importance order of these predictors is a potentially important finding because it suggests systematic ways to manipulate the typicality of image ads. For example, by varying the semantic overlap of images, one can assess experimentally the causal effect of this image characteristic on typicality assessments. However, it is noted that the computational typicality scoring is specific to the selected sample of image ads. It is an empirical question whether the obtained scoring is stable and remains invariant over time. For example, variations in the ad strategy and/or product offerings can have a significant impact on the typicality scores. In the same way, as human typicality judgments can be task-dependent and context-specific, the scoring may need to be updated and revised to adapt to a changing marketplace.
Complexity scoring was also performed. The experience of image complexity is related to the perceptual load in processing an image. Over the years, a substantial number of measures have been proposed to capture this perceptual load, demonstrating that the construct of experienced image complexity is probably multi-determined. In the present application, several types of computational measures of image complexity were considered.
First, previous research has shown that pixel-based features of an image that capture the brightness of each pixel in relation to neighboring pixels predict subjective complexity ratings. These features can be measured via intensity contrast, homogeneity, and dissimilarity. The intensity contrast is defined between pixels and their neighbors over the whole image, i.e., Σi,j|i−j|2p(i,j), where p(i,j) is the element i,j of the normalized symmetric GLCM (Grey Level Co-occurrence Matrix). Since an image is composed of pixels, each with an intensity (a specific gray level), the GLCM is a tabulation of how often different combinations of gray levels co-occur in an image or image section. The contrast measure is 0 for a constant image. Homogeneity represents the closeness of the distribution of the elements in the GLCM with respect to the GLCM diagonal and is computed as
The homogeneity measure is 1 for a diagonal GLCM. The dissimilarity measure indicates the degree to which an image forms clusters in the feature space. Contrast and homogeneity are computed on grayscale images and do not take into account color information. Both measures capture important properties of the GLCM and are widely used for image texture analysis and classification. In contrast, the dissimilarity measure considers the color information of an image (e.g., its raw pixel values) and is calculated based on its distance to the centroid of the FF images.
Second, in addition to these low-level or unstructured complexity measures, research has also shown that more high-level visual features such as color, luminance, and edges, measured by the percentage of edge pixels (e.g., edge density), clutter relations (e.g., sub-band entropy), and compression ratios (e.g., JPEG compression), also affect human complexity perceptions. In addition, compression-based and spatial information-based (SI-based) methods have been developed. Here, the inventors use the SI mean, which is an indicator of edge energy, as a measure of image complexity. This measure is calculated as the mean of the SI values across all the pixels in an image where SI represents the magnitude of spatial information at every pixel and is defined as SI=√{square root over (sh2+sv2)}, with sh and sv denoting grayscale images filtered with horizontal and vertical Sobel kernels, respectively.
Third, structural features of images as represented by shapes, objects, and their relationships have been shown to increase the experienced visual complexity of an image. Specifically, researchers have identified several Gestalt categories capturing the number of objects, the details of the objects, the shape of the objects, the similarity among the objects, and arrangement of objects. As noted above with respect to typicality scoring, the inventors implemented the fine-tuned DenseNet to arrive at predictions of the six design features.
As a semantic measure of object overlap, the inventors computed the average similarity of the objects within an image. Images that score high on semantic similarity may be viewed as more complex. Google Cloud Vision API was used to obtain up to 10 objects for each image. Each object word was then represented as a 300-dimensional vector using one of the most popular word embedding models (i.e., word2vec) developed by Google. This word2vec model is pre-trained on the Google News data set (about 100 billion words). The similarity of two object words is measured by the cosine of their corresponding 300-dimensional vectors learned via word2vec. The inventors define the semantic complexity of an image as the average similarity over all possible pairs of the object words.
To investigate the internal relationships among these different measures of complexity, a principal component analysis with a varimax rotation was conducted.
Regarding fluency scoring, it has been noted that images that are processed more fluently receive a more positive aesthetic response. It has also been found that such aesthetic features as the goodness of form, symmetry, or figure-ground contrast provide measures of perceptual fluency, whereas features that simplify mental operations targeted at the meaning of a stimulus may be used as measures of conceptual fluency. Following these observations, the inventors created two distinct variable groups to capture the subjective feeling of ease associated with mental processing.
As indicators of perceptual fluency, the first variable group includes eleven attributes designed to assess image aesthetics. An aesthetics and attributes database (AADB) of 10,000 images from the Flickr website was used as part of this analysis. This database is useful because it includes a wide range of images from distinct domains, including fast-food. The authors asked AMT workers to annotate each image with an overall aesthetic score and the following eleven attributes that measure distinct aesthetic features: a) “balancing element”—whether the image contains balanced elements, b) “content”—whether the image has good/interesting content, c) “color harmony”—whether the overall color of the image is harmonious, d) “depth of field”—whether the image has a shallow depth of field, e) “lighting”—whether the image has good/interesting lighting, f) “motion blur”—whether the image has motion blur, g) “object emphasis”—whether the image emphasizes foreground objects, h) “rule of thirds”—whether the photography follows the rule of thirds, i) “vivid color”—whether the photo has vivid color, not necessarily harmonious color, j) “repetition”—whether the image has repetitive patterns, and k) “symmetry”—whether the photo has symmetric patterns.
The inventors applied a trained model to obtain scores on the eleven image attributes and the overall aesthetic evaluation of the FF images. The trained model was developed by first fine-tuning a basic front-end architecture using a Euclidean loss function for predicting the overall aesthetic ratings. Next, the model was trained through a Siamese network, which takes image pairs as input and predicts which image obtained higher aesthetic ratings. Finally, the model incorporated a predictor for subsequent applications (e.g., a fully connected feedforward neural network for aesthetic scoring or image attribute classification).
The second variable group focuses on measuring conceptual fluency and includes two variables, the first indicating whether food is depicted in an image and the second indicating whether the image contains a tag line. The depiction of food item(s) in an FF image as well as the availability of tag lines simplify the interpretation of the image and should make it easier to understand. Google Cloud's Vision API was used to identify the presence of food items and tag lines in the image ads.
The last column of
Interest and enjoyment scoring is discussed below. In contrast to the typicality, complexity, and fluency constructs, no prior research has focused on developing pixel-based indicators of enjoyment and interest. The inventors therefore developed an ad-hoc approach to obtain scores that are predictive of the image ads' interest and enjoyment evaluations. Specifically, the inventors developed a simple but effective word2vec-based algorithm where each feature (i.e., object word) of an image identified by Google Cloud's Vision API and each of four attribute words—interesting, engaging, warm, and pleasant—are represented as a 300-dimensional vector using the pre-trained Google word2vec model. The inventors computed the cosine similarities between an image feature and attribute words and averaged these similarity measures for each image. Thus, images that received a high positive score on an attribute contained at least one image feature that received a high similarity score for that attribute. The resulting similarity scores proved to be useful as predictors of the human ratings in a random forest regression. The average R2 under 10-fold cross-validation for interesting, engaging, warm, and pleasant were 0.825, 0.789, 0.826, and 0.832, respectively. Other machine learning models such as XGBoost yielded similar results.
To assess the predictive performance of the computational approach, the inventors withheld 33% of the image ads for each FF chain. For this hold-out sample, the predicted computational scores were obtained and the path model presented
Comparing
Identifying the best ads was also considered. To further examine the usefulness of the computational scores for predicting ad success, the inventors created an indicator variable that identified the brand images that are simultaneously in the top half on liking, sharing, and memorability. For the hold-out sample, this resulted in 17 images. A Bayesian probit path-analytic model was used to predict this set of images. This model yielded a posterior mean AUC statistic of 0.80 (0.05). This result further demonstrates that the computational scoring provides an acceptable approach to identify images that perform well on all of the three image-ad success measures.
As noted above, the inventors did not use the original liking and sharing counts associated with each image ad because due to ad-targeting algorithms, these statistics are likely to be biased measures of the true liking and sharing characteristics of the images. However, significant positive relationships were found between the original measures and the ones obtained in the survey. A bivariate loglinear regression model with the original liking and sharing counts as dependent variables, the survey liking and sharing proportions as independent variables and food-chain intercepts yielded {circumflex over (β)}Liking=1.69 (p<0.016; R2=0.29) and {circumflex over (β)}Sharing=6.74 (p<0.003; R2=0.21).
Additionally, to examine the internal consistency of these proportions and the other measures obtained in our empirical studies in discriminating among the FF chain images, the inventors also conducted a detailed reliability analysis. Since the number of images is large, the reliability of the memorability (discrimination) measure, aesthetic characteristics, and the liking and sharing variables were estimated at the image level by using the split-half method. The inventors split the group of participants randomly into two halves and computed the Pearson correlation between them across the 100 images for each of the three FF chains. This procedure was repeated 1,000 times.
The processor 1905 can be any type of computer processor known in the art, and can include a plurality of processors and/or a plurality of processing cores. The processor 1905 can include a controller, a microcontroller, an audio processor, a graphics processing unit, a hardware accelerator, a digital signal processor, etc. Additionally, the processor 1905 may be implemented as a complex instruction set computer processor, a reduced instruction set computer processor, an x86 instruction set computer processor, etc. The processor 1905 is used to run the operating system 1910, which can be any type of operating system.
The operating system 1910 is stored in the memory 1915, which is also used to store programs, algorithms, network and communications data, peripheral component data, the advertisement assessment application 1930, and other operating instructions. The memory 1915 can be one or more memory systems that include various types of computer memory such as flash memory, random access memory (RAM), dynamic (RAM), static (RAM), a universal serial bus (USB) drive, an optical disk drive, a tape drive, an internal storage device, a non-volatile storage device, a hard disk drive (HDD), a volatile storage device, etc.
The I/O system 1920 is the framework which enables users and peripheral devices to interact with the computing system 1900. The I/O system 1920 can include a mouse, a keyboard, one or more displays, a speaker, a microphone, etc. that allow the user to interact with and control the computing system 1900. The I/O system 1920 also includes circuitry and a bus structure to interface with peripheral computing devices such as power sources, USB devices, peripheral component interconnect express (PCIe) devices, serial advanced technology attachment (SATA) devices, high definition multimedia interface (HDMI) devices, proprietary connection devices, etc. In an illustrative embodiment, the I/O system 1920 presents an interface to the user such that the user is able to control the system.
The network interface 1925 includes transceiver circuitry that includes a receiver and a transmitter, and that allows the computing system 1900 to transmit and receive data to/from other devices such as remote computing systems, servers, websites, etc. The network interface 1925 also enables communication through a network 1935, which can be one or more communication networks. The network 1935 can include a cable network, a fiber network, a cellular network, a wi-fi network, a landline telephone network, a microwave network, a satellite network, etc. The network interface 1925 also includes circuitry to allow device-to-device communication such as Bluetooth® communication.
The advertisement assessment application 1930 can include software in the form of computer-readable instructions which, upon execution by the processor 1905, performs any of the various operations described herein such as receiving image ads, segmenting the image ads to identify features, comparing identified features to known attributes and/or thresholds, determining a likelihood of success of the advertisement, running algorithms, solving equations, etc. The advertisement assessment application 1930 can utilize the processor 1905 and/or the memory 1915 as discussed above. In an alternative implementation, the imaging application 1930 can be remote or independent from the computing system 1900, but in communication therewith.
In general, predicting the success of image ads is a difficult task. A purpose of the proposed system is to simplify this task by considering several key constructs that play important roles in liking an image, in a decision to share it, and in remembering it. The inventors focused on such perceptual constructs as the typicality and complexity of an image and examined how they can affect the ease of understanding an image. The inventors also showed that these constructs are important in eliciting judgments of interest and enjoyment which, ultimately, are the key drivers in liking and sharing an image.
Although liking, sharing, and remembering image ads are a result of multiply determined processes, the proposed conceptual framework plays a significant role in this context. The framework allows multiple constructs to be studied jointly, which is necessary to understand the separate mediating and moderating roles of each construct. Moreover, the investigation and application of computational measures of the constructs instead of human ratings show that the framework is scalable, which as discussed herein, has significant implications for both theoretical and applied work.
The conceptual framework is informed by the vision and aesthetics literature as well as by numerous studies that examined subsets of the considered constructs in the context of image perception. Importantly, the approach ignores contextual features of ad perception and operates at the image level alone. The inventors focus on image features and not on consumer differences, past experiences, or other factors that can affect whether an image ad is liked, shared, or remembered. However, it is noted that this approach can allow for consumer-related sources of variation, although these sources were not considered in the work because many computational measures of image content are not sufficiently advanced to consider individual differences in perception. In sum, it is believed that the presented theoretical framework offers a rich set of opportunities to deepen our understanding of why image ads are liked, shared, and remembered and that this framework will ultimately advance practices in image-ad effectiveness research.
Any of the embodiments described herein can be implemented by a computing system that includes a processor, a memory, user interface, transceiver, etc. Specifically, any of the operations described herein can be implemented as computer-readable instructions stored in the memory. Upon execution of these computer-readable instructions by the processor, the computing system performs the actions described herein.
The word “illustrative” is used herein to mean serving as an example, instance, or illustration. Any aspect or design described herein as “illustrative” is not necessarily to be construed as preferred or advantageous over other aspects or designs. Further, for the purposes of this disclosure and unless otherwise specified, “a” or “an” means “one or more.”
The foregoing description of illustrative embodiments of the invention has been presented for purposes of illustration and of description. It is not intended to be exhaustive or to limit the invention to the precise form disclosed, and modifications and variations are possible in light of the above teachings or may be acquired from practice of the invention. The embodiments were chosen and described in order to explain the principles of the invention and as practical applications of the invention to enable one skilled in the art to utilize the invention in various embodiments and with various modifications as suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims appended hereto and their equivalents.
The present application claims the priority benefit of U.S. Provisional Patent App. No. 63/281,996 filed on Nov. 22, 2021, the entire disclosure of which is incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
63281996 | Nov 2021 | US |