This invention relates to a technique for assessing users' responses to content in accordance with electro-dermal activity signals.
Assessing the reaction of viewers to content they consume has importance for a wide variety of applications. Examples of such applications range from movie recommendation systems, which utilize user reaction to obtain user's preferences, to market research, where content creators conduct surveys and focus groups with test audiences to predict the success of movie productions or ad campaigns. While these applications traditionally obtain explicit user feedback via ratings and survey forms, numerous factors constrain these traditional approaches for gathering user feedback. For example, existing movie recommendation systems request viewers provide only a single rating for the entire movie. Survey forms have space limitations and rely on viewer memory, which fades over time. Participation costs and time limitations constrain the use of focus groups. Thus, traditional approaches for gathering user feedback do not afford detailed (e.g., “fine grain”) user response to content.
The advent of wearable biometric sensors now enables capturing user's responses to content with much finer granularity than past techniques. Consumer electronic equipment like watches and fitness devices now include embedded biometric sensors for heart rate and Electro-Dermal Activity (EDA) for continuously monitoring the physiological responses of the user. Such consumer electronic equipment record EDA as the conductance between a pair of electrodes placed over a user's skin near concentrations of sweat glands, hereinafter referred to as Skin Conductance Response or SCR. An individual's EDA has a well-known correlation to brain activation from emotional reactions to stimulus, which causes sudomotor neuron bursts and results in the expulsion of sweat from eccurine glands, causing conductance variations across the individual's skin.
Scientists have studied the psychological correlation between an individual's emotional reactions and resultant changes in EDA since the early 20th century. Signals generated from EDA provide a rich source of implicit feedback useful for inferring individuals' reactions to content at various granularities. Unfortunately, no straightforward method presently exists for direct inference of user opinion of content using EDA signals. Current approaches suffer from several important challenges. Signals obtained from EDA carry noise and stimuli not part of the content, e.g., distractions in the environment will adversely affect such signals. Additionally, the responses contained within the signals vary considerably based on the type of stimuli. Further, such responses depend on the individual's physiological and psychological state. Various other factors also complicate EDA signal interpretation, such as potentially overlapping events, attenuation of event activity amplitude for repeated stimulus, varying sweat burst responses, and underlying these factors, slowly varying, skin conductance levels.
Thus, a need exists for a technique for assessing fine-grain user responses from EDA signals.
Briefly, in accordance with the present principles, a method for determining user responses to content commences by collecting Electro-Dermal Activity (EDA) signals from a user via a collection system as the user consumes (e.g., views) the content. From the collected EDA signals, the amplitudes of the users' responses are extracted at particular times. The extracted amplitudes undergo processing with demographic information for the user and parameters of the collection system obtained during training to predict feedback of the user to the content.
The system 10 of
As discussed in detail hereinafter, the system 10, in accordance with another aspect of the present principles, can process multiple streams of EDA signals from individuals as they consume content. The system 10 can capture these streams in parallel for real-time analysis for a whole audience who consume the content simultaneously, or during multiple sessions with separate groups of individuals for offline analysis. Stream synchronization occurs using external methods (e.g., marking the EDA signals) with reference to a known event, such as the beginning of the movie.
Referring to
At this point, the system 10 now has for each user: (1) demographic information; (2) extracted and aggregated EDA responses collected with respect to the stimulus (e.g., the consumed content); and (3) known explicit user feedback. Using the aggregated EDA signal amplitudes from the blocks 141-14N, the system 10 establishes a set of parameters p of for a set of ensemble classification trees at block 16 to predict content ratings from EDA signals collected from users. The block 16 typically corresponds one or processing cycles of the processor but could comprise a separate hardware element.
Each classification tree constitutes a model that predicts a value of a target variable based on the value of various input variables. Each tree has one or more interior nodes, each node corresponding to an input variable. Each node has one or more edges (branches) that represent paths taken in the tree based on the value of the input variable at that node. Each path terminates at a “leaf” that represents the value of a target variable resulting from the value of the input variable. In accordance with an aspect of the present disclosure, the system 10 thus trains itself, thereby creating the ensemble classification parameters (p) by learning from: (1) demographics information; (2) extracted and aggregated EDA responses collected with respect to the stimulus; and (3) known explicit feedback of that user. Using trained parameters (p), the system 10 can determine subsets of variables (i.e., aggregated EDA user responses and demographics) relevant for discriminating among explicit users feedback.
In accordance with another aspect of the present principles, the system 10 of
In accordance with another aspect of the present principles, the system 10 can make use of a user's (1) EDA signals, and (2) demographics information, along with (3) learned system parameters to infer unknown explicit feedback of a user for whom the system 10 has only collected EDA signals. To better understand the manner in which the system 10 make such inference, refer to
The method of
Equation 1 can parameterize the specific dictionary basis functions as follows:
such that λ1 relates to the geometric decay of the impulse, λ2 constitutes the log-linear decay slope, and t0 corresponds to the response start. From empirical examination of the EDA signals, the system 10 constructs the signal dictionary, D occurs using all signals for the parameter space,
λ1ε{1.1,1.25,1.5,1.75,2,2.5,e},
λ2ε{0.3,0.5, . . . ,3.7,3.9}. (Equation 2)
To represent each EDA signal from this large collection of dictionary signals requires solving a standard linear inverse problem. Unfortunately, using ordinary least squares approaches will consume very large amounts of memory for large dictionaries, and will also destroy the inherent desired sparsity of the SCR event process. Using an orthogonal matching pursuit technique (a greedy algorithm) to resolve the set of dictionary components that best describe the observed EDA trace will avoid such limitations.
This matching procedure begins with the raw EDA signal, rx, a signal component dictionary D (constructed using the equation above), and an empty constructed dictionary, {circumflex over (D)}={ }. During step 304, the system 10 sets the high-pass filtered EDA signal becomes such that r=x. During step 306, the system 10 determines the single dictionary component that best fits the observed EDA signal using the relationship set forth in Equation (3):
During step 308, the system 10 updates the dictionary by adding this dictionary component to the inferred dictionary
({circumflex over (D)}={{circumflex over (D)}{circumflex over (d)}}) (Equation 4).
During step 310, the system 10 removes contributions of this dictionary component from the observed EDA signal, creating a new residual signal in accordance with Equation 5:
r=x−{circumflex over (D)}({circumflex over (D)}T{circumflex over (D)})−1{circumflex over (D)}Tx. (Equation 5)
This process repeats for a specified number of iterations by first incrementing a time value t by unity during step 312 and then determining during step 314 whether the value of t exceeds a maximum time value Tmax. If so, the process ends. If not, the process 300 branches to step 306. Performing the desired number of iterations thus yields a collection of dictionary components that fits to the observed signal. In summary, for each EDA signal of a given user, the adaptive decomposition approach of the process 300 executed by the system 10 yields a collection of user reaction dictionary components, represented by a set of time offsets (the time-start of each occurrence of a dictionary component) and the coefficient amplitudes of the user response events, respectively.
The system 10, as thus described, addresses the challenge of obtaining fine-grain user responses by using electro-dermal activity (EDA) signals of users consuming content and accurately mapping such signals to self-reported explicit feedback provided by such users. This approach not only improves existing approaches to calibrate audience feedback, but also enables a range of new applications such as indexing and searching individual content, and providing content recommendation systems that can propose content that best matches the physiological state of the user. To this end, the system 10 advantageously decomposes raw EDA signals (rx) into responses that accurately pinpoint the times and intensities of viewer responses to the stimuli in the content. Further, the system 10 provides a machine-learning framework that uses the EDA responses to accurately predict the explicit feedback provided by a user.
In accordance with another aspect of the present principles, the system 10 can advantageously characterize the changes in user electro-dermal activity (EDA) as such users respond to stimuli during content consumption. In this regard, the system 10 can accurately map implicit EDA feedback to the explicit feedback provided by the viewers in the form of ratings and survey forms. To that end, the system 10 can make use of one or more EDA sensors, such as the EDA sensor 500 of
The system 10 of
In accordance with another aspect of the present principles, the system 10 has the capability of analyzing user EDA signal responses to stimuli (e.g., content viewing).
In accordance with another aspect of the present principles, the system 10 can advantageously predict explicit feedback from EDA signals and address the problem of assessing user reactions to stimulus (e.g., view content) using EDA signals. In contrast to other approaches that focus on isolated experiments on individual users, the system 10 advantageously provides concurrent, audience-level evaluation of SCR events previously decomposed by the signal processing method described above.
In accordance with another aspect of the present principles, the system 10 advantageously processes EDA signals collected from viewers consuming (e.g., viewing) different types of audio-video content. In particular, the system 10 has successfully to collected EDA signals from an audience at scale in an environment with minimal distractions from external stimuli. In this regard, the system 10 has collected data in commercial movie theaters while audience members viewed feature-length films. The controlled temperature, lighting and immersive nature of a movie theatre enabled measuring EDA signals that mainly represented user reaction to stimuli in the movie. In addition to EDA signals, the system 10 collected explicit feedback from the audience for mapping the implicit feedback in EDA responses to the explicit feedback.
As mentioned previously,
As discussed above, the system 10 of
During each data collection operation described above, the system 10 obtains raw EDA signals from the users wearing sensors, such as the sensor 500 depicted in
The EDA signals depicted in
An example of the results obtained during an exemplary second data collection operation appear in Table 1 below. The data collection operation represented in Table 1 resulted from three separate audiences viewing three feature-length films labeled A through C herein. The movies A-C had different genres (e.g., drama, thriller, foreign) to avoid limiting the scope of data collection to genre-specific phenomena. Participants in the data collection operation comprised individuals solicited from the movies' regular audiences who signed a consent form before participating.
Table 2 shows the demographics of the participants of each screening.
In addition to the audience-wide EDA signals collected for implicit audience feedback, participants were also asked to provide explicit feedback at the end of each movie screening. The explicit feedback provided input data that enabled mapping the implicit feedback in the EDA signals to the explicit feedback. The collection of explicit feedback entailed distributing survey forms to the participants that asked for the participants to provide: (1) their gender and age, and (2) an overall rating for the movie based on a 5-point scale. The survey left interpretation of what this rating implied (e.g., enjoyment, engagement, etc.) up to the user's discretion.
Advantageously, the system 10 of the present principles makes use of an adaptive decomposition methodology which processes raw EDA signals to extract precise SCR events showing exactly when and how much the viewer responds to a stimulus. As depicted in
In accordance with the present principles, the system 10 addresses the aforementioned problems by performing signal decomposition that automatically adapts to the variations in the user's physiology. The signal decomposition performed by the system 10 takes account of the varying DC component of each user's signal. Often called the “tonic” signal, this component corresponds to the user's physiological response to sweat saturation-levels of the user's skin and has little correlation with the underlying fine-scale user reactions of interest. As discussed previously in connection with the flow chart of
The specific dictionary basis functions can be parameterized by:
such that λ1 relates to the geometric decay of the impulse, λ2 is the log-linear decay slope, and t0 is the response start. From empirical examination of EDA signals, the system 10 constructs the signal dictionary, D, using all signals dλ
λ1ε{1.1,1.25,1.5,1.75,2,2.5,e}, (2)
λ2ε{0.3,0.5, . . . ,3.7,3.9}. (3)
Specifically, this matching pursuit procedure begins with the high-pass filtered EDA signal x, a signal component dictionary D constructed using Equation 1, and an empty constructed dictionary {circumflex over (D)}={ }. First, the system 10 determines the single dictionary component ({circumflex over (d)}εD) that best fits the observed EDA signal:
The system 10 adds this dictionary component to the constructed dictionary {circumflex over (D)}={{circumflex over (D)}, {circumflex over (d)}}, and then removes the contributions of this dictionary component from the observed EDA signal, creating a new residual signal:
r=−{circumflex over (D)}({circumflex over (D)}T{circumflex over (D)})−1{circumflex over (D)}Tx. (5)
The system 10 repeats this process using the residual signal (i.e., setting x=r) for a specified number of iterations.
After completing the desired number of iterations, the system 10 obtains a collection of dictionary components that fits to the observed signal. Using standard least squares, the system 10 calculates the best coefficient vector β such that the observed EDA signal is represented by a combination of elements from the constructed dictionary, x≈{circumflex over (D)}β, where the amplitude of the non-zero elements of β correspond to the intensity of user's reactions.
In summary, for each EDA signal, the adaptive decomposition approach performed by the system 10 returns, {ti, si}, the set of time offsets (i.e., the time-start of each SCR event) and the coefficient amplitude of SCR events (i.e., the intensity of the SCR event), respectively.
As discussed previously, the system 10 advantageously accomplishes machine learning to predict explicit feedback of users to content (e.g., of movie ratings) from the decomposed SCR events provided by an EDA signal decomposition in accordance with the present principles. The ground-truth data of ratings for the movie comes from the user surveys taken immediately following content consumption (e.g., film viewing).
The prediction accuracy of the system 10 was compared to the accuracy achieved by using the demographic information provided by the users, e.g., age and gender information provided a set of the study participants. Table 2 summarizes the results of such a study for thirty-four study participants along with their demographic information for three films. While the comparison against demographic information may seem naive, movie studios produce feature-length films refined to target specific demographic groups. Therefore, an expectation exists for a large correlation between demographics and the resulting user responses to the films.
In the course of decomposing the SCR data of users, the system 10 obtains time-stamp and coefficient values of the SCR events for each user of length T (where T>>N). From this information, the system 10 constructs an [N×T]-implicit user response matrix S, such that the matrix element, Si,t
To mitigate this inherent sparsity in the user response matrix S, the system 10 extracts the coarse-scale user response information by aggregating the information into a reduced number of time-aggregated bins. For each time bin, the system 10 records the sum of SCR coefficient energies for that time period. For the experiments described above, the system 10 combined the user SCR events over the course of the entire stimulus into five equal-sized bins, denoting the aggregated [N×5] user response matrix as SA.
Combining the user response matrix SA with the user demographic information yields a complete response matrix, SC=[SA C]. The matrix C comprises an [N×2] matrix constructed from the element Ci,1 the gender of the user ui and the element Ci,2 the age of the user ui
To solve the problem of inferring explicit user feedback information (e.g., film ratings), the system 10 will classify the decomposed user responses, SC, using bagged classification trees. Bagged classification trees enable the system 10 to learn an ensemble of simple tree classifiers over multiple subsamples of a held-out training set. Specifically, to classify a particular user's rating, the system 10 uses leave-one-out cross validation such that the EDA signals from remaining users remain as training data only. From this collection of training data, the system 10 chooses a random subsample of training users and learns a single classification tree with respect to that training subset ground truth. For example, the system 10 may learn that if the response energy in the first time bin lies below a learned value, then the user will rate the film poorly. During each iteration, the system 10 will learn weights with respect to the classification accuracy on the training set in addition to learning the classification tree. Ultimately, the system 10 uses the specified test user data on a weighted combination of all the learned trees to classify the underlying explicit feedback for that user. The system 10 performs this bagged classifier approach on both the processed EDA data (the matrix SC) and the demographics-only information (the matrix C).
The foregoing describes a technique for assessing users' responses to content in accordance with electro-dermal activity signals.
This application claims priority under 35 U.S.C. 119(e) to U.S. Provisional Patent Application Ser. No. 61/839,669 filed Jun. 26, 2013, the teachings of which are incorporated herein.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/US2014/022275 | 3/10/2014 | WO | 00 |
Number | Date | Country | |
---|---|---|---|
61839669 | Jun 2013 | US |