Increasingly, companies' online interfaces (websites and apps) are becoming more important to their revenue and profit streams. As such, we see a shift in company strategy, away from traditional brick and mortar stores, towards servicing their clients online through websites and apps. Unlike traditional store journeys where you can engage 1 on 1 with your shopper, today, the need to identify trends in the online shopping experience is crucial.
Accordingly, there is a need to be able to identify points along a consumer's online journey that either create points of attachment or abandonment to their online visit and overall long-term relationship with the company.
This invention is based upon the ideal of multi-modal data capturing in order to identify points of interest along a computer human interaction journey. This invention focuses on the accessing of computer and mobile phones webcams (through user consent) and captures high resolution video of the person face.
Capturing this video, allows for the tool to identify, on their own merits, 1. The emotions of the user by analyzing hundreds of points along the person face in order to ascertain their emotions associate with the event. 2. The movement of the person eye's through remotely generated eye-tracking allowing for the understanding of important visual tracking clues along the person' journey. While on their own, these metrics have value towards assessing someone experience, our invention combines these measures together, into a single unique moment that compounds the findings of individuals into a stronger more predictive understanding of the moment.
Understanding how a person is reacting towards a moment is crucial to creating better experiences and gathering more meaningful insights into how the respondent was impacted by the event. A person is not able to tell you how they feel about something without personal bias' coming into play, rendering their version of the events, rather loose and inconsequential.
Consumer research, in the past has almost exclusively relied on System 2 (the mind's slower, analytical mode, where reason dominates) to identify how a customer is truly feeling towards a brand and it's delivered experiences. Over the years, science has proven that people cannot accurately give insights into their experiences without many mitigating factors emerging, rendering the data received from the person lacking the validity to ensure positive insights derived.
Over time, as the technology viability grew, along with progression in research and understanding of System 1 (our faster, automatic, intuitive, and emotional mode of thinking), the validity and ability to track System 1 in consumer behavior has grown and become a valuable component towards identifying insights into consumer experience. Two of the most widely used technologies to identify System 1 responses in consumer behavior have been driven by the growth in eye-tracking and facial emotion detection.
Having been around since the early 1900's eye-tracking has become an important way to identify paths along a consumer journey and interaction. Normally done in-labs, due to the complexity of the equipment and need for proper conditions, eye-tracking, on it's own can deliver insights that allow for researchers to identify trends in experiences.
By analyzing micro-movements in people's faces, science has been able to identify a set of basic emotions that are being felt by a participant. Cheek movement, eye-brows, forehead and other parts of the face are analyzed during this interaction. The output is insights into how a person; emotions are actually being effected by a given event.
The disclosure herein is an improvement to the status quo of using unimodal biometric responses, such as eye-tracking or facial emotion to gain an insight into a person reaction to an event. Our solution proposes to combine remotely captured eye-tracking and facial emotion data, and use the combining time response to identify when an abnormal occurrence is happening. An abnormal occurrence would be that where both, uniquely captured biometric response are showing an upper percentile (example upper 75th percentile) or lower percentile (example lower 10th percentile) of output at the same time. Allowing for an understanding and insights into that moment to be based upon both facial emotions and eye-tracking data rather than them alone.
For each epoch of one second (to validate empirically by Cube, based on the sampling frequency—Hertz—to ensure that enough data points are available for each epoch), we propose to calculate total gaze distance during that epoch (distance), as well as the average emotional valence. This distance is simply the sum of the Euclidean distances between each data point within the epoch. For example, if there are K data points in one epoch each at coordinates (Ui,Vi), i=1, . . . K then the total distance is given by the calculation
Building on previous literature, we hypothesize that distance is a psychophysiological inference of the user's hesitation in a given interaction. The more the distance, the more the participant has hesitated to perform the interaction (e.g., not sure how what to focus on to complete a task, looking for specific information).
This calculation will lead to x number of epochs for every participant. For example, for a given participant, a recording of 60 seconds will lead to 60 data coordinates of distance and valence to be plotted.
Given a data collection of n number of participants, the dataset will allow to identify moments of interaction that deviate from the ideal experience:
Based on our experience, the insight can be classified in the following categories:
Number | Date | Country | |
---|---|---|---|
63153008 | Feb 2021 | US |