With the emergence of eCommerce, global retailers are in a race to modernize the shopping experience. Historically on-line firms seek to play to their advantages, which include selection, convenience, and web-based analytics to overcome the high volume of returns, extreme comparison shopping, high-tech fraud, and the propensity of many customers to abandon purchases in the final stages. Similarly, historically store-based retailers seek to exploit their advantages of immediate gratification, merchandise interaction, and human relationships to counter eCommerce sales erosion, infrastructure costs, and inventory loss. Both are rapidly moving to integrate the best aspects of on-line and in-store shopping and over time these models will continue to converge, influenced by additional forces such as the rise of 5G networks and shifting population demographics. Tools that help integrate the on-line and in-store shopping experience will be central to realizing this transition.
Fueled by large on-line retailers like Amazon, retail analytics has become a significant global market segment, valued at $3 Billion in 2018 and expected to grow to over $8 billion by 2024. Typical products in the space might include chat bots for customer care, application of machine learning to Customer Relationship Management (CRM) data, machine vision for fraud prevention, and predictive ordering to minimize inventory costs.
The present technology will now be described with reference to the figures, which in general relate to the generation of personal trait feature vectors using cameras or other sensors to sense physical attributes of one or more people. The feature vectors may then be analyzed using artificial intelligence algorithms to both define persona groups within which different feature vectors fit, as well as to match a given feature vector in real-time to the plurality of economically-significant personas that they most closely resemble.
A persona group, or simply persona, is a fictional character or characters created to represent a user type that might use a site, brand, or product in a similar way. Marketers may use personas together with market segmentation, where the qualitative personas are constructed to be representative of specific segments. The present technology uses personas rather than uniquely-identifying user accounts to derive useful insights from the feature vectors created by the in-store sensors and a Person Matching and Profiling Service as explained below. These personas are usually defined based on a clustering of actual profile data derived from the Person Matching and Profiling Service over time and will likely be enriched to reflect revenue targets for this subset of customers and products likely to appeal to them.
It is understood that the present technology may be embodied in many different forms and should not be construed as being limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete and will fully convey the technology to those skilled in the art. Indeed, the technology is intended to cover alternatives, modifications and equivalents of these embodiments, which are included within the scope and spirit of the technology as defined by the appended claims. Furthermore, in the following detailed description of the present technology, numerous specific details are set forth in order to provide a thorough understanding of the present technology. However, it will be clear to those of ordinary skill in the art that the present technology may be practiced without such specific details.
Augmented Video Profile Feature Vector Generation
The augmented video profile feature vector (or simply feature vector) will now be explained with reference to
Common implementations of machine vision might focus on facial recognition for security applications, or time-and-space motion tracking for industrial applications. For the present technology, at least 3 distinct domains of machine vision are applied simultaneously to develop the augment version of the feature vectors Fya. In embodiments, these are:
Facial Recognition: In general, these algorithms operate by measuring distinctive attributes about a subject's face, such as relative size, shape and relative orientation of key facial features. These algorithms can be sub-divided into geometric approaches that focus on distinguishing characteristics or photometric approaches which distill the image into a set of distinguishing metrics. This approach uses photometrics to create a vector of measurements as well as a composite metric referred to as the faceID used to identify the record on an ongoing basis. Vectors with a very similar faceID will have many facial characteristics in common and so will likely be of the same person.
Fashion Feature Extraction: In contrast to other machine vision approaches, the present disclosure augments facial information with precise observations about the subject's style and grooming preferences. These details are extracted by fine feature detectors and then subjected to one or more deep learning algorithms trained to identify various items of clothing at varying level of specificity based on resolution down to the type (brown pants), cut (boot-cut, athletic cut, etc.), and brand (Dockers, Gap, etc.). This deep learning is structured based on a multi-tier hierarchical data model created for this technology referred to as the fashion genome, with an extensive library of imagery for each distinction allowing for progressively more detailed disclosures about an object based on available resolution (i.e. from “brown pants” to “athletic cut Dockers in Signature Khaki”). This genomic model is extended to include key grooming attributes such as hair style, luxury accessories, team logos, and body markings. Each isolated observation is appended to the feature vector as a distinguishing attribute of that visitor.
Demographic Categorization: Other machine vision techniques can be applied to assign other distinguishing attributes to the visitor, such as height, gender, age-range, and skin tone. These can be estimated using deep learning based on libraries of reference imagery or by real-time comparison with known objects in a scene (say, a person walking by a display of known height to precisely estimate the height of the visitor). By taking fashion features into consideration a general assessment of the visitor fashion aesthetic can be developed—say “urban minimalist,” “athletic casual,” “business formal.” Similar to Fashion features, the level of detail will vary by resolution, with all successfully measured features added to the resulting feature vector and referenced by faceID.
This multifaceted analysis will be conducted recursively as long the visit continues with additional observations added and previous features amended based on improved data. Metadata such as arrival time, departure time, and dwell time in certain locations may able added to the augmented feature vector.
Persona-Based Empirical Data
Data-driven customer personas will be utilized to derive valuable insights from the augmented feature vectors associated with actual human visitors.
A “persona” as used herein can be described as a fictional character defined in substantive detail to assist with the holistic design of a product or service encompassing multiple domains of human concern. A conventional application of user personas might be in software product development, where the framing of the multiple concerns representative of a typical user has been shown to greatly improve the usability of resulting products. A long-standing issue regarding the applicability of personas is that they don't represent “actual customers” and so have limited applicability to sustaining operations like sales and marketing. This is underscored by the fact that the “demographics” associated with most personas is loosely assumed in order to establish consistency with the practical attributes like the persona's motivation to purchase a new product, their specific goals, and assumed relevant competency. Actual insight into the true demographics of the broad user base is generally limited to focus groups or self-reported surveys.
In the context of machine vision, the actual demographics of customers can be directly observed, and their goals can be strongly inferred by their shopping activity within the store or venue. It is their motivations and competencies that must be approximated, but this challenge can be mitigated to a large extent by the intrinsic motivations that draw all humans from time to time to the fashion marketplace. Further, retail stores themselves will reliably draw in customers within a given proximity to the store, and so that group will reflect a subset of the demographics of the local community. In this context, then, personas provide a powerful yet efficient means of extracting economically useful insights with a minimum of privacy concerns.
Consider the situation of
This sort of rich, high-dimension data is well suited for application of deep learning algorithms to discover underlying structural relationships, especially when further enriched by primary marketing research and sales history associated with known sets of visitors. Based on one or more iterations of analysis, an initial set of “customer personas” 120 can be defined, as illustrated in
These initial customer personas 120, then, can be readily quantified in n-dimensional space as a hybrid of the actual customer profiles they encompass, further enriched with information such as historical sales loosely associated with this group and store merchandise that is intended to appeal to this customer base. Four personas 120 are shown here for purposes of illustration, but the actual number would be larger in practice. The quantity would still be much smaller, however, than other competing approaches that might be focused on uniquely associating each customer to a complex profile with data such as their loyalty account ID. Persona-based analysis is therefore faster and much more compliant with emerging privacy regulations.
The above example focuses on persona groups specific to known customer visitors, but this model also readily allows for incorporation of other personas, say celebrity influencers known to have a significant association with the particular market segment. This could include celebrities directly engaged by the brand to serve as a spokesperson or non-engaged celebrities known to have significant influence on fashion trends more generally. Further, a persona 120 can be defined that is referenced to a target persona that the establishment wants to attract over time. This influence can be incorporated in persona-base image analysis by simply defining a new enriched feature vector based on imagery of the celebrity influencer, spokesperson, or fashion trend of interest.
With an initial set of personas defined, persona-based image analysis can then be conducted by the repeated execution of an Affinity function A(Fya,Pax), measuring how closely the observed feature matrix aligns with the representative feature vectors designating the persona 120. This is repeated for all of the personas for each visitor, and basic thresholding is applied as shown in
This approach elegantly allows for multiple distinct outcomes based on many factors, including the availability of camera imagery, the brand's relationship with the visitor, and of course, the customer's own sense of style and fashion. Several possibilities are illustrated in
An iterative version of this technology can further be applied to fashion consulting on-line, as show in
Similar to the in-store feature vector, the on-line feature vector may include features from a customer's loyalty account and will continue to be augmented throughout the shopping session. The output of the “Membership function M” will still be a set of Personas satisfying a minimum threshold of alignment with the on-line feature vector Fo.
Curated Persona Sets
The third aspect of this present technology is referred to as automated curation of personas. In this phase, the defined persona groups are regularly reviewed by various machine learning algorithms with the objective of improving the alignment of the persona vectors themselves with the data derived from actual visitors. This process can involve the definition of new personas, updates to existing personas, or deletion of personas that contributed significantly to clustering errors. Personas may also be regularly added corresponding to new trends or influencers drawing significant attention and that might impact near-term purchasing decisions.
Technical intuition suggests that the effectiveness of this persona-based Machine Vision solution hinges on definition of personas that in some quantifiable sense reflect the shopping and purchasing habits of actual customers. Indeed, this relationship can become tenuous depending on the specific market of interest. Fortunately, fashion and style are forms of individual self-expression deeply rooted in human nature with established patterns of stylistic evolution that are also well understood and often driven by the industry itself. That make this marketplace particularly amenable to application of Machine Learning techniques to predict economic outcomes even at the transactional level. The personas themselves, though, will need to evolve with changes in demographics, sought-after styles, and seasonal merchandise in order to maintain and ideally improve the performance of the technology over time.
Consider the situation of
As part of the persona curation phase of the current technology, the errors associated with affinity mapping for all of the observed feature vectors over some time period will be aggregated and alternative configurations of the personas consider that would reduce this error score to a local minimum while also constraining the extent of the changes introduced by each iteration. In the case of
The revised persona definitions driven by repeated appearances of F201a are shown in
Machine learning-based persona curation will continue as a background process on a regular basis, but the results of this remapping as implemented in the current technology will be restricted to administrative logs until certain thresholds of performance improvement are fulfilled. At this point the persona curation module will recommend an update to the operational personas being used for machine vision along with an estimated improvement in the results. It will then be up to the human operators to approve these updates in whole or in part for use in active machine vision assessments of customer interests. Test personas are also supported by the current technology which can be run on limited sets of historical feature vectors to assess the performance of alternate root and branch personas.
Note that a version of the on-line affinity assessment procedure shown in
In one exemplary implementation, detailed marketing information is derived from video and imagery analysis utilizing a collection of elements on the customer premise, collectively referred to as the In-Store Video Subsystem. The modules comprising the In-store Video Subsystem are shown in
Retail Video Feed. The video output from the premise Closed-Circuit Television (CCTV) system or in-store cameras. This usually involves multiple cameras 138 and a local recording solution with an API for retrieval of recorded imagery associated with each CCTV camera.
Person Matching and Profiling Service (PMPS). The Person Matching and Profiling Service 134 employs multiple image processing and deep learning modules to generate value-added feature vectors based on imagery extracted from the Retail Video Feed. The detailed image processing and deep learning modules comprising the PMPS are detailed below.
Session Database (SDB). The Session Database 136 stores the data generated by each module of the PMPS and provides an interface for querying and updating it. The SDB data is only kept during the period of a session (e.g., a day), being completely erased once the session is completely processed and the relevant information is sent by an Event Triggering Client to cloud-based storage.
Event Triggering Client (ETC). The Event Triggering Client 132 extracts from the SDB the finalized information vectors generated by the PMPS and updates centralized profiles on cloud-based servers.
A portion of the innovative practices disclosed in this present technology are centralized within the PMPS. The general flow of information through the PMPS is illustrated in
The function of each module in this exemplary implementation is described below.
Video Capture Interface (VCI). The VCI provides an interface to capture input feeds from several sources including:
This subsystem captures frames from the In-Store Cameras at a certain frame rate, processes it (scale, color normalization, etc.) and generates a processed frame in a structure [frame RGB image, frame_id, camera_id, timestamp].
Person Detector (PD). The PD detects the presence of a person or multiple people within the video frames forwarded by the VCI and generates an image crop for each of the detected visitors. This process is implemented via a convolutional neural network (CNN) for efficient processing, as shown for example in
Short-Term Tracker (STT). The STT works in batches of body crops that belong to frames from the same camera. Associations are processed over a configurable interval of time.
The STT groups the image crops generated by the PD within the scope of a single camera to associate them to a single visitor based on visual persistence of similar body crop, position and predicted trajectories of an individual moving through the pre-referenced camera space. It outputs a list of tracklets per camera, individual, and time interval.
Day-Term tracker (DTT). The DTT works on longer time windows than the STT, hierarchically grouping tracklets until it finds the optimal solution where all are grouped by a unique person to a threshold of confidence. This procedure integrates tracklets from multiple cameras.
Once the tracklets are grouped by unique IDs, the DTT selects a subset of head and body crops that best describe that visitor. It does so by clustering each person's associated visual descriptors in K groups and gets the centroid of each cluster as a representative crop. In doing so, the DTT reduces the amount of data to be processed on next steps while maximizing useful information.
Tracking Analysis (TA). From the tracking information generated by the DTT, the TA subsystem extracts the following commercially-valuable information for each person identified:
The TA also provides functionality to calibrate the in-store cameras and store layout. If that information is set up, the TA module can also extract the following information for each person detected:
Spear the Video (STV). Complex feature vector development is accomplished in this exemplary implementation by means of a series of deep learning models in the form of multiple CNNs collectively referred to in this embodiment as Spear the Video (STV). The process is driven by the imagery packages provided by the DTT and PD as best representing the visitor in question, with parallel feature-extraction procedures in place to build both the Demographic and Fashion vectors as efficiently as possible. Deep learning is applied at four different tiers of resolution simultaneously—whole-frame, body crop, facial crop, and the product-specific sub crops. Each of these submodules is described below.
Feature extraction based on the full uncropped image capture is shown in
At the second tier, face crops from the PD form the basis for most augments to the Demographic Feature vector. This processing is implemented in the current embodiment using a series of pre-trained CNN's, as shown in
In addition to age and physical attributes, this exemplary implementation also includes a dedicated CNN to develop a 256-dimension parametric description of the visitor's face, which is then associated to a unique ID referred to as a faceID. While not uniquely-identifying, this faceID and its associated parametric vector can be used as a locally-reliable means of associating repeat visits from the same customer in the absence of other confirming information. When used in concert with the other two parametric descriptions developed by the PMPS—the Body Visual Descriptor and one or more Product Visual Descriptors—forms a sound basis for fine-grain market segment analysis in the fashion market.
Note that the class detail shown in
At the third tier, body crops from the PD form the basis for Fashion vector augments, as illustrated in
Note that the class detail shown in
At the deepest level of processing, individual fashion products are subjected to deep learning-based analysis, as illustrated in
The class detail shown in
In one aspect of this embodiment, progressively more detail will be derived based on the fidelity of the imagery obtained. Lower-resolution imagery such as from a legacy security camera might yield demographic information as well as an estimate of the overall style the individual is attempting to convey. Slightly higher resolution such as from modern color security cameras might identify a varying number of fashion products, and high-resolution imagery such as that from a magic-mirror type application could lead to a detailed breakdown of multiple items being worn by the individual. The image processing stages will be tailored by product detection, explained below, to avoid unproductive image analysis.
In another aspect of this embodiment, note that multiple potential label weightings are carried throughout the process vs casting the individual as any single stylistic type. This is especially important in the fashion market, as the identification of multiple products similar to those the individual is wearing or has historically purchased can be used directly by fashion consultants or within a search engine application to advise on other products the customer might be interested in. Engaging the customer in a positive and helpful way is central to this current technology.
A third aspect of this embodiment is that the level of detail can be product- and domain-specific. A shoe retailer, for example, would naturally want to focus on footwear down to the brand and model level to maximize the opportunity for product sales across their inventory. Other clothing items can be assessed at a higher level without the need to train specific Feature and Product Descriptor models related to those fashion items.
A fourth aspect of the current embodiment is that product detection and characterization may be applied to several images provided by the DTT representing the same session. This procedure allows for generalization across multiple models to identify the most products and in the most detail possible.
A fifth aspect of the current embodiment is that the Facial Descriptors and Body Descriptors can be employed with a significant degree of accuracy to associate visits by the same customer to the same location, thereby enable comparisons to how the person's stylistic interests and preferences have likely changed over time. This information is also central to curation of persona groups as outlined below.
A sixth aspect of the current embodiment, all information derived by the PMPS associated with a given visitor and visit is compiled and stored by a uniquely-assigned faceID. The record of the feature vectors with visit-related metadata such as store ID and record start time will be transferred to cloud storage for longer-term analysis. Local imagery related to visitors and the interim crops analyzed by the PMPS will be destroyed at the end of the processing session, a clear advantage over competitive approaches relying on large repositories of persistent reference imagery.
A seventh aspect of the current embodiment, the processing of commercially-valuable information is directly under the purview of the retailer. This readily allows the retailer to extend to the customer the ability to review, correct, or delete the information collected related to their visit, and opt into or out of the process going forward.
Persona-Based Preference Analysis
A persona (also user persona, customer persona, buyer persona) in user-centered design and marketing is a fictional character created to represent a user type that might use a site, brand, or product in a similar way. Marketers may use personas together with market segmentation, where the qualitative personas are constructed to be representative of specific segments. The term persona is used widely in online and technology applications as well as in advertising, where other terms such as pen portraits may also be used.
As discussed in the overview, this exemplary embodiment uses personas rather than uniquely-identifying user accounts to derive economically useful insights from the feature vectors created by the In-Store Video Subsystem via the PMPS. These personas are usually defined based on a clustering of actual profile data derived from the PMPS over time and will likely be enriched to reflect revenue targets for this subset of customers and products likely to appeal to them.
Consider the notional shopper represented in
This resulting feature vector will then be scored against a set of pre-defined persona files, as show in
The affinity function utilized in this exemplary embodiment is a cosign distance algorithm at full vector dimensionality, with rule-based preprocessing to compensate for information that may be missing or incomplete. Full agreement among all parameters in the feature vectors being compared would result in an effective similarity “alignment” of 1, analogous to cosine of 0 degrees. The agreement of each member of the vector contributes to a result between 0 and 1, with additional weighting across the full set of personas under evaluation to yield the final similarity score at varying level of dimensionality. Several examples of this similarity scoring between the shopper represented by the vector in
As desired, the resulting output will include a diversity of possible associations, with locally derived personas having the strongest affinity and celebrity influencers included to inject stylistic diversity and fashion trend-following. In this implementation, a vector of all non-zero similarity scores is associated to the visitor's profile, identified by faceID. By referencing the multiple personas with the strongest associations to each visitor, products can be suggested by sales associates or on-line search algorithms that are most likely to appeal to that customer, and also to stimulate period-relevant context for fashion recommendations that might otherwise hinge on the fashion expertise of the individual associate.
Beyond positioning of best-fit individual products for a given customer, this exemplary embodiment allows for macro-level analysis of significant commercial metrics, including:
Machine Learning-Based Persona Management
In this exemplary embodiment, the process of curating the mix of personas relies on a similar cosine distance method as that used for initial matching of customer-derived Feature Vectors to relevant personas. Over the course of a non-real time working interval, a history is maintained of the aggregate relative misalignment between actual observed visitor data and the persona segment representations. The objective of ongoing optimization is to maximize aggregate similarity while simultaneously minimizing similarity variability with a minimum number of economically significant personas. Relatively high error compared to historical precedent is a reliable indicator that the chosen personas need to be modified, while significant periodic variability indicates special causes for additional investigation. As a last resort, performance can be improved by the introduction of additional personas with a resulting increase in computational overhead.
The primary method of persona curation in the present technology is the evaluation of “pseudo personas” based on actual measured feature vectors that are “close” in cosine space alignment to the existing personas. The resulting aggregate error is then recalculated based on historical visitor feature vector repositories over some time period and the resulting error is tracked as if these vectors were in use as commercial personas. In this way “persona migration” can be tracked as a statistical entity and the semi-static personas modified over time to continuously minimize error. If necessary, additional “leaf” personas can be defined to drive a commensurate reduction in aggregate error, variability or both.
Rather than being noise, these migrations in number and quantification of the personas represents extremely valuable marketing information, as the driver is commonly rooted in relevant factors such as changes in customer mix, inventory turn-over, and competitor initiatives. Reduction in the error associated with personas representing new fashion trends, for example, will reliably indicate adoption of that trend within the geographic affinity zone of that store. Changes in personal alignment surrounding certain promotional activities can serve as a statistical measure of that marketing campaign's practical effectiveness. Errors between actual observed persona mix and the assumed profitability inherent to those personas will help to understand relative buying power of the various groups.
In the extreme, this solution does elegantly support “personas of one,” or personas related to very small groups of individuals that may warrant VIP treatment. Even these customers, though, can expect a fundamentally more enriching shopping experience than competing “unique-match” models that primarily exploit purchase history to promote the same products and offer tiered discounts in an often-abrasive attempt to maximize spend.
The foregoing detailed description of the technology has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the technology to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. The described embodiments were chosen in order to best explain the principles of the technology and its practical application to thereby enable others skilled in the art to best utilize the technology in various embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the technology be defined by the claims appended hereto.
The present application claims priority to U.S. Provisional Patent Application No. 62/979,959, filed on Feb. 21, 2020, entitled “MACHINE LEARNING FOR RAPID ANALYSIS OF IMAGE DATA VIA CURATED CUSTOMER PERSONAS,” which application is incorporated by reference herein in its entirety.
Number | Date | Country | |
---|---|---|---|
62979959 | Feb 2020 | US |