This application claims priority to application Ser. No. ______ entitled “SYSTEMS AND METHODS FOR LINKING A PRODUCT TO EXTERNAL CONTENT” and application Ser. No. ______ entitled “SYSTEMS AND METHODS FOR ANALYZING CUSTOMER REVIEWS”, both of which are filed concurrently herewith and the contents of which are incorporated by reference.
The present invention relates to machine learning of business operation parameters and management thereof.
Decision making can be difficult for reasons ranging from vague reporting structures to the complexities that naturally arise when an organization matures, and more decisions and decision makers are involved. The result is often wasted time, confusion, and frustration. Individually, everyone's intentions are good, yet the whole performs poorly. To counter the growth in information indigestion, management software such as executive information systems (EIS), group decision support systems (GDSS), and organizational decision support systems (ODSS) have been developed to help organizations to focus on data-driven decision making.
A decision support system (DSS) is a computerized program used to support determinations, judgments, and courses of action in an organization or a business. A DSS sifts through and analyzes massive amounts of data, compiling comprehensive information that can be used to solve problems and in decision-making. The DSS can help decision makers use communications technologies, data, documents, knowledge and/or models to complete decision process tasks. The DSS is a class of computerized information system that support decision-making activities. Historically, the DSS is run by analysts who collect and massage the data before generating reports for management.
Systems and methods are disclosed for automated business intelligence from business data to improve operations of the business. The system extracts signals from any unstructured data source. The system identifies anomalies in customer data and global trends for retail companies that present opportunities and crises to avoid and suggests optimal courses of action and estimated financial impact. The system also alerts individuals with opportunities and predicts customers' needs.
Advantages of the system may include one or more of the following. The system enables users to understand what customers are thinking by extracting insights from any open-ended text, including chat logs, product reviews, transcripts, and more. The system enables users to perform Data-Driven Merchandising, for example, to answer which product attributes are most likely to surge and underperform in the next season, and why? The system also enables users to identify Marketing ROI and answer questions such as “what are the products and customer segments that would benefit the most from marketing, and what are the right assortments to highlight?” The system enables users to identify the buying process that aligns the voice of the customer with the needs of the enterprise. Customer Experience is improved, and new needs can be anticipated. The system further identifies customer segment churns and how to re-engage customers. The system enables users to perform Dynamic Markdown—which items should be put on clearance? If so, when and by how much? In other uses, the system excels in finding behavioral patterns and early signals of surges and declines, from any data source. Combining signals from text reviews to clickthrough, among others. The system stitches exhaustive personas and their behavioral shifts, how they are interacting with your offerings, and how this impacts the bottom line. The system can handle large amounts of data and saves users from mining such data to understand what customers are predict trends and capitalize on future demand by finding anomalies and patterns in sales data. The system helps users in knowing which products appear most often across social media (comments, posts, videos, etc.) to stay on top of what's trending. Sales opportunities can be accelerated as the system can predict when customers will interact with brands and turn consumer behavior into sales opportunities and margin improvements. The system helps to optimize customer engagement and maps each customer to the products they actually want to buy and minimize markdowns by engaging them at the times they're most likely to purchase. The system increases revenue through proper inventory allocation and reduces carry-over across product catalog by capitalizing on niche buying and merchandising opportunities. The system improves decision making and identifies demand drivers and improves product development by unifying transaction data with external information about market trends. Bringing together applied machine learning, data science, social science, and managerial science, the system automatically recommends options to reduce the effort required to make higher-quality decisions for users.
A detailed description of preferred embodiments of the present invention will be given below with reference to the accompanying drawings. In the following description of the present invention, when it is determined that a detailed description of a related well-known function or element may make the gist of the present invention unnecessarily vague, the detailed description will be omitted.
As shown in
The multi-source data collection module 10 collects data from a variety of data sources. For example, the present system and method collect data from e commerce websites retail brick and mortar stores and social media including review sites. Such systems may include any number of analysis engines that enable management or other users to generate one or more analyses. The results of these analyses can be stored in one or more databases as well as used to facilitate decision making and where the system can predict customer behavior. This includes what customers will do in the future based on past actions. The system can predict customer behavior based on the multiple sources of information including unstructured text, video images, maps, weather, time of day, purchase history, competitor activity, social network activities, trends, forums, blogs, specific categories of product purchases, warranty claims, responses to advertisements, interactions with manufacturers, credit card transactions, voice calls, GPS/geographic location, market research, sensor information, email news, and trends. These predictions are made across all industries not just retail which enables greater depth of insight into customer's needs and wants. As these needs and wants can change quickly, the data is continuously updated to support real-time understanding of customer behavioral changes over time.
As the amount of unstructured data can be large, the conversion of these sources into structured data can be a daunting task. As shown in
In one embodiment, a schema defines the data structures for storing data internally. All the structured, unstructured (text, image, among others), and tabular data is saved in this format. After ingesting raw data both from the clients and from external sources, transformers convert them into the Schema. All further metrics and entities are created from them. Once standardized data enables the system to process data regardless of where data was ingested from and only focused on what the data contains. This can be standardized across every company or retailer. The system can scale the data for a mom & pop shop all the way to large retailers. The system also ensures that code can run for different users. In one implementation for retailers, the system can
Analyze internal systems used by retailers to see what data they store
Analyze use cases from data science teams to see what data is needed
The pipeline comprises a series of tasks, which has some inputs and some outputs. The inputs can be marked as required or optional. In case a required input is missing, then the task isn't run (no review analysis). Where an optional input is missing, the task is run without the missing data. For example, if the product has good reviews to explain why it is trending, there is no need for reviews using the sales trend as well.
Finally, marking fields as optional makes the pipeline self-sufficient. With a large number of data sources, such options enable future data sources to be added iteratively over time. As the pipeline checks for requirements at every step, failed cases can still work whenever data or reviews are added—no manual intervention needs to be done.
The schema module addresses the difficulty in industrializing the work of data scientists with independent teams of data scientists for each client. The instant standard schema may represent all the data that is necessary and that will ever be needed.
In one exemplary embodiment, the data collection and transformation module 20 performs entity creation and transformed data creation as illustrated in the following exemplary operations:
Create Entities:
Create a Base Transformed Data:
In
Next the anomaly detection is detailed. The system identifies shifts in business through analyzing large amounts of data in order to detect anomalies, which are identified by observing irregular abrupt unexpected or inexplicable variations in metrics from normal. In other implementations, identifying anomalies involves the identification of unusual values in time series data time sequence data. In yet other implementations, statistical forecasting statistical forecasting can be used for making inferences about future events on the basis of past events. Time series forecasting differs from most other forecasting techniques in that time series techniques focus primarily on historical information rather than external sources of information such as trends or predictions of related companies or markets, but it is less affected by unforeseen events as only historical information is used.
An exemplary processing of data for a retailer is detailed as: Data Sources→Internal Schema→Metrics→Insights
Entities form the core backbone of the platform. An entity can be a store, a product, a sku, a category, a customer, customer segment, or anything which is a tangible unit which can be taken action on, and has metrics. Each insight is generated for an entity. Each metric is an attribute generated for an entity.
As the system generates any insight or any metric, they are all connected to an entity. Entities can also define hierarchy. With an example of products, the system may have:
Once anomalies are detected, insights can be generated to aid users in correcting course.
In one embodiment, a migration utility provides centralized automated migration from disparate systems to a single view of the customer across a variety of business functions and sales channels. Users can transform massive amounts of information to gain comprehensive, real-time views of the customer that are needed to run successful business today. The user interface supports data analysis by simplifying its extraction representation manipulation storage retrieval transmission and visualization.
An exemplary flow showing operations from Data Sources→Internal Schema→Metrics→Insights is detailed next.
1. Transformers: Data Sources→Internal Schema
With the growing use of SaaS platforms, every company uses a slightly different stack, leading to a diverse number of data sources where eventually, all these need to be processed by the data science pipeline to generate insights. For example, the data can be from third party email platforms such as MovableInk, Klaviyo, SendGrid, MailChimp, or from third party Store Management Platforms such as WooCommerce, BigCommerce, Shopify, SQL database, among others. The underlying data they contain is the same. An invoice line item will always have the same set of core fields irrespective of where the system get the data from. The source of the data and its initial format should be abstracted away from the data scientists and the data pipeline. The system can ingest a number of data sources and they can be categorized into different information types: eg. inventory, transaction line, page views/conversion, item catalog information, etc.
The system then applies a defined Internal Schema—a common framework that can work for a wide variety of clients. The transformer functions convert each data source (or combination of data sources) into one (or multiple) internal schema formats. The transformed internal schema files are all saved as Pandas parquet files which are timestamped by their date. These are devised to be modular. For example, transaction lines and returns can be represented in the same DataFrame, but the system can separate them to ensure that the system can easily check if a client has specific returns information or not and proceed accordingly.
The modularity also means that the system can add new data sources without breaking any existing clients. New clients need only the bare minimum (transaction lines, product catalog) to start working with the system. The system can define dependencies and break every information piece into its atomic units.
For third party platform integrations with Shopify, Google Analytics, the system needs to write the transformer once per platform. For custom enterprise integrations, a custom transformer is used for each integration. For example, the transformer can read customer's csv dumps about their item catalog. The system runs jobs that pull the data from all these sources and store the raw data in j son/csv/Pandas parquet files. These jobs are arranged in a directed acyclic graph because they can depend on each other. For example, the system would need to have the item catalog before the system can pull the reviews.
2. Metric Generation: Internal Schema→Metrics
This is the step for all statistical analyses, machine learning, and deep learning models that work with all the data that is present. Various examples are shown in
The benefit of the internal schema is that:
This enables the system to collate all computationally intensive tasks together. Furthermore:
Each metric file has some required and some optional internal schema files it needs. It then generates a set of metrics relevant to a particular “metric-set”. For example, a “Channel Conversion” metric file uses the transaction and conversion internal schema files to generate a DataFrame where for each product, the system can see the views, orders, quantity sold, revenue, and fraction of sales coming in through each channel. The metrics also include deep-learning models for forecasting and the NLP stack. The goal is to do a majority of the computation at this step so that once the metrics are generated once, doing any processing on top of them is computationally trivial. In one embodiment, the metrics are all saved as Pandas parquet files which are timestamped by their created at date.
3. Insight Generation: Metrics→Insights
The system processes all structured and unstructured data sources together to create actionable insights. The insight is the final output of the data pipeline. An insight consists of three parts:
what is happening
what needs to be done
what will happen if recommended action is taken
Insights are created for entities. On the dashboard, there are different pages for different kinds of entities. Insights can be for Products, Product Categories, Customer Segments, and Stores, among others. This can be seen in the screenshot below on the left sidebar of
Step 1 New Insights
These are generated every week, for example. (The frequency can be tailored to client needs). Insights can fall into two categories:
Step 2 Triaging Insights
Step 3 Executing Insights
Insight that are approved contain actions that need to be done. There are two ways an action can be completed:
The user manually opens an approved insight and follows the instructions manually. Once this is complete, they mark the insight as done. Eg “Order more inventory for this item”. They add any comments they have about the insight. An exemplary UI is shown in
The user opens an approved insight and has a one-click integration to take action on the insight, as shown in
Step 3a: Snoozing Insights
Each of these three points are backed by “section cards” that showcase facts which led to the conclusion. Each section card is a fact followed by contextual information.
The section card processing is separated from the metric creation process because insight scripts are simpler parts of the stack that lean more on narrative and explicit logic—the second card is computationally light. They use the thresholds and numbers already calculated from the metrics. Each insight script develops a type of insight (eg. discount to reach milestone) for all products that fit that bucket. This runs in a few minutes and allows the insight creators to iterate on insights faster and avoid running computationally-intensive metrics again while creating a new insight or generating insights for clients. The approach also abstracts the computational logic from the insight creators and they only need to focus on creating the insight logic. For example: they will get a field call “Churn Probability” which is a probability for a particular customer churning from this brand. This is computed with deep learning models but the insight creator can take this at face value.
Each insight file has required and optional metrics it can use. For example, while giving a discount insight, it is essential to have the discount metric, but it can add more explanation to also show the conversion statistics (if this isn't present, the section related to conversations will not be populated).
In addition to creating the sections (these are separately placed to reuse sections), the insight files have logic to write down the narrative for the insight and action to be taken, in addition to calculating the incremental revenue that would be generated if the action is taken as planned.
The output from all insight files go to the master insight controller
Final Step: At the End of the Pipeline
Once all the insight scripts contribute their insights, each insight type is sorted by their incremental revenue. The master insight controller picks the insights to show in a round robin fashion, picks N insights and pushes the payload to the database which the Cerebra Dashboard uses to track insights shown to clients.
In another implementation, the system not only converts text to structured form but also extracts signals and generates useful insights in its analysis. For example, the system identifies anomalies and patterns in customer data and global trends for retail companies that present opportunities and crises to avoid and suggests optimal courses of action and estimated financial impact. The system enables users to understand what customers are thinking by extracting insights from any open-ended text including chat logs product reviews transcripts and more.
The system helps users perform data driven merchandising for example to answer which product attributes are most likely to surge and underperform in the next season and why, for example. The system can identify marketing ROI and answer questions such as what are the products and customer segments that would benefit the most from operational changes and what are the right assortments to highlight. The system extracts signals from unstructured data source and identifies anomalies in customer data and global trends for companies such as retail companies that present opportunities and suggests optimal courses of action and estimated financial impact, among others.
what is the event that triggered this insight
what is the action that should be taken and why
how to calculate about the potential business impact from taking this action
a tag showing what category this card falls into, and
contextual information or data that was used to make the claim
An Executed Action Lookback can be provided. These give detailed information into how the specific action that was taken by the customers impacted their sales. While developing any new action set for a client, the lookback logic is created hand-in-hand. Every action comes with a timeframe associated with it. For example, “Market this product on the Affiliate channel for 14 days. This will generate $10,000 in additional revenue”.
The system analyzes the sales to see how it had performed to previous expectations and if the action had the intended impact. In
One embodiment performs content analysis (K-Means) to ensure that the predicates cover all the labels covered in the customer reviews. A text processing pipeline performs the following:
The exemplary results below can allow a search by label selection . . . for example if a retailer wants to have all the reviews for which the color is great but after a wash, the item has shrunk or has been deformed.
Title: He's Worn it Daily for Two Years!
Title: Almost Perfect Jacket
One embodiment generates a UCF (Universal Customer Fingerprint) which tracks user behavior in detail. The UCF has 3 modules:
1. Detecting content consumption based on where the user spent time on.
2. Calculating a user's fingerprint based on their browsing behaviour so we can tie multiple sessions from the same device together (instead of using 3rd party cookies)
3. Determining a user vector based on their text consumption
In another embodiment, as the system pushes insights to users or clients, the system checks that:
The actions have a significant business impact
There are no missing data sources
There is no ‘bad data’ coming from data sources
There were no errors or drops anywhere in the data pipeline
These are challenges that come up with having fully automated and scalable systems that can adapt to any combination of data sources that are sent to it. To this end, after the insights are generated but before they are pushed to out clients dashboard, they are run through the automated QA process. The automated QA process checks for the following
Make sure all must-have sections are present and are not empty
Make sure the incremental revenue is higher than the minimum set by the client (eg. only show insights with $2,000 in revenue)
Make sure no blacklisted products are present (eg. those that have been discontinued, or those that are one time sales)
Make sure the forecasts are reasonable and there are no unexplainable trends P Check if the product link is valid
Check for any computational issues for example NaN and infinity values
Automated QA 1
Make sure all the variants in the product are included and their inventory values add up
Make sure there are no duplicate insights shown recently
Make sure there are no contradicting insights
Make sure there are no product duplicates
Make sure none of the core input data sources are empty
The automated QA test can be defined explicitly with a simple config file that is created for each client, which makes it easy for anyone to update the rules and thresholds for all the QA tests. In doing so, we maintain a single automated QA framework for our entire platform while at the same time allowing for client-level modifications.
In another embodiment detailed in the incorporated by reference applications, the system provides a method to automatically associate a product or a service with external content by:
The text extraction includes selecting a predetermined number of text identified by TF-IDF (term frequency-inverse document frequency).
The text extraction includes applying an explainability of an attention model to see if the attention model provides one or more keywords or tokens to keep.
The text extraction includes obtaining a primary keyword from a search term and obtaining a secondary keyword from the primary keyword and labeling the product text by word-set-match or by zero-shot learning (ZSL).
The text extraction can also include:
The method includes representing the product or service as a multimedia file; extracting meta data for the product or service corresponding to the multimedia file; and discovering keywords that connect the image to external signals coming from social media, news articles, or search.
The multimedia file comprises a picture or a video. The external content comprises one or more words in a search term. The method includes extracting signals from a social media site or from a search engine.
Another method can link a product or service to an external content by discovering one or more keywords associated with the product or service; and linking the product or service with the external content from social media. The text extraction can include selecting a predetermined number of text identified by TF-IDF (term frequency-inverse document frequency). The text extraction comprises applying an explainability of an attention model to see if the attention model provides one or more keywords or tokens to keep. The text extraction comprises obtaining a primary keyword from a search term and obtaining a secondary keyword from the primary keyword and labeling the product text by word-set-match or by zero-shot learning (ZSL).
In another embodiment detailed in the incorporated by reference applications, the system can incorporate data from a customer review of a product. This is done by extracting product categories and predicates from the customer review; extracting product features from the customer review; extracting an activity with the product features from the customer review; performing sentiment analysis using a learning machine on the customer review; determining a life scene from the customer review; and analyzing a customer opinion from the customer review.
In implementations, the system includes applying a language model to detect a language of the customer review. The system includes extracting the customer opinion from a review title or review content. The system includes extracting categories and predicates from a review title or review content. The system includes determining a polarity of the product category and electing the category. The system includes extracting product features from a review title or review content. The system includes extracting a user activity with the product from a review title or review content. The system includes performing sentiment analysis from a review title or review content. The system includes performing chunk extraction on a review title or review content. The system includes extracting a life scene from a review title or review content. The system can modify the preprocessed text by using coreference.
It is to be understood that the above description is intended to be illustrative, and not restrictive. For example, the above-described embodiments (and/or aspects thereof) may be used in combination with each other. Many other embodiments will be apparent to those of skill in the art upon reviewing the above description. The scope of the invention should, therefore, be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled. In the appended claims, the terms “including” and “in which” are used as the plain-English equivalents of the respective terms “comprising” and “wherein.” Also, in the following claims, the terms “including” and “comprising” are open-ended, that is, a system, device, article, or process that includes elements in addition to those listed after such a term in a claim are still deemed to fall within the scope of that claim. Moreover, in the following claims, the terms “first,” “second,” and “third,” etc. are used merely as labels, and are not intended to impose numerical requirements on their objects. The Abstract of the Disclosure is provided to comply with 37 C.F.R. § 1.72(b), requiring an abstract that will allow the reader to quickly ascertain the nature of the technical disclosure. It is submitted with the understanding that it will not be used to interpret or limit the scope or meaning of the claims. In addition, in the foregoing Detailed Description, various features may be grouped together to streamline the disclosure. This method of disclosure is not to be interpreted as reflecting an intention that the claimed embodiments require more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive subject matter may lie in less than all features of a single disclosed embodiment. Thus, the following claims are hereby incorporated into the Detailed Description, with each claim standing on its own as a separate embodiment.