Current computing systems provide a certain amount of ability to match promotional content to end-users. Such systems attempt to tailor promotional content to the wants and needs of the user, to present him or her with offers for desired products or services. However, such systems are currently subject to limitations. In particular, matching promotional content to users continues to be limited in its ability to reach audiences with high conversion rates. Contemporary systems often simply play promotional content at predetermined intervals, play promotional content selected according to user-defined preferences, or attempt to divine these user preferences indirectly such as via past purchases, user search history, or the like. These and other approaches have demonstrated a limited ability to predict true user preferences at any particular point in time, and have thus shown limited ability to select promotional content that accurately matches a user's preferences or interests at the time this promotional content would be displayed.
Accordingly, to overcome the limited ability of computer based systems to match users with effective promotional content, systems and methods are described herein for a computer-based process that classifies content into specified categories as it is being played, and selects promotional content matching these categories. Thus, for example, matched promotional content may be played for the user in real time while the content still matches the specified categories, or matching promotional content may be played at a transition in which the played content shifts categories. In this manner, systems of embodiments of the disclosure may play promotional content in real time, which matches what the user is currently watching. This increases the likelihood that the promotional content is targeted to something of current interest to the user, thus increasing the effectiveness of such promotional content.
In more detail, systems of embodiments of the disclosure may determine classifications of content as that content is being consumed, such as by classifying each content frame as it is displayed for consumption. When the content maintains a similar set of classifications for a period of time, such as during a particular scene in which the setting and/or subject remains the same, a period of time in which the same or similar products are being shown, or the like, the system may determine that the user is interested in content with those particular classifications. Accordingly, promotional content having one or more of the same or similar classifications, or any one or more classifications that correspond thereto, may then be selected and transmitted for display to the user. This promotional content may be displayed for the user at any time, although in some situations it may be desirable to display the promotional content while, or shortly after, the consumed content contains those particular classifications.
Content may be classified according to one or more machine learning models. For example, the system may employ one or more known machine learning classifiers, such as a recurrent neural network trained to receive content frames as input and to generate various features of those input frames as output. Further machine learning models may be employed for classification based on these features. Any type or types of machine learning models suitable for classification are contemplated. In one embodiment of the disclosure, the classification process may be broken into steps each handled by a different model or models. For instance, relevant machine learning features used for classification may first be determined, and those features may then be used to generate classifications of the content. These features may also be used to update a user profile, so that user profiles maintain stored features of content the user has consumed. These stored features may then be classified to determine the types of content the user has consumed in the past, which may in turn indicate the types of content he or she is interested in, and thus the types of promotional content that may be effective.
Additional machine learning models may be employed to match promotional content to the content currently being consumed by the user. In some embodiments of the disclosure, a set of machine learning models may be trained to generate a yes/no promotional content match output from inputs that include the determined content classifications, that is, to recommend promotional content that matches certain classifications. These models may be trained using labeled sets of classifications that are deemed to match, or not to match, promotional content. In this manner, producers of promotional content may specify certain classifications they deem as effective matches for their promotional content, and the machine learning models may then be trained to determine whether the user is currently consuming content that is a match for their promotional content. If so, this promotional content may be deemed as a good match for the user, and may be played for the user accordingly.
To improve the ability of such models to match user-consumed content to promotional content, user behavior information may be employed as an additional input. More specifically, the promotional content matching models may be configured and trained to take in user behavior information as an input, in additional to content classifications. Behavior information may include any aspect of user behavior, such as applications the user has open, websites the user is currently viewing, and the like. The model may thus be trained on both classifications deemed as effective matches for promotional content, as well as user behaviors that are found to be effective predictors of interest in that promotional content.
As above, promotional content may be displayed for the user at any time deemed appropriate. For example, promotional content may be displayed after a particular content segment bearing particular classifications is completed, e.g., at the transition between one segment or scene matching the promotional content, and the next segment or scene. As another example, promotional content may instead be played immediately upon matching with a particular content segment. That is, once matching promotional content is determined, the content the user is currently viewing or consuming may be interrupted for play of the promotional content.
Embodiments of the disclosure may be applied to match promotional content to current consumption of any type of content. This includes both content such as video and audio comprising time-varying images or other signals, as well as content such as web pages which are largely time-invariant but for which only a portion may be viewed at a time. Promotional content may thus be matched with any currently-displayed portion or segment of any type of content that may be consumed by a user.
The above and other objects and advantages of the disclosure will be apparent upon consideration of the following detailed description, taken in conjunction with the accompanying drawings, in which like reference characters refer to like parts throughout, and in which:
In one embodiment, the disclosure relates to systems and methods for real-time matching of promotional content to content that a user is currently consuming. Content that is currently being consumed is classified into descriptive categories, such as by determining a vector of content features where this vector is in turn used to classify the currently-played content. Promotional content having classifications that match the classifications of the currently-played content is then determined. Matching promotional content may then be played for the user in real time. In this manner, systems and processes of embodiments of the disclosure may identify promotional content matching what the user is currently watching, so as to present users promotional content tailored to subject matter the user is currently interested in.
Embodiments of the disclosure contemplate content classification and subsequent promotional content matching in any suitable manner. Many such methods exist. In embodiments of the disclosure, content may be classified by determining relevant textural or visual features, and assembling these features into a vector that may be accompanied by supplemental information such as the sequence position (e.g., timestamp) of the content frame and the duration of the current segment. A machine learning model may then classify these feature vectors, with the resulting classifications matched to classifications of promotional content. Exemplary embodiments of the content classification and matching process are described in U.S. patent application Ser. No. 16/698,618, filed on Nov. 27, 2019, which is hereby incorporated by reference in its entirety. Further embodiments are described in
As referred to herein, the term “signature analysis” refers to the analysis of a generated feature vector corresponding to at least one frame of a video using a machine learning model. As referred to herein, a signature analysis for video includes signature analysis for one or more static images (e.g., at least one frame of a video). As referred to herein, a video signature includes a feature vector generated based on texture, shape intensity, and temporal data corresponding to at least one frame of a video. As referred to herein, the term “content item” should be understood to mean an electronically consumable user asset, such as television programming, as well as pay-per-view programs, on-demand programs, Internet content (e.g., streaming content, downloadable content, or Webcasts), video, audio, playlists, electronic books, social media, applications, games, any other media, or any combination thereof. Content items may be recorded, played, displayed or accessed by devices. As referred to herein, “content providers” are digital repositories, conduits, or both of content items. Content providers may include cable sources, over-the-top content providers, or other sources of content.
At least one frame of video 201 is used to generate feature vector 203. In some embodiments, the system 200 determines a texture associated with the at least one frame of video 201 using the texture analyzer of signature analyzer 202. The texture analyzer may use a statistical texture measurement method such as edge density and direction, local binary partition, co-occurrences matrices, autocorrelation, Laws texture energy measures, any suitable approach to generating texture features, or any combination thereof. Texture determination is discussed in the description of
Feature vector 203 is analyzed using machine learning model 205 to produce a machine learning model output. In some embodiments, a machine learning model includes a neural network, a Bayesian network, any suitable computational characterization model, or any combination thereof. In some embodiments, a machine learning model output includes a value, a vector, a range of values, any suitable numeric representation of classifications of a content item, or any suitable combination thereof. For example, the machine learning model output may be one or more classifications and associated confidence values, where the classifications may be any categories into which content may be classified or characterized as. This may include, for instance, genres, products, backgrounds, settings, volumes, actions, any objects, or the like. As is known, machine learning model 205 may be trained in any suitable manner to generate any types or categories of classifications.
In some embodiments, matching engine 206 determines whether a match exists between the output of machine learning model 205 and any promotional content. For instance, classifications output from machine learning model 205 are compared to predetermined classifications of promotional content. Matches between promotional content classifications and classifications of frames of video 201 may be determined in any manner, such as by the number of identical classifications, the degree of similarity of a number of classifications, or in any other manner. Embodiments of the disclosure also contemplate implementation of a machine learning model within matching engine 206, which may determine whether particular promotional content matches the output classifications of machine learning model 205. In embodiments of the disclosure, this machine learning model may be any model capable of determining a match between two sets of classifications. Such a model may, for example, be any machine learning classifier, such as a K-nearest neighbor classifier, a multilayer perceptron, a CNN, or the like. In embodiments of the disclosure, classifiers may be trained on input labeled classification sets, to output a match between the determined classification spaces and the classifications of user content. Classifiers may also be trained in unsupervised manner, such as on predetermined classifications of promotional content.
The machine learning model of matching engine 206 may also be configured to consider user behavior information. That is, various user behavior information may be an input to the model, so that the model is trained to consider user behavior as one or more variables in addition to content classifications. Behavior information may include any aspect of user behavior that may correlate with likelihood of purchasing any product or service, such as applications the user has open, websites the user is currently viewing, purchases made recently or historically, or the like. Labeled user behavior information may thus be used in training the machine learning classifier of engine 206. User behavior information may be stored in any manner, such as in a user profile that may itself be stored in storage 508 or in any other accessible location such as a remote server. Such user profiles may also contain other information used in the content matching processes of embodiments of the disclosure. This other information may, for example, include feature vectors previously generated as above by signature analyzer 202, so that user profiles contain records of the types of content (e.g., classifications) that the user has shown interest in.
Once a match is determined, matching engine 206 may retrieve and transmit the matched promotional content for display to the user, such as by insertion into the content stream of video 201. Matched promotional content may be displayed for the user in any manner, and at any time, including as above immediately upon determining matching promotional content or at the end of the video segment of segmented video 204.
In some embodiments, the deep recommendation system uses local binary partition (LBP) to determine a texture associated with at least one frame of a video. For example, each center pixel in image 301 is examined to determine if the intensity of its eight nearest neighbors are each greater than the pixel's intensity. The eight nearest neighbors of pixel 303 have the same intensity. The LBP value of each pixel is an 8-bit array. A value of 1 in the array corresponds to a neighboring pixel with a greater intensity. A value of 0 in the array corresponds to a neighboring pixel with the same or lower intensity. For pixel 303 and pixel 304, the LBP value is an 8-bit array of zeros. For pixel 305 and 306, the LBP value is an 8-bit array of 3 zeroes and 5 ones (e.g., 11100011), corresponding to the 3 pixels of lower intensity and 5 pixels of higher intensity. A histogram of the LBP values for each pixel of the image may be used to determine the texture of the image.
Co-occurrence matrices may be used to determine a texture associated with at least one frame of a video. A histogram indicative of the number of times a first pixel value (e.g., a gray tone or color value) co-occurs with a second pixel value in a certain spatial relationship. For example, a co-occurrence matrix counts the number of times a color value of (0, 0, 0) appears to the left of a color value of (255, 255, 255). The histogram from a co-occurrence matrix may be used to determine the texture of the image. Resulting textures may be output as an element of feature vector 203.
Line 402, depicted as defining the trunk of a car, is extended over the lines of the car for clarity. A perpendicular line at an angle α1 and at distance d1 intersects line 402. A GHT space defined by perpendicular line angles, α, at distances, d, define the axes for the GHT space. The line defining the trunk of the car in image 301 is mapped to point 403 in the GHT space. Line 402 and other determined geometric elements may be output as an element of feature vector 203.
In some embodiments, the methods and systems described in connection with
Device 500 may receive content and data via input/output (hereinafter “I/O”) path 502. I/O path 502 may provide content (e.g., broadcast programming, on-demand programming, Internet content, content available over a local area network (LAN) or wide area network (WAN), and/or other content) and data to control circuitry 504, which includes processing circuitry 506 and storage 508. Control circuitry 504 may be used to send and receive commands, requests, and other suitable data using I/O path 502. I/O path 502 may connect control circuitry 504 (and specifically processing circuitry 506) to one or more communications paths (described below). I/O functions may be provided by one or more of these communications paths, but are shown as a single path in
Control circuitry 504 may be based on any suitable processing circuitry such as processing circuitry 506. As referred to herein, processing circuitry should be understood to mean circuitry based on one or more microprocessors, microcontrollers, digital signal processors, programmable logic devices, field-programmable gate arrays (FPGAs), application-specific integrated circuits (ASICs), etc., and may include a multi-core processor (e.g., dual-core, quad-core, hexa-core, or any suitable number of cores) or supercomputer. In some embodiments, processing circuitry may be distributed across multiple separate processors or processing units, for example, multiple of the same type of processing units (e.g., two Intel Core i7 processors) or multiple different processors (e.g., an Intel Core i5 processor and an Intel Core i7 processor). In some embodiments, control circuitry 504 executes instructions for causing to be provided deep recommendations based on image or signature analysis.
An application on a device may be a stand-alone application implemented on a device or a server. The application may be implemented as software or a set of executable instructions. The instructions for performing any of the embodiments discussed herein of the application may be encoded on non-transitory computer-readable media (e.g., a hard drive, random-access memory on a DRAM integrated circuit, read-only memory on a BLU-RAY disk, etc.) or transitory computer-readable media (e.g., propagating signals carrying data and/or instructions). For example, in
In some embodiments, an application may be a client-server application where only the client application resides on device 500 (e.g., device 602), and a server application resides on an external server (e.g., server 606). For example, an application may be implemented partially as a client application on control circuitry 504 of device 500 and partially on server 606 as a server application running on control circuitry. Server 606 may be a part of a local area network with device 602, or may be part of a cloud computing environment accessed via the Internet. In a cloud computing environment, various types of computing services for performing searches on the Internet or informational databases, gathering information for a display (e.g., information for providing deep recommendations for display), or parsing data are provided by a collection of network-accessible computing and storage resources (e.g., server 606), referred to as “the cloud.” Device 500 may be cloud clients that rely on the cloud computing capabilities from server 606 to gather data to populate an application. When executed by control circuitry of server 606, the system may instruct the control circuitry to provide content matching on device 602. The client application may instruct control circuitry of the receiving device 602 to provide matched promotional content. Alternatively, device 602 may perform all computations locally via control circuitry 504 without relying on server 606.
Control circuitry 504 may include communications circuitry suitable for communicating with a content server or other networks or servers. The instructions for carrying out the above-mentioned functionality may be stored and executed on server 606. Communications circuitry may include a cable modem, a wireless modem for communications with other equipment, or any other suitable communications circuitry. Such communications may involve the Internet or any other suitable communication network or paths. In addition, communications circuitry may include circuitry that enables peer-to-peer communication of devices, or communication of devices in locations remote from each other.
Memory may be an electronic storage device provided as storage 508 that is part of control circuitry 504. As referred to herein, the phrase “electronic storage device” or “storage device” should be understood to mean any device for storing electronic data, computer software, or firmware, such as random-access memory, read-only memory, hard drives, optical drives, solid state devices, quantum storage devices, gaming consoles, or any other suitable fixed or removable storage devices, and/or any combination of the same. Nonvolatile memory may also be used (e.g., to launch a boot-up routine and other instructions). Cloud-based storage (e.g., on server 606) may be used to supplement storage 508 or instead of storage 508.
Control circuitry 504 may include display generating circuitry and tuning circuitry, such as one or more analog tuners, one or more MP3 decoders or other digital decoding circuitry, or any other suitable tuning or audio circuits or combinations of such circuits. Encoding circuitry (e.g., for converting over-the-air, analog, or digital signals to audio signals for storage) may also be provided. Control circuitry 504 may also include scaler circuitry for upconverting and downconverting content into the preferred output format of the device 500. Circuitry 504 may also include digital-to-analog converter circuitry and analog-to-digital converter circuitry for converting between digital and analog signals. The tuning and encoding circuitry may be used by the device to receive and to display, to play, or to record content. The tuning and encoding circuitry may also be used to receive guidance data. The circuitry described herein, including for example, the tuning, audio generating, encoding, decoding, encrypting, decrypting, scaler, and analog/digital circuitry, may be implemented using software running on one or more general purpose or specialized processors. Multiple tuners may be provided to handle simultaneous tuning functions. If storage 508 is provided as a separate device from device 500, the tuning and encoding circuitry (including multiple tuners) may be associated with storage 508.
A user may send instructions to control circuitry 504 using user input interface 510 of device 500. User input interface 510 may be any suitable user interface touch-screen, touchpad, stylus and may be responsive to external device add-ons such as a remote control, mouse, trackball, keypad, keyboard, joystick, voice recognition interface, or other user input interfaces. User input interface 510 may be a touchscreen or touch-sensitive display. In such circumstances, user input interface 510 may be integrated with or combined with display 512. Display 512 may be one or more of a monitor, a television, a liquid crystal display (LCD) for a mobile device, amorphous silicon display, low temperature poly silicon display, electronic ink display, electrophoretic display, active matrix display, electro-wetting display, electro-fluidic display, cathode ray tube display, light-emitting diode display, electroluminescent display, plasma display panel, high-performance addressing display, thin-film transistor display, organic light-emitting diode display, surface-conduction electron-emitter display (SED), laser television, carbon nanotubes, quantum dot display, interferometric modulator display, or any other suitable equipment for displaying visual images. A video card or graphics card may generate the output to the display 512. Speakers 514 may be provided as integrated with other elements of device 500 or may be stand-alone units. Display 512 may be used to display visual content while audio content may be played through speakers 514. In some embodiments, the audio may be distributed to a receiver (not shown), which processes and outputs the audio via speakers 514.
Control circuitry 504 may allow a user to provide user profile information or may automatically compile user profile information. For example, control circuitry 504 may track user preferences for different video signatures and deep recommendations. In some embodiments, control circuitry 504 monitors user inputs, such as queries, texts, calls, conversation audio, social media posts, etc., to detect user preferences. Control circuitry 504 may store the user preferences in the user profile. Additionally, control circuitry 504 may obtain all or part of other user profiles that are related to a particular user (e.g., via social media networks), and/or obtain information about the user from other sources that control circuitry 504 may access. As a result, a user can be provided with real-time matched promotional content.
Device 500 of
In system 600, there may be multiple devices but only one of each type is shown in
As depicted in
At Step 704, system 200 transforms the at least one frame of the video to generate a shape intensity. A method as described in the description of
At Step 706, the deep recommendation system generates a feature vector based on the texture, the shape intensity, and temporal data corresponding to the at least one frame of the video. The texture determined in step 702 and shape intensity determined in Step 704 may be structured in a feature vector with temporal data indicative of a change in texture and shape intensity over time. Temporal data corresponding to at least one frame of a video includes the time to display the at least one frame (e.g., segment sequence positions, timestamp information, or the like), the number of frames (or, e.g., segment duration or the like), a difference in texture and/or shape intensity over the time or number of frames, any suitable value of change over feature vector values for frames over time, or any combinations thereof.
At Step 708, the deep recommendation system analyzes the feature vector using a machine learning model 205 to produce a machine learning model output. For example, as above, the feature vector is analyzed using a neural network to produce classifications of the video frame.
At Step 710, the system 200 determines whether any promotional content matches the classifications output at Step 708. As above, classifications output at Step 708 may be matched against classifications or classification spaces of promotional content, such as by a trained machine learning model. If a match is found, i.e., the classifications output at Step 708 are sufficiently similar to classifications of particular promotional content, that promotional content may be transmitted for display to the user.
Machine learning model 205 may be trained in any suitable manner.
In some embodiments, the feature vectors received in Step 802 include information indicative of a texture associated with at least one frame of a video, a shape intensity based on a transform of the at least one frame of the video, and temporal data corresponding to the at least one frame of the video. For example, the feature vectors include a value corresponding to the texture of at least one frame (e.g., as determined by methods described in connection with
At Step 804, the training system trains the machine learning model using the labeled feature vectors to produce a trained machine learning model for classifying content feature vectors. In some embodiments, training the machine learning model includes iteratively determining weights for a neural network while minimizing a loss function to optimize the weights, such as by use of a gradient descent method.
The system 200 then selects promotional content having one or more classifications corresponding to the classifications output by machine learning model 205 (Step 910). This is accomplished by matching promotional content to the classification output of model 205. As above, matching may be performed in any manner, such as by determination of greater than some predetermined number of identical or similar (within a predetermined difference metric) classifications, or via use of a machine learning model trained to determine whether classified content falls within the classification space of various promotional content.
The system 200 then transmits, or causes to be transmitted, any matched promotional content for display to the user (Step 920). Matched promotional content may be displayed at any time and in any manner, such as after display of a particular content portion being played, e.g., after (including immediately after) the currently-consumed segment 204. Alternatively, or in addition, the content being consumed may be interrupted for immediate display of the matched promotional content. In this manner, system 200 may determine matching promotional content in real time, which matches characteristics of those portions of content that are currently being consumed. This promotional content may then be displayed for the user while the user is still viewing the matching content. In this manner, promotional content may be played to match the user's immediate interests, increasing the likelihood of conversion.
System 200 then determines classifications of the content portions, such as via one or more machine learning models that take as input the generated features of the content portions and generate the classifications as output (Step 1010). As above, machine learning model 205 may be trained to classify input feature vectors, yielding classifications for each video frame or group of frames.
Once classifications of the currently consumed video portion are determined, system 200 matches promotional content to these video portions (Step 1020). As above, in some embodiments this may be accomplished through use of a machine learning model such as a K-nearest neighbor or other classifier trained to determine whether content classifications fall into the classification space of various promotional content. In these embodiments, the machine learning model would receive as input the classifications of the currently consumed video portion, and would determine as output the identity of any matching promotional content. In certain embodiments, the machine learning model would also receive as input user behavior information describing current user behavior relevant to the likelihood of purchasing any product or service. In these embodiments, the classifier would match the classifications of the currently consumed video portion and the user's current behavior to classifications and corresponding behavior positively correlated with specified promotional content. User behavior may be, for example, determined from current user behavior, retrieved from a stored user profile, or otherwise determined in any manner.
As above, embodiments of the disclosure may be applied to match promotional content to current consumption of time-varying content such as video, audio, or the like. It is noted, though, that embodiments of the disclosure may also be applied to match promotional content to any other type of user-consumed content. This may include content such as web pages and the like, which users may scroll through and thus view only a portion of such content at any given time, even though the content itself is largely time-invariant.
System 200 may also determine whether the content page portion currently being displayed is different from the content page portion submitted as input to the system 200 (Step 1120). That is, system 200 may determine whether the user has scrolled to a different portion of the content page since the classification of Step 1100 was performed. This determination may be made by a comparison of the image input at Step 1100 to a subsequent image received from the user device.
If the user has scrolled to a different portion of the content page, system 200 may transmit the matched promotional content for display on that portion of the content page to which the user has scrolled, i.e., the portion of the page which the user is currently consuming (Step 1130). This increases the likelihood that the user will actually view the matched promotional content. Embodiments of the disclosure contemplate display of matched promotional content in any manner, so long as such display occurs on the portion of the page which the user is currently consuming. For example, matched promotional content may be displayed in an overlying popup window, such as when the content page is a web page. As another example, matched promotional content may be displayed in a picture-in-picture (PiP) window. Any manner of display is contemplated.
System 200 then determines classifications of the currently displayed content portions, such as via one or more machine learning models that take as input the generated features of the content portions and generate the classifications as output (Step 1210). As above, machine learning model 205 may be trained to classify input feature vectors, yielding classifications for each content page portion.
Once classifications of the currently displayed content page portion are determined, system 200 matches promotional content to these content page portions (Step 1220). As above, in some embodiments this may be accomplished through use of a machine learning model such as a K-nearest neighbor or other classifier trained to determine whether content classifications fall into the classification space of various promotional content. In these embodiments, the machine learning model would receive as input the classifications of the currently displayed content page portion, and would determine as output the identity of any matching promotional content. In certain embodiments, the machine learning model would also receive as input user behavior information describing current user behavior relevant to the likelihood of purchasing any product or service. In these embodiments, the classifier would match the classifications of the currently displayed content page portion and the user's current behavior to classifications and corresponding behavior positively correlated with specified promotional content.
The foregoing description, for purposes of explanation, used specific nomenclature to provide a thorough understanding of the disclosure. However, it will be apparent to one skilled in the art that the specific details are not required to practice the methods and systems of the disclosure. Thus, the foregoing descriptions of specific embodiments of the present invention are presented for purposes of illustration and description. They are not intended to be exhaustive or to limit the invention to the precise forms disclosed. Many modifications and variations are possible in view of the above teachings. For example, any content may be classified, whether time-varying content such as audio and/or video, or generally time-invariant content such as web pages and the like. Matching promotional content can be determined in real time and displayed for the user in any time and manner, whether by insertion into a content stream, via a popup or PiP window, immediately upon matching, at the conclusion of a determined segment, or the like. The embodiments were chosen and described in order to best explain the principles of the invention and its practical applications, to thereby enable others skilled in the art to best utilize the methods and systems of the disclosure and various embodiments with various modifications as are suited to the particular use contemplated. Additionally, different features of the various embodiments, disclosed or otherwise, can be mixed and matched or otherwise combined so as to create further embodiments contemplated by the disclosure.