The present invention relates to pasting an object to a video, particular to system and method for pasting an advertisement to a video.
It is known that one or more objects are allowed to be pasted to a video. The one or more objects may be advertising materials such as a 2D advertising banner/tag or a 2D advertising image. The 2D advertising banner/tag may occlude one or more objects in the video when pasting to the video. For example, the 2D advertising banner/tag occludes a performer in some scenes of the video, with the result that the video becomes unnatural and unreal. Audiences may be upset by such unnatural and unreal video and quit from viewing the video.
The present invention is directed to improvements that address foregoing issues and provide related advantages.
Below various embodiments of the present invention are described to provide methods for pasting an advertisement to a video via a video advertisement platform.
Example methods are disclosed herein. An example includes apparatus has an AI engine to receive the video having a plurality of video frames, in which an ending video frame is included. A first video frame of the plurality of video frames is scanned, wherein the first video frame has one or more first target objects and one or more second target objects. The AI engine determines whether the first video frame of the plurality of video frames is the ending video frame, based on a video frame index. When the first video frame is not the ending video frame, the AI engine determines whether a corresponding predetermined video frame information associated with the first video frame is identified in database. When the corresponding predetermined video frame information associated with the first video frame is identified in database, the AI engine segments the one or more second target objects and extracts the one or more segmented second target objects from the first video frame. One or more predetermined objects are pasted to the one or more first target objects in the video frame, based on the corresponding predetermined video frame information associated with the first video frame. The extracted one or more second target objects are pasted to the video frame.
The present application can be best understood by reference to the figures described taken in conjunction with the accompanying drawing figures, in which like parts may be referred to by like numerals.
The following description is presented to enable a person of ordinary skill in the art to make and use the various embodiments. Descriptions of specific devices, techniques, and applications are provided only as examples. Various modifications to the examples described herein will be readily apparent to those of ordinary skill in the art, and the general principles defined herein may be applied to other examples and applications without departing from the spirit and scope of the present invention. Thus, the disclosed invention is not intended to be limited to the examples described herein and shown, but is to be accorded the scope consistent with the claims.
In one embodiment, one or more video providers is/are registered user(s) of video advertisement platform 120. The one or more video providers use a video filming device, such as a smartphone, a tablet computer, a handheld computer, a camcorder, a video recorder, a camera or any device having video filming function, to make videos. Merely by way of example, first video providers 160a uses his/her smartphone to film one or more videos. Second video provider 160b uses a video recorder to film one or more videos. First video provider 160a and second video provider 160b are registered users and are allowed to upload one or more videos to video advertisement server 122. Both of first video provider 160a and second video provider 160b are influencers. Each of first video provider 160a and second video provider 160b has his/her own login name for example his/her email address. There is no limitation on the format of the login name. The login name may be any combination of letters and numbers. Each of first video provider 160a and second video provider 160b has his/her own login password.
Merely by way of example, Login interface 300 includes login name field 301 and login password field 302. First video provider 160a is allowed to enter his/her login names in login name field 301. First video provider 160a is then allowed to enter his/her login password in login password field 302.
Once first video provider 160a enters login name and login password successfully, first video provider 160a is allowed to access upload video interface 303 as illustrated in
Upload video interface 303 includes box 304 for video provider 160a to open a video file (which is also named as an original video) to be uploaded. The original video is then arranged to be uploaded to video advertisement server 122 and to be stored in storage 128.
Once the original video is uploaded to the video advertisement server 122 successfully, the original video will be displayed on video library interface 305 as illustrated in
Merely by way of example, first video provider 160a is allowed to update his/her profile information at profile interface 309 as illustrated in
Create campaign interface 310 includes campaign name field 312a, description 312b, campaign period field 312c, referred KOL field 312d, broadcast location field 312e and preferred video streaming platform 312f. There is no limitation on what fields are included in create campaign interface 311. For one example, create campaign interface 311 may further include category field (for example sport, fitness, music, lifestyle, food, technology and travel) and/or preferred language field. For another example, create campaign interface 311 may include campaign name field 312a and description 312b only. First service subscriber 180a is allowed to enter information in the corresponding field. In addition, create-campaign interface 311 provides a field or a box for first service subscriber 180a to upload one or more objects. First service subscriber 180a is allowed to upload one or more advertising materials via assets upload box 312g.
Merely by way of example, first service subscriber 180a is allowed to update profile information via profile interface 313 as illustrated in
The first video includes a plurality of video frames. Before the first video is scanned by AI engine 124, the plurality of video frames is arranged to be examined manually in order to identity one or more first target objects. For example, the first target object may be a quadrilateral object such as a picture frame, a monitor, a display or a television. There is no limitation on the shape of the first target object. The first target object may be a triangular, hexagonal, or octagonal objects. There is no limitation on the nature of the first target object. The first target object may be a table, a cabinet, a wall, a bed or any objects with plain surfaces.
The video frames may be manually examined one by one or may be manually examined in collective manner. For example, the first video includes N video frames, with a video frame index n (n is from 0 to N−1). A beginning video frame of N video frames has the video frame index equal to 0 (n=0) and an ending video frame of N video frames has the video frame index equal to N−1 (n=N−1).
In one embodiment, the video frames are manually examined one by one. When one or more first target objects are identified in the examined video frame, location(s) and shape(s) of the one or more first target objects is/are annotated, which will be considered as predetermined video frame information associated with the examined video frame and stored in database in storage 128.
One or more objects provided by service subscribers will be selected and retrieved from the database. The selection of the one or more objects may be automatically made by AI engine 124, based on content of the first video or may be manually made.
In one example, one or more objects provided by a service subscriber are one or more advertising materials, which are arranged to be retrieved and displayed on the examined video frame, based on the location(s) of one or more first target objects. In one example, the one or more objects is manually reshaped and aligned with the one or more identified first target objects. The one or more reshaped objects lie on a transparent plain surface.
The one or more reshaped objects together with the transparent plain surface are associated with the examined video frame and are stored in storage 128 as one or more predetermined objects associated with the examined video frame. The location(s) and the shape(s) of the one or more objects is/are the same as the location(s) and shape(s) of the one or more annotated first target objects. The same procedure will be applied to other video frames to be examined, in which one or more first target objects are identified.
In one embodiment, as illustrated in
Locations and shapes of first target objects 410a and 410b are annotated respectively. The locations and shapes of first target objects 410a and 410b will be considered as predetermined video frame information associated with the video frame with n=1000 and stored in the database.
One or more objects provided by service subscribers will be selected and retrieved from the database. In one example, two advertising materials 414a and 414b provided by first service subscriber 180a are arranged to be retrieved and displayed on the video frame with n=1000, based on the locations of first target objects 410a and 410b as illustrated
Two advertising materials 414a and 414b are manually reshaped and aligned with two identified first target objects 410a and 410b to became two reshaped advertising materials 414c and 414d as illustrated in
Two reshaped advertising materials 414c and 414d are associated with the video frame with n=1000 and are stored in storage 128 as one or more predetermined advertising materials associated with video frame with n=1000. The locations and the shapes of advertising materials 414c and 414d are the same as the locations and shapes of first target objects 410a and 410b. The same procedure will be applied to other video frames to be examined, in which one or more first target objects are identified.
Alternatively, two first target objects 410a and 410b are identified in the video frame with n=1000. The video frame with n=1000 is considered as a plain surface with x axis and y axis. For example, the x axis is from 0 to K and the y axis is from 0 to L. The value of K and the value of L depends on the resolution of the first target object. If the resolution is 720×480, the x axis is from 0 to 720 and the y axis is from 0 to 480. As illustrated in
In one embodiment, the video frames are manually examined in collective manner. One or more first target objects appears in full duration of the first video or one or more first target objects appears and disappears throughout the first video. The first video includes N video frames (such as N=10000), with video frame index from n=0 to n=9999 and has a full duration of 400 seconds.
In one embodiment, one or more first target objects appears in full duration of the first video. One or more first target objects have may different location(s) and shape(s) throughout the first video. For example, for video frame with n=0 to video frame with n=3000 (first batch), one or more first target objects having first location(s) and first shape(s) throughout video frame with n=0 to video frame with n=3000 are identified. For example, video frame with n=1000 is manually examined.
The first locations and first shapes of one or more first target objects are annotated, which will be considered as predetermined video frame information associated with video frames with n=1000. The first locations and first shapes of one or more first target objects will be associated with video frames of from video frame with n=0 to video frame with n=3000 to form predetermined video frame information associated with corresponding video frame.
One or more advertising materials is arranged to appear in video frame n=1000, based on the first location(s) of one or more first target objects. The one or more advertising materials are manually reshaped and aligned with the one or more first target objects. The one or more reshaped advertising materials lie on a transparent plain surface.
The one or more reshaped advertising materials together with the transparent plain surface are associated with video frame with n=10000 and are stored in storage 128 as one or more predetermined advertising materials associated with video frame with n=10000.
The one or more reshaped advertising materials together with the transparent plain surface will be associated with video frames included from video frame with n=0 to video frame with n=3000 to form one or more predetermined advertising materials associated with corresponding video frame.
For video frame with n=3001 to video frame with n=6000 (second batch), one or more first target objects having second location(s) and second shape(s) throughout video frame with n=3001 to video frame with n=6000 are identified. For example, video frame with n=4000 is manually examined. The same process above will be implemented in from video frame with n=3001 to video frame with n=6000.
For video frame with n=6001 to video frame with n=9999 (third batch), one or more first target objects having third location(s) and third shape(s) throughout video frame with n=6001 to video frame with n=9999 are identified. For example, video frame with n=7000 is manually examined. The same process above will be implemented in from video frame with n=6001 to video frame with n=9999.
In another embodiment, one or more first target objects appears and disappears throughout the first video. For example, one or more first target objects with first location(s) and first shape(s) are identified throughout video frame with n=0 to video frame with n=3000 (first batch). Also, one or more first target objects with second location(s) and second shape(s) are identified throughout video frame with n=6001 to video frame with n=9999 (second batch). No first target objects are identified throughout video frame with n=3001 to video frame with n=6000.
For video frame with n=0 to video frame with n=3000, video frame with n=1000 is manually examined. The first locations and first shapes of one or more first target objects are annotated, which will be considered as predetermined video frame information associated with video n=1000. The first locations and first shapes of one or more first target objects are associated with video frames of from video frame with n=0 to video frame with n=3000 to form predetermined video frame information associated with corresponding video frame.
One or more advertising materials are arranged to appear in video frame n=1000, based on the first location(s) of one or more first target objects. The one or more advertising materials are manually reshaped and aligned with the one or more first target objects. The one or more reshaped advertising materials lie on a transparent plain surface.
The one or more reshaped advertising materials together with the transparent plain surface are associated with the examined video frame and are stored in storage 128 as one or more predetermined advertising materials associated with video frame with n=1000. The one or more reshaped advertising materials together with the transparent plain surface will be associated with video frames of from video frame n=0 to video frame n=3000 to form one or more predetermined advertising materials associated with corresponding video frame.
For video frame with n=6001 to video frame with n=9999 (second batch), video frame with n=7000 is manually examined. The same process above will be implemented in from video frame with n=6001 to video frame with n=9999. For video frame with n=3001 to video frame with n=6000, no action will be performed.
In one embodiment, the video frames are manually examined in collective manner, based on coordinates of one or more first target objects. For example, one or more first target objects are identified in video frame with n=0 to video frame with n=3000 (first batch), coordinates of the one or more first target objects remains the same from video frame with n=0 to video frame with n=3000. Taking example of video frame with n=1000 being manually examined. Coordinates of four corners of one or more first target objects are manually annotated. For instance, first set of coordinates of first target object 410a are manually annotated as (99,19), (125,23) (98,64) and (124,65). First set of coordinates of first target object 410b are (162,41), (183,44) (163,82) and (183,82). The first set of coordinates of first target objects 410a and 410b will be considered as predetermined video frame information associated with the video frame with n=1000.
The first set of coordinates of first target objects 410a and 410b will be considered as predetermined video frame information associated with the video frame with n=1000 and stored in the database. Predetermined video frame information of each of video frames from n=0 to n=3000 will be updated with first set of coordinates of first target objects 410a and 410b.
For other batches having one or more of first target objects, the same process above will be implemented.
Once the manual examination on the first video completes successfully, the first video will be scanned by AI engine 124. The AI engine 124 will scan from the beginning video frame to the ending video frame of the plurality of video frames of the first video one by one. The first video includes N video frames, with the video frame index n (n is from 0 to N−1). The beginning video frame of N video frames has the video frame index equal to 0 and an ending video frame of N video frames has the video frame index equal to N−1. For example, the first video includes 10000 video frames and the video frame index n is from 0 to 9999.
Merely by way of example, the video frame with n=1000 is scanned by AI engine 124 as illustrated in
If the predetermined video frame information associated with the video frame with n=1000 is identified in the database, predetermined advertising materials 414c and 414d associated with the video frame with n=1000 will retrieved from the database.
AI engine 124 will automatically perform segmentation and extraction when one or more second target objects are identified in the scanned video frame (video frame with n=1000). For example, second target object is a human being. Two second target objects 512a and 512b are identified in the video frame with n=1000 by AI engine 124. AI engine 124 will perform segmentation on two second target objects 512a and 512b to obtain segmented second target objects 512c and 512d. AI engine 124 will then perform extraction to obtain target objects 512e and 512f from the video frame with n=1000.
Predetermined advertising materials 514a and 514b are arranged to be pasted to first target objects 510a and 510b (which is named as an AD pasted video frame with n=1000) by inserting transparent plain surface 418, based on the predetermined video frame information associated video frame with n=1000. Two second target objects 512e and 512f are pasted to the original location (where two second target objects are segmented and extracted) in the AD pasted video frame with n=1000 to form a processed video frame with n=1000.
AI engine 124 is configured to scan a next video frame and the video frame index is incremented by 1 (n=n+1). The same procedure will be implemented in upcoming video frames. When AI engine finishes scanning all video frames of the first video, advertising materials 414c and 414d are pasted to the first video. The first video will become a processed video and will be shown on processed video display box 307.
In one embodiment, for segmentation, AI engine 124 processes the first video by using a deep neural network. AI engine 124 is configured to collect pixels of second target objects 512a and 512b (both are human beings). Different deep neural networks may be used such as Mask RCNN, RVOS and Deeplabv3+. For example, Mask RCNN is used for segmentation. A core network in the Mask RCNN is the “restnet101” which includes 100 convolutional layers. The core network is pretrained by COCO dataset for segmenting 1000 different objects. For example, segmentation is configured to apply on human being. Thus, human being images are selected from the COCO dataset. Therefore, a total of 6000 human being images are selected, 5000 of which are used for training and 1000 of which are used for validation. Mask RCNN is retrained again by using these 6000 images. After the training, the deep neural network is configured to segment second target objects 512a and 512b. As illustrated in
Masked second target object 512c and 512d are used to obtain human being pixels from the video frame with n=1000. For example, the video frame with n=1000 is a 3-dimensional (3D) matrix F. The 1st (F1) and 2nd (F2) dimensions represent the height and width of the video frame with n=1000 respectively. The 3rd dimension (F3) represents color channels. Let F3 (0) denote the red channel (R), F3 (1) denote the green channel (G) and F3 (2) denote the blue channel (B).
Masked human beings 512c and 512d are represented by a single-channel matrix M. The height and width of Matrix M are identical to those of Matrix F. Human being pixels (for both second target object 512a and 512b) on Matrix F are extracted by using Matrix M. An output of a human being image (H) is obtained. The human being image (H) also includes 3 color channels (RGB). The extraction follows the formula below:
H(0)=F3(0)·M
H(1)=F3(1)·M
H(2)=F3(2)·M
In the formulas above, the multiple sign “·” represents multiplication of each element (pixel) between two Matrix F and Matrix M. The masked image pixel values are “1” and the unmasked pixel values are “0”. Pixel values of Matrix F multiply “1” to get the original values and pixel values of Matrix F multiply “0” to get “0”. Therefore, the human being image (H) displays the colorful second target objects 512e and 512f and black background as illustrated in
In one example, in order to mitigate the unwanted occlusion, second target objects 512i and 512j are extracted from Matrix P as illustrated in
B(0)=P3(0)·(1−M)
B(1)=P3(1)·(1−M)
B(2)=P3(2)·(1−M)
B represents a background image, which displays on the background of predetermined advertising materials 514a and 514b pasted video frame image (P). Masked second target object 512c and 512d will become black in this background image. The (1−M) operation reverses the pixel value for masked second target object 512c and 512d and becomes 512i and 512j as shown in
Second target objects 512k and 512l (i.e. the human being image represented by H) and the background (i.e. the background image presented by B) are then merged to get the final result video frame (R):
R=H+B
The formula above adds each corresponding element (pixel) from H and B. The R is the processed video frame with n=1000, shown in
In another example, as illustrated in
Turning now to
At Step 604, if the corresponding predetermined video frame information associated with the scanned video frame is not identified in database, Step 610 will be performed.
At Step 605, if the one or more second targets are not identified in the scanned video frame, Step 611 will be performed. Step 611 is the same as Step 608
At Step 603, if the scanned video frame is the ending video frame (n=N−1), Step 612 will be performed by ending the scanning process.
Turning to
At Step 704, If one or more second targets are not identified in the scanned video frame, AI engine 124 will determine whether one or more first targets are identified in the scanned video frame by the first trained deep neural network at Step 711. If one or more first targets (first target objects 150a and 150b) are identified in the scanned video frame, Step 712 will be performed and then Step 710 will be performed. Step 712 is the same as Step 708.
At Step 711, if one or more first targets are not identified in the scanned video frame, Step 710 will be performed.
At Step 703, if the scanned video frame is the ending video frame (n=N−1), Step 713 will be performed by ending the scanning process.
The disclosed and other embodiments, modules and the functional operations and modules described in this document can be implemented in digital electronic circuitry, or in computer software, firmware, or hardware, including the structures disclosed in this document and their structural equivalents, or in combinations of one or more of them. The disclosed and other embodiments can be implemented as one or more computer program products, i.e., one or more modules of computer program instructions encoded on a computer readable medium for execution by, or to control the operation of, data processing apparatus. The computer readable medium can be a machine-readable storage device, a machine-readable storage substrate, a memory device, a composition of matter effecting a machine-readable propagated signal, or a combination of one or more them. The term “data processing apparatus” encompasses all apparatus, devices, and machines for processing data, including by way of example a programmable processor, a computer, or multiple processors or computers. The apparatus can include, in addition to hardware, code that creates an execution environment for the computer program in question, e.g., code that constitutes processor firmware, a protocol stack, a database management system, an operating system, or a combination of one or more of them. A propagated signal is an artificially generated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode information for transmission to suitable receiver apparatus.
A computer program (also known as a program, software, software application, script, or code) can be written in any form of programming language, including compiled or interpreted languages, and it can be deployed in any form, including as a standalone program or as a module, component, subroutine, or other unit suitable for use in a computing environment. A computer program does not necessarily correspond to a file in a file system. A program can be stored in a portion of a file that holds other programs or data (e.g., one or more scripts stored in a markup language document), in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub programs, or portions of code). A computer program can be deployed to be executed on one computer or on multiple computers that are located at one site or distributed across multiple sites and interconnected by a communication network.
The processes and logic flows described in this document can be performed by one or more programmable processors executing one or more computer programs to perform functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application specific integrated circuit).
Processors suitable for the execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer. Generally, a processor will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a processor for performing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto optical disks, or optical disks. However, a computer need not have such devices. Computer readable media suitable for storing computer program instructions and data include all forms of nonvolatile memory, media and memory devices, including by way of example semiconductor memory devices, e.g., EPROM, EEPROM, and flash memory devices; magnetic disks, e.g., internal hard disks or removable disks; magneto optical disks; and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
While this document contains many specifics, these should not be construed as limitations on the scope of an invention that is claimed or of what may be claimed, but rather as descriptions of features specific to particular embodiments. Certain features that are described in this document in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable sub-combination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a sub-combination or a variation of a sub-combination. Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results.
Only a few examples and implementations are disclosed. Variations, modifications, and enhancements to the described examples and implementations and other implementations can be made based on what is disclosed.
This application is a continuation application of PCT/CN2021/078595, filed on Mar. 2, 2021, which claims the benefit of U.S. Provisional Application No. 62/991,498 filed on Mar. 18, 2020. The contents of the above-mentioned applications are all hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
62991498 | Mar 2020 | US |
Number | Date | Country | |
---|---|---|---|
Parent | PCT/CN2021/078595 | Mar 2021 | US |
Child | 17946552 | US |