This invention relates to producing and adapting video.
It is possible to adapt a video stream to change some of the content in it. For example, it is known to identify an object such as a billboard in a video stream, and replace what appears to be displayed on that object. The object can be identified manually or automatically. To make the object reliably identifiable it is known to ensure it is in a predetermined colour, most conventionally green. One use of this technology is to allow a video stream to contain advertisements that are targeted to a specific viewer or are up-to-date for the time when the video stream is played out. Another use is to modify a story being portrayed by the video: for example words in a book displayed in the video could be adapted to be in a language suitable for a specific viewer or set of viewers or they could be adapted to give a different message that changes the meaning of the video.
There are several difficulties with implementing this technology. Taking the example of a billboard, a suitable billboard for adaptation must first be identified in the original video stream. Then, in order for any new information that is to appear to be displayed on the billboard to look realistic, the position, size and distortion of that information must over time match changes in the position and angle of the camera that originally captured the video. Typically these adjustments are done manually, which is time-consuming. In addition, it may be difficult for the person who originally makes the video to include suitable opportunities to adapt the video content.
There is a need for an improved way of producing and adapting video.
According to one aspect there is provided a system for capturing a video stream, the system comprising: a camera; and an encoding device configured to store video captured by the camera together with metadata indicating a location in the video at which a predesignated substitution object appears.
The metadata may indicate times during the video at which the substitution object appears.
The metadata may indicate regions of the video occupied by the substitution object over time.
The metadata may indicate a size and shape of the substitution object.
The metadata may indicate one or more characteristics of a lens of the camera at one or more times when the substitution object appears in the video.
The metadata may indicate one or more colour characteristics of the video at one or more times when the substitution object appears in the video.
The system may comprise an input device whereby a user can input at least some of the metadata to the system.
According to a second aspect there is provided a system for processing video to replace substitutable content in the video with alternative content, the system comprising a processor configured to: process metadata associated with the video to identify a region in the video in which the substitutable content appears; select, in dependence on the metadata, an item of alternative content from a datastore storing alternative content; and process the video to replace regions of the video defined by the metadata with substituted content formed in dependence on the alternative content.
The metadata may indicate a pose of a camera that captured the video at a time when the substitutable content appears in the video. The processor may be configured to spatially distort the alternative content in dependence on the indicated pose to form the substituted content.
The metadata may indicate one or more characteristics of a lens of the camera at a time when the substitutable content object appears in the video. The processor may be configured to spatially distort the alternative content in dependence on the indicated lens characteristics to form the substituted content.
The metadata may indicate one or more colour characteristics of the video at a time when the substitutable content appears in the video. The processor may be configured to chromatically distort the alternative content in dependence on the indicated lens characteristics to form the substituted content.
According to a third aspect there is provided a method for playing out a video stream, the method comprising: forming a first video stream for playout, the first video stream depicting at least one space for substitution by an overlay; forming a second video stream for playout, the second video stream having an omission corresponding to the first video stream; playing out the second video stream; stopping playout of the second video stream at the omission; subsequently, playing out the first video stream with the space substituted by an overlay; subsequently playing out a further portion of the second video stream.
The method may comprise storing video captured by a camera together with metadata indicating a location in the video at which a predesignated substitution object appears.
According to a fourth aspect there is provided a method for processing a video stream to replace substitutable content in the video stream with alternative content, the method comprising: processing metadata associated with the video stream to identify a region in the video stream in which the substitutable content appears; selecting, in dependence on the metadata, an item of alternative content from a datastore storing alternative content; and processing the video stream to replace regions of the video stream defined by the metadata with substituted content formed in dependence on the alternative content.
The method may comprise processing the video stream to determine whether the video stream contains data indicating that it complies with one or more standard formats, and replacing regions of the video stream as set out above only if the video stream contains such data.
Any processor may be constituted by a single CPU or may be distributed between multiple CPUs, which may be located together or at different locations.
Apparatus may be provided for implementing the methods set out above. The methods may be implemented by one or more suitably programmed computers.
According to a fifth aspect there is provided a method for processing a video stream to replace substitutable content in the video stream with alternative content, the method comprising: processing the video stream using a computer programmed to implement an image recognition algorithm to identify in the video stream depictions of an environment having a propensity to contain one or more predetermined objects; retrieving from a data store a model of one of the predetermined objects; and processing the video stream to replace regions of the video stream depicting the identified environment with substituted content formed in dependence on the retrieved model.
The present invention will now be described by way of example with reference to the accompanying drawings.
In the drawings:
In the system of
The playout system may play out the original video as captured by the camera, or it may play out an adapted version of the original video. An adapted version of the video may be adapted in numerous ways. One example will be described for illustration. The end-user device 6 transmits context information to the playout system over a channel 8. The context information represents the context of the user device 6: for example its location or information about the past behaviour of a user device, for example in the form of cookies. The playout system has a processor 8 and a memory 9 which stores in non-transient form code for execution by the processor 8 to cause it to make the playout system function as described herein. The playout system 8 has access to an advertisement database 7 which stores a series of advertisements. In dependence on the context information received from the user and/or on other information which can be stored in database 7, such as indications of which of the advertisements are suitable for inclusion in a specific video stream and which advertisements are to be prioritised for inclusion (which may be dependent on the level of bids from potential advertisers) the playout system selects an advertisement for inclusion in the video stream as played out to the user of device 6. The playout system retrieves that advertisement from database 7. A region 12 of the scene in the video has been reserved for placement of advertisements. The playout system forms an adapted video which is based on the originally captured video but in which the selected advertisement has been placed in the part of the video corresponding to region 12. The way in which this is done will be described in more detail below. Then the adapted video is played out to the device 6 for presentation there to the user. In this way, the user receives a customised advertisement. The advertisement is integrated into the video so that it appears to have been present when the video was originally shot. The same approach may be used to adapt visual elements in the video for different languages (e.g. by changing text to a language suitable for the user as indicated by the context data) or to provide different storylines.
Other information may be used to select an advertisement for playout in a specific slot. For example, the advertisement may be selected such that its principal or highlight colour matches the colour of a prominent object depicted in the video alongside the overlain advertisement. Or the advertisement may be selected such that the character of its brand matches a characteristic of such a prominent object.
An advertisement may communicate branding or marketing information or may communicate other information such as educational information, public service information or equipment test information. An advertisement may take the form of a still image or a video segment. An advertisement could be amplified through supporting exposures on the same screen: e.g. corner bugs, scrollers or squeezebacks or watermarks such as audio codes.
The video may be stored in a compressed and/or video encoded format. To overlay the advertisement or other replacement content on the video, the video may be decompressed and/or decoded to yield a series of video frames or part-frames. The frames or part-frames that are to display the replacement content are adapted by overlaying that content on the respective frames or part-frames. Then the video may be re-compressed and/or re-encoded and stored and/or transmitted to the end-user device.
When the replacement content is overlain on the video, it is preferable that this is done in a way that causes the replacement content to appear as if it was originally present when the video was shot. To achieve this, the replacement content can be distorted (e.g. by one or more of hue adjustment, brightness adjustment, contrast adjustment, scaling, trapezoidal transformation, rotation, barrel transformation and pincushion transformation) to match any changes in the video resulting from motion of the camera when the video was captured, lens distortion etc. Mechanisms to achieve this will be discussed further below.
The camera 2 provides a feed of captured video to preview unit 20. A display 21 is provided to allow the captured video to be viewed. The display 21 could be integrated with the camera, to allow an operator of the camera to see the images on the display whilst capturing the video. The camera is equipped with a monitoring unit 23. The monitoring unit determines one or more of (i) the position of the camera relative to the scene 1, (ii) the direction of the field of view of the camera relative to the scene 1, (iii) the optical state of the camera. The optical state of the camera could include one or more of the focal length of the lens being used by the camera, the aperture of that lens, the make of that lens, the model of that lens and colour parameters being used by the camera (e.g. white balance or colour space). The monitoring unit provides that information to the preview unit 20.
The preview unit comprises a processor 24 and a memory 25. The memory stores in a non-transient way code executable by the processor 24 to cause the preview unit to execute the functions described herein. The preview unit receives information from an input device, such as console 26, indicating one or more spaces in the scene that are—like space 12—to be allocated for the addition of information during post-processing by adapting the captured video. The preview unit may also receive from the input device an indication of what information is to be added at a space: e.g. an image of a billboard, an image of a bus shelter or an image of a delivery van. The images may be captured by cameras or may be computer-generated. The preview unit receives the captured video from the camera 23 and forms an preview video stream in which the captured video has been adapted to show the designated type of object, or a neutral pattern such as cross-hatching, at the designated space. The inserted object or pattern is referred to as an overlay. That preview video stream is provided to display 21. In that way, an operator at the video capture facility can gain an impression of the scene as it will appear once the captured video has been adapted at the playout system 4. This can help the operator to compose the captured video stream.
When the preview unit adapts the captured video in this way it may do so in dependence on the information received from the monitoring unit. The preview unit determines the scale, position, distortion, colour and angle of the inserted content in dependence on the information received from the monitoring unit. For example:
The camera 2 and/or the preview unit and/or another unit associated with the capture system store with the video information about the timing(s) at which a space 12 appears in the video stream, the position(s) at which it appears and any desired additional information such as, for each relevant point in the video: the pose/direction of the camera, the lens being used, its focal length, the white balance being used. The type of space 12 may also be stored, for example indicating whether it can readily represent a billboard, a van, a bus shelter or another substitute entity. Because this information is stored with the video, the post-production system that is to replace space 12 with alternative content can readily find places in the video where content can be replaced, readily select replacement content for such a space and readily replace the content in a way that allows it to appear as if it was present in the originally shot video. Meta data may be added, for example time of capture and location of capture.
The preview unit transmits the captured video stream to the storage unit 3, using connection 22. The information as transmitted includes:
In order to form an overlay that represents a substitute object, the preview unit may store images of a range of objects in memory 25. It may then transform the shape and colour of a selected stored image and superimpose it on the captured video to form the adapted video stream.
The operator capturing the video or setting up scene 1 may be provided with guidelines indicating preferred positions for space 12. These guidelines may be selected so as to allow flexibility in the positioning of space 12, to make it easy to define spaces such as space 12 for a sufficient proportion of the length of the video stream that is being formed or to make the spaces suitable for adaptation to include desired content such as advertisements. The guidelines may provide recommendations as to one or more of the following aspects of how a substitutable space 12 may appear in a video:
The playout system 4 has access to advertisements stored in database 7 and also, if the video stream transmitted from preview unit 20 does not incorporate the overlays, images of suitable overlay objects. The overlay objects that are available and/or used may depend on the application of the system. For example:
Other uses of the system are possible.
When the playout system 4 is processing the video stream for playout to device 6 it performs the following steps:
It will be appreciated that the steps above can be applied to overlay content other than advertisements.
At the editing stage, a computer (which may be a distributed computer) forming part of an editing suite may process a video stream to prepare it for the overlay of advertisements (e.g. by inserting objects in the video stream), or to insert overlay in the stream. Prior to doing so, the computer may process the video stream to assess whether the video stream is suitable for this processing. That may involve checking whether the video stream contains predetermined metadata or indicia that indicate it complies with one or more standardised formats making it readily possible to process the stream in that way.
One example workflow may proceed as follows, and is illustrated in
This approach allows a video stream to be played out using current streaming schemas (with a played-out video being paused for the insertion of advertisements at appropriate times) but with the content that is provided in advertising breaks being integral with the principal content. This can increase the engagement and enjoyment of viewers. Unlike existing systems and methods that stream personalised video to a user, whereby the advertisements or other personalised content is in addition to the original intended video, the present idea allows the personalisation to be part of the intended video, rather than an addition thereto. As a result, the duration of the gap between the stopping and restarting of the second video stream is always constant, no mater what personalisation has been added, due to the use of the originally created scene(s) being the base for the overlain video stream. Thus the personalised video stream is always the same duration irrespective of the advertisement added. Furthermore, as the overlain scene E has come from the original video stream 40, such that the durations of the omission and the overlain video stream (E1-E4) are the same, there is no need for any handshake at the end of overlain video stream (E1-E4).
Furthermore, by using original scene(s), the transition between the distribution video stream 41 and the overlain video stream (E1-E4), i.e. between D and E in
The selection of the appropriate content to apply as an overlay is described above. In any of those methods, some identification information is required to determine what content should be applied for a given end user. This information may come from the user directly, for example as a signal from the end device 6, or may come from data held by the distributor of one or more of the video streams, or may come from an Internet Service Provider or other company hosting the cloud 5 or physical devices into which the video streams are provided and/or combined prior to onward transmission to the end user.
When a video stream is to be processed to bear an overlay, one option is for the original video stream to include a space depicting an object (e.g. a bus shelter), and for the overlay to be placed over that object as it appears in the video stream. A second option is for the original video stream not to depict such an object but to incorporate a visible space dedicated for insertion of an object of that type or of a predetermined set of types. For example, a flat area of ground may be left unoccupied by actors so that a bus shelter, a van or an advertising hoarding may be inserted there and overlain, e.g. by advertisements. A third option is to analyse the original video stream to identify spaces suitable for the insertion of objects. This may be done manually or automatically by a computer implementing image analysis software. The object(s) to be inserted may be selected manually or automatically by a computer implementing image analysis software. The object(s) to be inserted may be selected automatically in dependence on the environment depicted in the video stream. For example if the video stream depicts a highway then objects such as a bus shelter or a van, which might normally be expected to be seen in such an environment may be selected for insertion. This selection can be done by suitably trained machine learning software. When an object has been selected, an image or model of such an object can be retrieved from a database or library of objects, e.g. as a three-dimensional mode. Then the object may be inserted as depicted in that model.
In examples given above, the original video stream is captured by a camera. The original video stream may be a computer-generated stream, or may be a combination of a real stream captured by a camera and a computer-generated stream (formed e.g. by rotoscoping). The video stream may be a conventional (2D) video stream or a 3D and/or virtual reality video stream.
The applicant hereby discloses in isolation each individual feature described herein and any combination of two or more such features, to the extent that such features or combinations are capable of being carried out based on the present specification as a whole in the light of the common general knowledge of a person skilled in the art, irrespective of whether such features or combinations of features solve any problems disclosed herein, and without limitation to the scope of the claims. The applicant indicates that aspects of the present invention may consist of any such individual feature or combination of features. In view of the foregoing description it will be evident to a person skilled in the art that various modifications may be made within the scope of the invention.
Number | Date | Country | Kind |
---|---|---|---|
2004965.6 | Apr 2020 | GB | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/GB2021/050826 | 4/1/2021 | WO |