Advances in computer technology and software have made possible the generation of richly featured augmented reality (AR) experiences for users. AR experiences can merge virtual objects or characters with real-world features in a way that can, in principle, provide a powerfully interactive experience. AR can further be used to extend content from displays into people's homes and personal environments.
However, while AR aligned with static elements (images, planes, and objects) is common, there is a need in the art for systems and methods designed to generate AR imagery that conforms to moving images, aligning the AR imagery both spatially and temporally with those moving images.
The following description contains specific information pertaining to implementations in the present disclosure. One skilled in the art will recognize that the present disclosure may be implemented in a manner different from that specifically discussed herein. The drawings in the present application and their accompanying detailed description are directed to merely exemplary implementations. Unless noted otherwise, like or corresponding elements among the figures may be indicated by like or corresponding reference numerals. Moreover, the drawings and illustrations in the present application are generally not to scale, and are not intended to correspond to actual relative dimensions.
The present application discloses systems and methods for providing augmented reality (AR) enhancement of moving images. It is noted that, as defined in the present application, the term “moving images” refers to imagery produced by playout of a sequence of video frames. Moreover, as defined herein, the term “anchor image” refers to an image serving as a two-dimensional (2D) image template upon which one or more AR effects may be overlaid, or from which one or more AR effects may extend into an environment in which a display screen displaying the anchor image is located. In various use cases, an anchor image may be a single video frame in its entirety, an image included in a portion of a single video frame that is less than the entire video frame, or to a sequence of multiple video frames. It is further noted that the AR enhancement solution disclosed in the present application may be implemented as automated systems and methods.
As used in the present application, the terms “automation,” “automated,” and “automating” refer to systems and processes that do not require the participation of a human administrator. Although in some implementations the AR enhancements provided by the systems and methods disclosed herein may be reviewed or even modified by a human editor or system administrator, that human involvement is optional. Thus, the methods described in the present application may be performed under the control of hardware processing components of the disclosed systems.
As further shown in
Although
Moreover, although the present application refers to software code 110 and one or both of AR effects generator 120 and AR effects database 122 as being stored in memory 106 for conceptual clarity, more generally, memory 106 may take the form of any computer-readable non-transitory storage medium. The expression “computer-readable non-transitory storage medium.” as defined in the present application, refers to any medium, excluding a carrier wave or other transitory signal that provides instructions to processing hardware 104 of AR device 102. Thus, a computer-readable non-transitory storage medium may correspond to various types of media, such as volatile media and non-volatile media, for example. Volatile media may include dynamic memory, such as dynamic random access memory (dynamic RAM), while non-volatile memory may include optical, magnetic, or electrostatic storage devices. Common forms of computer-readable non-transitory storage media include, for example, optical discs, RAM, programmable read-only memory (PROM), erasable PROM (EPROM), and FLASH memory.
Processing hardware 104 may include multiple hardware processing units, such as one or more central processing units, one or more graphics processing units, and one or more tensor processing units, one or more field-programmable gate arrays (FPGAs), custom hardware for machine-learning training or inferencing, and an application programming interface (API) server, for example. By way of definition, as used in the present application, the terms “central processing unit” (CPU). “graphics processing unit” (GPU), and “tensor processing unit” (TPU) have their customary meaning in the art. That is to say, a CPU includes an Arithmetic Logic Unit (ALU) for carrying out the arithmetic and logical operations of AR device 102, as well as a Control Unit (CU) for retrieving programs, such as software code 110, from memory 106, while a GPU may be implemented to reduce the processing overhead of the CPU by performing computationally intensive graphics or other processing tasks. A TPU is an application-specific integrated circuit (ASIC) configured specifically for artificial intelligence (AI) applications such as machine learning modeling.
As defined in the present application, the expression “machine learning model” may refer to a mathematical model for making future predictions based on patterns learned from samples of data or “training data.” Various learning algorithms can be used to map correlations between input data and output data. These correlations form the mathematical model that can be used to make future predictions on new input data. Such a predictive model may include one or more logistic regression models, Bayesian models, or neural networks (NNs). Moreover, a “deep neural network,” in the context of deep learning, may refer to a NN that utilizes multiple hidden layers between input and output layers, which may allow for learning based on features not explicitly defined in raw data.
Transceiver 128 of system 100 may be implemented as any suitable wireless communication unit. For example, transceiver 128 may be implemented as a fourth generation (4G) wireless transceiver, or as a 5G wireless transceiver. In addition, or alternatively, transceiver 128 may be configured for communications using one or more of Wireless Fidelity (Wi-Fi), Worldwide Interoperability for Microwave Access (WiMAX). Bluetooth. Bluetooth low energy, ZigBee, radio-frequency identification (RFID), near-field communication (NFC), and 60 GHz wireless communications methods.
Camera(s) 234a may include various types of cameras, such as one or more red-green-blue (RGB) still image cameras, video cameras. RGB-D cameras that include a depth sensor, infrared (IR) cameras, or combinations thereof to name a few examples. P/L sensor(s) 234f may include one or more accelerometers, one or more gyroscopes, a Global Positioning System (GPS) receiver, a magnetometer, or any combination of such features, for example. In some implementations, P/L sensor(s) 234f may be implemented as an inertial measurement unit (IMU).
Input unit 230 corresponds in general to input unit 130, in
Output unit 240 corresponds in general to output unit 140, in
The functionality of media enhancement system 100 will be further described by reference to
Referring to
In addition or alternatively, action 361 may include using one or more of lidar detector 234b. OR module 234e, P/L sensor(s) 234f, and microphone(s) 235 to provide location data 124 for use in determining a position of display screen 154 in relation to AR device 102, such as a position including one or more of an x, y, or z location coordinate of display screen 154 in relation to AR device 102. Moreover, where location data 124 includes audio data obtained by microphone(s) 235 as a result of monitoring media content 152, location data 124 may further include microphone metadata describing the angle of arrival of sound at microphone(s) 235. Action 361 may be performed by software code 110, executed by processing hardware 104 of AR device 102, and using features of input unit 130/230 as noted above.
Continuing to refer to
Playhead data 156 indicates the present playback state of media playout device 150, such as play, pause, fast forward, or rewind, for example, and may further indicate a timestamp or frame number of a presently displayed moving image of the sequence of moving images displayed on display screen 154. In addition, in some implementations, playhead data 156 may include one or more of a variety of playback parameters, such as audio level and including display screen parameters such as hue, saturation, brightness, contrast, and tint, for example. Playhead data 156 may be received in action 362 by software code 110, executed by processing hardware 104 of AR device 102, and using one of transceiver 128 or microphone(s) 235. It is noted that although flowchart 360 lists action 362 as following action 361, that representation is merely exemplary. In various implementations, action 362 may precede action 361, may follow action 361, or may be performed in parallel with action 361, i.e., contemporaneously with action 361.
Continuing to refer to
As further shown by diagram 470b, where a scene including multiple moving images includes video frames that are partially-static, i.e., some elements change significantly from video frame to video frame but include some imagery that is mostly static from video frame to video frame, that static imagery portion of any one of the video frames within that scene may serve as anchor image 472b. That is to say, anchor image 472b includes only the tree portion of video frame 471.
As yet further shown by diagram 470c, where a scene including multiple moving images includes video frames that are dynamic, i.e., video frames including imagery that changes substantially from video frame to video frame, a subset of multiple video frames, portions of video frames, or both, may serve as anchor set of images 474. With respect to the expression “imagery that changes substantially from video frame to video frame.” that expression refers to change of the composition as a whole of the imagery from frame-to-frame. In diagram 470c, for example, the boat changes location from frame right, to frame center, to frame left, while other features, such as a tree, umbrella, and chair move and appear or disappear at different timestamps.
In some implementations, the one or more anchor images detected in action 363 may be manually predetermined. However, in other implementations, the one or more anchor images detected in action 363 may be detected algorithmically.
According to the process depicted in diagram 500A, a calibration step is first performed in which a known image is displayed on display screen 554 for calibration. The calibration image may be detected using AR, and its area may be determined. Then with the help of AR, a flat vertical plane 555 can be located on the surface of display screen 554 (hereinafter “surface plane 555”). It is noted that although flat vertical plane is shown to be a hexagon in
After the calibration image and surface plane 555 of display screen 554 have been detected, known points and positions from the calibration image can be mapped as anchor images 572b onto surface plane 555 of display screen 554. Once anchor images 572b are in place, the calibration image can be removed, and media content 552 can be displayed. Anchor images 572b on surface plane 555 can be used to place AR effects 590 around or over display screen 554.
It is noted that although diagram 500B depicts media content 552 being shifted to the top of display screen 554, and using double-wide area 582 at the bottom of display screen 554 for content matching designs 584, that representation is merely by way of example. In other implementations, media content 552 may be shifted to the bottom of display screen 554, and double-wide area 582 for content matching designs 584 may be at the top of display screen 554.
Regarding the implementation shown in
It is noted that although diagram 500C depicts media content 552 being shifted to the right on display screen 554, and using double-wide region 588 on the left of display screen 554 for content matching designs 584, that representation is merely by way of example. In other implementations, media content 552 may be shifted to the left of display screen 554, and double-wide region 588 for content matching designs 584 may be on the right of display screen 554. It is further noted that one advantage to the approaches shown by
Referring once again to
Flowchart 360 further includes obtaining, using the one or more anchor image(s) detected in action 363, one or more AR effect(s) 190 associated with the one or more anchor image(s) (action 364). Referring to
Referring to
Media content 652, display screen 654, and AR effect(s) 690 correspond respectively to media content 152, display screen 154, and AR effect(s) 190, in
It is noted that AR effect(s) 190/690 are spatially aligned with the sequence of moving images being displayed on display screen 154/654 such that river 653 appears to generate waterfall 692 and plunge pool 694. It is further noted that AR effect(s) 190/690 are temporally aligned with the sequence of moving images being displayed on display screen 154/654 such that the flow rate of river 653 appears to correspond to the volume of water falling into plunge pool 694. Furthermore. AR effect(s) 190/690 are temporally aligned with the sequence of moving images being displayed on display screen 154/654 in that AR effect(s) 190/690 appear and disappear contemporaneously with river 653 to which they correspond.
In some implementations, the method outlined by flowchart 360 may conclude with action 365 described above. However, in other implementations, processing hardware 104 of AR device 102 may further execute software code 110 to generate one or more audio effects corresponding to AR effect(s) 190/690, one or more haptic effects corresponding to AR effect(s) 190/690, or one or more audio effects and one or more haptic effects corresponding to AR effect(s) 190/690. In those implementations, the method outlined by flowchart 360 may further include, rendering, by software code 110 executed by processing hardware 104, while rendering AR effect(s) 190/690 on display 242/642 of AR device 102, the one or more audio effects using audio speaker(s) 244, or the one or more haptic effects using haptic actuator(s) 248, or the one or more audio effects using audio speaker(s) 244 and the one or more haptic effects using haptic actuator(s) 248. Alternatively, or in addition, processing hardware 104 of AR device 102 may further execute software code 110 to detect one or more Internet of Things (IoT) connected devices in the environment in which display screen 154/654 is located, and may activate those one or more IoT connected devices to produce ambient effects, such as lighting, temperature, aromas, and the like, to further enhance media content 152/652 while AR effect(s) 190/690 are being rendered.
With respect to the method outlined by flowchart 360, it is emphasized that actions 361, 362, 363, 364, and 365 may be performed as an automated method.
Thus, as described above, the present application discloses systems and methods for providing AR enhancement of moving images. From the above description it is manifest that various techniques can be used for implementing the concepts described in the present application without departing from the scope of those concepts. Moreover, while the concepts have been described with specific reference to certain implementations, a person of ordinary skill in the art would recognize that changes can be made in form and detail without departing from the scope of those concepts. As such, the described implementations are to be considered in all respects as illustrative and not restrictive. It should also be understood that the present application is not limited to the particular implementations described herein, but many rearrangements, modifications, and substitutions are possible without departing from the scope of the present disclosure.
| Number | Name | Date | Kind |
|---|---|---|---|
| 5970504 | Abe | Oct 1999 | A |
| 9355617 | Izumihara et al. | May 2016 | B2 |
| 10075758 | Ayers | Sep 2018 | B2 |
| 10078917 | Gaeta et al. | Sep 2018 | B1 |
| 11238656 | Lin | Feb 2022 | B1 |
| 20020089530 | Markel | Jul 2002 | A1 |
| 20090024617 | Cope | Jan 2009 | A1 |
| 20100053164 | Imai et al. | Mar 2010 | A1 |
| 20110285704 | Takeda et al. | Nov 2011 | A1 |
| 20120127284 | Bar-Zeev et al. | May 2012 | A1 |
| 20160163063 | Ashman | Jun 2016 | A1 |
| 20170201808 | Chowdhary et al. | Jul 2017 | A1 |
| 20180191769 | Anderson | Jul 2018 | A1 |
| 20180367835 | Hamidi-Rad et al. | Dec 2018 | A1 |
| 20190222776 | Carter et al. | Jul 2019 | A1 |
| 20200035025 | Crocker et al. | Jan 2020 | A1 |
| 20200201514 | Murphy | Jun 2020 | A1 |
| 20200219294 | Badr | Jul 2020 | A1 |
| 20200260154 | Comito | Aug 2020 | A1 |
| 20200294313 | Arroyo Palacios et al. | Sep 2020 | A1 |
| 20200342231 | Richter et al. | Oct 2020 | A1 |
| 20200364937 | Selbrede | Nov 2020 | A1 |
| 20210096638 | O'Hern et al. | Apr 2021 | A1 |
| 20210150212 | Richey | May 2021 | A1 |
| 20210248669 | Wade | Aug 2021 | A1 |
| 20210263587 | Zhu | Aug 2021 | A1 |
| 20210326594 | Costello et al. | Oct 2021 | A1 |
| 20210342590 | Choi et al. | Nov 2021 | A1 |
| 20220005322 | Pilnock et al. | Jan 2022 | A1 |
| 20230206568 | Du et al. | Jun 2023 | A1 |
| 20230316683 | Hare et al. | Oct 2023 | A1 |
| Number | Date | Country |
|---|---|---|
| 112738540 | Apr 2021 | CN |
| 3121688 | Jan 2017 | EP |
| 3386204 | Oct 2018 | EP |
| WO-2015192117 | Dec 2015 | WO |
| WO 2016001908 | Jan 2016 | WO |
| WO 2021085130 | May 2021 | WO |
| Entry |
|---|
| International Search Report and Written Opinion dated Sep. 26, 2023 for International Application PCT/US2023/025471. |
| International Search Report and Written Opinion dated Sep. 18, 2023 for International Application PCT/US2023/025477. |
| International Search Report for PCT/US/2023/025476 dated Sep. 4, 2023. |
| File History of U.S. Appl. No. 17/887,742, filed Aug. 15, 2022. |
| File History of U.S. Appl. No. 17/887,754, filed Aug. 15, 2022. |
| Number | Date | Country | |
|---|---|---|---|
| 20240054689 A1 | Feb 2024 | US |