Virtual reality (VR) 360° video content allows a user to turn his/her head around or change his/her eye gaze direction to view content from different directions, which yields an immersive experience. However, in such virtual environments, users are purely observers and have no impact on the linear story flow. This greatly limits the immersive experience.
Traditional video technologies offer some limited possibilities for non-linear storytelling. For example, such technologies may pause a video and show a menu at a predetermined transition/decision point. In such a scenario, the user may provide an input selection, which may determine the next video clip.
Systems and methods disclosed herein relate to structures for non-linear storytelling using 360° virtual reality (VR) video content. Such systems and methods may be additionally or alternatively applied to video content with an arbitrary field of view (e.g., 180° VR video content). Non-linear storytelling structures may incorporate various user interactions within a VR environment. As such, the systems and methods described herein may provide users with an ability to choose different virtual reality story paths dynamically and seamlessly.
In an aspect, a virtual reality system is provided. The virtual reality system includes a media server that hosts and serves media data via a network. The virtual reality system includes a sensor configured to detect a user input and a display. The virtual reality system also includes a media player configured to execute instructions stored in memory so as to carry out operations. The operations include loading a nonlinear video structure from a media server via a network. The nonlinear video structure includes a plurality of uniform resource identifiers. Each uniform resource identifier is associated with a respective video trunk. The nonlinear video structure includes an arrangement of respective video trunks coupled by at least one transition trunk. The operations also include determining an initial playlist based on the nonlinear video structure, streaming the initial playlist from the media server, and rendering video frames for displaying via the display. The operations further include, while loading the at least one transition trunk, receiving the user input and determining a next playlist based on the received user input. The operations also include streaming the next playlist from the media server.
In an aspect, a method is provided. The method includes loading a nonlinear video structure. The nonlinear video structure includes a plurality of uniform resource identifiers. Each uniform resource identifier is associated with a respective video trunk. The nonlinear video structure includes an arrangement of respective video trunks coupled by at least one transition trunk. The method includes determining an initial playlist based on the nonlinear video structure, streaming the initial playlist from a media server via network, and rendering video images associated with the initial playlist for displaying via a display. The method also includes, while loading the at least one transition trunk, receiving a user input via a user interface and determining a next playlist based on the received user input. The method yet further includes streaming the next playlist from the media server.
In an aspect, a method is provided. The method includes loading a nonlinear video structure. The nonlinear video structure includes a plurality of uniform resource identifiers. Each uniform resource identifier is associated with a respective video trunk. The nonlinear video structure includes an arrangement of respective video trunks coupled by at least one transition trunk. The method also includes determining an initial playlist based on the nonlinear video structure, streaming the initial playlist from a media server via a network, and rendering video images associated with the initial playlist for displaying via a display. The method yet further includes, when playback is within a predetermined amount of time from an end of a currently-playing stream, loading all video trunks corresponding with possible next playlists based on the non-linear video structure. The method also includes receiving a user input via a user interface and selecting a proper next playlist based on the received user input. The method yet further includes streaming the proper next playlist from the media server.
In an aspect, a system is provided. The system includes various means for carrying out the operations of the other respective aspects described herein.
These as well as other embodiments, aspects, advantages, and alternatives will become apparent to those of ordinary skill in the art by reading the following detailed description, with reference where appropriate to the accompanying drawings. Further, it should be understood that this summary and other descriptions and figures provided herein are intended to illustrate embodiments by way of example only and, as such, that numerous variations are possible. For instance, structural elements and process steps can be rearranged, combined, distributed, eliminated, or otherwise changed, while remaining within the scope of the embodiments as claimed.
Example methods, devices, and systems are described herein. It should be understood that the words “example” and “exemplary” are used herein to mean “serving as an example, instance, or illustration.” Any embodiment or feature described herein as being an “example” or “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments or features. Other embodiments can be utilized, and other changes can be made, without departing from the scope of the subject matter presented herein.
Thus, the example embodiments described herein are not meant to be limiting. Aspects of the present disclosure, as generally described herein, and illustrated in the figures, can be arranged, substituted, combined, separated, and designed in a wide variety of different configurations, all of which are contemplated herein.
Further, unless context suggests otherwise, the features illustrated in each of the figures may be used in combination with one another. Thus, the figures should be generally viewed as component aspects of one or more overall embodiments, with the understanding that not all illustrated features are necessary for each embodiment.
I. Videos with Linear Storytelling
II. Videos with Non-Linear Storytelling
Although not illustrated in
In an example embodiment, the transition trunks (#3, #11, #14, #15) may be shared by multiple playlists. This sharing may provide for smooth transitions as switching between playlists may be performed while playing back the transition trunk. That is, a prior playlist need not play to completion before transition to a subsequent playlist. Rather, the prior and subsequent playlists may be synchronized via a global time clock, such as a video streaming presentation time stamp (PTS). Under such a scenario, the prior playlist may stop playing (even during playback of the transition trunk) once the subsequent playlist begins synchronized playback of the remaining portion of the transition trunk.
In another embodiment, the playlists need not include shared transition trunks. In such a scenario, a pause may be provided (or may be necessary) before switching to a new playlist. Additionally or alternatively, a device may pre-fetch trunks in all possible paths to provide a smoother transition.
In some embodiments, unneeded video trunks may be partially or completely deleted from memory when not needed (e.g., the user interaction leads to a different video trunk being selected). As such, by causing the media player to handle a small number of video trunks at any given time, computing resources may be conserved and utilized more efficiently.
In another embodiment, the video need not be cut into small (short time segment) trunks. Instead, each playlist above may include an individual (discrete) piece of video. Furthermore, which
A. Implicit User Input
In an embodiment, users may provide one or more implicit inputs before and/or during playback of a transition point/video trunk. A determination of which subsequent playlist to play may be based on the implicit user input(s). In an example embodiment, an implicit input may be determined based on tracking where a user is looking (e.g., via head- and/or eye-tracking methods and systems) and other known information about the user. While the user is immersed in a virtual reality environment, a virtual reality application may be configured to track movements and/or an orientation of the user's head. By tracking a user's gaze and/or head position, the VR application may determine which story path (e.g., which subsequent playlist) should be selected.
For example, in a non-linear VR video that simulates driving on New York City streets, a user may approach a 3-way intersection. A road to the left may lead to the Financial District (e.g., Wall Street) and a road to the right may lead to the Brooklyn Bridge. A decision can be made automatically based on a user's historical behavior and/or preferences. For example, if the user has viewed primarily financial-related buildings in the past few minutes, then continue the video of a tour of the Financial District; otherwise, continue with a video of driving over the Brooklyn Bridge.
These decisions may also be made according to user profiles, which may be associated to a preexisting user account, generated upon first use, and adjusted based on user interactions. Decisions could additionally or alternatively be made based on anonymous user statistics gathered from other similar or related users.
Implicit user input may provide a better user experience because an optimal path is automatically chosen on behalf of the user and direct action is not needed in some or all cases.
B. Explicit User Input
In another embodiment, a user's explicit input may be used to determine a subsequent playlist for playback. Many different types of explicit user inputs are contemplated, some of which may include:
In an embodiment, a choice may be made and/or recognized while a video trunk corresponding with a transition point continues playing. In another embodiment, video playback may be paused until a choice is made.
While embodiments herein may utilize implicit or explicit user interactions to determine a next video trunk to play, some embodiments may utilize a hybrid system of user interactions. For example, a machine learning algorithm could include determining an implicit user interaction from which the next video trunk may be determined. Subsequently, an explicit user interaction may be received, which may provide “training” to the system. Over time, and/or over a series of implicit and explicit user interactions, the system and method may become more attuned to a given user or decision-making scenario, which may provide a more intuitive, user-friendly, user experience.
C. Choice Hints to Users
Optionally, when a user is approaching a transition point, a “hint” or another type of indication may be provided to the user. In one embodiment, one or more visual indicators may be displayed on a user display. For example, in the VR driving scenario, directional arrows may be superimposed over the video images at the 3-way intersection to indicate possible directions of travel or choices. Alternatively or additionally, a menu may be displayed. Note that the video may be, but need not be, paused while such visual indications are being provided.
In another embodiment, such “hints” may take the form of voice prompts, text, haptic feedback, audio chime, dimmed/brightened display, defocused/hazy display, etc.
D. Feedback to User Input
Furthermore, feedback may optionally be provided to the user when a subsequent video stream is selected based on user input. Possible user feedback may include visual cues, audio cues, text display, haptic feedback, or another form of feedback.
Note that the present disclosure relates to interactive, streaming nonlinear VR video content. Such content is distinct from traditional video games, where all contents are pre-stored and/or rendered locally. In the present disclosure, video streams for each possible branch from a given transition point may be pre-fetched prior to the transition point. Furthermore, unneeded video content may be deleted from memory based on user interactions that select other video content for rendering/display.
In an example embodiment, a media server may be hosted on one or more cloud computing networks. In such a scenario, the media server may host all playlists, video trunks, and video structure metadata. The media server may also serve these data to a client media player via network based on a client request. The client media player may exist as software, firmware, and/or hardware on mobile phones or other virtual reality client devices. The client media player may receive information indicative of a user input or user behavior. That is, the client media player may detect and respond to user behaviors. In an example embodiment, the client media player may request proper data based on the user input or user behavior. The methods described herein may be carried out fully, or in part, by the client media player.
In one embodiment, the non-linear video stream representation may include shared video trunks that correspond with the transition points as illustrated in
1. Pre-load the nonlinear video structure.
2. Start to stream the initial playlist.
3. Whenever a transition trunk starts to load, determine the next playlist, based on user interactions.
4. Continue to stream the next playlist.
5. Go to Step 3.
In another embodiment, there is no shared transition trunk between connecting playlists, or there are very few shared transition trunks. In such scenarios, the following process may be utilized:
1. Pre-load the nonlinear video structure.
2. Start to stream the initial playlist.
3. When streaming is within a predetermined time to the end of stream (say within m seconds), start to pre-load trunks from each of the possible next playlists.
4. When user input is determined, select the proper next playlist, and discard trunk information for all other playlists.
5. Continue to stream the next playlist.
6. Go to Step 3.
It is understood that the systems and methods described herein may be applied to augmented reality (AR) scenarios as well as VR scenarios. That is, the video images presently described may be superimposed over a live direct or indirect view of a physical, real-world environment. Furthermore, although embodiments herein describe 360° virtual reality video content, it is understood that video content corresponding to smaller portions of a viewing sphere may be used within the context of the present disclosure.
The particular arrangements shown in the Figures should not be viewed as limiting. It should be understood that other embodiments may include more or less of each element shown in a given Figure. Further, some of the illustrated elements may be combined or omitted. Yet further, an illustrative embodiment may include elements that are not illustrated in the Figures.
A step or block that represents a processing of information can correspond to circuitry that can be configured to perform the specific logical functions of a herein-described method or technique. Alternatively or additionally, a step or block that represents a processing of information can correspond to a module, a segment, or a portion of program code (including related data). The program code can include one or more instructions executable by a processor for implementing specific logical functions or actions in the method or technique. The program code and/or related data can be stored on any type of computer readable medium such as a storage device including a disk, hard drive, or other storage medium.
The computer readable medium can also include non-transitory computer readable media such as computer-readable media that store data for short periods of time like register memory, processor cache, and random access memory (RAM). The computer readable media can also include non-transitory computer readable media that store program code and/or data for longer periods of time. Thus, the computer readable media may include secondary or persistent long term storage, like read only memory (ROM), optical or magnetic disks, compact-disc read only memory (CD-ROM), for example. The computer readable media can also be any other volatile or non-volatile storage systems. A computer readable medium can be considered a computer readable storage medium, for example, or a tangible storage device.
While various examples and embodiments have been disclosed, other examples and embodiments will be apparent to those skilled in the art. The various disclosed examples and embodiments are for purposes of illustration and are not intended to be limiting, with the true scope being indicated by the following claims.
This application claims priority to U.S. Provisional Patent Application No. 62/375,710 filed Aug. 16, 2016, the contents of which are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
62375710 | Aug 2016 | US |