The present disclosure generally relates to audio processing and more particularly, to systems and methods for adjusting directional audio according to a viewing angle during playback of a 360 video.
As smartphones and other mobile devices have become ubiquitous, people have the ability to capture video virtually anytime. Furthermore, 360 videos have gained increasing popularity.
In a computing device for adjusting audio output during playback of 360 video, a 360 video bitstream is received, and the 360 video bitstream separated into video content and audio content. The audio content corresponding to a plurality of audio sources is decoded, wherein a number of audio sources is represented by N. The video content is displayed and the audio content is output through a plurality of output devices, wherein a number of output devices is represented by M. In response to detecting a change in a viewing angle for the video content, a determination is made, for each of the plurality of output devices, of a distribution ratio for each of the plurality of audio sources based on the viewing angle such that N×M distribution ratios are determined; and the audio content is output through each of the plurality of output devices based on the determined N×M distribution ratios.
Another embodiment is a system that comprises a memory storing instructions and a processor coupled to the memory. The processor is configured by the instructions to receive a 360 video bitstream, and separate the 360 video bitstream into video content and audio content. The processor is further configured to decode the audio content corresponding to a plurality of audio sources, wherein a number of audio sources is represented by N. The processor is further configured to display the video content and output the audio content through a plurality of output devices, wherein a number of output devices is represented by M. In response to detecting a change in a viewing angle for the video content, the processor is further configured to determine, for each of the plurality of output devices, a distribution ratio for each of the plurality of audio sources based on the viewing angle such that N×M distribution ratios are determined; and output the audio content through each of the plurality of output devices based on the determined N×M distribution ratios.
Another embodiment is a non-transitory computer-readable storage medium storing instructions to be implemented by a computing device having a processor. The instructions, when executed by the processor, cause the computing device to receive a 360 video bitstream, and separate the 360 video bitstream into video content and audio content. The computing device is further configured to decode the audio content corresponding to a plurality of audio sources, wherein a number of audio sources is represented by N. The computing device is further configured to display the video content and output the audio content through a plurality of output devices, wherein a number of output devices is represented by M. In response to detecting a change in a viewing angle for the video content, the computing device is further configured to determine, for each of the plurality of output devices, a distribution ratio for each of the plurality of audio sources based on the viewing angle such that N×M distribution ratios are determined; and output the audio content through each of the plurality of output devices based on the determined N×M distribution ratios.
Various aspects of the disclosure can be better understood with reference to the following drawings. The components in the drawings are not necessarily to scale, emphasis instead being placed upon clearly illustrating the principles of the present disclosure. Moreover, in the drawings, like reference numerals designate corresponding parts throughout the several views.
An increasing number of digital capture devices are capable of recording 360 degree video (hereinafter “360 video”), which offers viewers a fully immersive experience. The creation of 360 video generally involves capturing a full 360 degree view using multiple cameras, stitching the captured views together, and encoding the video. An individual viewing a 360 video can experience audio from multiple directions due to placement of various audio capture devices during capturing of 360 video, as shown in FIG. 4. Various embodiments achieve an improved audio experience over conventional systems by adjusting the perceived direction of audio according to the user's viewing angle during playback of 360 video, thereby providing the user with a more realistic experience. In this regard, various embodiments provide an improvement over systems that output the same audio content regardless of whether the viewing angle changes.
As shown in
It should be emphasized that the present invention does not limit how the microphones are connected to the camera. Each audio source (AS) provides a separate sound signal based on the audio content captured by a corresponding microphone. For example, AS1 produces a sound signal based on the sound signal captured by Mic1. The microphone configuration utilized while capturing 360 video can be designed to accommodate different camera designs. For example, the microphone can be coupled via a cable or coupled wirelessly to the camera via Bluetooth®. In some configurations, a microphone array can be attached directly below or above the video camera to capture audio from different directions. The microphones can be evenly located around the camera or randomly placed.
A description of a system for implementing the audio adjustment techniques disclosed herein is now described followed by a discussion of the operation of the components within the system.
For some embodiments, the computing device 102 may be equipped with a plurality of cameras (not shown) where the cameras are utilized to directly capture digital media content comprising 360 degree views. In accordance with such embodiments, the computing device 102 further comprises a stitching module (not shown) configured to process the captured views and generate a 360 degree video. Alternatively, the computing device 102 can obtain 360 video from other digital recording devices coupled to the computing device 102 through a network interface 104. The network interface 104 in the computing device 102 may also access one or more content sharing websites 124 hosted on a server via the network 120 to retrieve digital media content.
As one of ordinary skill will appreciate, the digital media content may be encoded in any of a number of formats including, but not limited to, Motion Picture Experts Group (MPEG)-1, MPEG-2, MPEG-4, H.264, Third Generation Partnership Project (3GPP), 3GPP-2, Standard-Definition Video (SD-Video), High-Definition Video (HD-Video), Digital Versatile Disc (DVD) multimedia, Video Compact Disc (VCD) multimedia, High-Definition Digital Versatile Disc (HD-DVD) multimedia, Digital Television Video/High-definition Digital Television (DTV/HDTV) multimedia, Audio Video Interleave (AVI), Digital Video (DV), QuickTime (QT) file, Windows Media Video (WMV), Advanced System Format (ASF), Real Media (RM), Flash Media (FLV), or any number of other digital formats.
The computing device 102 includes a splitter 106 for receiving a 360 video file and separating the 360 video file into video and audio content. The splitter 106 routes the video content to a video decoder 108 and the audio content to an audio decoder 110 for decoding the video and audio data inside the file, respectively. The video decoder 108 is coupled to a display 116 and the audio decoder 110 is coupled to an audio output adjuster 112. As described in more detail below, the audio output adjuster 112 is configured to determine a ratio for distributing audio content from each of the audio sources (AS1, AS2, . . . ASN) (
For embodiments where the audio output device 118 in
The processing device 202 may include any custom made or commercially available processor, a central processing unit (CPU) or an auxiliary processor among several processors associated with the computing device 102, a semiconductor based microprocessor (in the form of a microchip), a macroprocessor, one or more application specific integrated circuits (ASICs), a plurality of suitably configured digital logic gates, and other well known electrical configurations comprising discrete elements both individually and in various combinations to coordinate the overall operation of the computing system.
The memory 214 can include any one of a combination of volatile memory elements (e.g., random-access memory (RAM, such as DRAM, and SRAM, etc.)) and nonvolatile memory elements (e.g., ROM, hard drive, tape, CDROM, etc.). The memory 214 typically comprises a native operating system 216, one or more native applications, emulation systems, or emulated applications for any of a variety of operating systems and/or emulated hardware platforms, emulated operating systems, etc. For example, the applications may include application specific software which may comprise some or all the components of the computing device 102 depicted in
Input/output interfaces 204 provide any number of interfaces for the input and output of data. For example, where the computing device 102 comprises a personal computer, these components may interface with one or more user input/output interfaces, which may comprise a keyboard or a mouse, as shown in
In the context of this disclosure, a non-transitory computer-readable medium stores programs for use by or in connection with an instruction execution system, apparatus, or device. More specific examples of a computer-readable medium may include by way of example and without limitation: a portable computer diskette, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM, EEPROM, or Flash memory), and a portable compact disc read-only memory (CDROM) (optical).
Reference is made to
Although the flowchart of
To begin, in block 310, the computing device 102 receives 360 video to be viewed by a user and splits the 360 video into video and audio content. In block 320, the audio decoder 110 decodes the encoded audio content and extracts the number of audio sources (AS1 to ASN) encoded in the audio portion of the 360 video, where N represents the total number of audio sources. As shown earlier in
Next, in block 330, the computing device 102 monitors for a change in viewing angle specified by the user as the user views the 360 video. A change in the viewing angle by the user triggers calculation of the ratio for distributing audio content from each of the N audio sources to the channels of the audio output device, and adjustment of the audio output is performed on the fly. For implementations where the audio output device comprises headphones, the headphones comprise a right channel and a left channel such that the number of audio output devices is two (M=2).
The computing device 102 determines the ratio for distributing audio content originating from each of the N audio sources between the M=2 audio output devices—specifically, between the left and right channels of the headphones (block 340). Thus, the ratio is calculated for each of the N audio sources, thereby yielding N ratio values for each of the M=2 audio output devices for a total of N×M ratio values. Based on the determined ratio, in block 350, the computing device 102 adjusts the corresponding magnitude or volume of the audio content for the left and right channels for each of the N audio sources and outputs the audio content accordingly to the left and right channels. Thereafter the process in
Additional details are now provided for calculation of distribution ratios by the audio output adjuster 112 (
With regards to the distribution ratios, assume that the viewing angle is θ. Based on this, the left channel angle is θL=270+θ, and the right channel angle is θR=90+θ, where the respective magnitudes of each audio source (AS1 to ASN) for the left and right channels are calculated according to the following equations:
In the equations above, fLi(θ) represents the ratio for distributing the audio content from the ith audio source (ASi) out of N audio sources to the left channel speaker based on a viewing angle θ degrees. Similarly, in the equations above, fRi(θ) represents the ratio for distributing the audio content from audio source ASi to the right channel speaker based on a viewing angle θ degrees, where the sum of the ratios is fLi(θ)+fRi(θ)=1. Thus, CHl represents the magnitude/volume of all audio signals from the N audio source (AS1 . . . ASNi) output to the left channel, while CHr(i) represents the magnitude/volume of all audio signals from all the N audio source (AS1 . . . ASNi) output to the right channel, where the audio signals are weighted by corresponding distribution ratios (fLi(θ),fRi(θ)). Thus, an improved audio experience is achieved by adjusting the perceived direction of audio according to the user's viewing angle during playback of 360 video, thereby providing the user with a more realistic experience.
To further illustrate calculation of the distribution ratios disclosed above, reference is made to
Thus, if the viewing angle θ=0, then CHl=AS2 and CHr=AS1, whereas if the viewing angle θ=180, then CHl=AS1 and CHr=AS2. If the viewing angle θ=90, then CHl=½×AS1+½×AS2 and CHr=½×AS1+½×AS2. That is, for this particular example, the two audio sources (AS1, AS2) contribute equally when the viewing angle θ=90.
With reference to
Thus, if the viewing angle θ=0, then:
If the viewing angle θ=180, then:
If the viewing angle θ=90, then:
CHl=AS1+¼×AS2+¼×AS3
CHr=¾×AS2+¾×AS3
With reference to
Thus, if viewing angle θ=0, then:
CHl=½×AS1+½×AS3+AS4
CHr=½×AS1+AS2+½×AS3
If the viewing angle θ=180, then:
CHl=½×AS1+AS2+½×AS3
CHr=½×AS1+½×AS3+AS4
If the viewing angle θ=90, then:
CHl=AS1+½×AS2+½×AS4
CHr=½×AS2+AS3+½×AS4
Note that while the audio output device (
If θ=0, then CH1=AS1, CH2=AS2, CH3, =AS3.
If θ=120, then CH1=AS2, CH2=AS3, CH3, =AS1
If θ=240, then CH1=AS3, CH2=AS1, CH3, =AS2
If θ=30, then:
It should be emphasized that the above-described embodiments of the present disclosure are merely possible examples of implementations set forth for a clear understanding of the principles of the disclosure. Many variations and modifications may be made to the above-described embodiment(s) without departing substantially from the spirit and principles of the disclosure. All such modifications and variations are intended to be included herein within the scope of this disclosure and protected by the following claims.
This application claims priority to, and the benefit of, U.S. Provisional patent application entitled, “Systems and Methods for Adjusting Directional Audio in a 360 Video,” having Ser. No. 62/337,912, filed on May 18, 2016, which is incorporated by reference in its entirety.
Number | Date | Country | |
---|---|---|---|
62337912 | May 2016 | US |