This application claims priority under 35 U.S.C. §119 to Japanese Patent Application No. 2012-026242, filed on Feb. 9, 2012. The entire disclosure of Japanese Patent Application No. 2012-026242 is hereby incorporated herein by reference.
1. Technical Field
The technology disclosed herein relates to a 3D video reproduction device.
2. Background Information
Devices for displaying 3D video have been under development in recent years. Typical 3D video devices can produce a video having parallax capable of being projected onto a screen. The images projected onto the screen (right-eye image and left-eye image) are combined by the viewer's brain and recognized as a three-dimensional image. Also, during projection of the video onto the screen, the user is given a surround effect by emitting acoustic signals from a plurality of speakers (see Japanese Laid-Open Patent Application 2000-122590).
Conventional 3D video devices, such as the one disclosed in Japanese Laid-Open Patent Application 2000-122590, provides the user with three-dimensional video and accompanying surround sound. However, a common characteristic with these conventional 3D video devices is that the surround sound is oversimplified and does not or cannot adequately express the full effects of a three-dimensional video.
The technology disclosed herein was conceived in an effort to solve the above problem. Accordingly, one object of the present technology is to provide a 3D video reproduction device in which the effects of a three-dimensional video can be fully expressed.
In accordance with one aspect of the technology disclosed herein, a 3D video reproduction device is provided that reproduces 3D streaming data that includes 3D video data and audio data. The 3D video reproduction device comprises an audio analyzing unit, a parallax setting unit, a video correction unit, and an output unit. The audio analyzing unit is configured to analyze the audio data to determine the scene indicated by the 3D video data. The parallax setting unit is configured to set the amount of parallax that corresponds to the scene based on the audio data analyzed by the audio analyzing unit. The video correction unit is configured to correct the 3D video data based on the amount of parallax set by the parallax setting unit. The output unit is configured to reproduce the corrected 3D video data and output the audio data.
With this 3D video reproduction device, the scene indicated by 3D video data can be determined by analyzing the audio data. Also, the 3D video data is corrected, and the corrected 3D video data is reproduced, by setting the amount of parallax that corresponds to the scene of the video based on this analysis result. Thus, this 3D video reproduction device expresses 3D video by using 3D video data and audio data in concert with each other. Consequently, this 3D video reproduction device can more fully express the effect of 3D video than in a conventional scenario in which 3D video data and audio data are simply used as independent data and outputted to an output unit.
The 3D video reproduction device of the present technology can fully express the effects of three-dimensional video.
These and other objects, features, aspects and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, taken in conjunction with the annexed drawings, disclose embodiments of the present invention.
Referring now to the attached drawings, which form a part of this original disclosure:
Selected embodiments of the present invention will now be explained with reference to the drawings. It will be apparent to those skilled in the art from this disclosure that the following descriptions of the embodiments of the present invention are provided for illustration only and not for the purpose of limiting the invention as defined by the appended claims and their equivalents.
Configuration of 3D Video Reproduction Device
The 3D video reproduction device 100 comprises a control unit 150, an interface unit 160, a stream control unit 110, an audio decode unit 111, a video decode unit 112, an audio analyzing unit 113, a parallax setting unit 114, a video correction unit 115, and a video display unit 116 (one example of an output unit).
The control unit 150 provides overall control of the operation of the entire 3D video reproduction device 100. The control unit 150 is made up of a CPU (central processing unit), a ROM (read-only memory), and so forth. Programs related to basic control and the like are stored in the ROM.
The interface unit 160 handles command inputs from a user. When the interface unit 160 receives a command from the user, it sends the control unit 150 a signal corresponding to the content of the command. A volume adjuster 160a is included in the interface unit 160. When the volume is set with the volume adjuster 160a, a signal corresponding to that volume is sent to the control unit 150.
The stream control unit 110 separates 3D streaming data inputted to the 3D video reproduction device 100 into 3D video data and audio data, and then outputs the 3D video data and audio data separately to the outside. The 3D video data has, for example, left-use video data and right-use video data. For example, the plurality of sets of audio data includes surround audio data for a plurality of channels.
The audio decode unit 111 decodes the audio data for each channel outputted from the stream control unit. The video decode unit 112 decodes 3D video data outputted from the stream control unit 110, such as left and right video data.
The audio analyzing unit 113 analyzes audio data in order to determine the state of each scene indicated by the 3D video data. More precisely, the audio analyzing unit 113 calculates sound pressure data (one example of first characteristic data) indicating the strength of audio and volume data (one example of second characteristic data) indicating the volume of the audio, by analyzing the audio data.
More specifically, audio data is audio time-series data corresponding to video data. The audio analyzing unit 113 calculates sound pressure data and volume data based on this audio time-series data. Sound pressure data is calculated, for example, by using this equation:
Lp(t)=10×log 10(P(t)/Po)2.
Here, P(t) is audio data, and Po is a reference sound pressure. The audio data is the audio data P(t) at the point in time t in time-series data.
The audio data to be analyzed in audio time-series data may be the audio data for each channel, corresponding to video at a certain point in time of output to the video display unit 116, or may be the average for audio data for the various channels within a specific range prior to a certain point in time of output to the video display unit 116. This average can be, for example, the average of audio data for each channel which is decoded prior to being outputted to the video display unit 116 and stored in a buffer.
The parallax setting unit 114 sets the amount of parallax that corresponds to each scene displayed in the video, based on the analyzed audio data. Here, the amount of parallax corresponds, for example, to the amount of change in information about the position in the horizontal direction, for a position, region, object, etc., corresponding to a left-eye image and a right-eye image.
The parallax setting unit 114 has an imaging direction setting unit 114a, a first adjust unit 114b (first parallax adjustment unit), and a second adjust unit 114c (second parallax adjustment unit).
The imaging direction setting unit 114a sets the imaging direction corresponding to each scene. The imaging direction corresponding to each scene is set based on the audio data. More specifically, the imaging direction setting unit 114a selects at least one set of audio data from among a plurality of sets based on the sound pressure data for each of the plurality of sets of audio data (sound pressure data for each channel). The imaging direction setting unit 114a then sets the imaging direction to a single direction, with respect to a plane in which the amount of parallax is zero (reference plane) based on the selected audio data. The imaging direction has a receding direction and an approaching direction. The imaging direction setting unit 114a sets the imaging direction to either the receding direction or the approaching direction based on the selected audio data.
The term “receding direction” here refers to the direction in which an object appears to be moving away from the user, using the reference plane as a reference. The “approaching direction” is the direction in which an object appears to be moving toward the user, using the reference plane as a reference.
The first adjust unit 114b adjusts the amount of parallax so that movement of the object is recognized as being toward the set imaging direction. More specifically, the first adjust unit 114b adjusts the amount of parallax so that movement of the object is recognized as being toward the set imaging direction based on volume data for the selected audio data. How the amount of parallax is adjusted will be discussed in detail below.
As shown in
The second adjust unit 114c also adjusts the amount of parallax in the imaging direction based on volume data adjusted by the volume adjuster. How the amount of parallax is further adjusted will be discussed in detail below.
The video correction unit 115 corrects 3D video data based on the amount of parallax that corresponds to each scene. The 3D video data (left and right video data) inputted to the 3D video reproduction device 100 already has left and right parallax. However, when the amount of parallax is set as above, the 3D video data is corrected so that the amount of parallax of the 3D video data is the amount set above.
The video display unit 116 reproduces corrected 3D video data and outputs audio data. The video display unit 116 is a liquid crystal monitor equipped with speakers, for example. In this case, the corrected 3D video data is displayed on the liquid crystal monitor, and audio data for the various channels is outputted from the speakers.
The 3D video reproduction device 100 further has a RAM (random access memory) (not shown). This RAM functions as a working memory (buffer memory) for a control unit 130. The RAM is a volatile storage medium, such as a DRAM (dynamic random access memory).
Operation of 3D Video Reproduction Device
The operation of the 3D video reproduction device 100 will now be described through reference to
Also, the specific first volume level α1 and specific second volume level α2 in
The processing performed by the 3D video reproduction device 100 will now be described through reference to
In the following embodiment, audio data for human dialog etc., and/or audio data for environmental sound (such as for the landscape within the screen) etc., was used as an example to facilitate the description, but this audio data may be any kind at all.
With this 3D video reproduction device 100, when the control unit 130 recognizes 3D streaming data that includes 3D video data and audio data (S1), the direction (imaging direction) of setting an object K1 (viewing screen) is set with respect to a plane K0 (reference plane) at which the amount of parallax is zero based on the sound pressure data for each channel (see
More precisely, it is determined whether or not audio data for all the channels other than the center channel (surround channels) is greater than a specific first sound pressure data (S2). For example, first the surround channel having the greatest sound pressure data (first correction-use channel) is selected from among the five surround channels. Next, if the sound pressure data for this first correction-use channel is greater than the specific first sound pressure data (Yes in S2), the imaging direction is set to the direction in which the viewing screen K1 is moving away from the user (receding direction), using the reference plane K0 as a reference (S3). In this case, the user recognizes the viewing screen K1, for example a landscape within the screen, at a position that is away from the user. Consequently, the landscape within the screen, etc., appears more dynamic to the user.
An example was given here of a case of setting the imaging direction to the receding direction when the greatest sound pressure data of the five surround channels was greater than the specific first sound pressure data, but the imaging direction may be set to the receding direction when the average sound pressure data of the five surround channels is greater than the specific first sound pressure data. In this case, the average sound pressure data of the five surround channels is set as the first correction-use channel.
Next, if the first correction-use channel is at or under the specific first sound pressure data (No in S2), it is determined whether or not the sound pressure data for the center channel (the second correction-use channel) is greater than a specific second sound pressure data (S4). Here, if the sound pressure data for the second correction-use channel is greater than the specific second sound pressure data (Yes in S4), the imaging direction is set to the direction in which the viewing screen K1 is moving toward the user (approaching direction), using the reference plane K0 as a reference (S5). In this case, the user recognizes that the viewing screen K1, for example a person within the screen, is located close to the user. Consequently, the user pays more attention to the person or the like within the screen.
If the sound pressure data for the second correction-use channel (center channel) is at or under the specific second sound pressure data (No in S4), the setting of the imaging direction is not executed, and the processing in step 8 (S8; discussed below) is executed.
Next, when the imaging direction is set as above, the amount by which the viewing screen K1 jumps out or recedes is corrected based on the volume data for each channel.
More specifically, if the imaging direction is set to the receding direction (S3), the amount by which the viewing screen K1 recedes is corrected based on the volume data for the first correction-use channel (S4). Even more specifically, in this case the amount of recession is corrected so that the greater is the volume data for the first correction-use channel, the more the viewing screen K1 jumps out (S6).
Here, the amount of recession of the viewing screen K1 is corrected by multiplying the first amplification coefficient shown in
Meanwhile, if the imaging direction is set to the approaching direction, the amount by which the viewing screen K1 jumps out is corrected according to the volume data for the second correction-use channel (center channel) (S7). More specifically, in this case amount of jump-out is corrected so that the greater is the volume data for the second correction-use channel, the greater is the amount by which the viewing screen K1 jumps out.
Here, the amount by which the viewing screen K1 jumps out is set by multiplying the second amplification coefficient shown in
Next, it is determined whether or not amplification by volume has been performed (S8). If volume amplification has been performed (Yes in S8), then the amount by which the viewing screen K1 recedes and/or the amount by which the viewing screen K1 jumps out is further corrected according to the amount of amplification by volume (S9). The amplification amount is the current volume with respect to a preset volume (a reference value).
In this case, the amount by which the viewing screen K1 recedes and/or the amount by which the viewing screen K1 jumps out is further set by multiplying a third amplification coefficient corresponding to the amount of volume amplification by the above-mentioned corrected amount of parallax. If the answer is “No” in S4, the amount by which the viewing screen K1 recedes and/or the amount by which the viewing screen K1 jumps out is set by multiplying the third amplification coefficient corresponding to the amount of volume amplification by the uncorrected amount of parallax. The corresponding relation between the third amplification coefficient and the amount of volume amplification is set as shown in
When the left and right amounts of parallax corresponding to each scene are thus set based on the audio data for each channel, the 3D video data is corrected based on these parallax amounts (S10). As a result, the corrected 3D video data is displayed on the liquid crystal monitor, and audio data for the various channels is outputted from the speakers (S 11).
Features of 3D Video Reproduction Device
The 3D video reproduction device 100 of the present technology reproduces 3D streaming data having 3D video data and audio data. This 3D video reproduction device 100 comprises the audio analyzing unit 113, the parallax setting unit 114, the video correction unit 115, and the video display unit 116. The audio analyzing unit 113 analyzes audio data in order to determine the scene indicated by 3D video data. The parallax setting unit 114 sets the amount of parallax that corresponds to the video scene based on the audio data analyzed by the audio analyzing unit 113. The video correction unit 115 corrects 3D video data based on the amount of parallax set by the parallax setting unit 114. The video display unit 116 reproduces corrected 3D video data and outputs audio data.
With this 3D video reproduction device 100, the scene indicated by the 3D video data can be determined by analyzing the audio data for each channel. The 3D video data is corrected, and the corrected 3D video data is reproduced, by setting a parallax amount corresponding to the video scene based on this analysis result. Thus, this 3D video reproduction device expresses 3D video by using 3D video data and audio data in concert with each other. Consequently, this 3D video reproduction device can more fully express the effects of 3D video than in a conventional scenario in which 3D video data and audio data are simply used as independent data and outputted to an output unit.
(a) In the above embodiment, an example was given in which the imaging direction was set to the approaching direction when the sound pressure data for the first correction-use channel was at or under a specific first sound pressure data, and the sound pressure data for the second correction-use channel was greater than a specific second sound pressure data. As shown in
(b) In the above embodiment, an example was given in which the imaging direction was set to the receding direction when the sound pressure data for the first correction-use channel was greater than a specific first sound pressure data, and the sound pressure data for the second correction-use channel was at or under a specific second sound pressure data. As shown in
(c) In the above embodiment, an example was given in which there were 5.1 surround channels, but the number of channels is not limited to what was given in the embodiment above, and any number may be used so long as there are a plurality.
In understanding the scope of the present disclosure, the term “comprising” and its derivatives, as used herein, are intended to be open ended terms that specify the presence of the stated features, elements, components, groups, integers, and/or steps, but do not exclude the presence of other unstated features, elements, components, groups, integers and/or steps. The foregoing also applies to words having similar meanings such as the terms, “including”, “having” and their derivatives. Also, the terms “part,” “section,” “portion,” “member” or “element” when used in the singular can have the dual meaning of a single part or a plurality of parts. Also as used herein to describe the above embodiment(s), the following directional terms “forward”, “rearward”, “above”, “downward”, “vertical”, “horizontal”, “below” and “transverse” as well as any other similar directional terms refer to those directions of the 3D video reproduction device. Accordingly, these terms, as utilized to describe the present invention should be interpreted relative to the 3D video reproduction device.
The term “configured” as used herein to describe a component, section, or part of a device includes hardware and/or software that is constructed and/or programmed to carry out the desired function.
The terms of degree such as “substantially”, “about” and “approximately” as used herein mean a reasonable amount of deviation of the modified term such that the end result is not significantly changed.
While only selected embodiments have been chosen to illustrate the present invention, it will be apparent to those skilled in the art from this disclosure that various changes and modifications can be made herein without departing from the scope of the invention as defined in the appended claims. For example, the size, shape, location or orientation of the various components can be changed as needed and/or desired. Components that are shown directly connected or contacting each other can have intermediate structures disposed between them. The functions of one element can be performed by two, and vice versa. The structures and functions of one embodiment can be adopted in another embodiment. It is not necessary for all advantages to be present in a particular embodiment at the same time. Every feature which is unique from the prior art, alone or in combination with other features, also should be considered a separate description of further inventions by the applicant, including the structural and/or functional concepts embodied by such feature(s). Thus, the foregoing descriptions of the embodiments according to the present invention are provided for illustration only, and not for the purpose of limiting the invention as defined by the appended claims and their equivalents.
The present technology can be broadly applied to 3D video reproduction devices.
| Number | Date | Country | Kind |
|---|---|---|---|
| 2012-026242 | Feb 2012 | JP | national |