The present invention relates to a video processing apparatus and a video display apparatus.
This application claims priority based on JP 2018-67287 filed on Mar. 30, 2018, the contents of which are incorporated herein by reference.
In recent years, the resolution of display apparatuses has been increased and the display apparatuses capable of displaying Ultra High Density (UHD) have been developed. 8K super Hi-Vision broadcast, a television broadcast with about eight thousand pixels in the lateral direction, uses a display apparatus capable of displaying especially high resolution in such UHD displays, and the implementation of the 8K super Hi-Vision broadcast has been in progress. The signal for supplying a video to the display apparatus (8K display apparatus) supporting such 8K super Hi-Vision broadcast has a very wide band, and it is necessary to supply the signal at a speed of higher than 70 Gbps in a case of non-compression, and a speed of approximately 100 Mbps even in a case of compression.
In order to distribute a video signal that uses such a broadband signal, the use of new types of broadcast satellites and optical fibers has been studied (NPL 1).
The ultra high density display apparatus uses a large amount of information that can be provided to a viewer, thus allowing services that provide a wide variety of information to be available. The ultra high density display apparatus has a sufficient number of pixels per unit area even in a case that the screen size is increased, and has sufficient information even in a case that a portion of the display apparatus is used to provide video information, so the user experience of the viewer is greatly improved compared to a case that a similar service is provided for a display apparatus with an existing resolution.
In order to further enhance the presence obtained by increasing the screen size, efforts have been carried out for the acoustic aspect, and the use of an acoustic system using multiple speakers together has been studied (NPL 2).
However, in a case that a viewer watches with a large screen ultra high density display apparatus, most of the field of view is covered by the video, and consciousness is focused on the center of the field of view, so the recognition capability for each video information in displaying multiple pieces of video information is reduced.
An aspect of the present invention has been made in view of the above problems, and is to disclose a device and a configuration thereof for enhancing recognition of multiple pieces of video information, by providing multiple pieces of video information and acoustic information according to a display apparatus used by a viewer from a network side device, and reproducing the acoustic information using an audio object along with displaying of the multiple pieces of video information on the display apparatus side.
(1) In order to achieve the object described above, according to an aspect of the present invention, provided is a video insertion apparatus that inserts one or more prescribed videos and one or more pieces of prescribed audio into a stream including a video and a piece of audio and transmits the stream resulting from the insertion to a video display terminal apparatus, the video insertion apparatus including: a scaling processing unit configured to align a size and position of a prescribed video of the one or more prescribed videos to be inserted with sizes and positions of one or more display regions that are part of a display range of the video included in the stream; and an audio object position adjustment unit configured to convert a piece of prescribed audio of the one or more pieces of prescribed audio corresponding to the prescribed video to be inserted into an audio object and configure a position at which the audio object is configured in each of the one or more display regions.
(2) In order to achieve the object described above, according to an aspect of the present invention, provided is the video insertion apparatus, further including: a terminal interface unit configured to acquire terminal information of the video display terminal apparatus, wherein the one or more display regions are configured based on the terminal information.
(3) In order to achieve the object described above, according to an aspect of the present invention, provided is the video insertion apparatus, wherein a plurality of the video display terminal apparatuses to which the stream resulting from the insertion is to be transmitted are grouped based on at least either information about an area or information about a user group, and the prescribed video and the piece of prescribed audio are inserted for each of the plurality of video display terminal apparatuses that are grouped.
(4) In order to achieve the object described above, according to an aspect of the present invention, provided is the video insertion apparatus, wherein in a case that change information is received from a video display terminal apparatus of the plurality of video display terminal apparatuses to which at least one of a plurality of the streams resulting from the insertion is transmitted, the change information being information for the prescribed video and the piece of prescribed audio inserted for each of the plurality of video display terminal apparatuses that are grouped, configurations of the one or more display regions and the audio object of the piece of prescribed audio are changed based on the change information for each of the plurality of video display terminal apparatuses that are grouped.
(5) In order to achieve the object described above, according to an aspect of the present invention, provided is a video display terminal apparatus that receives a stream including information of a video and a piece of audio and reproduces the video and the piece of audio, wherein the video display terminal apparatus transmits, to a video insertion apparatus, information related to a size of a video display unit included in the video display terminal apparatus, and terminal information including information related to a distance between the video display unit and a viewer.
(6) In order to achieve the object described above, according to an aspect of the present invention, provided is the video display terminal apparatus, wherein the information of the size of the video display unit included in the terminal information is normalized to prescribed types of information.
(7) In order to achieve the object described above, according to an aspect of the present invention, provided is the video display terminal apparatus, further including: a user input apparatus, wherein in a case that an operation on a video inserted by the video insertion apparatus is input from the user input apparatus, change information corresponding to the video is transmitted to the video insertion apparatus.
According to an aspect of the present invention, recognition of multiple pieces of video information can be enhanced, by providing multiple pieces of video information and acoustic information according to a display apparatus used by a viewer from a network side device, and reproducing the acoustic information using an audio object along with display of the multiple pieces of video information on the display apparatus side.
Hereinafter, a radio communication technology according to an embodiment of the present invention will be described in detail with reference to the drawings.
An embodiment of the present invention will be described in detail below by using the drawings.
The video server 101 includes a video combining unit 105 configured to supply a video stream, an audio combining unit 106 configured to generate an audio stream, and a multiplexing unit 107 configured to multiplex the video stream and the audio stream. The audio stream may include two or more pieces of audio data. The audio stream encoding method is not particularly specified, but MPEG AAC, MPEG SAOC, or the like may be used. The video stream encoding method is not particularly specified, but H.264 scheme, H.265 scheme, VP9, or the like may be used. The method for multiplexing the audio stream and the video stream is not particularly limited, but MPEG2 Systems, MPEG Media Transport (MMT), MP4, or the like may be used. A stream obtained by multiplexing the audio stream and the video stream is hereinafter referred to as a composite stream.
The video insertion apparatus 102 is located between the video server 101 and the network 128, and inserts, to the composite stream output from the video server 101, another video stream in which the video size is controlled and another audio stream including an object audio in which the audio position is controlled. 108 is a demultiplexer unit configured to demultiplex the input composite stream to extract the video stream and the audio stream, and 109 is a video combining unit configured to compose video data of an video stream for insertion output from a stream cache unit 121 with the video data included in the video stream output from the demultiplexer unit 108. The method for composing video streams is not particularly specified. The video stream output from the demultiplexer unit 108 may be decoded to generate raw video data, and the video stream output from the stream cache unit 121 may be decoded to generate raw video data, and the two pieces of video data may be composed and then reencoded to obtain a composed video stream, or the video stream output from the demultiplexer unit 108 and the video stream output from the stream cache unit 121 may be composed on a coding unit basis such that the reencoding process is partially decreased. The method may be a method that allows the video stream output from the stream cache unit 121 to be composed as another track. 110 is an audio combining unit configured to compose the audio stream output from the stream cache unit 121 with the audio stream output from the demultiplexer unit 108. Although the method for composing audio streams is not particularly specified, for example, in a case that the audio stream output from the demultiplexer unit 108 is a channel based audio source, the audio stream may be composed as an object audio source obtained by using the channel based audio source as bed and adding the audio object output from the stream cache 121. In a case that the audio stream output from the demultiplexer unit 108 is the object audio source, an audio object may be added to the object audio source. At this time, it may be downmixed in a case that the upper limit of the number of audio objects is exceeded. The audio stream to be composed may also be composed as another track. 111 is a multiplexer unit that multiplexes the composed video stream output from the video combining unit 109 and the composed audio stream output from the audio combining unit 110. The re-multiplexed composite stream is output to the network 128.
121 is the stream cache unit configured to send, according to the control of an insertion stream configuration unit 113, the video stream for insertion output from a scaler/position adjustment unit 114 and the audio stream for insertion output from an audio object position adjustment unit 117 to the video combining unit 109 and the audio combining unit 110, respectively. According to the control of the insertion stream configuration unit, the video stream and the audio stream are accumulated, and the accumulated video stream and audio stream are sent to the video combining unit 109 and the audio combining unit 110, respectively. 114 is the scaler/position adjustment unit which is a block configured to perform scaling processing on the video data output from a video selection unit 115 and generate a video stream in which display position has been adjusted according to the control of the insertion stream configuration unit 113. 115 is a block configured to transmit video data selected from a video library unit 116 to the scaler/position adjustment unit 114 according to the control of the insertion stream configuration unit 113. 116 is the video library unit configured to accumulate multiple pieces of video data for insertion. 117 is the audio object position adjustment unit configured to convert the audio data output from the audio selection unit 118 to an audio object by the control of the insertion stream configuration unit 113, and output an audio stream in which the position of the audio object is configured. 118 is the audio selection unit configured to output the audio data selected from an audio library 119 according to the control of the insertion stream configuration unit 113. 119 is the audio library configured to accumulate multiple pieces of audio data for insertion. 120 is a library update unit which is a block configured to update the contents of the video library 116 and the audio library 119 from outside the video insertion apparatus 102, and transmit the updated content to the insertion video stream configuration unit 113.
112 is a terminal interface unit configured to communicate with the video display terminal apparatus 103 to be connected via the network 128, obtain various pieces of information such as terminal capability information related to the hardware or the software of the video display terminal apparatus 103, and user operation information input via a user input apparatus 127 of the video display terminal apparatus 103, obtain terminal registration information, related to the video display terminal apparatus 103, that is registered in advance by communicating with the terminal information management apparatus 104, and transmit these pieces of information to the insertion video stream configuration unit 113. The insertion video stream configuration unit 113 is a block configured to configure the display size and display position of the video stream selected from the video library 116, and a parameter for converting the audio stream selected from the audio library 119 to an audio object, based on information of the video display terminal apparatus 103 obtained from the terminal interface 112, user operation information, information obtained from the library update unit 120, other information obtained from the video server 101, and the like.
Next, an example configuration of the video display terminal apparatus 103 will be described. 122 is a demultiplexer unit configured to perform demultiplexing processing on the input composite stream, and output the video stream and the audio stream, 123 is a video display unit configured to decode and display the video stream and display the picture for the user interface provided by a network service interface unit 125, 124 is an audio reproduction unit configured to decode the audio stream to perform multi-channel reproduction, and reproduce audio for the user interface provided by the network service interface unit 125, and 125 is a network service interface unit configured to communicate with the terminal interface unit 112 of the video insertion apparatus 102 via the network 128, and exchange various types of information such as information of a terminal information unit 126 and information of the user input apparatus 127. 126 is the terminal information unit which is a block configured to store information related to the video display terminal apparatus 103 such as information specific to the configuration of the video display terminal apparatus 103, unique information for individually identifying the video display terminal apparatus 103, information for identifying a contract for using the network 128, and the like, and transmit information stored to the terminal interface unit 112 of the video insertion apparatus 102 via the network service interface unit 125. 127 is the user input apparatus which is a block configured to receive user operations for the video display terminal apparatus 103, transfer the user operation information to the terminal interface unit 112 of the video insertion apparatus 102 via the network service interface unit 125, generate a video for the user interface to output the video to the video display unit 123, and generate an audio for the user interface to output the audio to the audio reproduction unit 124.
The terminal information management apparatus 104 is an apparatus configured to receive an inquiry from the terminal interface unit 112 of the video insertion apparatus 102, and transmit information related to services that can be used by the video insertion apparatus 102 as a response, based on information related to the video display terminal apparatus 103 included in the inquiry.
The audio reproduction unit 124 included in the video display terminal apparatus 103 is configured to be capable of reproducing the object audio. Unlike existing channel based audio sources, the object audio is a scheme that defines each of multiple audio sources constituting the reproduction audio as an audio object (virtual audio source) and arranges and reproduces the audio sources at a free position of the reproduction space. Existing channel based audio sources are audio sources that are prepared with the assumption that speakers are arranged in multiple predetermined directions, for example left and right two directions in a case of a two channel stereo audio source, or a left front, a front center, a right front, a right rear, and a left rear in a case of a five channel surround audio source. In many cases, speakers used for channel based audio sources are located on a horizontal plane, and in some implementations, multiple horizontal planes are provided to reproduce sound traveling from an upper side in a predetermined direction. In such channel based audio sources, since multiple audio sources are mixed for the assumed speaker arrangement at the time of audio source regeneration, there is a problem that an intended sound may fail to be reproduced in mixing the audio sources, due to differences of positions at which the speakers are arranged in the reproduction environment or a difference of the position of the listener at the time of reproduction. This may be expressed as a narrow sweet spot of the audio source. In contrast, in a case that an object audio is used, selection of speakers to reproduce a virtual audio source and mixing can be adaptively performed depending on the speaker arrangement positions or the listener position, thus allowing an intended sound field to be reproduced in generating the audio source even in a case that the reproduction environment changes. The selection of speakers to reproduce an audio object and mixing may be referred to as sound rendering.
Although there are multiple methods for defining a virtual audio source, it is often the case that multiple audio sources located at relative positions from one reference point are used. In the present embodiment, the virtual audio is defined as the audio source represented by the polar coordinates denoted by r, θ, and ϕ from the reference position (origin) as illustrated by 201 in
Next, the insertion of the video and the audio will be described with reference to
The video insertion apparatus 102 that obtains the information related to the size and the viewing distance of the video display unit 123 of the video display terminal 103 selects video data and audio data to be inserted by the insertion video stream configuration unit 113 from the video library 118 and the audio library 119 via the video selection unit 115 and the audio selection unit 118, respectively. The scaling processing and display position adjustment are performed, on the selected video data, by the scaler/position adjustment unit 114 so as to allow overlapping display composition to be performed on the video stream included in the composite stream received from the video server 101. The scaler/position adjustment unit 114 converts the video data obtained by performing the scaling processing and the display position adjustment into a video stream and transmits the video stream to the stream cache unit 121. The selected audio data is converted to an audio object and the position of the audio object is configured by the audio object position adjustment unit 117. The position of the audio object is described with reference to
In a case that the size of the display apparatus 123 of the video display terminal 103 is small and an audio object configured in the region for displaying the insertion video is not so effective in arousing attention to the insertion video, an audio object of the video to be inserted outside the range of the display apparatus 123 may be configured. As an example,
An example of a configuration has been described above in which the insertion video and the insertion audio are composed by the video insertion apparatus separated from the video display terminal apparatus by the network, but a configuration may be adopted in which the insertion video and the insertion audio are composed by the video display terminal apparatus. An example of such a configuration is illustrated in
As illustrated above, the audio object is configured to be at a position near the display position of the video inserted in the video insertion apparatus or at a position where it is possible to recognize that the insertion video is displayed and the audio is reproduced, so that the attention of the viewer is aroused and it is possible to inform that the video is inserted. Configuring the audio object such that the sound travels from the displayed insertion video improves the user experience of the insertion video.
In the present embodiment, a configuration will be described in which the network can be divided into multiple sub-networks, for example, networks provided in specific regions, video insertion apparatuses are located in the divided networks to allow insertion of a video effective only in the divided networks, or allow insertion of a video effective only in groups based on information of users connected to the networks.
The configuration of the device used in the present embodiment is illustrated in
An example of area control and group control will be described using
For the insertion video being grouped, the user may access the video insertion apparatus 604 via the network service interface 603 by using the user input apparatus 127 to change the insertion method of the insertion video and the insertion audio. An example of this operation is described using
A program running on an apparatus according to the present invention may serve as a program that controls a Central Processing Unit (CPU) and the like to cause a computer to operate in such a manner as to realize the functions of the above-described embodiment according to the present invention. Programs or the information handled by the programs are temporarily stored in a volatile memory such as a Random Access Memory (RAM), a non-volatile memory such as a flash memory, a Hard Disk Drive (HDD), or any other storage device system.
Note that a program for realizing the functions of the embodiment according to the present invention may be recorded in a computer-readable recording medium. This configuration may be realized by causing a computer system to read the program recorded on the recording medium for execution. It is assumed that the “computer system” refers to a computer system built into the apparatuses, and the computer system includes an operating system and hardware components such as a peripheral device. The “computer-readable recording medium” may be any of a semiconductor recording medium, an optical recording medium, a magnetic recording medium, a medium dynamically retaining the program for a short time, or any other computer readable recording medium.
Each functional block or various characteristics of the apparatuses used in the above-described embodiment may be implemented or performed on an electric circuit, for example, an integrated circuit or multiple integrated circuits. An electric circuit designed to perform the functions described in the present specification may include a general-purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other programmable logic devices, discrete gates or transistor logic, discrete hardware components, or a combination thereof. The general-purpose processor may be a microprocessor or may be a processor of known type, a controller, a micro-controller, or a state machine instead. The above-mentioned electric circuit may include a digital circuit, or may include an analog circuit. In a case that with advances in semiconductor technology, a circuit integration technology appears that replaces the present integrated circuits, one or more aspects of the present invention can use a new integrated circuit based on the technology.
Note that the invention of the present patent application is not limited to the above-described embodiments. In the embodiment, apparatuses have been described as an example, but the invention of the present application is not limited to these apparatuses, and is applicable to a terminal apparatus or a communication apparatus of a fixed-type or a stationary-type electronic apparatus installed indoors or outdoors, for example, an AV apparatus, office equipment, a vending machine, and other household apparatuses.
The embodiments of the present invention have been described in detail above referring to the drawings, but the specific configuration is not limited to the embodiments and includes, for example, an amendment to a design that falls within the scope that does not depart from the gist of the present invention. Various modifications are possible within the scope of the present invention defined by claims, and embodiments that are made by suitably combining technical means disclosed according to the different embodiments are also included in the technical scope of the present invention. A configuration in which constituent elements, described in the respective embodiments and having mutually the same effects, are substituted for one another is also included in the technical scope of the present invention.
The present invention can be used in a video insertion apparatus and a video display terminal apparatus.
Number | Date | Country | Kind |
---|---|---|---|
2018-067287 | Mar 2018 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/009107 | 3/7/2019 | WO | 00 |