The present invention relates to a video display apparatus and a video processing apparatus. This application claims priority based on JP 2018-170471 filed on Sep. 12, 2018, the contents of which are incorporated herein by reference.
With recent improvement in resolution of display apparatuses, display (image display) apparatuses capable of display in Ultra High Density (UHD) have been introduced. Among such UHD displays, display apparatuses capable of particularly high resolution display are used for 8K super-high vision broadcasting, which is television broadcasting with about 8000 pixels in the lateral direction, and practical utilization of this 8K super-high vision broadcasting has been advanced. For effective performance of such ultra high resolution display, display apparatuses tend to increase in size.
While a network of a wide band is required for transmission of video signals of such ultra high resolution, practical utilization of transmission of video signals of ultra high resolution is in the process of being enabled with the use of optical fiber networks and advanced wireless networks.
Such ultra high resolution display apparatuses are capable of using an abundant amount of information that can be provided to viewers, to thereby be able to provide videos with sense of presence. Video communication using such a video with good immersive feeling is also under study.
NPL 1: Ministry of Internal Affairs and Communications, “Current State about Advancement of 4K and 8K”, website of the MIC
<https://www.soumu.go.jp/main_content/000276941.pdf>
In a case of performing communication using video, sense of presence is increased in a case that a video of a communication partner displayed on a display apparatus is displayed so as to directly face a user performing the communication and to establish eye-to-eye contact between the user and the communication partner. However, a large display apparatus causes significant restriction on video camera apparatuses. This comes from a problem that sense of presence is decreased because the display apparatus does not allow light to pass through, so it is not possible to capture images by a video camera apparatus from behind the display apparatus, and that the video camera apparatus comes, in a case of being disposed on a front face side of the display apparatus, to exist between the user and a video displayed on the display apparatus. This is described using
One aspect of the present invention has been made in view of the above problems and discloses an apparatus and a configuration thereof that use multiple video camera apparatuses arranged outside a display area of a display apparatus, use a video processing apparatus in a network to generate a video of an arbitrary view point from videos captured by the multiple video camera apparatuses, and display the generated video on a display apparatus of a communication partner, to thereby enable video communication with good immersive feeling.
(1) In order to achieve the object described above, one aspect of the present invention provides a video display apparatus for communicating with one or more video processing apparatuses, the video display apparatus including: a video display unit; multiple video camera units; a synchronization controller; and a controller, wherein each of the multiple video camera units is installed outside the video display unit, the synchronization controller synchronizes shutters of the multiple video camera units, the controller transmits, to any one of the one or more video processing apparatuses, camera capability information indicating capability of each of the multiple video camera units, camera arrangement information indicating an arrangement condition of the multiple video camera units, display capability information indicating video display capability of the video display unit, and video information obtained through capturing by each of the multiple video camera units, and video information transmitted from any one of the one or more video processing apparatuses is received and the video information is displayed on the video display unit.
(2) In order to achieve the object described above, one aspect of the present invention provides the video display apparatus, wherein the camera arrangement information includes location information of each of the multiple video camera units relative to a prescribed point being used as a reference in the video display unit included in the video display apparatus and includes information on an optical axis of each of the multiple video camera units with respect to a display surface of the video display unit being used as a reference.
(3) In order to achieve the object described above, one aspect of the present invention provides the video display apparatus, wherein the camera capability information includes information on a focal length and a diaphragm of a lens configuration used by each of the multiple video camera units.
(4) In order to achieve the object described above, one aspect of the present invention provides the video display apparatus, wherein the display capability includes at least one of information on a size of the video display unit included in the video display apparatus, information on a possible resolution displayable by the video display unit, information on a possible color depth displayable by the video display apparatus, and information on arrangement of the video display unit.
(5) In order to achieve the object described above, one aspect of the present invention provides the video display apparatus, wherein the controller receives configuration information of each of the video camera units from any one of the one or more video processing apparatuses and configures each of the multiple video camera units in accordance with the configuration information.
(6) In order to achieve the object described above, one aspect of the present invention provides the video display apparatus, wherein in a case that multiple values are configurable in each of at least two of the display capability information, the camera capability information, and the camera arrangement information, combinations of values of the display capability information, the camera capability information, and the camera arrangement information to be transmitted to the video processing apparatus are partially restricted.
(7) In order to achieve the object described above, one aspect of the present invention provides a video processing apparatus for communicating with multiple video display apparatuses including a first video display apparatus and a second video display apparatus, the video processing apparatus including: receiving, from the first video display apparatus, camera capability information indicating capability of multiple video camera units, camera arrangement information indicating an arrangement condition of the multiple video camera units, display capability information indicating video display capability of the video display unit, and video information obtained through capturing by each of the multiple video camera units; generating an arbitrary view point video from the video information thus received; and transmitting the arbitrary video view point video to the second video display apparatus.
(8) In order to achieve the object described above, one aspect of the present invention provides the video processing apparatus, wherein in a case that multiple values are configurable in each of at least two of the display capability information, the camera capability information, and the camera arrangement information, a combination of the display capability information, the camera capability information, and the camera arrangement information is restricted.
According to one aspect of the present invention, by transmitting video information obtained through capturing by each of multiple video camera units to a video processing apparatus, receiving video information of video of an arbitrary view point transmitted from the video processing apparatus, and displaying the video information by a video display unit, video communication using video with good immersive feeling is enabled, and this enhances user experience.
Hereinafter, a radio communication technique according to an embodiment of the present invention will be described in detail with reference to the drawings.
An embodiment of the present invention will be described in detail below using the drawings.
The communication between the video display apparatus 101 and the video display apparatus 102 includes a data flow of inputting, to the video processing apparatus 1-104, the display capability information, the camera capability information, and the camera arrangement information from the video display apparatus 101 and video information obtained through capturing by the multiple cameras installed on the video display apparatus 101, using light field data generated by the video processing apparatus 1-104 to generate video data of an arbitrary view point by the video processing apparatus 2-105, and displaying the generated video data of the arbitrary view point in the video display apparatus 102, and a data flow of inputting, to the video processing apparatus 1-104, the display capability information, the camera capability information, and the camera arrangement information from the video display apparatus 102 and video information obtained through capturing by the multiple cameras installed on the video display apparatus 102, using light field data generated by the video processing apparatus 1-104 to generate video data of an arbitrary view point by the video processing apparatus 2-105, and displaying the generated video data of the arbitrary view point by the video display apparatus 101. The two data flows are constituted of equivalent processing. Hence, the following description describes the data flow from the video display apparatus 101 toward the video display apparatus 102, and description of the data flow from the video display apparatus 102 toward the video display apparatus 101 is omitted.
The camera arrangement information of each of the video display apparatuses 101 and 102 may include an arrangement condition of each of the multiple video camera units 303 to 310 included in the corresponding one of the video display apparatuses 101 and 102. As an example of an arrangement position of the video camera unit 304, which is one of the multiple video camera units 303 to 310, a relative position information of the central position of a front principal point of a lens included in the video camera unit 304 with respect to the central position of the video display unit 302 may be included. Alternatively, a particular point other than the central position may be used as a reference. As this relative position information, a distance 314 in the vertical direction and a distance 315 in the horizontal direction from the central position of the video display unit 302 to the central position of the front principal point of the lens may be used. A relationship from the central position of the video display unit 302 to the central position of the front principal point of the lens may be in a polar coordinate format. The camera arrangement information may also include information on the direction of the optical axis of the lens, and the specification and the configuration of the lens included in each of the video camera units 303 to 310. As an example, an angle (θ, φ) 317 representing the angle of the optical axis of the lens 316 with respect to the vertical direction of a surface of the video display apparatus 302, a focus length f 318 and a diaphragm configuration a 319 of the lens 316, and information F (F value) (not illustrated) on the brightness of the lens 316 may be included in the camera arrangement information. The focus length f 318 and the diaphragm configuration a 319 of the lens 316, and the information F (F value) on the brightness of the lens 316, which indicate the lens configuration may be included in the camera capability information. In the present embodiment, it is assumed that the front principal point of the lens included in each of the video camera units 303 to 310 is arranged on the same plane as that of the video display unit 302. However, no limitation is intended, and the front principal point of the lens need not necessarily be arranged on the same plane as that of the video display unit 302. In a case that each of the video camera units 303 to 310 includes a zoom lens, the position of the front principal point of the lens 316 may be changed as the angle of view for capturing changes. In such a case, information on the position of the front principal point of the lens 316 may be included in camera position information. The information on the position of the front principal point of the lens 316 may use the relative distance of the video display unit 320 from the plane or may be another location information. The positional relationship between the lens 316, the video display unit 302, and the lens 316 may be represented by a value using, as a reference, the position of a flange back or an image sensor, without being limited to the front principal point of the lens 316. The camera capability information may include capability about an imaging element included in each of the video camera units. Examples of such information include information on one of or multiple possible resolutions of a video signal for output by each of the video camera units, possible color depths for output, and a color filter array to be used, information on imaging element array.
The arrangement positions of the video camera units 303 to 310 with respect to the video display unit 302 may be determined in advance. As an example, the arrangement positions may be determined based on the size of the video display unit 302 and the number of video camera units to be used. The size of the elements to be used as the video display unit 302 may be standardized, and positions usable as arrangement positions for the video camera units may be defined based on the size of the elements of the video display units, and arrangement positions to be used may be indicated among the usable positions. One or some of the video camera units 303 to 310 may be configured to be movable to configure multiple usable optical axes, and information on the usable optical axes may be included in the camera capability information.
428 denotes an interface unit for connecting the video display apparatus 101 and the network 103 and has a configuration conforming to a scheme used by the network 103. In a case that the network 103 is a wireless network, a wireless modem may be used. In a case that the network 103 uses the Ethernet (trade name), an Ethernet (trade name) adapter may be used. The controller 421 is configured to control all the other blocks and communicate with the video processing apparatus 1-104, the video processing apparatus 2-105, and the video display apparatus 102 via the communication controller 422 to exchange control data with each of the apparatuses. The control data includes display capability information, camera capability information, and camera arrangement information.
Next, a method in which the video processing apparatus 1-104 and the video processing apparatus 2-105 use multiple pieces of data output from the video display apparatus 101 to generate video data to be used for display by the video display apparatus 102. In the present example, a light field is used to obtain a video of an arbitrary view point. The light field is a collective expression of rays in a certain space and is generally expressed as a set of four or more dimensional vectors. In the present embodiment, a set of four-dimensional vectors, also referred to as Light Slab, is used as light field data. An overview of the light field data used in the present embodiment will be described using HG. 5. As illustrated in
Calculations are also possible for a video of the light field data L′ captured by a video camera for which a virtual lens, diaphragm, and imaging element are configured in a similar manner. An example will be described with reference to
The light field data L′ is a set of data coming to various locations from various directions, and an apparatus called a light field camera is typically used to obtain light field data through capturing. While various types of light field camera have already been proposed, an overview of a type using a microlens array will be described using
Rays 606 that pass through the primary lens 601 and pass through a particular lens of the microlens array 602 reach particular positions of the imaging element 603. The positions are determined depending on the specification of the primary lens 601 and the positional relationship of the primary lens 601, the microlens array 602, and the imaging element 603. Assuming a condition where a point 609 on a plane 604 brings rays to focus on the microlens array 602 for simplicity, a ray passing through a point 610 on another plane 605 and then the point 609 on the plane 604 passes through the primary lens 601 and the microlens array 602 to reach a point 607 on the imaging element 603. A ray passing through a point 611 on the plane 605 and then the point 609 on the plane 604 passes through the primary lens 601, and the microlens array 602 to reach a point 608 on the imaging element 603. This means that a ray reaching a point p1(x1, y1) on the imaging element 601 can be expressed by using the light field data L′ including the plane 604 and the plane 605, as follows.
p
1(x1,y1)=F1·L′(x,y,u,v) (Equation 1)
F1 is a matrix determined by the specifications of the primary lens 601, the microlens array 602, and the imaging element 603, and the positional relationship of the primary lens 601, the microlens array 602, and the imaging element 603. This means that, using such a light field camera, it is possible to generate light field data in a capturing range in the imaging element 603.
The video camera units 303 to 310 included in the video display apparatuses 101 and 102 used in the present embodiment are not capable of capturing videos of such an angle of view that users illustrated in
An example of a configuration of equipment used during learning of a neural network is illustrated in
Because the size of the light field data, which is an output from the neural network, is large in comparison with inputs to the neural network, in other words, outputs from the video camera units 702 and 703, learning in the neural network may not progress. As a countermeasure to such a situation, restriction may be imposed on the light field data output from the neural network. As a result, the size of the light field data can be reduced, and the learning efficiency in the neural network can be increased. Various methods are conceivable for this restriction, and any method may be used as long as restriction can be imposed on the positions and directions of rays included in a light field as a result. As an example, any of methods such as restricting the position, the optical axis, and the angle of view of a virtual video camera used in generating an arbitrary view point video that is generated using the light field, or restricting the resolution and color depth of an arbitrary view point video to be generated. Some conditions may be configured for signals to be input to the neural network, in other words, outputs from the video camera units 702 and 703 to increase the learning efficiency of the neural network. As an example, restriction may be imposed on arrangement conditions for the light field camera 701 and the video camera units 702 and 703 and the configurations of the video camera units to be used for supervised data. In other words, restriction may be imposed on the number of video cameras used as the video camera units, the arrangement condition configured for each video camera (such as a relative position from the center of the video display unit of each of the video display apparatuses 101 and 102, a relative position from the arrangement position of each of the video display apparatuses 101 and 102, and the inclination of the optical axis with respect to a vertical direction of the video display unit), a lens configuration (such as a focal length and the amount of diaphragm) of each video camera, and the like. As a restriction method, possible values that can be taken for each of the number of video cameras used as the video camera units, the position at which each video camera can be arranged, the direction in which the optical axis can be configured, the focal length that can be configured, and a diaphragm configuration that can be configured may be determined in advance, and only any of the values may be used. Combinations of possible values may be restricted for at least two parameters among the number of video cameras used as the video camera units, the position at which each video camera can be arranged, the direction in which the optical axis can be configured, the focal length that can be configured, and the diaphragm configuration that can be configured. At least one of these parameters may be associated with the size of the video display unit included in each of the video display apparatuses 101 and 102. In this case, possible values for the size of the video display unit may also be determined in advance.
Note that, in a case that these parameters are handled by the video processing apparatus 1-104 and it is indicated that either the camera capability information or the camera arrangement information obtained from the video display apparatus 101 corresponds to multiple configurations, information indicating a configuration to be used may be transmitted to the video display apparatus 101 to indicate the configuration to be used by the video display apparatus 101. In a case that each of the camera capability information, the camera arrangement information, and the display capability information may take multiple values, combinations of values possible to be processed by the neural network may be restricted in advance, and information indicating that some combinations are not possible except for the combinations possible to be processed may be transmitted to the video display apparatus 101. In a case that there is a combination possible for approximation, the combination for approximation may be used instead of indicated combinations. The use of the combination for approximation may be notified.
After the advancement of the learning in the neural network, the learning unit 705 transmits the weights of the neural network to an accumulation unit 706 to accumulate a learning result. At this time, a learning result may be accumulated for each of or each combination of the values such as the number of video cameras used as the video camera units, the position at which each video camera can be arranged, the direction in which the optical axis can be configured, the focal length that can be configured, and the diaphragm configuration that can be configured. The learned weights thus accumulated are transmitted to the video processing apparatus 1-104. The means for transmitting the weights to the video processing apparatus 1-104 is not particularly limited, and the weights may be transmitted using some kind of network or may be transmitted using a physical portable recording medium. The system including the learning unit 705 illustrated in
The video processing apparatus 1-104 includes a neural network similar to the neural network used by the learning unit 705, and uses the weights obtained from the accumulation unit 706 to generate light field data from at least one of the display capability information, the camera capability information, and the camera arrangement information transmitted from the video display apparatus 101 and video information obtained through capturing and transmitted from the video display apparatus 101. In a case that the weights obtained from the accumulation unit 706 change based on at least one of the display capability information, the camera capability information, and the camera arrangement information transmitted from the video display apparatus 101, light field data is generated by using the weights corresponding to the parameter on which the change is based. In a case that the video information obtained through capturing and transmitted from the video display apparatus 101 is of multiplexed videos captured by multiple video camera units, demultiplexing processing is performed, and signals output from video camera units having a similar arrangement as video camera arrangement used during learning in the neural network are input to the neural network. In a case that voice data is multiplexed on a signal transmitted from the video display apparatus 101, demultiplexing may be performed on the signal including the voice data at the time of demultiplexing, and signals other than the video data including the voice data may be transmitted to the video processing apparatus 2-105. Control information other than the video data and the voice data, for example, control information such as the display capability information, the camera capability information, and the camera arrangement information, may be transmitted to the video processing apparatus 2-105. In a case that the video information obtained through capturing and transmitted from the video display apparatus 101 is video-coded, complex processing is performed, and a signal obtained as a result of the decoding is input to the neural network.
The light field data generated by the video processing apparatus 1-104 is input to the video processing apparatus 2-105. The video processing apparatus 2-105 generates video data of an arbitrary view point in the manner illustrated in
The video processing apparatus 2-105 generates video data of the arbitrary view point by using the configured arbitrary view point and also using, in a case that the virtual video camera is configured, the configuration of the virtual video camera. The resolution of the video data of the arbitrary view point generated at this time may be configured based on the display capability information of the video display apparatus 102. The resolution of the video data of the arbitrary view point may be configured by configuring sampling intervals of the light field data. The generated video data of the arbitrary view point is video-coded, and in a case that voice data is input from the video processing apparatus 1-104, the coded video data and the voice data are multiplexed and transmitted to the video display apparatus 102.
The video display apparatus 102 receives the video data of the arbitrary view point and the voice data thus multiplexed, the received data passes through the network interface unit 428 and the communication controller 422, the demultiplexing unit 423 separates the coded video data and the coded voice data. The coded video data is decoded by the video decoder 424, and the resultant data is displayed by the video display unit 425. The coded voice data is decoded by the voice decoder 426, and the resultant data is output by the voice output unit 427 as voice.
With the above-described operation, by generating video data of an arbitrary view point by using video data obtained through capturing by each of the multiple video camera units 303 to 310 arranged outside the video display unit 302 of each of the video display apparatuses 101 and 102, it is possible to generate video data of an arbitrary view point with users directly facing while sandwiching the video display apparatuses 101 and 102 and to hence perform video communication with good immersive feeling.
Note that equivalent configurations may be made for the multiple video camera units 303 to 310 for capturing, but different configurations may be made for the multiple video camera units 303 to 310 to generate light field data. This is because, in a case that the performance of each of the multiple video camera units 303 to 310 included in each of the video display apparatuses 101 and 102 is lower than the performance of the light field camera 701 used during learning, capturing videos by changing the configurations of the multiple video camera units 303 to 310 allows generation of light field data close to the performance of the light field camera 701 in some cases. As an example, in a case that the color depth of the data obtained through capturing by each of the multiple video camera units 303 to 310 included in each of the video display apparatuses 101 and 102 is lower than that of the light field camera 701, the multiple video camera units 303 to 310 may be divided into multiple groups, and the groups may be changed in diaphragm configuration to configure a group having a diaphragm configuration suitable for a scene with high illuminance and a group having a diaphragm configuration suitable for a scene with low illuminance. For example, video capturing may be performed with the video camera units 303, 305, 307, and 309 having a narrow diaphragm configuration to use the configuration suitable for a scene with high illuminance and the video camera units 304, 306, 308, and 310 having an open diaphragm configuration to use the configuration suitable for a scene with low illuminance. In a case of employing such configurations, learning by the learning unit 705 is performed by using similar configurations as those of the video camera units 303 to 310 described above with respect to the diaphragm configuration and arrangement of each of the video camera units (702, 703, and the camera units omitted in illustration) to use during learning by the neural network using the light field camera 701. With learning being advanced in this state, light field data output by the neural network results in that close to the performance of the light field camera 701. The video display apparatus 101 may be configured to make the configurations of the video camera units 303 to 310 by the video processing apparatus 1-104, and the video processing apparatus 1-104 may use camera capability information and camera arrangement information that are received from the video display apparatus 101 to make the configurations of the video camera units 303 to 310 of the video display apparatus 101.
By making different configurations for the respective video camera units 303 to 310 as described above, it is possible to increase the quality of light field data generated by the video processing apparatus 1-104 and improve the quality of video data of an arbitrary view point generated by the video processing apparatus 2-105, to thereby be able to perform video communication with good immersive feeling. Different configurations for the respective video camera units 303 to 310 may be made for other parameters such as focal length, and color depth and the resolution of video data to be output, in addition to the diaphragm configuration.
The present embodiment generates video data of an arbitrary view point by using surface data, instead of generating video data of an arbitrary view point by using light field data in the first embodiment.
Each of video display apparatuses 101 and 102 has a configuration equivalent to that in the first embodiment. The processing of the video processing apparatus 1 is changed, and a parallax map is created using video data obtained through capturing by multiple video camera units 303 to 310 of the video display apparatus 101 to generate a 3D surface model, based on the parallax maps. Texture data is generated based on the video data obtained though the capturing by each of the multiple video camera units 303 to 310 on the 3D surface model, and the 3D surface model, the texture data, and voice data transmitted from the video display apparatus 101 are transmitted to the video processing apparatus 2. The processing of the video processing apparatus 2 is also changed, video data of an arbitrary view point is generated as 3DCG video from the 3D surface model and the texture data received from the video processing apparatus 1 and information of configured virtual cameras to be coded, and voice data transmitted from the video display apparatus 101 is multiplexed on the 3DCG video to transmit the multiplexed data to the video display apparatus 102.
With the above-described operation, by generating video data of an arbitrary view point by using video data obtained through capturing by each of the multiple video camera units 303 to 310 arranged outside the video display unit 302 of each of the video display apparatuses 101 and 102, it is possible to generate video data of an arbitrary view point with users directly facing while sandwiching the video display apparatuses 101 and 102 and to hence perform video communication with good immersive feeling.
A program running on an apparatus according to the present invention may serve as a program that controls a Central Processing Unit (CPU) and the like to cause a computer to operate in such a manner as to realize the functions of the above-described embodiments according to the present invention. Programs or the information handled by the programs are temporarily stored in a volatile memory such as a Random Access Memory (RAM), a non-volatile memory such as a flash memory, a Hard Disk Drive (HDD), or any other storage device system.
Note that a program for realizing the functions of the embodiments according to the present invention may be recorded in a computer-readable recording medium. This configuration may be realized by causing a computer system to read the program recorded on the recording medium for execution. It is assumed that the “computer system” refers to a computer system built into the apparatuses, and the computer system includes an operating system and hardware components such as a peripheral device. Furthermore, the “computer-readable recording medium” may be any of a semiconductor recording medium, an optical recording medium, a magnetic recording medium, a medium dynamically retaining the program for a short time, or any other computer readable recording medium.
Furthermore, each functional block or various characteristics of the apparatuses used in the above-described embodiments may be implemented or performed on an electric circuit, for example, an integrated circuit or multiple integrated circuits. An electric circuit designed to perform the functions described in the present specification may include a general-purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), or other programmable logic devices, discrete gates or transistor logic, discrete hardware components, or a combination thereof. The general-purpose processor may be a microprocessor or may be a processor of known type, a controller, a micro-controller, or a state machine instead. The above-mentioned electric circuit may include a digital circuit, or may include an analog circuit. Furthermore, in a case that with advances in semiconductor technology, a circuit integration technology appears that replaces the present integrated circuits, one or more aspects of the present invention can use a new integrated circuit based on the technology.
Note that the invention of the present patent application is not limited to the above-described embodiments. In the embodiments, apparatuses have been described as an example, but the invention of the present application is not limited to these apparatuses, and is applicable to a terminal apparatus or a communication apparatus of a fixed-type or a stationary-type electronic apparatus installed indoors or outdoors, for example, an AV apparatus, office equipment, a vending machine, and other household apparatuses.
The embodiments of the present invention have been described in detail above referring to the drawings, but the specific configuration is not limited to the embodiments and includes, for example, an amendment to a design that falls within the scope that does not depart from the gist of the present invention. Various modifications are possible within the scope of the present invention defined by claims, and embodiments that are made by suitably combining technical means disclosed according to the different embodiments are also included in the technical scope of the present invention. Furthermore, a configuration in which constituent elements, described in the respective embodiments and having mutually the same effects, are substituted for one another is also included in the technical scope of the present invention.
The present invention is applicable to a video display apparatus and a video processing apparatus.
Number | Date | Country | Kind |
---|---|---|---|
2018-170471 | Sep 2018 | JP | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2019/035160 | 9/6/2019 | WO | 00 |