The present invention relates to an image data output device that outputs an image used for displaying, a content creation device that creates content using this image, and a content reproduction device that causes this image or content using it to be displayed, and an image data output method, a content creation method, and a content reproduction method carried out by the respective devices.
A camera that can photograph an omnidirectional (360°) image or an extremely-wide-angle image close to it by a fisheye lens or the like has become familiar. When an omnidirectional image photographed by such a camera is employed as a display target and is allowed to be viewed with free point of view and line of sight through a head-mounted display or cursor operation, enjoyment of the image world with a high sense of immersion and presentation of the state of various places are enabled.
When the angle of view of the image used for displaying is set wider, more dynamic image representation becomes possible whereas the size of treated data increases more. Particularly in the case of a moving image, necessary resources increase in all phases such as transmission of a photographed image, recording of data, creation of content, and reproduction. For this reason, in an environment in which the resources are not sufficient, the situation in which the image quality at the time of displaying lowers or displaying does not follow change in the point of view or the line of sight possibly occurs.
Further, to acquire a photographed image with such a wide angle of view as to be incapable of being covered by the angle of view of one camera, images that are photographed by plural cameras and are different in the angle of view need to be connected. A technique of automatically connecting photographed images based on the positional relation among cameras, the individual angles of view, and so forth for this purpose is known. However, in the case of allowing the images connected in such a manner to be seen with free change in the line of sight, image distortion or discontinuity at a joint is visually recognized due to zooming up in some cases.
The present invention is made in view of such problems and an object thereof is to provide a technique of displaying a high-quality image by using an omnidirectional (360°) panorama photographed image.
A certain aspect of the present invention relates to an image data output device. The image data output device is an image data output device that outputs data of an image used for displaying. The image data output device includes a partial image acquiring unit that acquires a plurality of partial images that configure the image, an output image generating unit that generates data of an image to be output from the partial images after deciding a connection position of the partial images, a map generating unit that generates map data that indicates the connection position, and a data output unit that associates the data of the image to be output with the map data and outputs the data of the image to be output and the map data.
Another aspect of the present invention relates to a content creation device. The content creation device includes a data acquiring unit that acquires data of an image obtained by connecting a plurality of partial images and map data that indicates a joint of the partial images, a content generating unit that refers to the map data and corrects the image regarding the joint and then employs the corrected image as data of content, and a data output unit that outputs the data of the content.
Still another aspect of the present invention relates to a content reproduction device. The content reproduction device includes a data acquiring unit that acquires data of a plurality of partial images that configure an image used for displaying and map data that indicates a connection position of the partial images, a display image generating unit that refers to the map data and connects the partial images in a region corresponding to a line of sight to generate a display image, and a data output unit that outputs the display image to a display device.
Yet another aspect of the present invention relates to an image data output method. The image data output method includes the followings: by an image data output device that outputs data of an image used for displaying, a step of acquiring a plurality of partial images that configure the image; a step of generating data of an image to be output from the partial images after deciding a connection position of the partial images; a step of generating map data that indicates the connection position; and a step of associating the data of the image to be output with the map data and outputting the data of the image to be output and the map data.
A further aspect of the present invention relates to a content creation method. The content creation method includes the followings: by a content creation device, a step of acquiring data of an image obtained by connecting a plurality of partial images and map data that indicates a joint of the partial images; a step of referring to the map data and correcting the image regarding the joint and then employing the corrected image as data of content; and a step of outputting the data of the content.
A still further aspect of the present invention relates to a content reproduction method. The content reproduction method includes the followings: by a content reproduction device, a step of acquiring data of a plurality of partial images that configure an image used for displaying and map data that indicates a connection position of the partial images; a step of referring to the map data and connecting the partial images in a region corresponding to a line of sight to generate a display image; and a step of outputting the display image to a display device.
What are obtained by translating any combinations of the above constituent elements and expressions of the present invention among a method, a device, a system, a computer program, a recording medium in which a computer program is recorded, and so forth are also effective as aspects of the present invention.
According to the present invention, a high-quality image can be displayed by using a wide-angle photographed image.
To the content creation device 18, a display device 16a and an input device 14a used for creation of content by a content creator may be connected. To the content reproduction device 20, besides a display device 16b for viewing of an image by a content viewer, an input device 14b for carrying out operation to the content or the contents of displaying may be connected.
The image data output device 10, the content creation device 18, and the content reproduction device 20 establish communications through a wide area network such as the Internet or a local network such as a LAN (Local Area network). Alternatively, at least either of data provision from the image data output device 10 to the content creation device 18 and the content reproduction device 20 and data provision from the content creation device 18 to the content reproduction device 20 may be carried out through a recording medium.
The image data output device 10 and the imaging device 12 may be connected by a cable or may be wirelessly connected by a wireless LAN or the like. The content creation device 18 and the display device 16a and the input device 14a and the content reproduction device 20 and the display device 16b and the input device 14b may also be connected in either a wired or wireless manner. Alternatively, two or more devices in these devices may be integrally formed. For example, the imaging device 12 and the image data output device 10 may be combined to be formed as an imaging device or electronic equipment.
The display device 16b that displays an image reproduced by the content reproduction device 20 is not limited to a flat display and may be a wearable display such as a head-mounted display, a projector, and so forth. The content reproduction device 20, the display device 16b, and the input device 14b may be combined to be formed as a display device or an information processing device. As above, the appearance shape and the connection form of the various devices illustrated in the diagram are not limited. Further, in the case in which the content reproduction device 20 directly processes an original image from the image data output device 10 and generates a display image, the content creation device 18 does not have to be included in the system.
The imaging device 12 has plural cameras including plural lenses 13a, 13b, 13c, 13d, 13e . . . and imaging sensors such as CMOS (Complementary Metal Oxide Semiconductor) sensors corresponding to the respective lenses. Each camera photographs an image with an assigned angle of view. The mechanism that outputs an image obtained by light collection by each lens as two-dimensional luminance distribution is the same as a general camera. The photographed image may be either a still image or a moving image.
The image data output device 10 acquires pieces of data of photographed images output by the respective cameras and connects them to generate data of one original image. Here, the “original image” is the image that is the basis and is partly displayed or is displayed through being processed in some cases. For example, in the case in which an omnidirectional image is prepared and part thereof is displayed on a screen of a head-mounted display with a field of view corresponding to the line of sight of the viewer, this omnidirectional image is the original image.
In this case, for example, by introducing the imaging device 12 having four cameras having optical axes at intervals of 90° with respect to orientations in the horizontal direction and two cameras having optical axes vertically upward and vertically downward, images with angles of views obtained by dividing all orientations into sixths are photographed. Then, as in image data 22 in the diagram, the original image is generated by disposing a photographed image in the region corresponding to the angle of view of each camera and connecting the photographed images in an image plane in which the horizontal direction represents orientations of 360° and the vertical direction represents orientations of 180°. In the diagram, the images photographed by the six cameras are represented as “cam1” to “cam6,” individually.
The format of the image data 22 illustrated in the diagram is called equirectangular projection and is a general format used when an omnidirectional image is represented on a two-dimensional plane. However, the above description does not intend to limit the number of cameras and the format of data thereto, and the angle of view of the image obtained by the connection is also not particularly limited. Further, it is general that the joint of images is decided in consideration of the shape of an image that appears in the vicinity of this joint and so forth actually, and the joint does not necessarily become a straight line like those illustrated in the diagram. The image data 22 is subjected to compression encoding in a general format and thereafter is provided to the content creation device 18 through a network or recording medium.
In the case in which the imaging device 12 photographs a moving image, the image data output device 10 sequentially generates the image data 22 as an image frame in each time step and outputs the image data 22. The content creation device 18 generates content using the image data 22. The creation of content executed here may be all carried out by the content creation device 18 on the basis of a program prepared in advance or the like, or processing of at least part thereof may be manually carried out by a content creator.
For example, the content creator causes the display device 16a to display at least part of the image represented by the image data 22, and decides a region used for content and associates the image with a reproduction program or electronic game by using the input device 14a. The method for causing image data represented by the equirectangular projection to be displayed in various forms is a well-known technique. Alternatively, a moving image of the image data 22 may be edited by a general moving image editing application. The content creation device 18 itself may execute similar processing in accordance with a program created in advance or the like.
That is, the contents and purpose of the content created by the content creation device 18 are not limited as long as the image data 22 is used. Data of the content created in such a manner is provided to the content reproduction device 20 through a network or recording medium. The image data included in the content may have a configuration similar to that of the image data 22 or may be different in the data format or the angle of view. The image data may be what is obtained by executing some kind of processing on an image.
The content reproduction device 20 causes the display device 16b to display an image of content by executing processing of information provided as the content, and so forth, according to operation to the input device 14b by the content viewer, and so forth. Depending on the content, the point of view and the line of sight with respect to the display image may be changed according to operation to the input device 14b by the viewer. Alternatively, the point of view and the line of sight may be prescribed on the side of the content.
As one example, the content reproduction device 20 carries out mapping of the image data 22 on the inner surface of a celestial sphere centered at the content viewer who wears a head-mounted display and causes an image of the region in which the face of the content viewer is oriented to be displayed on a screen of the head-mounted display. When this is done, whichever direction the content viewer is oriented in, the content viewer can see the image world with a field of view corresponding to it and can obtain a feeling as if the content viewer entered this world.
Alternatively, a landscape or the like in the orientation of the movement destination may be allowed to be seen by employing a flat display as the display device 16b and moving a cursor displayed on it by the content viewer. When an image does not need to be edited or be associated with another piece of information, the content reproduction device 20 may acquire the image data 22 directly from the image data output device 10 and cause the display device 16b to display the whole or part thereof.
As above, the present embodiment is based on generating content and a display image by using the image data 22 obtained by connecting plural images independently acquired. Hereinafter, the photographed image before connection will be referred to as “partial image.” As described later, the partial image is not limited to a photographed image. Further, a region represented by a certain partial image may contain a region represented by another partial image. In this case, strictly the partial image of the latter is combined with or superimposed on the partial image of the former. Hereinafter, such a case will also be referred to as “connection” in some cases.
Further, it suffices that the image data output device 10 at least decides which position partial images are connected at. That is, depending on what kind of partial image is used, actual connection processing may be executed by the image data output device 10 itself or may be executed by the content creation device 18 or the content reproduction device 20. In either case, the image data output device 10 generates map data that indicates the connection position of partial images in the plane of the image after the connection and associates the map data with data of at least either of the image after the connection and the partial images before the connection to output them. Here, the “connection position” may be the position of the connection boundary (joint) or may be the position of the region occupied by the partial image.
In the case of connecting partial images photographed with different angles of view as in the form illustrated in
For this reason, when content that permits enlargement of an image is created, a desire to improve the quality of the content by correcting an image at the joint more strictly arises in the content creator. However, when the whole of a wide-angle image is seen, it is difficult for such trouble of the joint to be detected by the device or be noticed by the creator. When the image is displayed while being enlarged to such a degree that trouble of the joint can be detected or visually recognized, the possibility that the joint gets out of the field of view due to narrowing of the field of view becomes high, and it is difficult to find the place that should be corrected after all.
Thus, if the image data output device 10 outputs the map data that represents the connection position of partial images like the above-described one, enlargement of the image with an aim at the joint becomes possible on the side of the content creation device 18 and it becomes possible for the device or content creator to carry out processing and correction efficiently and without omission. The map data can also be used for purposes other than such processing and correction of an image. A specific example will be described later.
The CPU 23 controls the whole of the image data output device 10 by executing an operating system stored in the storing unit 34. Further, the CPU 23 executes various programs that are read out from a removable recording medium to be loaded into the main memory 26 or are downloaded through the communication unit 32. The GPU 24 has functions of a geometry engine and functions of a rendering processor and executes rendering processing in accordance with a rendering command from the CPU 23 to carry out output to the output unit 36. The main memory 26 is configured by a RAM (Random Access Memory) and stores program and data necessary for processing. The internal circuit configurations of the content creation device 18 and the content reproduction device 20 may also be similar.
The image data output device 10 includes a partial image acquiring unit 50 that acquires data of partial images from the imaging device 12, an output image generating unit 52 that generates data of the image that should be output, such as an image obtained by connecting partial images and a partial image itself, a map generating unit 56 that generates map data relating to connection, and a data output unit 54 that outputs the image data and the map data. The partial image acquiring unit 50 is implemented by the input unit 38, the CPU 23, the main memory 26, and so forth in
In the form in which the field of view of the camera that configures the imaging device 12 is changed as described later, the partial image acquiring unit 50 acquires data that indicates the angle of the optical axis of the camera together with data of the photographed image. Further, in the case in which an image other than the photographed image, such as character information or a figure, is used as part of a partial image, the partial image acquiring unit 50 may internally generate this image in accordance with instruction input by a user, or the like.
The output image generating unit 52 is implemented by the CPU 23, the GPU 24, the main memory 26, and so forth in
Alternatively, the output image generating unit 52 may only decide the partial images that should be connected and the connection position thereof, on the premise that the content creation device 18 or the content reproduction device 20 executes connection processing of the partial images. The map generating unit 56 is implemented by the CPU 23, the GPU 24, the main memory 26, and so forth in
The data output unit 54 is implemented by the CPU 23, the main memory 26, the communication unit 32, and so forth in
The content creation device 18 includes a data acquiring unit 60 that acquires image data and map data, a content generating unit 62 that generates data of content by using the acquired data, and a data output unit 64 that outputs the data of the content. The data acquiring unit 60 is implemented by the communication unit 32, the CPU 23, the main memory 26, and so forth in
The content generating unit 62 is implemented by the CPU 23, the GPU 24, the main memory 26, and so forth in
In the latter case, the content generating unit 62 causes the display device 16a to display the image and accepts correction and editing of the image input by the content creator through the input device 14a. In either case, the content generating unit 62 refers to map data and generates the image to be included in content. For example, the content generating unit 62 executes detection processing of image distortion or a discontinuous part regarding a predetermined region including a joint in image data resulting from connecting by the image data output device 10, and instructs the content creator to correct it or executes predetermined processing by itself.
Alternatively, the content generating unit 62 connects partial images provided from the image data output device 10 based on the map data. Further, the content generating unit 62 may newly generate a partial image. For example, an image that represents additional information (hereinafter, referred to as “additional image”) such as character information of explanation of a subject, subtitles, and so forth and a figure desired to be added to an image may be generated based on instruction input by the content creator, or the like. In this case, similarly to the map generating unit 56 of the image data output device 10, the content generating unit 62 generates map data that represents the position at which the additional image is combined (connected) in the plane of the image used for displaying and employs the map data as part of content data together with data of the additional image.
The content generating unit 62 may cause at least part of partial images provided from the image data output device 10 to be included in data of content together with map data without connection. Further, in the case in which the image data output device 10 provides an omnidirectional image photographed from plural points of view, the content generating unit 62 may acquire a three-dimensional model of the photographic place from this photographed image and the positional relation among the respective points of view and cause the three-dimensional model to be included in the content data. This technique is generally known as SfM (Structure from Motion). However, in the case in which processing of correcting discontinuity of a boundary, such as blending, has been executed for the connecting part of partial images, estimation of the distance of a subject whose image appears at the connecting part is difficult if nothing is done. Thus, the content generating unit 62 may clip the partial images before the correction on the basis of map data and carry out three-dimensional modeling of the subject regarding each of these partial images.
The data output unit 64 is implemented by the CPU 23, the main memory 26, the communication unit 32, and so forth in
The content reproduction device 20 includes a data acquiring unit 70 that acquires image data and map data or data of content, a display image generating unit 72 that generates a display image by using the acquired data, and a data output unit 74 that outputs data of the display image. The data acquiring unit 70 is implemented by the communication unit 32, the CPU 23, the main memory 26, and so forth in
The display image generating unit 72 is implemented by the CPU 23, the GPU 24, the main memory 26, and so forth in
A general technique can be applied to the method for displaying an image with the field of view corresponding to the point of view and the line of sight in a wide-angle image. Further, the display image generating unit 72 may complete the image as the basis of displaying through referring to map data and connecting and updating partial images in the region corresponding to the line of sight. Further, as described later, the display image generating unit 72 may execute processing of noise addition or the like for part of the display image and switch the partial images of the connection target in accordance with an instruction by the content viewer. The data output unit 74 is implemented by the CPU 23, the main memory 26, the output unit 36, and so forth in
The map data 80 is data of an image that represents the joints of such partial images by the difference in the pixel value. In this example, the pixel values of the regions of the partial images of “cam1,” “cam2,” “cam3,” “cam4,” “cam5,” and “cam6” are set to 2-bit values of “00,” “01,” “00,” “01,” “10,” and “10,” respectively. When the pixel values of partial images adjacent to each other are made different as above, it is found that the joint exists at the part across which there is the difference in the pixel value. The number of bits of the pixel value and the assignment are various depending on the number and arrangement of connected partial images.
(b) illustrates, as an output target, the image data 22 and map data 82 that represents lines of joints thereof. The image data 22 has the same configuration as (a). The map data 82 is data of an image that represents the lines themselves of the joints. For example, the map data 82 is created as a 1-bit white-black image in which the value of the pixels that represent these lines is set to 1 whereas the value of the other pixels is set to 0, or the like. The line that represents the joint may have a width corresponding to a predetermined number of pixels including the actual joint or may have a width of one pixel in contact with the inside or outside of the joint.
Alternatively, the part of these lines may be highlighted in the image data 22 itself, on the premise that the image is corrected by the content creation device 18 or the like. For example, the pixel value may be set larger by a predetermined ratio or the pixel color may be replaced by another color. Alternatively, in order to allow discrimination of each region of the map data, a semitransparent filling region may be superimposed, and the resulting image may be output. Further, in the case in which the joints are straight lines as illustrated in the diagram, the coordinates of the intersections of the straight lines may be output instead of the map data.
The content creation device 18 that has acquired such data carries out enlargement with a focus on, in the image data 22, the part across which there is a difference in the pixel value in the map data 80 or the part different from the surroundings in the pixel value in the map data 82. In addition, the content creation device 18 detects image distortion and discontinuity and carries out processing and correction by an existing filtering technique such as smoothing. Alternatively, the display device 16a is caused to display an enlarged image to allow the content creator to carry out processing and correction. This enlargement and correction is repeatedly carried out regarding all regions having a possibility of being displayed as content. This can generate a high-quality image efficiently and without omission. The content reproduction device 20 may execute similar processing for the display region.
In the example illustrated in the diagram, in image data 84 obtained by connecting images photographed by six cameras similarly to the image data 22 in
In the map data 88 in the example illustrated in the diagram, the pixel value of the regions that represent moving images in the image plane is set to “0,” and the pixel value of the regions that represent still images is set to “1.” However, by combining the information that represents joints, explained with
The output image generating unit 52 of the image data output device 10 identifies the partial image that may be set to a still image by taking the inter-frame difference of each moving image photographed by each camera of the imaging device 12. For example, the region corresponding to the moving image in which the total of the difference in the pixel value is equal to or smaller than a predetermined value between frames over the whole moving image is set to a still image. In the case in which the composition is fixed at a certain degree, such as the case in which a subject moves only at part of an indoor space free from motion or a vast space, regions treated as moving images and regions treated as still images may be set in advance. Then, part of partial images acquired as moving images is replaced by a still image.
The content creation device 18 or the content reproduction device 20 refers to the map data 88 and sequentially replaces images of the regions of the moving images in the image data 86 of the clock time t0 by frames of moving images of the subsequent clock times t1, t2, t3, . . . . This can generate data of a moving image obtained by combining the still images and the moving images. The content creation device 18 employs the whole or part of such a moving image as image data of content. Further, the content reproduction device 20 causes the display device 16b to display the whole or part of such a moving image.
In the case in which the display device 16b doubles as the image data output device 10, such as the case of a head-mounted display including the imaging device 12 and the image data output device 10, partial images that may be set to still images may be saved in a memory inside the image data output device 10. In this case, only data of the regions of moving images is transmitted from the image data output device 10 to the content creation device 18 or the content reproduction device 20 and necessary processing is executed. Then, the image data output device 10 combines the data with still images immediately before displaying. This can suppress the amount of data that should be transmitted.
In the case in which information on joints is included in map data as described above, the content creation device 18 may correct the joints of partial images to make the joints be less likely to be visually recognized as described with
The regions of moving images are clearly indicated by the map data 88. In addition, the other regions are replaced by still images. This can suppress the data size even with a wide-angle image like an omnidirectional image and save the necessary transmission band and storage area. Further, it suffices that only a partial region is updated in the content creation device 18 or the content reproduction device 20, and therefore, the load of the processing is reduced. Thus, it is also possible to enhance the resolution of the output image at a certain degree. As a result, a high-resolution moving image can be allowed to be viewed without delay even when it is a wide-angle image.
Here, suppose that regions involving motion move from regions of a heavy-line frame in the image data 92a to regions of a heavy-line frame in image data 92b. The regions involving motion can be detected based on the inter-frame difference of each moving image that configures the partial image as described above. In this case, the image data output device 10 outputs the image data 92b of the whole of the image plane in which the frames of the latest partial images, i.e., the frames of the partial images of a clock time t4, of the regions of the movement destination are included, new map data 94b for discrimination between the regions of moving images and the regions of still images, and pieces of image data 96d, 96e, 96f, . . . of the regions of moving images at subsequent clock times t5, t6, t7, . . . .
However, the regions that change in the period from the clock time t3 to the clock time t4 are nothing less than the regions of moving images in the plane of the image data 92b. Therefore, depending on the case, only the frames of the partial images of the clock time t4 may be output without outputting the image data 92b. Further, the size of the region of the moving image may change. Operation of the content creation device 18 and the content reproduction device 20 is basically the same as that described with
In the forms illustrated in
In this example, to a partial region “cam2” of an overall region “cam1” that is photographed by a wide-angle camera and is used for displaying in image data 100, an image photographed with a narrower angle of view and a higher resolution than it is connected. In this case, pieces of data output by the image data output device 10 are image data 102 photographed by the wide-angle camera, image data 104 photographed by a camera with a narrow angle of view and a high resolution, and map data 106 for discrimination between regions of both. The wide-angle image and the narrow-angle image may both be a moving image or still image, or either one may be a still image and the other may be a moving image.
In the example illustrated in the diagram, in the map data 106, the pixel value of the region of the wide-angle image in the image plane is set to “0,” and the pixel value of the region of the narrow-angle image is set to “1.” The region represented with the high resolution may be only one region as illustrated in the diagram or may be plural regions photographed by plural cameras. In this case, information for discrimination of images associated with the respective regions may be incorporated as the pixel value of the map data 106. Further, the region represented with the high resolution may be fixed or may be made variable.
Further, in the case of connecting partial images as illustrated in
The content creation device 18 or the content reproduction device 20 refers to the map data 106 and connects the image data 104 to the region that should be represented with the high resolution in the wide-angle image data 102. In this case, processing of replacing the low-resolution image in the relevant region of the image data 102 by the high-resolution image of the image data 104 is executed. Due to this, while image displaying with a wide field of view is permitted, a region having a high possibility of attracting a gaze can be represented in detail with the high resolution.
The content creation device 18 employs the whole or part of such an image as image data of content. Further, the content reproduction device 20 causes the display device 16b to display the whole or part of such an image. In the case in which information on joints is included in map data as described above, the content creation device 18 may correct the joints of partial images to make the joints be less likely to be visually recognized as described with
In the case in which the region of the high resolution is made variable, in response to pan operation of the camera 112 for the high-resolution image, the angle-of-view measurement part 114 measures the angle thereof and supplies the angle to the image data output device 10 together with data of a photographed image. The orientation of the camera 110 for the wide-angle image is fixed. For example, in the case in which the camera 112 for the high-resolution image carries out photographing with the optical axis set to directions of 180° in the horizontal direction and directions of 90° in the vertical direction in the omnidirectional image photographed by the camera 110 for the wide-angle image, the narrow-angle image data 104 is associated with the very center of the wide-angle image data 102 as illustrated in
The image data output device 10 deems this state as the basis and identifies the region to which the narrow-angle image data 104 should be connected in the plane of the wide-angle image data 102 on the basis of angle change of the pan direction of the camera 112 for the high-resolution image, to generate the map data 106. That is, when the camera 112 for the high-resolution image is caused to carry out pan operation, the map data 106 also becomes a moving image together with the narrow-angle image data 104. Further, in the case in which a moving image is also employed as the image data 102, the image data output device 10 outputs the three pieces of data illustrated in the diagram in time steps of the moving image. The pan operation itself may be carried out by the photographer according to the situation.
In such a form, it is desirable to form the imaging device 12 in such a manner that, as illustrated in a bird's-eye view of (b), a rotation center o of the pan operation of the camera 112 for the high-resolution image, i.e., the fixed point of variable optical axes 1, 1′, and 1″, is made to correspond with the optical center of the camera 110 for the wide-angle image. Due to this, when the wide-angle image data 102 is represented by the equirectangular projection, the angle of the pan direction represents, i.e., the position in the horizontal direction at which the narrow-angle image should be connected.
For example, in the case of providing a moving image of a concert, the viewer can enjoy a feeling of presence by causing the viewer to see the state of the whole of the venue including the audience. However, the data size of the content becomes enormous if high-resolution data is employed for all of the moving image. Due to this, the transmission band and the storage area become tight. In addition, the load of processing of decoding and so forth increases, which possibly causes latency. When the whole is set to a low resolution, the image quality appears to be lower than a general moving image. Even if the region in which the resolution is set high is changed on the side of the content reproduction device 20 according to the line of sight of the viewer, following the change in the line of sight is difficult due to the load of processing in some cases.
Thus, as described above, the map data 106 is generated and output on the premise that the whole is photographed with a low resolution whereas a region to which the viewer is highly likely to pay attention, such as a main performer, is photographed with a narrow angle and a high resolution and is combined later. Due to this, the data size is suppressed as a whole, and content that allows an image with a feeling of presence to be viewed without delay can be implemented while the influence on the appearance is suppressed to the minimum.
An additional image may be connected instead of the image with the narrow angle and the high resolution in the form illustrated in
In this example, as the additional image data 124, plural images in which explanatory sentences of the respective subjects are represented by different languages such as English and Japanese are prepared in such a manner as to be switchable. The contents represented by the additional information are not limited to the explanatory sentence and are enough when being necessary character information such as subtitles of voice of persons that appear in a moving image. Further, the additional information is not limited to characters and may be a figure or image. The wide-angle image data 122 that serves as the base may be either a still image or moving image. In the map data 126 illustrated in the diagram, the region of the wide-angle image in the image plane is set to white whereas the regions of the additional images are set to black. However, actually the latter regions are given a pixel value that indicates identification information of the corresponding additional image in the additional image data 124. In the case of switching plural languages, plural additional images are associated with one region.
Further, also in this form, in the case of connecting partial images as illustrated in
The content reproduction device 20 further displays a cursor 132 for specifying the additional image in the screen 128a. When the content viewer sets the cursor 132 on the additional image and makes a choice through pressing down an Enter button of the input device 14b, or the like, the content reproduction device 20 refers to the map data 126 again and replaces the additional image that should be displayed there by an additional image of another language. In the example illustrated in the diagram, a Japanese sentence 130b is displayed. In the case in which three or more languages are prepared, a list from which the viewer can make a choice may be additionally displayed, or the language may be switched in order every time the Enter button is pressed down. Further, the operation means for specifying the additional image and switching it to another language is not limited to the above-described one. For example, switching through touching a touch panel disposed to cover the display screen, or the like, may be employed.
In a general display form in which the angle of view of displaying corresponds with that of the original image given as the display target, a main subject exists around the center of the screen in many cases and therefore an explanatory sentence or subtitle rarely becomes an obstacle even when being fixedly displayed at a lower part of the screen or the like. On the other hand, in a form in which a wide-angle image is seen while the line of sight is freely changed, the flexibility of the position of the main subject relative to the screen is high. For this reason, when the display position of an explanatory sentence or subtitle is fixed, possibly the explanatory sentence or subtitle overlaps with the main subject and becomes difficult to see.
Further, in the case in which an explanatory sentence or the like is included in the original image data 122, it is also conceivable that, due to further addition of an image in which it is represented by another language, character strings are displayed in an overlapping manner and are unreadable. Thus, as described with
For example, as illustrated in a screen 128c in the diagram, even when the line of sight is moved from the screen 128b, an explanatory sentence 130c follows the motion and therefore does not become an obstacle to another subject. In addition, it also does not become unclear which subject the explanatory sentence 130c corresponds to as the additional information. Meanwhile, switching to another language can be easily carried out through operation by the viewer. The additional information is variously conceivable as described above. Therefore, the attribute to be switched is not limited to the language and may be a sentence itself or the color, shape, or the like of a figure. Further, displaying/non-displaying of the additional information may be switched.
In such a form, wide-angle images are two images 140a and 140b. Due to this, the data size becomes twice compared with the case of one image. The data size is suppressed by decimating data and halving the size in the vertical direction or the horizontal direction. However, the quality of displaying lowers due to the lowering of the resolution. Thus, increase in the data size is suppressed by generating a single image in a pseudo manner by use of distance information of the subject 140 or parallax information.
Specifically, between an image 142a photographed by the camera 12a of the left point of view as illustrated in the diagram and an image 142b photographed by the camera 12b of the right point of view, deviation attributed to the parallax arises at the position of an image of the same subject 140. Thus, for example, only the image 142a is employed as the output target. In addition, information that represents the image positional deviation on the image is output as additional data. At the time of displaying, the image in the output image 142a is displaced by the deviation amount to generate the image 142b in a pseudo manner and thereby images that similarly involve the parallax can be displayed with a small data size.
The amount of positional deviation of the image of the same subject in the two images depends on the distance from the imaging surface to the subject. Therefore, it is conceivable that what is generally called a depth image having this distance as the pixel value is generated and is output together with the image 142a. A method is widely known in which the distance to a subject is acquired from the amount of deviation of a corresponding point in images photographed from points of view having a known interval on the basis of the principle of triangulation and a depth image is generated. The distance value obtained as the depth image may be associated with channels of RGB (Red, Green, Blue) colors of the image 142a to make 4-channel image data. Further, the amount of deviation itself may be output instead of the distance value.
On the other hand, the image obtained by shifting the image of the subject in the image 142a does not sufficiently express the image 142b actually photographed by the other camera 12b in many cases. For example, as illustrated in the diagram, in the case in which light from a light source 144 is reflected and a specularly reflected component with high angle dependence is observed only at the point of view of the camera 12b, the luminance of a region 146 in the image 142b becomes high compared with the image 142. Further, there is also the case in which, depending on the shape of the subject 140, a part that is visible only from the point of view of the camera 12b exists and occlusion occurs in the image 142a. In stereoscopic viewing, not only the parallax but also such a difference in how an image looks between the left and right images greatly affects a feeling of presence.
Thus, similarly to the narrow-angle high-resolution image in
The image data output device 10a includes a stereo image acquiring unit 150 that acquires data of stereo images from the imaging device 12, a depth image generating unit 152 that generates a depth image from the stereo images, and a partial image acquiring unit 154 that acquires the difference between an image obtained by shifting one of the stereo images by the parallax and an actual photographed image as a partial image. The image data output device 10a also includes a map generating unit 156 that generates map data that represents the region with which the partial image is combined and a data output unit 158 that outputs image data of the one of the stereo images, data of the depth image, data of the partial image, and the map data.
The stereo image acquiring unit 150 is implemented by the input unit 38, the CPU 23, the main memory 26, and so forth in
The depth image generating unit 152 is implemented by the CPU 23, the GPU 24, the main memory 26, and so forth in
Alternatively, only a camera of one point of view may be employed as the imaging device 12, and the depth image generating unit 152 may estimate the distance of a subject by deep learning based on a photographed image and generate the depth image. The partial image acquiring unit 154 is implemented by the CPU 23, the GPU 24, the main memory 26, and so forth in
When the images 142a and 142b are defined as the first and second images, respectively, in the example of
The map generating unit 156 is implemented by the CPU 23, the GPU 24, the main memory 26, and so forth in
The data output unit 158 is implemented by the CPU 23, the main memory 26, the communication unit 32, and so forth in
The content reproduction device 20a includes a data acquiring unit 162 that acquires data of the first image, data of the depth image, data of the partial image, and map data, a pseudo image generating unit 164 that generates a pseudo image of the second image on the basis of the depth image, a partial image combining unit 166 that combines the partial image with the pseudo image, and a data output unit 168 that outputs data of a display image. The data acquiring unit 162 is implemented by the communication unit 32, the CPU 23, the main memory 26, and so forth in
Further, in the case in which a joint exists in the first image, the data acquiring unit 162 may identify the joint with reference to the acquired map data and correct the joint as appropriate similarly to the display image generating unit 72 in
The pseudo image generating unit 164 is implemented by the CPU 23, the GPU 24, the main memory 26, the input unit 38, and so forth in
The partial image combining unit 166 is implemented by the CPU 23, the GPU 24, the main memory 26, and so forth in
The data output unit 168 is implemented by the CPU 23, the GPU 24, the main memory 26, the output unit 36, and so forth in
The depth image generating unit 152 generates a depth image 172 by using the first image 170a and the second image 170b (S10). In the example illustrated in the diagram, the depth image 172 in a format in which the pixel is represented with higher luminance when the distance from the imaging surface is shorter is schematically illustrated. The amount of deviation of an image in the stereo images and the distance of the subject are basically in an inverse proportional relation, and therefore, both can be mutually converted. Subsequently, the partial image acquiring unit 154 shifts an image in the first image 170a on the basis of the depth image 172 or the amount of deviation of the image due to the parallax, identified when it is acquired, and generates a pseudo image 174 of the second image (S12a, S12b).
Then, the partial image acquiring unit 154 generates a differential image 176 between the pseudo image 174 and the original second image 170b (S14a, S14b). The difference is hardly generated if reflected light, occlusion, and so forth peculiar to the point of view of the second image do not exist. In the case in which an image peculiar to only a single point of view like the region 146 in
The partial image acquiring unit 154 clips a region of a predetermined range including the region 178 in the second image 170b as a partial image 180 (S16a, S16b). Meanwhile, the map generating unit 156 generates map data in which a region 182 of a partial image like one illustrated by a dotted line in the differential image 176 is given a pixel value different from the other region.
As the region clipped as the partial image by the partial image acquiring unit 154, a region in which the amount of deviation (parallax value) of an image in the stereo images has been obtained in a predetermined range from a subject desired to be highlighted in stereoscopic video to be displayed, or a rectangular region including it, or the like may be employed. This region may be decided by using a technique of known semantic segmentation in deep learning.
The data output unit 158 outputs the first image 170a in the stereo images, the depth image 172, the map data, and data of the partial image 180 to the content reproduction device 20 or a recording medium. In the content reproduction device 20, the pseudo image generating unit 164 generates the pseudo image 174 by processing of S10,
S12a, and S12b illustrated in the diagram, and the partial image combining unit 166 refers to the map data and combines the partial image 180 with the relevant place to restore the second image 170b.
Even in total, the depth image 172, the map data, and the partial image 180 are data with a remarkably-small size compared with the color data of the second image 170b. Therefore, the transmission band and the storage area can be saved. When the saved data capacity is allotted to the data capacity of the first image 170a and thereby the first image 170a is output with the high resolution thereof kept, high-quality stereoscopic video can be allowed to be seen with a free line of sight by using stereo images with a vast angle of view.
According to the present embodiment described above, in a technique in which images photographed by plural cameras different in the angle of view like an omnidirectional image are connected to be used for displaying, the provision source of image data outputs map data that represents the position at which the images are connected together with the image data. For example, when the map data that represents the connection place is output together with the image after the connection, image distortion and discontinuity that possibly occur due to the connection can be efficiently detected and corrected in the content creation device or the content reproduction device that has acquired the map data. Due to this, necessary correction can be carried out with a light load without omission and content with high quality can be easily implemented.
Further, in the case of photographing and displaying a moving image, the region other than a partial region involving motion is set to a still image, and map data that represents discrimination between the regions of a moving image and the still image is output together with an image of the whole region that is first frames. Due to this, if only data of the partial moving image is transmitted and processed in the subsequent time, the same moving image can be displayed with higher efficiency than the case in which the moving image of the whole region is treated as the processing target. At this time, by intentionally executing noise processing for the region of the still image, the possibility of giving a sense of discomfort to the viewer becomes low.
Alternatively, an image with a wide angle and a low resolution and an image with a narrow angle and a high resolution are photographed and map data that indicates a region represented by the high-resolution image in the low-resolution whole image is output together with image data of both, to allow both images to be combined at the time of content creation and at the time of displaying. This can suppress the data size compared with the case in which an image whose whole region is set to the high resolution is output, and can display an image with higher quality than the case in which an image whose whole region is set to the low resolution. Alternatively, an additional image of explanation of a subject, subtitles, or the like is combined with a wide-angle image. By representing a position suitable for the combining as map data at this time, the additional information can be continued to be displayed at an appropriate position that does not obstruct the original image even when the line of sight is freely changed. Further, the additional information can be freely switched and can be set to the non-displayed state.
Further, in a technique in which stereoscopic viewing is implemented by causing left and right eyes to see stereo images photographed from left and right points of view, the second image is allowed to be restored by displacing an image of a subject in the first image in the stereo images by the parallax, and the data size is reduced. At this time, data of a region on the second image in which occlusion or reflection that is not expressed by only the displacement of the image is caused is associated with map data that represents the position of this region and these pieces of data are output. Due to this, although data of the second image is excluded from the output target, an image close to it can be restored, and therefore, stereoscopic video without a sense of discomfort can be displayed.
Due to these forms, it is possible to solve problems of trouble of the joint, which becomes a bottleneck for seeing an omnidirectional image while freely changing the line of sight, increase in the data size, the position at which additional information is displayed, and so forth. As a result, dynamic image expression can be implemented without delay and the deterioration of the quality irrespective of whether or not there are many resources. Further, by associating the data of the image with the map data on the side that provides the image, adaptive processing is enabled at a subsequent given processing stage, and the flexibility of the display form is enhanced even with a photographed image.
The description is made above based on the embodiments of the present invention. The above-described embodiments are exemplification and it is understood by those skilled in the art that various modification examples are possible regarding combinations of the respective constituent elements and the respective processing processes thereof and such modification examples also fall within the scope of the present invention.
1 Content processing system, 10 Image data output device, 12 Imaging device, 14a Input device, 16a Display device, 18 Content creation device, 20 Content reproduction device, 23 CPU, 24 GPU, 26 Main memory, 32 Communication unit, 34 Storing unit, 36 Output unit, 38 Input unit, 40 Recording medium drive unit, 50 Partial image acquiring unit, 52 Output image generating unit, 54 Data output unit, 56 Map generating unit, 60 Data acquiring unit, 62 Content generating unit, 64 Data output unit, 70 Data acquiring unit, 72 Display image generating unit, 74 Data output unit, 150 Stereo image acquiring unit, 152 Depth image generating unit, 154 Partial image acquiring unit, 156 Map generating unit, 158 Data output unit, 162 Data acquiring unit, 164 Pseudo image generating unit, 166 Partial image combining unit, 168 Data output unit.
As described above, the present invention can be used for various devices such as a game machine, an image processing device, an image data output device, a content creation device, a content reproduction device, an imaging device, and a head-mounted direction, a system including it, and so forth.
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/JP2018/036542 | 9/28/2018 | WO | 00 |