This application is based upon and claims the benefit of priority from Japanese Patent Application No. 2010-252372, filed on Nov. 10, 2010; the entire contents of which are incorporated herein by reference.
Embodiments described herein relate generally to an image processing apparatus, an image processing method, and a computer program product.
Conventionally, to display a two-dimensional image three-dimensionally, there are technologies of adding information about depth to the two-dimensional image. According to one of the conventional technologies, for example, from a distribution of high-frequency components in an upper part and a lower part of a two-dimensional image, a composition ratio to a depth model that is preliminarily prepared is calculated, and a rough depth of the whole of the image is obtained from a result of the calculation. Moreover, it is proposed that the depth is to be corrected by superimposing a red color signal (R signal) in a two-dimensional image onto a rough depth.
According to one embodiment, an image processing apparatus includes a detecting unit configured to detect at least one object included in an image; a selecting unit configured to select a depth model to be a base of information about depth of the object in accordance with a property of the object; a segment unit configured to segment an area of the object detected from the image; and a depth map creating unit configured to create a depth map representing a depth of the image. The depth map creating unit arranges the depth model at a position on the depth map corresponding to a position of the segmented area of the object in the image, compares an area of the depth model and the area of the object, and gives a corrected depth value to a position not superimposed on other.
Various embodiments will be described hereinafter with reference to the accompanying drawings.
First of all, an image processing apparatus, an image processing method, and a computer program product thereof according to a first embodiment are explained below in detail with reference to the drawings. Explanations described below assume the following items (1) to (4). However, the present disclosure is not limited by these items.
(1) It is assumed that an upper left corner of the image is to be the origin, the traverse direction (horizontal direction) is an x axis, and the longitudinal direction (vertical direction) is a y axis. However, a coordinate system to be set about an image is not limited to this. A pixel value at coordinates (x, y) in the image is expressed as P(x, y). A pixel value P that indicates the brightness or a color component of a pixel is acceptable. For example, brightness, lightness, and specific color channel are applicable to such pixel value P.
(2) The depth map is data that represents depth of each pixel in an image. The depth map has the origin at the upper left corner of the map, the x axis in the traverse direction (horizontal direction), and the y axis in the longitudinal direction (vertical direction). However, a coordinate system to be set about the depth map is not limited to this. A pixel value at coordinates (X, Y) on the depth map is expressed as Z(X, Y). The pixel value Z is information indicating the depth of each pixel (depth information). For example, the larger the pixel value Z, the farther the depth of the pixel is.
(3) Coordinates on an image correspond to coordinates on a depth map one to one. According to the present disclosure, unless otherwise specifically described, the size of an image is equal to the size of a depth map. Moreover, coordinates (x, y) on an image and coordinates (X, Y) on a depth map correspond to each other. In other words, x=X, and y=Y are held.
(4) Unless otherwise specifically described in the present disclosure, the pixel value P on an image is to be described as “pixel value”, and a range of the value is [0, 255] (between 0 and 255). Furthermore, the pixel value Z on a depth map is to be described as “depth value”, and a range of the value is [0, 255] (between 0 and 255).
Next, an image processing apparatus 1 according to the first embodiment is explained below in detail with reference to the drawings.
Into the image processing apparatus 1, a two-dimensional image is input (hereinafter, “input image”). Any device or any medium can be applied to an input source of an input image. For example, it is preferable that input data can be input from a recording medium, such as a Hard Disk Drive (HDD), a Digital Versatile Disk Read-Only Memory (DVD-ROM), or a flash memory. Moreover, it is preferable that input data can be input from an external device connected via a network, such as a video recorder, a digital camera, or a digital video camera. Furthermore, into the image processing apparatus 1, image data can be input from a receiver that receives television broadcasting via wireless or wired communication.
Furthermore, the format of an input image 100 is not necessarily a two-dimensional (2D) image. For example, input image 100 can be a stereoscopic image, such as a side-by-side format or a line-by-line format, or an image in multiple-viewpoint format. In such case, an image of one of the viewpoints is treated as an image to be processed.
The base depth input unit 11 receives a base depth that a depth value Z is set on each pixel in the whole of the map of the same size as the input image. The base depth is, for example, data of a three-dimensional spatial structure having depth. Depth information included in the base depth is expressed by numeric value per pixel (depth value Z), for example. Such base depth can be used as basic data of depth when creating the depth map about the input image. The base depth can be stored in the base depth storage unit 16. Preferably the base depth storage unit 16 preliminarily stores one or more patterns of base depths as templates. The base depth input unit 11 specifies a template of the base depth appropriate to the input image by analyzing the input image, and acquires the template from the base depth storage unit 16.
Specification of base depth can be performed based on a spatial structure that is specified or estimated from the input image. According to the specifying method, a spatial structure of the input image is specified or estimated from, for example, an area of the ground or the floor, or an area of the sky or the ceiling, in the input image. The base depth appropriate to the spatial structure is then specified from the base depth storage unit 16. However, not limited to the specifying method, the base depth can be acquired by using various methods.
Not limited by this, the base depth of the depth value Z that is uniform overall can be used. In such case, the depth value Z to be set can be variously modified, for example, to a depth value Z indicating the farthest position, a depth value Z that is created at random provided it is larger than the maximum value of depth values Z of pixels in a corrected depth map described later (see a section (g) in
The detecting unit 12 detects at least one object in the input image. Through detection of the object, the type of an object can be detected in addition to the position and the area of the object (shape, size, and the like). A generally known method can be used for the detection of object. Among existing detection methods, there is a method of classifying the object from the input image, for example, by using a classifier for object detection. However, not limited to this, various detection methods are applicable. The detecting unit 12 can detect segmented object areas that the object is segmented into a plurality of areas. For example, a method of segmenting the object into units of object is conceivable.
The selecting unit 13 selects at least one depth model corresponding to the object detected by the detecting unit 12 (hereinafter, “detected object”), from a group of depth models that is an aggregation of a plurality of depth models. The depth model is a model preliminarily made from depth information about each object. According to the depth model, for example, a stereoscopic shape viewing from one direction of an object, such as a person, an animal, a conveyance, a building, a plant, or the like is expressed in information about depth. Moreover, the group of depth models includes depth models in various shapes about an individual object, in addition to depth models of various kinds of objects. The group of depth models is stored, for example, by the depth model storage unit 17.
The segment unit 14 segments an area of the detected object (hereinafter, “object area”) from the input image. For example, the segment unit 14 can segment the object area from the input image by setting a flag in the object area. The depth map creating unit 15 creates a depth map indicating information about depth of the input image, from a base depth, a depth model, and an object area. An example of the depth map creating unit 15 is shown in
As shown in
A flow of an image processing method to be executed by the image processing apparatus 1 is explained below in detail with reference to the drawings.
As shown in
When the input image 100 is input, the base depth input unit 11 receives the base depth to be added to the input image 100 (Step S102). As the base depth to be received, the base depth with a depth structure that is the closest to a spatial structure estimated from, for example, a sky area or a ground area in the input image 100 can be selected from a plurality of templates stored by the base depth storage unit 16. An example of a base depth 140 thus received is shown in a section (h) in
The detecting unit 12 detects object information indicating the property of the object 101 shown on the input image 100, by analyzing the input image 100 (Step S103). The object information is, for example, the position (for example, reference coordinates), the area (shape, size, and the like), the type, and the like, of the object 101. When the input image 100 includes a plurality of objects, object information is detected with respect to each of the objects. A general method can be used for detection of object. For example, when the object 101 is a person, a method of face detection or personal detection can be used. As shown in a section (b) in
Object information about the object 101 is input into the selecting unit 13. The selecting unit 13 selects a depth model appropriate to the object 101 from a group of depth models in the depth model storage unit 17, based on the object information (e.g. the shape and the type) (Step S104).
B A section (c) in
The object information is also input into the segment unit 14, as described above. The segment unit 14 segments the area of the object 101 (object area) from the input image 100 based on the object information (Step S105). A general segmentation technology can be used for segmentation of the object area. An example of a segmented an object area 110 is shown in a section (d) in
The selected depth model 120 and the segmented object area 110 are input into the depth map creating unit 15. In the depth map creating unit 15, to begin with, the depth model correcting unit 151 superimposes the depth model 120 and the object area 110 that are input (Step S106). As shown in a section (e) in
Subsequently, as shown in a section (f) in
(1) A depth value Z of a pixel in the depth model 120 at a position nearest to the position of a pixel to be added (correction position)
(2) An average of depth values Z of n pixels (n is a natural number) in the depth model 120 and in the vicinity of the position nearest to the position of the pixel that is to be added (correction position)
(3) The maximum value of depth values Z of n pixels in the depth model 120 and in the vicinity of the position nearest to the position of the pixel that is to be added
(4) The maximum value of depth values Z of pixels in the depth model 120
By performing the correction of the depth model as described above, a corrected depth model 130 as shown in a section (g) in
The depth map compositing unit 152 creates a depth map 150 about the input image 100, as shown in a section (i) in
The depth map 150 created in this way as described above is output from the depth map creating unit 15 to a certain external device, such as a display device (Step S110). Accordingly, the image processing method of creating the depth map 150 about the input image 100 is finished.
As described above, according to the first embodiment, even when the shape of an object, such as a person, in a two-dimensional image is different from a depth model prepared in advance, a depth model more precise to the object can be created. As a result, a structure (depth map) with more accurate depth can be created from a two-dimensional image.
Moreover, based on a depth created according to the first embodiment, by obtaining disparity of each pixel in the input image 100 an image that is observed from another view point different from the input image 100 can be created. Therefore, multiple-viewpoint images that are observed from two or more view points are created from the input image 100, and displayed on a stereoscopic image display, thereby enabling stereoscopic vision. The image that is observed from another view point can be created, for example, by rendering technology.
Modification 1 of Depth Map Creating Unit
In the first embodiment, the corrected depth model 130 more precise to an object is created by correcting the depth model 120 that is selected for the object 101, based on the object area 110. However, not limited by this, a similar effect can be obtained, for example, by correcting the depth of the object 101 based on the object area 110, after adding the depth model 120 to the object 101 in the input image 100. In such case, the depth map creating unit 15 in
As shown in
A flow of an image processing method according to the modification 1 is then explained below in detail with reference to the drawings.
As shown in
According to the modification 1, the depth model compositing unit 153 of the depth map creating unit 15 then combines the base depth 140 and the depth model 120, based on the coordinate system of the base depth 140, and the reference coordinates (XF, YF) of the depth model 120 (Step S111). Accordingly, as shown in a section (f) in
The pre-depth map 141 is input together with the object area 110 into the depth map correcting unit 154. As shown in a section (g) in
Subsequently, as shown in a section (h) in
As described above, by correcting the depth model 120 based on the object area 110 after giving the depth model 120 to the object 101 in the input image 100, an effect similar to the first embodiment can be obtained. The other configurations, operations, and effects are similar to those according to the first embodiment, therefore detailed explanations are omitted here.
Next, an image processing apparatus, an image processing method, and a computer program product thereof according to a second embodiment are explained below in detail with reference to the drawings. In the following description, configurations similar to the first embodiment and its modification are assigned with the same reference numerals, and repeated explanations of them are omitted.
An input image is input into the base depth creating unit 21. The base depth creating unit 21 creates a base depth from the input image. To create the base depth, a known technology can be used. However, as a technology to be used for creating the base depth, a technology of creating the base depth about the whole of an image by analyzing the composition of the image and motions of an object in the image is preferable. Therefore, the base depth creating unit 21 estimates or specifies a spatial structure of an input image from, for example, an area of the ground or the floor in the input image (hereinafter, “ground area”), or an area of the sky or the ceiling (hereinafter, “sky area”), and creates the base depth based on the estimated spatial structure. To detect the ground area and the sky area, a generally known method can be used. Among known detection methods, for example, there is a method by using a classifier with respect to each area. Moreover, another method by performing detection of two kinds of areas among three kinds of areas that are a three-dimensional object, the sky, and the ground in the input image, and determining the rest of the area as an area of the loft kind is conceivable. In such case, when categorizing areas into four or more kinds, one kind is to be left and the other kinds are to be detected. The base depth created by the base depth creating unit 21 is to be input into the depth map creating unit 15 and to be used for creation of the depth map, similarly to the first embodiment and its modification.
Then, a flow of an image processing method according to the second embodiment is explained below in detail with reference to the drawings.
As shown in
By configuring and operating in this way as described above, according to the second embodiment, the base depth 240 appropriate to the spatial structure of the input image 200 is created. Accordingly, a structure of depth more similar to the actual depth structure in the input image 200 can be used. As a result, a structure (depth map) with more accurate depth can be created from a two-dimensional image. The other configurations, operations, and effects are similar to those according to the first embodiment and its modification, therefore detailed explanations are omitted here.
Next, an image processing apparatus, an image processing method, and a computer program product thereof according to a third embodiment are explained below in detail with reference to the drawings. In the following description, configurations similar to the first embodiment and its modification are assigned with the same reference numerals, and repeated explanations of them are omitted.
The depth model creating unit 33 creates a depth model about the object 101 from the position and the area (shape and size) of the object 101 detected by the detecting unit 12. The depth model to be created can be variously changed in form to a hemisphere (including one with an oval cross-section), a semicircular column, a half cone, a rectangular parallelepiped, a polygonal pyramid, or the like. A shape of the depth model is preferably a shape that can be easily obtained by a function. The depth model creating unit 33 selects a function to be used when creating a depth model, for example, based on the shape of the object 101, and adjusts the shape and the size to be obtained by the function based on the size of the object 101. The depth model created in this way is input into the depth map creating unit 15, and to be used for creation of the depth map, similarly to the first and the second embodiments and its modification.
Next, a flow of an image processing method according to the third embodiment is explained below in detail with reference to the drawings.
As shown in
Subsequently, according to the third embodiment, a function to be used for creating a depth model from the shape of the object 101 detected by the detecting unit 12 is selected (Step S301); and then a value appropriate to the size of the object 101 is set into the selected function, and model calculation is performed, so that a depth model 320 as shown in the section (b) in
By configuring and operating in this way as described above, according to the third embodiment, a need for preliminarily preparing depth models corresponding to various objects is omitted, so that a storage area to be provided in the image processing apparatus 3 can be reduced. The other configurations, operations, and effects are similar to those according to the first or the second embodiment, or its modification, therefore detailed explanations are omitted here.
The image processing apparatus and the image processing method according to the embodiments described above can be implemented by software or hardware. When implemented by software, the image processing apparatus and the image processing method are implemented by reading and executing a predetermined computer program with an information processor, such as a Central Processing Unit (CPU). The predetermined computer program can be recorded on a recording medium, such as a Compact Disk Read Only Memory (CD-ROM), a Digital Versatile Disk-ROM (DVD-ROM), or a flash memory, or can be recorded on a recording device connected to a network. The information processor reads or downloads the predetermined computer program and executes it.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying aims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2010-252372 | Nov 2010 | JP | national |