The present invention relates to an image processing apparatus, an image processing method, and a image communication system.
With the ongoing sophistication of consumer television of recent years, three-dimensional (3D) television capable of offering stereoscopic vision is gaining in popularity. Although there are a variety of methods for realizing 3D television, some of the methods require a user to wear dedicated eyeglasses for observing the stereoscopic images.
In a scheme where the dedicated glasses are required to observe stereoscopic images, the user naturally must wear the dedicated glasses. The inventor of the present inventions directed his attentions to the fact that the user must wear the dedicated glasses, and has reached a realization that not only the eyeglasses can be used to observe the stereoscopic images but also new field of application for the eyeglasses can be sought.
The present invention has been made in view of the circumstances, and a purpose thereof is to provide a new field of use of eyeglasses that are used to observe stereoscopic images.
In order to resolve the above-described problems, one embodiment of the present invention provides an image processing apparatus. The image processing apparatus includes: an image pickup unit configured to take an image of an object, which includes a face of a person wearing glasses by which to observe a stereoscopic image that contains a first parallax image and a second parallax image obtained when the object in a three-dimensional (3D) space is viewed from different viewpoints; a glasses identifying unit configured to identify the glasses included in the image of the object taken by the image pickup unit; a face detector configured to detect a facial region of the face of the person included in the image of the object taken by the image pickup unit, based on the glasses identified by the glasses identifying unit; and an augmented-reality special rendering unit configured to add a virtual feature to the facial region detected by the face detector.
Another embodiment of the present invention relates to an image communication system. The system includes at least two of the above-described image processing apparatuses, and the at least two image processing apparatuses are connected in a manner that permits mutual communication via a communication line.
Still another embodiment of the present invention relates to an image processing method executed by a processor. The method includes: capturing an image of an object, which includes a face of a person wearing glasses by which to observe a stereoscopic image that contains a first parallax image and a second parallax image obtained when the object in a three-dimensional (3D) space is viewed from different viewpoints; identifying the glasses from the captured image of the object; detecting a facial region of the face of the person from the captured image of the object, based on the identified glasses; and adding a virtual feature to the detected facial region.
Optional combinations of the aforementioned constituting elements, and implementations of the invention in the form of methods, apparatuses, systems, recording media, computer programs, and so forth may also be effective as additional modes of the present invention.
Embodiments will now be described by way of examples only, with reference to the accompanying drawings which are meant to be exemplary, not limiting, and wherein like elements are numbered alike in several Figures in which:
The invention will now be described by reference to the preferred embodiments. This does not intend to limit the scope of the present invention, but to exemplify the invention.
A description will be given of an outline of preferred embodiments. In the preferred embodiments, images including an image of the face of a person who wears the eyeglasses with which to observe stereoscopic images are acquired and then a facial region of the face of the person is detected using the glasses as a landmark. The thus acquired images are subjected to a special rendering of augmented reality, by which a virtual feature is added, around the detected facial region.
The stereo camera 200 includes a first camera 202 and a second camera 204 for taking images of a user, who is an object to be captured, from different points of view. Here, the images of an object as seen from different points of view in a three-dimensional (3D) space are called “parallax images”. Since the left eye and the right eye of a human are about 6 cm situated apart from each other, there occurs a parallax (disparity) between the image seen by the left eye and the image seen by the right eye. And it is considered that the human brain recognizes the depth of objects using the parallax images sensed through the left and right eyes. Accordingly, if parallax images sensed through the left eye and the right eye are projected onto the respective eyes, the brain will recognize the parallax images as an image having depths, or a perspective image. In the following, the images of an object, including left-eye parallax images and right-eye parallax images, as seen from different points of view in the 3D space will be simply referred to as “stereoscopic image(s)”. The stereo camera 200 may be realized by use of solid-state image pickup devices such as CCD (Charge-Coupled Device) sensors and CMOS (Complementary Metal Oxide) sensors.
The image processing apparatus 300 processes the images (video images) of an object taken by the stereo camera 200. The detail of the image processing apparatus 300 will be discussed later. The 3D television 400 displays three-dimensional images generated by the image processing apparatus 300. Through the 3D glasses 500, the user can recognize the images displayed by the 3D television as stereoscopic images having depths.
There are a variety of 3D television systems for showing perspective images by use of parallax images to human viewers. However, in the present embodiment a description is given of a 3D television, as an example, using a system where left-eye parallax images and right-eye parallax images are displayed alternately in time division, namely in a time sharing manner.
The 3D television 400 presents the left-eye parallax images and the right-eye parallax images, generated by the image processing apparatus 300, alternately in time division. The image processing apparatus 300 transmits the display timing of parallax images on the 3D television 400 to the 3D glasses 500 as a synchronization signal. The 3D glasses 500 operates the shutter on the left lens or the right lens according to the synchronization signal received. The shutter may be implemented by use of known liquid crystal shutter technology, for instance.
More specifically, when the 3D television 400 displays a parallax image for the left eye, the 3D glasses 500 shields the images entering the right eye by closing the shutter for the right-eye lens. Thus, when the 3D television 400 displays a parallax image for the left eye, the parallax image for the left eye is projected onto the left eye of the user only. On the other hand, when the 3D television 400 displays a parallax image for the right eye, the 3D glasses 500 closes the shutter for the left-eye lens with the result that the parallax image for the right eye is projected onto the right eye of the user only.
At time 2t the 3D television 400 displays right-eye parallax images to present the right-eye parallax images to the right eye of the user. And at time 4t the 3D television 400 displays left-eye parallax images to present the left-eye parallax images to the left eye of the user. This can present perspective 3D images having a sense of depth to the user.
The left-eye image generator 302 visualizes the information acquired from the first camera 202 so as to generate left-eye parallax images. The right-eye image generator 304 visualizes the information acquired from the second camera 204 so as to generate right-eye parallax images.
The glasses identifying unit 306 identifies the 3D glasses 500 from the images of an object that are captured by the stereo camera 200 and then visualized by the left-eye image generator 302 and the right-eye image generator 304. As described earlier, implemented in the present embodiment is the shutter glasses where the shutter on the left lens or the right lens is operated according to the synchronization signal received from the image processing apparatus 300. Accordingly, the glasses identifying unit 306 includes a shutter region identifying unit 320 and a frame identifying unit 322.
The 3D glasses 500 alternately closes the left lens and the right lens in time division, thereby blocking alternately the images projected onto the left eye and the right eye, respectively. This means that, in the captured images of the face of the user wearing the 3D glasses 500, each of the user's eyes looking through the left and right lens of the 3D glasses 500 is alternately blocked and the image of the blocked lens is not captured. Thus, the shutter region identifying unit 320 identifies the 3D glasses 500 in a manner such that a region, where the passage of the image of the object is blocked from the images of the object including the image of the face of user wearing the 3D glasses 500, is detected as a lens region.
With the lens region identified by the shutter region identifying unit 320 as a starting point, the frame identifying unit 322 tracks the glasses frame of the 3D glasses 500 so as to identify the 3D glasses 500. The face detector 308 detects the face of the user with a glasses region identified by the glasses identifying unit 306.
Thus, the user watching the 3D television 400 of a type where the dedicated glasses are worn is required to wear the 3D glasses 500. It is therefore possible to start identifying the glasses region. Where, in particular, the shutter-type 3D glasses are used, it is possible to identify the lens region of the 3D glasses 500 as a landmark. The lens region is a somewhat large region as compared with the face of a human and therefore the lens region can be detected stably and quickly. For example, as compared with a case where the glasses frame is to be detected, the lens has a two-dimensional extensity and therefore it can be detected stably and quickly.
The shutter region identifying unit 320 calculates a difference between an image obtained when the shutters of the left and right lens of the 3D glasses 500 are closed as shown in
Once the lens region of the 3D glasses is identified, the frame identifying unit 322 can identify the frame of the 3D glasses 500 by tracking an edge connected to the lens region thereof. Also, once the lens region of the 3D glasses 500 is identified, the both eyes of the user can be identified and therefore the approximate size of the face of the user can be estimated based on the distance between the eyes. The face detector 308 detects a flesh-color region and the edge with the lens region of the 3D glasses as the starting point and thereby can identify a facial region of the user's face.
Now refer back to
The augmented-reality special rendering unit 314 adds virtual features to the facial region of the face of the person detected by the face detector and its surrounding regions. Here, the “augmented reality” is a collective term for a way of thinking where a 3D model is first projected into a real space, displayed on the 3D television 400, which the user wearing the 3D glasses 500 observes and then various virtual features are added to this real space, and the techniques by which to achieve such a way of thinking.
More to the point, the augmented-reality special rendering unit 314 adds various augmented realities based on the 3D model of the user's face generated by the 3D model generator 312. For that purpose, the augmented-reality special rendering unit 314 includes a background special rendering unit 326, a mirror image generator 328, an image pickup position correcting unit 330, a face special rendering unit 332, and a special rendering control unit 324 for controlling these components and the operations of thereof.
The background special rendering unit 326 renders a special effect of augmented reality to a background region. Here, the background region is a region other than the facial region which has been detected by the face detector 308 and then modeled by the 3D model generator 312. As will be discussed later, the image processing system 100 may be used as a television telephone, for instance, if the image processing system 100 is connected to other image processing systems 100 via a network. In such a case, the stereo camera 200 may generally well be installed within a home of the user. However, there are cases where it is not preferable that what is actually seen in the home is transmitted as it is. To cope with this, the background special rendering unit 326 replaces the background region with a different image, scumbles the background region or the like. Thus, the present embodiment is advantageous in that undisguisedly transmitting what is actually seen in the home can be prevented.
Since the 3D model generator 312 generates a 3D model of the face of the user, the image pickup position correcting unit 330 can produce images obtained when the image of the user is captured from an arbitrary direction. Thus, the image pickup position correcting unit 330 produces images that would be obtained when the image of the user is captured from the frontal direction, based on the 3D model of the face of the user generated by the 3D model generator 312. Thereby, the user can observe the images of his/hers that would be captured from right in front of himself/herself. If the television telephone is to be used, the user can make eye contact with a conversation partner and vice versa. This can reduce a sense of discomfort in making conversation with the conversation partner, as compared with the case where the images taken from the directions other than those taken from right in front of the conversation partner are used in making conversation with each other.
Now refer back to
The mirror image generator 328 generates a 3D model, where the image of the user is reflected in a mirror, based on the 3D model of the user's face detected by the face detector 308 and generated by the 3D generator 312. The user can observe his/her own face, to which the special effect of augmented reality has been rendered, as the images reflected in the mirror before the transmission of television-telephone signals. Use of the image processing system using the 3D glasses 500 worn by the user allows the user to check his/her own figure, to which a special effect of augmented reality has been rendered, before entering a cyber-world and thereby allows the user to feel the switch from an ordinary scene to the extraordinary.
It is required to wear the 3D glasses 500 in order for the user to observe stereoscopic images. However, the user does not necessarily wish to display on the 3D television 400 his/her direct images showing that he/she wears the 3D glasses 500 and to transmit those images to the conversation partner. Rather, there may be cases where the user does not take an active stance toward displaying the images as it is and transmitting them as it is but wishes to render an extraordinary special effect to the images.
In the present embodiment, the 3D model generator 312 generates the 3D model of the user's face, so that extraordinary special effects can be rendered to the images through various augmented realities. Then the 3D glasses 500 can be used in the face detection processing as a preprocessing for generating the 3D model. This is because it is guaranteed that the user wears the 3D glasses 500.
The special rendering control unit 324 receives instructions from the user via a user interface such as a not-shown remote controller and then controls the special rendering performed by each component of the augmented-reality special rendering unit 314. Though not shown in the Figures, the augmented-reality special rendering unit 314 may be provided with a function of adding other augmented realities. Here, the other augmented realities include an augmented reality where characters are displayed near the user's face using a “speech balloon” technique, for instance.
The stereoscopic image generator 316 generates stereoscopic images including the left-eye parallax images and the right-eye parallax images obtained when a 3D model of the user in a virtual 3D space is seen from different points of view, based on the 3D model of the user generated by the 3D model generator 312 or based on the 3D model of the user to which the augmented-reality special rendering unit 314 has rendered a special effect. The output unit 318 outputs the stereoscopic images generated by the stereoscopic image generator 316 to the 3D television 400 or transmits the stereoscopic images to other image processing system(s) 100 via a network such as the Internet.
The left-eye image generator 302 and the right-eye image generator 304 visualize an object, including the face of the user wearing the 3D glasses 500, outputted from the stereo camera 200 (S10). The glasses identifying unit 306 identifies the 3D glasses 500 from the images of the object visualized by the left-eye image generator 302 and the right-eye image generator 304 (S12).
Based on the 3D glasses identified by the glasses identifying unit 306, the face detector 308 detects a facial region of the face of the user from the object, including the face of the user, visualized by the left-eye image generator 302 and the right-eye image generator 304 (S14). The feature point detector 310 detects feature points from the facial region of the user's face detected by the face detector 308 (S16).
The 3D model generator 312 generates the 3D model of the user's face based on both the facial region of the user's face detected by the face detector 308 and the feature points detected by the feature point detector 310 (S18). The augmented-reality special rendering unit 314 renders a special effect of augmented reality, based on the 3D model of the user's face generated by the 3D model generator 312 (S20).
The stereoscopic image generator 312 generates stereoscopic images including the left-eye parallax images and the right-eye parallax images obtained when the 3D model of the user in a virtual 3D space is seen from different points of view, based on the 3D model of the user generated by the 3D model generator 312 or based on the 3D model of the user to which the augmented-reality special rendering unit 314 has rendered a special effect (S22). The output unit 318 outputs the stereoscopic images generated by the stereoscopic image generator 316 to an external device (S24). As the output unit 318 has outputted the stereoscopic images, the processing in this flowchart will be terminated.
In
Similarly, the images of the second user 900 wearing the 3D glasses 500b are subjected to the special effects of augmented realities and then are sent to the first 3D television 400a that the first user 800 watches. In this manner, by employing the 3D television telephone system 700, the users can video chat using the images that have been subjected to the special effects of augmented realities.
As described earlier, the 3D glasses serve as the landmark in the present embodiment and therefore the 3D model of the face can be generated stably and with a high degree of accuracy and furthermore the rendering elements such as the background image and the expression images can be separated from each other. Thus, in the present embodiment, the information, such as the position and orientation of the user's face, expression images, and expression data, which requires the real-timeliness is gathered together and combined in units of frame and is transmitted in real time. On the other hand, the 3D model of the face, the special effects using the augmented reality, and the like are transmitted beforehand prior to the communication by the 3D television telephone and therefore these are not transmitted in units of frame.
As shown in
An operation implementing the above-described structure is as follows. The user wears the 3D glasses and uses the image processing system 100. The stereo camera 200 captures the images of an object including the user wearing the 3D glasses. The facial region of the user's face is detected using the 3D glasses 500 as the landmark, and various special effects of augmented realities are rendered. The images to which a special effect of augmented reality has been rendered are displayed on the 3D television 400 and are transmitted to another image processing system 100.
As described above, the embodiments provide a new usage field where the 3D glasses 500 are not only used to watch the stereoscopic images but also used as the landmark in rendering special effects of augmented realities.
The present invention has been described based upon illustrative embodiments. The above-described embodiments are intended to be illustrative only and it will be obvious to those skilled in the art that various modifications to the combination of constituting elements and processes could be developed and that such modifications are also within the scope of the present invention.
The description has been given of a case where the shutter glasses are employed but this should not be limited to the 3D glasses 500 and, for example, polarized glasses may be employed instead. In such a case, a lenticular marker, a light-emitting diode or the like is added to the glasses frame, so that the polarized glasses may be used as the landmark. In particular, the lenticular marker is characterized by the feature that the design or pattern thereof varies when viewed from different angles. Thus, it is advantageous in that the orientation and the angle of the face can be measured by converting a relative angle between the glasses and the camera into a change in the pattern. Also, in order to facilitate the observation of the expression areas 334, an under-rim glasses frame that covers the lower half of the lens may be employed.
The description has been given of a case where used are the stereo cameras 200 including the first camera 202 and the second camera 204 that capture the images of the user from different viewpoints. However, the image pickup devices are not limited to the stereo cameras but may be a monocular camera instead. In this case, the feature points detected by the feature point detector 310 are directly mapped to a 3D model of a general-use face. As compared with the case where the stereo cameras are used, the accuracy of mapping may drop but if the fact that the accuracy is not so much important factor in the augmented reality where the 3D model is used is taken into consideration, this modification will be advantageous in terms of the suppressed cost because only the single unit of camera is used.
Number | Date | Country | Kind |
---|---|---|---|
2010-118665 | May 2010 | JP | national |
This is a continuation application of U.S. patent application Ser. No. 14/944,478, allowed and accorded a filing date of Nov. 18, 2015, which is a continuation application of U.S. Pat. No. 9,225,973 (U.S. patent application Ser. No. 13/605,571), accorded a filing date of Sep. 6, 2012, which is a continuation application of International Application No. PCT/JP2010/007616, filed Dec. 28, 2010, which claims priority to JP 2010-118665, filed May 24, 2010, the entire disclosures of which are hereby incorporated by reference.
Number | Date | Country | |
---|---|---|---|
Parent | 14944478 | Nov 2015 | US |
Child | 15712343 | US | |
Parent | 13605571 | Sep 2012 | US |
Child | 14944478 | US | |
Parent | PCT/JP2010/007616 | Dec 2010 | US |
Child | 13605571 | US |