Embodiments described herein relate generally to an information processing apparatus, an information processing method, and a program product.
Known is an information processing apparatus that detects an operator movement for an operation instruction from a video that is based on video data captured by an image capturing apparatus, and outputs operation data indicating the operation instruction given by the movement thus detected to a target apparatus.
However, according to the conventional technology, the operator cannot recognize the area where a movement of the operator giving an operation instruction is detected in the video that is based on the video data captured by the image capturing apparatus. Therefore, an operator movement other than an operation instruction might be detected as an operator movement giving an operation instruction, and the accuracy at which the target apparatus is caused to operate via a gesture is low. In addition, it has been desired to increase the number of operation instructions to be given via movements that are detected from the area where an operator movement giving an operation instruction is detected.
A general architecture that implements the various features of the invention will now be described with reference to the drawings. The drawings and the associated descriptions are provided to illustrate embodiments of the invention and not to limit the scope of the invention.
In general, according to one embodiment, an information processing apparatus comprises: a detector configured to set a plurality of detection areas to a single piece of face image included in a video image that is based, on input video data, with reference to a position of the face image to detect movements of an operator giving an operation instruction in the detection areas; and an output module configured to output operation data indicating the operation instruction based on a combination of the movements detected in the detection areas.
The main unit 11 comprises a housing in a shape of a thin box. On the top surface of the main unit 11, a keyboard 13, an input operation panel 15, a touch pad 16, speakers 18A and 18B, and a power button 19 for powering on and off the computer 10, and the like are provided. On the input operation panel 15, various operation buttons are provided.
On the rear surface of the main unit 11, a terminal for connecting an external display (not illustrated), such as a terminal based on the High-Definition Multimedia Interface (HDMI) standard, is provided. The terminal for connecting an external display is used to output a digital video signal to the external display.
The CPU 111 is a processor for controlling operations of the computer 10. The CPU 111 executes an operating system (OS) and various types of application programs loaded onto the main memory 112 from the HDD 117. The CPU 111 also executes a basic input/output system (BIOS) stored in the BIOS-ROM 119. The BIOS is a computer program for controlling peripheral devices. The BIOS is executed to begin with when the computer 10 is powered on.
The north bridge 113 is abridge device for connecting a local bus of the CPU 111 and the south bridge 116. The north bridge 113 has a function of communicating with the graphics controller 114 via an accelerated graphics port (AGP) bus or the like.
The graphics controller 114 is a display controller for controlling the display unit 12 of the computer 10. The graphics controller 114 generates video signals to be output to the display unit 12 from image data written by the OS or an application program to a video random access memory (VRAM) (not illustrated).
The HDD 117, the sub-processor 118, the BIOS-ROM 119, the camera module 20, and the EC/KBC 120 are connected to the south bridge 116. The south bridge 116 comprises an integrated drive electronics (IDE) controller for controlling the HDD 117 and the sub-processor 118.
The EC/KBC 120 is a single-chip microcomputer in which an embedded controller (EC) for managing power and a keyboard controller (KBC) for controlling the touch pad 16 and the KB 13 are integrated. The EC/KBC 120 works with the power circuit 121 to power on the computer 10 when the power button 19 is operated, for example. When an external power is supplied via the AC adapter 123, the computer 10 is powered by the external power. When no external power is supplied, the computer 10 is powered by the battery 122.
The camera module 20 is a universal serial bus (USB) camera, for example. The USB connector on the camera module 20 is connected to an USB port (not illustrated) provided on the main unit 11 of the computer 10. Video data (image data) captured by the camera module 20 is stored in the main memory 112 or the like as frame data, and can be displayed on the display unit 12. The frame rate of frame images included in the video data captured by the camera module 20 is 15 frames/second, for example. The camera module 20 may be an external camera, or may be a built-in camera in the computer 10.
The sub-processor 118 processes video data acquired from the camera module 20, for example.
The image acquiring module 301 acquires video data captured by the camera module 20, and stores the video data in the HDD 117, for example.
The detector 302 sets a plurality of detection areas to a single face image included in a video that is based on the input video data (video data acquired by the image acquiring module 301), with reference to the position of the face image. The detector 302 then detects movements of an operator of the computer 10 giving an operation instruction from the respective detection areas. In the embodiment, the detector 302 comprises a face detecting/tracking module 311, a detection area setting module 312, a prohibition determining module 313, a movement detecting module 314, and a history acquiring module 315.
The operation determining module 303 functions as an output module that outputs operation data indicating an operation instruction given by a combination of the movements detected by the detector 302 in the detection areas. The operation executing module 304 controls a target apparatus (e.g., the display unit 12, the speakers 18A and 18B, or the external display) based on the operation data output from the operation determining module 303.
A process of outputting the operation data in the computer 10 according to the embodiment will now be explained with reference to
While the computer 10 is on after the power button 19 is operated, the image acquiring module 301 acquires video data captured by the camera module 20 (S401). In the embodiment, the image acquiring module 301 acquires video data by sampling a frame image at a preset sampling rate from frame images captured at a given frame rate by the camera module 20. In other words, the image acquiring module 301 keeps sampling frame images to acquire video data. The video data thus acquired may include a face image of an operator of the computer 10 (hereinafter referred to as a face image).
Once the image acquiring module 301 acquires the video data, the face detecting/tracking module 311 detects a face image from the video that is based on the video data thus acquired, and keeps track of the face image (S402). Keeping track of a keeping sampling frame images face image herein means to keep detecting a face image of the same operator across the frame images included in the acquired video data.
Specifically, the face detecting/tracking module 311 distinguishes a face image 502 from a non-face image 503 in a frame image 501 included in the video that is based on the acquired video data, using Scale-Invariant Feature Transform (SIFT), Speeded Up Robust Features (SURF), or the like, as illustrated in
The face detecting/tracking module 311 then detects a plurality of characterizing points (e.g., three points of the nose, the left eye, and the right eye) from the face image 502 in the frame image 501 included in the video that is based on the acquired video data, using simultaneous localization and mapping (SLAM) (an example of parallel tracking and mapping (PTAM)) or the like that uses a tracking technique for keeping track of characterizing points, such as the Kanade Lucas Tomasi (KLT). At this time, the face detecting/tracking module 311 detects characterizing points that are the same as those in the face image 502 included in a frame image captured prior to the frame image 501, among the characterizing points in the face image 502 included in the frame image 501. In this manner, the face detecting/tracking module 311 keeps track of the detected face image 502.
The face detecting/tracking module 311 detects the face image 502 of a face directly facing the camera module 20, from the face images included in the frame image 501 included in the video that is based on the acquired video data. In the embodiment, the face detecting/tracking module 311 detects a face image including both eyes, or a face image not including ears as a face image 502 of a face directly facing the front, among the face images included in the frame image 501 included in the video that is based on the acquired video data. In other words, it can be assumed that, when an operator intends to make operations on the computer 10, the operator directly faces the display unit 12. Therefore, by detecting a face image 502 of a face directly facing the camera module 20, the face detecting/tracking module 311 can detect only the face image 502 of an operator intended to make operations on the computer 10. Because the subsequent process is triggered when an operator faces the display unit 12 directly, extra operations required for making an operation instruction via a gesture can be omitted.
Referring back to
If the face detecting/tracking module 311 succeeds in keeping track of the face image (Yes at S403), the detection area setting module 312 detects the position of the face image included in the video that is based on the acquired video data (S404). In the embodiment, as the position of the face image 502, the detection area setting module 312 detects position coordinates (X1, Y1) of the center of the face image 502 detected by the face detecting/tracking module 311 (the position of the nose, in the embodiment) in a preset coordinate system having a point of origin (0, 0) at the upper left corner of the frame image 501 included in the video data (hereinafter referred to as an XY coordinate system), as illustrated in
The detection area setting module 312 detects an inclination of the axis that extends in the vertical direction of the face image (hereinafter, referred to as a face image axis) (an example of a first axis) in the video that is based on the acquired video data. In the embodiment, the face image axis passes through the center (position coordinates (X1, Y1) of the face image. The detection area setting module 312 then detects an inclination of the face image axis (angle θ) in the XY coordinate system as an inclination of the face image. Alternatively, the detection area setting module 312 may consider an axis extending in the vertical direction of the face image and passing through the axis of symmetry that makes the face image symmetric as the face image axis, and detect the inclination of the face image axis in the XY coordinate system as an inclination of the face image. As another alternative, in a triangle connecting the nose, the left eye, and the right eye detected as the characterizing points of the face image, the detection area setting module 312 may consider a perpendicular drawn from the characterizing point at the nose to a line segment connecting the characterizing points at the left eye and at the right eye as a face image axis, and detect the inclination of the face image axis in the XY coordinate system as an inclination of the face image.
Referring back to
In the embodiment, if the image data displayed on the display unit 12 allows an operator to make an operation instruction more easily when the display unit 12 is used as a reference, the detection area setting module 312 is switched to the first mode. Examples of such image data include a window displaying scrollable content (e.g., a text, a picture, or an image), a window displaying various types of information requiring a confirmation (e.g., a menu), and a window displaying rotatable content (e.g., a picture or an image). If the image data displayed on the display unit 12 allows an operator to make an operation instruction more easily when the operator himself/herself is used as a reference, e.g., in a case of a screen related to replaying content, selection of a channel number, or the volume of sound output from the speakers 18A and 18B, the detection area setting module 312 is switched to the second mode.
The detection area setting module 312 then sets a plurality of detection areas to a piece of face image included in the video with reference to the position of the detected face image (S406). The detection areas herein mean areas from which an operator movement (a movement of an operator's hand giving an operation instruction, or a movement of an object caused by an operation instruction) for giving an operation instruction (e.g., to scroll the content displayed in the window, to confirm the various types of information displayed in the window, to rotate the content displayed in the window, to replay the content, to select a channel number, or to adjust the volume) is detected. When a plurality of face images are included in the video that is based on the acquired video data, the detection area setting module 312 sets a plurality of detection areas to each of the face images, with reference to the position of each of the face images.
In the embodiment, as illustrated in
More specifically, in the xy coordinate system having a point of origin at the position coordinates (X1, Y1) of the face image 502, the detection area setting module 312 acquires position coordinates (x1, y1) shifted downwardly from the position coordinates (X1, Y1) of the face image 502 (along the y-axis direction), as illustrated in
When the axis of the face image 502 (y axis) is inclined by an angle θ in the XY coordinate system as well, e.g., when the operator of the computer 10 is lying, the detection area setting module 312 sets the detection areas 504A, 504B in the same manner. As illustrated in
In the embodiment, the detection area setting module 312 sets a rectangular area to each of the detection areas 504A, 504B, but the shape is not limited thereto, provided that such an area is set with reference to the position of the face image 502. For example, the detection area setting module 312 may set an area curved in an arc shape as a detection area.
Furthermore, in the embodiment, the detection area setting module 312 sets the detection areas 504A, 504B arranged along the x axis on both sides of the y axis that passes through the center of the face image 502, but the embodiment is not limited thereto. For example, the detection area setting module 312 may set a plurality of detection areas 504C to 504G that are arranged in a line along the x axis, and enabled to detect an operator movement 506 for giving an operation instruction, as illustrated in
Referring back to
Specifically, the movement detecting module 314 extracts frame images 501 between time t at which the last frame image is captured and time t−1 preceding the time t by given time (e.g., time corresponding to 10 frames), from frame images 501 included in the video that is based on the acquired video data.
The movement detecting module 314 then detects the movements 506A, 506B of the hands 505 from the respective detection areas 504A, 504B in each of the extracted frame images 501. In the example illustrated in
In the embodiment, the movement detecting module 314 detects the movement 506A, 506B of the hand 505 in the example illustrated in
The movement detecting module 314 may also detect the movement 506A, 506B of the hand 505h near the detection area 504A, 504B, in addition to a movement 506A, 506B of the hand 505 in the detection area 504A, 504B, as illustrated in
Among the movements 506A, 506B in the respective detection areas 504A, 504B, the movement detecting module 314 may detect only movements 506A, 506B that can be detected reliably, without detecting a movement at a speed higher than a predetermined speed or a movement not intended to be an operation instruction (in the embodiment, a movement of the hand 505 along the X axis or the Y axis, or a movement other than a movement of the hand 505 along the x axis or the y axis). In this manner, a movement of an operation instruction can be detected reliably.
Referring back to
The prohibition determining module 313 then determines if a prohibition period during which an operation instruction is prohibited has elapsed from when operation data is last output from the operation determining module 303 (S409). The prohibition period herein is a period during which an operator is prohibited from making any operation instruction, and may be set at discretion of an operator of the computer 10. If the prohibition period has not elapsed (No at S409), the prohibition determining module 313 waits until the prohibition period elapses. In this manner, when an operator makes an operation instruction and another operator makes an operation immediately after the first operator, the operation instruction made by the first operator is prevented from being cancelled by the operation instruction made by the second operator. Furthermore, when an operator makes an operation instruction using the same movement repeatedly (for example, when the operator repeatedly makes a movement of moving down the hand 505), as the hand 505 is brought back to the original position after moving down the hand 505, the movement of bringing back the hand 505 to the original position might be detected. In such a case, the prohibition period can prevent the movement of bringing down the hand 505 from being cancelled by the movement of bringing back the hand 505 to the original position.
The prohibition determining module 313 informs that an operation instruction can now be made after the prohibition period has elapsed. In the embodiment, when an operation instruction can be made, the prohibition determining module 313 notifies that an operation instruction can now be made by changing the display mode of the display unit 12, such as by displaying a message indicating that an operation instruction can now be made on the display unit 12. In the embodiment, the prohibition determining module 313 informs that an operation instruction can now be made by changing the display mode of the display unit 12, but the embodiment is not limited thereto, and the prohibition determining module 313 may also inform that an operation instruction can now be made using a light-emitting diode (LED) indicator not illustrated or the speakers 18A and 18B, for example.
When the prohibition determining module 313 determines that the prohibition period has elapsed (Yes at S409), the operation determining module 303 outputs operation data indicating an operation instruction that is based on a combination of the movements detected from the respective detection areas, from the history of movements acquired by the history acquiring module 315 (S410). Specifically, the operation determining module 303 outputs operation data indicating an operation instruction that is based on directions of the movement detected in the respective detection areas set to a piece of face image. The operation determining module 303 also outputs operation data indicating an operation instruction that is based on the number of detection areas from which the movements are detected, among the detection areas set to a piece of face image. In the embodiment, when the movements 506A, 506B detected in the detection areas 504A, 504B and acquired by the history acquiring module 315 are movements in a vertical direction or a horizontal direction in the XY coordinate system (or in the xy coordinate system), the operation determining module 303 outputs operation data indicating an operation instruction that is based on a combination of the movements 506A, 506B detected in the respective detection areas 504A, 504B and acquired by the history acquiring module 315.
For example, when a window displaying scrollable content is displayed on the display unit 12 and the movement detecting module 314 detects a movement 506A (or a movement 506B) of the hand 505 along Y axis in the detection area 504A (or in the detection area 504B) as illustrated in
When a window displaying various types of information requiring a confirmation is displayed on the display unit 12 and the movement detecting module 314 detects movements 506A, 506B of bringing the hands 505 together along the X axis in the respective detection areas 504A, 504B as illustrated in
When a window displaying rotatable content is displayed on the display unit 12 and the movement detecting module 314 detects a movement 506A of bringing up the hand 505 along the Y axis in the detection area 504A and detects a movement 506B of bringing down the hand 505 in the detection area 504B as illustrated in
When a screen displaying content having replayed on the display unit 12 and the movement detecting module 314 detects movements 506A, 506B of bringing the hands 505 together along the x axis as illustrated in
When a screen related to a channel number selection is displayed on the display unit 12 and the movement detecting module 314 detects a movement 506A (or a movement 506B) of the hand 505 along the x axis in the detection area 504A (or in the detection area 504B) as illustrated in
When a screen related to the volume of sound output from the speakers 18A and 18B is displayed on the display unit 12 and the movement detecting module 314 detects a movement 506A (or a movement 506B) of the hand 505 along the Y axis in the detection area 504A (or in the detection area 504B) as illustrated in
When a screen related to the volume of sound output from the speakers 18A and 18B is displayed on the display unit 12 while the second mode is selected and the detection areas 504C to 504G are set as illustrated in
In the manner described above, the computer 10 according to the embodiment sets a plurality of detection areas to a single piece of face image with reference to the position of a face image included in a video that is based on input video data, detects operator movements giving an operation instruction in the respective detection areas, and outputs operation data indicating an operation instruction that is based on a combination of the movements detected in the respective detection area. Therefore, an operation instruction can be given by a combination of a plurality of gestures, so that an increased number of operation instructions become possible.
The computer program executed on the computer 10 according to the embodiment may be provided in a manner recorded in a computer-readable recording medium such as a compact disk read-only memory (CD-ROM), a flexible disk (FD), a compact disk recordable (CD-R), or a digital versatile disk (DVD) as a file in an installable or executable format.
Furthermore, the computer program executed on the computer 10 according to the embodiment may be stored in a computer connected to a network such as the Internet, and made available for download over the network. Furthermore, the computer program executed on the computer 10 according to the embodiment may be provided or distributed over a network such as the Internet.
Furthermore, the computer program according to the embodiment may be provided in a manner incorporated in a ROM or the like in advance.
Moreover, the various modules of the systems described herein can be implemented as software applications, hardware and/or software modules, or components on one or more computers, such as servers. While the various modules are illustrated separately, they may share some or all of the same underlying logic or code.
While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.
Number | Date | Country | Kind |
---|---|---|---|
2012-117942 | May 2012 | JP | national |
This application is a continuation of PCT international application Ser. No. PCT/JP2013/058195, filed on Mar. 14, 2013, which designates the United States, incorporated herein by reference, and which is based upon and claims the benefit of priority from Japanese Patent Application No. 2012-117942, filed on May 23, 2012, the entire contents of which are incorporated herein by reference.
Number | Date | Country | |
---|---|---|---|
Parent | PCT/JP2013/058195 | Mar 2013 | US |
Child | 13970359 | US |