Embodiments of the present invention relates to computing technology, and more specifically, to a method and apparatus for live user recognition.
With the development of image/video processing and pattern recognition technology, face recognition has become a stable, accurate and efficient biometrics recognition technology. Face recognition takes an image and/or video containing a face as input and determines a user's identity by recognizing and analyzing facial features. Compared with iris recognition or other biometrics-based technology, face recognition can complete identity authentication efficiently without user focus and awareness, thereby causing slight disturbance to users. Therefore, face recognition has been widely applied to identity authentication in finance, justice, public safety, military and various respects in human daily life. Moreover, face recognition can be implemented by means of various user terminals like a personal computer (PC), a mobile phone, a personal digital assistant (PDA) and so on, without precision and expensive specialized instruments.
However, there also exist some drawbacks in face recognition-based identity authentication. For example, an image/video containing a legal user's face might be obtained by an illegal user using various means, such as via public web albums, personal resumes, pinhole cameras, etc. Then, the illegal user might place such image/video (such as legal users' facial photos) in front of an image acquisition device so as to input it into a face recognition system, thereby breaking into the legal user's accounts. Conventional face recognition systems are unable to cope with such situation, because they are unable to detect whether the inputted user facial image is obtained from a live user.
To alleviate this problem, pre-processing to a facial image such as three-dimensional depth analysis, blink detection and/or spectrum sensing has been proposed, thereby determining whether the recognized facial image is obtained from a live user or from a two-dimensional image like users' photo. However, this method imposes strict requirements on operating environment. In addition, the method cannot differentiate between live users and video containing faces, because faces in video may also have three-dimensional depth information and actions like blinks. Other known methods require before face recognition, users' specific parts (such as hands or eyes) perform predetermined actions, for example moving along a predetermined path. However, since these predetermined actions are relatively fixed, illegal users might record actions performed by legal users during identity authentication and use recorded video clips to simulate live users. Moreover, these methods request users to remember predetermined actions, which increases interaction burden on users. Solutions using infrared detection and other means to measure human temperature and thus recognize live users are also known. However, these solutions have to be implemented by means of specialized devices, thereby increasing complexity and/or costs of the face recognition systems.
In view of the foregoing discussion, there is a need in the art for a technical solution capable of live user recognition more effectively, accurately and conveniently.
To overcome the above problems, the present invention proposes a method and apparatus for live user recognition.
According to one aspect of the present invention, there is provided a method for live user recognition. The method comprises: obtaining an image containing a face; while recognizing the face based on the image, detecting whether gaze of the face moves into a proximity of a random position on a display screen every time an object is displayed at the random position; and determining whether the image is obtained from the live user based on the detection.
According to another aspect of the present invention, there is provided an apparatus for live user recognition. The apparatus comprises: an image obtaining unit configured to obtain an image containing a face; a gaze detecting unit configured to detect, while recognizing the face based on the image, whether gaze of the face moves into a proximity of a random position on a display screen every time an object is displayed at the random position; and a live user recognizing unit configured to determine whether the image is obtained from the live user based on the detection.
It would be appreciated from the following description that according to embodiments of the present invention, while performing face recognition to a user, whether a facial image is obtained from a live user can be recognized rapidly and effectively through the movement of the gaze to a random position on the screen. Moreover, according to embodiments of the present invention, illegal users can hardly pretend to be legal users by using facial images and/or video acquired in advance. In addition, since the solution's working principle is based on conventional physiological properties of the human body (such as the stress response), so the user burden can be maintained at acceptable lower level. The method and apparatus according to embodiments of the present invention can be conveniently implemented using a common computer device without a need for any dedicated device or instrument, thereby helping to reducing the cost.
Through the detailed description with reference to the accompanying drawings, the above and other objects, features and advantages of the present invention will become easier to be understood. In the accompanying drawings, there are shown several embodiments of the present invention in an illustrative manner rather than limiting.
The same or corresponding numerals generally refer to the same or corresponding parts throughout the figures.
Description is presented below to principles and spirit of the present invention with reference to several embodiments in the accompanying drawings. It should be understood these embodiments are described only for enabling those skilled in the art to better understand and thus implement the present invention, rather than limiting the scope of the present invention in any manner.
Reference is first made to
System 100 further comprises a display screen (hereinafter referred to as a “screen” for short) 102 for presenting information to the user. According to embodiments of the present invention, screen 102 may be any device capable of presenting visualized information to the user, including without limitation to one or more of: a cathode ray tube (CRT) display, a liquid crystal display (LCD), a light-emitting diode (LED) display, a plasma display panel (PDP), a 3-dimensional display, a touch display, etc.
Note although image capture device 101 and display screen 102 are shown as separate devices in
Optionally, system 100 may further comprise one or more sensors 103 for capturing one or more parameters indicative of environment state where the user is located. In some embodiments, sensor 103 may include for example one or more of: parameters captured by sensor 103 are only used to support optional functions in some embodiments, while live user recognition does not rely on these parameters. Concrete operations and functions of sensor 103 will be described in detail below. Like the foregoing description, sensor 103 may also be located on the same physical equipment as image capture device 101 and/or display screen 102. For example, in some embodiments, image capture device 101, display screen 102 and sensor 103 may be components of the same user equipment (such as a mobile phone), and they may be coupled to a central processing unit of the user equipment.
Next method 200 proceeds to step S202. At step S202, while recognizing the face based on the image obtained at step S201, every time an object is displayed at a random position on the screen, it is detected whether or not the face's gaze moves into proximity of that random position.
During operation, after obtaining an image at step S201, the image may be processed to recognize facial features and information contained in the image. Any face recognition and/or analysis method, no matter currently known or developed in future, may be used in conjunction with embodiments of the present invention, and the scope of the present invention is not limited in this regard. In parallel to face recognition, one or more objects may be displayed to the user by the display screen 102, so as to detect whether or not the currently processed image is obtained from a live user. According to embodiments of the present invention, live user detection is implemented concurrently with face recognition. This is because if they are not executed concurrently, then an illegal user might use a facial photo/video for face recognition and use a face of another (illegal) live user to pass live user recognition. Embodiments of the present invention can effectively discover and eliminate occurrence of such a phenomenon.
Still with reference to
According to some embodiments of the present invention, the displayed object may be a bright spot. Alternatively, the displayed object may be text, icon, pattern, or any appropriate content that may draw the user's attention. Compared with background presented by display screen 102, the object may be highlighted in order to draw sufficient attention from the user. For example, the displayed object may differ from the screen background in following one or more respects: color, brightness, shape, action (for example, the object may rotate, jitter, zoom, etc.), etc.
According to embodiments of the present invention, while implementing face detection, the image capture device 102 is configured to continuously capture images containing the user's face. Thus, every time an object is displayed at a random position on the display screen, a gaze tracking process is applied to a series of captured images, so as to detect whether or not the face's gaze moves to that random position where the object is displayed on the screen. A variety of gaze tracking techniques are known, which include without limitation to: shape-based tracking, feature-based tracking, appearance-based tracking, tracking based on mixed characteristics of geometric and optical features, etc. For example, there has been proposed a solution for human eye recognition and gaze tracking through an active shape model (ASM) or an active appearance model (AAM). In fact, any gaze detection and tracking methods that are currently known or to be developed in future may be used in conjunction with embodiments of the present invention, and the scope of the present invention is not limited in this regard.
In particular, considering possible errors in the gaze detection process, in implementation, the user's gaze does not necessarily completely match a screen position where an object is displayed. Rather, a predetermined proximity may be set, such as a circular area with a predetermined radius or a polygonal area with predetermined side lengths. In gaze detection, as long as the gaze falls within the predetermined proximity of the object position, it may be determined the gaze has moved to the screen position of the object.
Returning to
Otherwise, if it is detected at step S202 the gaze does not move with an object displayed on the screen, then at step S203 it may be determined the image containing a face is possibly not obtained from the live user. At this point, any appropriate subsequent processing may be performed, e.g., further estimating the risk that the image is obtained from a non-live user, or directly leading to failure of the identity authentication process, etc.
Method 200 ends after step S203.
Now reference is made to
It would be appreciated that the gaze in a static image like a photo cannot change, while the probability that the gaze in video exactly moves to a random position of an object on the screen after the object is displayed is quite low. Therefore, according to embodiments of the present invention, it is possible to effectively prevent illegal users from successfully passing face recognition-based identity authentication by using facial photos and/or video.
As described above, while performing face recognition, one object may be displayed on the screen 102. Alternatively, a plurality of objects may be sequentially displayed.
As shown in
Next at step S402, an object is displayed on a display screen while performing face recognition based on the obtained image. As described above, the displayed object may be, for example, a bright spot and may differ from the background of display screen 102 in various respects like color, brightness, shape, action and so on. In particular, the object's display position on the screen is determined at random.
Then the method 400 proceeds to step S403, where it is detected whether or not the gaze of the face under recognition moves into proximity of a random position on the screen within a predetermined time period in response to the object being displayed at the random position. It would be appreciated according to the embodiment as discussed herein, in addition to detecting whether the gaze moves into proximity of the object's position, it is detected whether such movement is completed within a predetermined time period. In other words, a time window may be set for the gaze detection. Only the gaze's movement to the object's position detected within this time window is considered valid. Otherwise, if the movement is beyond the time window, it is considered there is a risk that the image is obtained from a non-live user, even if the gaze moves into proximity of a random position of the displayed object,
According to the physiological stress response of organisms, when an obvious object appears on the screen, usually a live user will “gaze” at the object immediately. Moreover, such a physiological characteristic of organisms can hardly be simulated by manually manipulating facial images or video. Therefore, by detecting whether the gaze moves to the object's position within a sufficient time period, the accuracy of live user recognition can be enhanced further.
In order to further avoid the risk of misjudging a non-live user as a live user, optionally the duration that the object is displayed on the screen may be recorded. When the duration that the object is displayed on the screen reaches a threshold, at step S404 the display of the object on the screen is removed. With reference to
As shown in
Returning to
At optional step S405, the stay time of the gaze within the proximity of the displayed object's random position is detected. A time starting point of the stay time is the instant when the gaze moves into the proximity, while a time ending point is the instant when the gaze moves outside the proximity. The detected gaze stay time may be recorded for later live user recognition, which will be described in detail below.
Method 400 then proceeds to step S406, where it is detected whether the number of displayed objects reaches a predetermined threshold. According to embodiments of the present invention, the threshold may be a preset fixed number. Alternatively, the threshold may be randomly generated every time live user recognition is executed. If it is determined at step S406 that the threshold is not reached (branch “No”), then method 400 proceeds to step S407.
At step S407, at least one parameter indicative of environmental status (referred to as “environmental parameters” for short) is obtained, and an appearance of a to-be-displayed object is adjusted based on the environmental parameters. The environmental parameters may be obtained by means of one or more sensors 103 shown in
After step S407, method 200 returns to step S402, where another object (referred to as “a second object” for example) is displayed according to the appearance adjusted at step S407. In particular, according to some embodiments of the present invention, the second object's display position may be set such that it is sufficiently far away from the display position of the previously displayed first object. Specifically, suppose at a first instant, the first object is displayed at a first random position on the screen; at a second instant subsequently, the second object is displayed at a second random position on the screen. The distance from the second random position to the first random position may be made greater than a threshold distance. In implementation, after randomly generating a candidate display position of the second object, the distance from the candidate display position to a first random position may be calculated. If this distance is greater than a predetermined threshold distance, then the candidate display position is set as a second random position used for displaying the second object. Otherwise, if the distance is less than the predetermined threshold distance, then another candidate display position of the second object is generated, and the comparison is repeated until the distance from a candidate display position to the first random position is greater than the predetermined threshold distance. Advantageously, by ensuring the distance between display positions of two objects to be large enough, it makes the gaze movement more recognizable and further increase the accuracy of live user recognition.
At steps S403 to S405, the second object is processed in a similar way to the above processing to the first object. In particular, still with reference to
At step S406, if it is determined the predetermined display number is reached (branch “yes”), then method 400 proceeds to step S408, where it is recognized based on the detection at step S403 and/or step S405 whether the obtained image comes from the live user. Specifically, regarding any one object displayed on the screen, if it is detected at step S403 the gaze does not move into proximity of a random position where the object is located within the predetermined time period, then it is determined the image might be obtained from a non-live user.
Alternatively or additionally, at step S408 the actual stay time within which the gaze stays inside the proximity of the random position as obtained at step S405 may be compared with a predetermined threshold stay time. If the actual stay time is greater than the threshold stay time, then it is considered the gaze's stay is valid. Otherwise, if the actual stay time is less than the threshold stay time, then it is determined there is a risk that the image is obtained from a non-live user. For the purpose of convenient discussion, a non-live user risk (probability) value determined according to the detection on the ith object is recorded as Pi (i=1, 2, . . . , N, wherein N is the number of displayed objects). Thereby, a sequence {P1, P2, . . . , PN} consisting of risk values may be obtained at step S408. Later, according to some embodiments, an accumulated risk value (Σi Pi) that the image is obtained from a non-live user may be calculated. If the accumulated risk value is greater than a threshold accumulated risk value, then it may be determined the image being processed currently is not obtained from the live user. Alternatively, in other embodiments, each separate risk value Pi may be compared with an individual risk threshold. At this point, as an example, if the number of risk values Pi that exceeds the individual risk threshold exceeds a predetermined threshold, then it may be decided the image being processed currently is not obtained from the live user. Other various processing approaches are also applicable, and the scope of the present invention is not limited in this regard.
If it is determined at step S408 that the image being processed currently is from a non-live user, various appropriate subsequent processing may be performed. For example, in some embodiments, the user's identity authentication may be rejected directly. Alternatively, further live user recognition may also be executed. At this point, for example the criterion for live user recognition may be enhanced accordingly, such as displaying more objects, shortening a display interval between multiple objects, etc. On the contrary, if it is determined at step S408 that the image being processed currently is from the live user, then the identity authentication is continued based on a result of face recognition. The scope of the present invention is not limited by any subsequent operation resulting from a result of the live user recognition.
Method 400 ends after step S408.
By sequentially displaying a plurality of objects at a plurality of random positions on the screen, the accuracy and reliability of the live user recognition may be further increased. A concrete example is now considered with reference to
With reference to
According to some embodiments, the gaze detecting unit 702 may comprise: a unit configured to detect whether the gaze of the face moves into the proximity of the random position in a predetermined time period after the object is displayed.
According to some embodiments, a first object is displayed at a first random position on the display screen at a first instant; a second object is displayed at a second random position on the display screen at a subsequent second instant, wherein a distance between the first random position and the second random position is greater than a predetermined threshold distance. Moreover according to some embodiments, before the second instant, the first object is removed from the display screen.
According to some embodiments, the duration for which an object is displayed on the display screen is less than a predetermined threshold time. Alternatively or additionally, according to some embodiments, apparatus 700 may further comprise: a stay time detecting unit (not shown) configured to detect a stay within which the gaze stays inside the proximity of the random position, so as to determine whether the image is obtained from the live user.
According to some embodiments, apparatus 700 may further comprise: an environmental parameter obtaining unit (not shown) configured to obtain at least one parameter indicative of environmental state; and an object appearance adjusting unit (not shown) configured to dynamically adjust the object's appearance based on the at least one parameter. Alternatively or additionally, the object differs from the background of the display screen in at least one respect: color, brightness, shape, action.
It should be understood that for the clarity purpose,
With reference to
One or more units may further be coupled to bus 804: an input unit 806, including a keyboard, mouse, trackball, etc.; an output unit 807, including a display screen, loudspeaker, etc.; storage unit 808, including a hard disk, etc.; and a communication unit 809, including a network adapter like a local area network (LAN) card, modem, etc. Communication unit 809 is used for performing communication process via a network such as the Internet and the like. Alternatively or additionally, communication unit 809 may include one or more antennas for wireless data and/or voice communication. Optionally, a drive 810 may be coupled to I/O unit 805, on which a removable medium 811 may be mounted, such as an optical disk, magneto-optical disk, semiconductor storage medium, etc.
In particular, when the method and process according to embodiments of the present invention is implemented by software, a computer program constituting the software may be downloaded and installed from a network via communication unit 809 and/or installed from removable medium 811.
For the purpose of illustration only, several exemplary embodiments of the present invention have been described above. Embodiments of the present invention can be implemented in software, hardware or combination of software and hardware. The hardware portion can be implemented by using dedicated logic; the software portion can be stored in a memory and executed by an appropriate instruction executing system such as a microprocessor or dedicated design hardware. Those of ordinary skill in the art may appreciate the above system and method can be implemented by using computer-executable instructions and/or by being contained in processor-controlled code, which is provided on carrier media like a magnetic disk, CD or DVD-ROM, programmable memories like a read-only memory (firmware), or data carriers like an optical or electronic signal carrier. The system of the present invention can be embodied as semiconductors like very large scale integrated circuits or gate arrays, logic chips and transistors, or hardware circuitry of programmable hardware devices like field programmable gate arrays and programmable logic devices, or software executable by various types of processors, or a combination of the above hardware circuits and software, such as firmware.
Note although several means or sub-means of the system have been mentioned in the above detailed description, such division is merely exemplary and not mandatory. In fact, according to embodiments of the present invention, the features and functions of two or more means described above may be embodied in one means. On the contrary, the features and functions of one means described above may be embodied by a plurality of means. In addition, although in the accompanying drawings operations of the method of the present invention are described in specific order, it is not required or suggested these operations be necessarily executed in the specific order or the desired result be achieved by executing all illustrated operations. On the contrary, the steps depicted in the flowcharts may change their execution order. Additionally or alternatively, some steps may be omitted, a plurality of steps may be combined into one step for execution, and/or one step may be decomposed into a plurality of steps for execution.
Although the present invention has been described with reference to several embodiments, it is to be understood the present invention is not limited to the embodiments disclosed herein. The present invention is intended to embrace various modifications and equivalent arrangements comprised in the spirit and scope of the appended claims. The scope of the appended claims accords with the broadest interpretation, thereby embracing all such modifications and equivalent structures and functions.
Number | Date | Country | Kind |
---|---|---|---|
201310193848.6 | May 2013 | CN | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/FI2014/050352 | 5/13/2014 | WO | 00 |