Field
Various devices and systems may benefit from convenient authentication. For example, certain mobile devices may benefit from attribute-based continuous user authentication.
Description of the Related Art
Advances in communication and sensing technologies have led to an exponential growth in the use of mobile devices such as smartphones and tablets. Mobile devices are becoming increasingly popular due to their flexibility and convenience in managing personal information. Indeed, mobile devices, such as cellphones, tablets, and smart watches have become inseparable parts of people's lives.
Traditional methods for authenticating users on mobile devices are based on passwords, pin numbers, secret patterns or fingerprints. As long as the mobile phone remains active, typical devices incorporate no mechanisms to verify that the user originally authenticated is still the user in control of the mobile device. Thus, unauthorized individuals may improperly obtain access to personal information of the user if a password is compromised or if a user does not exercise adequate vigilance after initial authentication on a device.
The users often store important information such as bank account details or credentials to access their sensitive accounts on their mobile phones. Moreover, nearly half of the users do not use any form of authentication mechanism for their phones because of the frustrations made by these methods. Even if they do, as mentioned above, the initial password-based authentication can be compromised and thus it cannot continuously protect the personal information of the users.
According to certain embodiments, a method can include determining attributes of an authorized user of a mobile device. The method can also include obtaining an unconstrained image of a current user of the mobile device. The method can further include processing the unconstrained image to determine at least one characteristic of the current user. The method can additionally include making an authorization determination based on a comparison between the attributes and the determined characteristic.
In certain embodiments, an apparatus can include at least one processor and at least one memory including computer program code. The at least one memory and the computer program code can be configured to, with the at least one processor, cause the apparatus at least to determine attributes of an authorized user of a mobile device. The at least one memory and the computer program code can also be configured to, with the at least one processor, cause the apparatus at least to obtain an unconstrained image of a current user of the mobile device. The at least one memory and the computer program code can further be configured to, with the at least one processor, cause the apparatus at least to process the unconstrained image to determine at least one characteristic of the current user. The at least one memory and the computer program code can additionally be configured to, with the at least one processor, cause the apparatus at least to make an authorization determination based on a comparison between the attributes and the determined characteristic.
An apparatus, according to certain embodiments, can include means for determining attributes of an authorized user of a mobile device. The apparatus can also include means for obtaining an unconstrained image of a current user of the mobile device. The apparatus can further include means for processing the unconstrained image to determine at least one characteristic of the current user. The apparatus can additionally include means for making an authorization determination based on a comparison between the attributes and the determined characteristic.
A non-transitory computer readable medium can be encoded with instructions that, when executed in hardware, perform a process. The process can include determining attributes of an authorized user of a mobile device. The process can also include obtaining an unconstrained image of a current user of the mobile device. The process can further include processing the unconstrained image to determine at least one characteristic of the current user. The process can additionally include making an authorization determination based on a comparison between the attributes and the determined characteristic.
A computer program product can encode instructions for performing a process. The process can include determining attributes of an authorized user of a mobile device. The process can also include obtaining an unconstrained image of a current user of the mobile device. The process can further include processing the unconstrained image to determine at least one characteristic of the current user. The process can additionally include making an authorization determination based on a comparison between the attributes and the determined characteristic.
For proper understanding of the invention, reference should be made to the accompanying drawings, wherein:
Certain embodiments provide a method of using facial attributes for continuous authentication of smartphone users. The binary attribute classifiers can be trained using, for example, a PubFig dataset, and can provide compact visual descriptions of faces. The learned classifiers can be applied to the image of the current user of a mobile device to extract the attributes. Authentication can be done by comparing the difference between the acquired attributes and the enrolled attributes of the original user. Certain embodiments applied to unconstrained mobile face video datasets can capture meaningful attributes of faces and perform better than previously proposed local binary pattern (LBP)-based authentication methods.
For example, a deep convolutional neural network (DCNN) architecture can be provided for the task of continuous authentication on mobile devices. To deal with the limited resources of these devices or for other reasons such as speed, the complexity of the networks can be reduced by learning intermediate features such as gender and hair color instead of identities.
A multi-task, part-based DCNN architecture can be used for attribute detection and can perform better than the conventional methods, in terms of accuracy.
Each attribute classifier Cliε{Cl1, . . . ClN} can be trained by an automatic procedure of model selection for each attribute AiEε{A1, . . . , AN}, where N is the total number of attributes. Automatic selection can be used as each attribute may need a different model. Models can be indexed in various ways.
For each attribute, a set of different facial parts or components can be more discriminative. The face components considered for training can include eyes, nose, mouth, hair, eyes&nose, mouth&nose, eyes&nose&mouth, eyes&eyebrows, and the full face. In total, nine different face components can be considered in certain embodiments.
For different attributes, different types of features may be needed. For example, for the attribute “blond hair,” features related to color can be more discriminative than features related to texture. In certain embodiments, four types of features may be used, including local binary patterns (LBP), color LBP, histogram of oriented gradients (HOG), and color HOG.
In order to capture local information regarding the locality of the features, different cell sizes of the HOG and the LBP features can be considered. In total, six different cell sizes, 6, 8, 12, 16, 24, 32, can be used in certain embodiments.
Any available fiducial point detection method can be used to extract the different facial components. Furthermore, the detected landmarks can also be used to align the faces to a canonical coordinate system. After extracting each set of features, the principal component analysis (PCA) can be used with 99% of the energy to project each feature onto a low-dimensional subspace. A support vector machine (SVM) with the RBF kernel can then be learned on these features. This process can be run exhaustively to train all possible models. For each attribute classifier, most of the available data can be used for training the SVMs and the remaining data can be used for model selection. The face images in the test set do not need to overlap with those in the training set. Total number of negative and positive classes can be the same for both training and testing Finally, among all 216 SVMs, five with the best accuracies can be selected.
Continuous authentication can be treated as an attempt at verification regarding whether a given pair of videos or images correspond to the same person or not. The receiver operating characteristic (ROC) curve, which describes the relations between false acceptance rates (FARs) and true acceptance rates (TARs), can be used to evaluate the performance of verification algorithms. As the TAR increases, so does the FAR.
Therefore, one would expect an ideal verification framework to have TARs all equal to 1 for any FARs, The ROC curves can be computed given a similarity matrix.
Certain embodiments can extract an attribute vector from each image in a given video. The vectors can then be averaged to obtain a single attribute vector that represents the entire video.
As shown in
The enrolled attributes can be a specific set, such as chubby, beard, mustache, blond, eyeglasses, and male, as shown at 250. This may be a subset of all possible attributes, such as attributes that are particularly easy to detect or otherwise good discriminators between the authenticated user and other users.
During use, the images taken at 210 can be provided to an efficient deep part-based attribute detection network. These can extract a set of attributes at 230, which can be the same set or a different set from the enrolled set of attributes.
At 260, a comparison between enrolled and more recently extracted attributes can be performed. If the attributes match, access can be continued. Otherwise, the phone or other mobile device can be locked. The match does not have to be a precise match. For example, as shown in
An individual or collective threshold can be applied to the attributes. Authentication, therefore, can depend on the threshold or thresholds being met to a predetermined degree. In the particular example of
Four sets of models based on these two architectures can include BinaryDeep-CNNAA and BinaryWide-CNNAA, which are single task networks, as well as MultiDeep-CNNAA and MultiWide-CNNAA, which are multi-task networks. While these are example networks and models that can be used, other networks and models are permitted.
Care can be taken when training these networks to ensure that classes with more available training data do not unduly influence the results. Thus, the training of the networks can be manipulated by adding in distorted versions of rarer cases, so that the number of images from each class is approximately equal.
Attributes, or semantic features, can be used in a variety of ways, including activity recognition in video and face verification. Improving the accuracy of attribute classifiers can be an important first step in any application which uses these attributes. Attributes are typically considered to be independent in conventional usage of attributes.
However, many attributes are very strongly positively related, such as heavy makeup and wearing lipstick or very strongly negatively related such as being female and having a full beard. Attribute relationships can be exploited in, for example, three ways: by using a multi-task deep convolutional neural network (MCNN) sharing the lowest layers amongst all attributes, sharing the higher layers for related attributes, and by building an auxiliary network on top of the MCNN which utilizes the scores from all attributes to improve the final classification of each attribute.
Attributes are mid-level representations that can be used for the recognition of activities, objects, and people. Attributes can provide an abstraction between the low-level features and the high-level labels. Attributes can be used in face recognition and verification. In the face recognition domain, attributes can include gender, race, age, hair color, facial hair, and so on. These semantic features can be very intuitive, and can allow for much more understandable descriptions of objects, people, and activities.
Reliable estimation of facial attributes can be useful for many different tasks. Human computer interaction (HCl) applications may require information about gender in order to properly greet a user, such as Mr. or Ms., and other attributes such as expression in order to determine the mood of the user. Facial attributes can be used for identity verification in low quality imagery, where other verification methods may fail. Suspects are often described in terms of attributes, and so they can be used to automatically search for suspects in surveillance video. Attributes can be used to search a database of images very quickly. They can be used in both image search and retrieval
Convolutional neural networks (CNNs) have replaced most traditional methods for feature extraction in many computer vision problems. They can be effective in attribute classification as well. However, as mentioned above, attributes have generally been treated as independent from each other. From a simple example, a woman wearing lipstick and earrings, it can be seen that the attributes are not highly independent. If a subject is wearing lipstick and earrings, the probability that the subject is a woman is much higher than if they did not exhibit those attributes, and the reverse is also true. Treating each attribute as independent may fail to use the valuable information provided by the other attributes. Attributes can fit nicely into a multi-task learning framework, where multiple problems can be solved jointly using shared information.
In certain embodiments, therefore, a multi-task deep CNN (MCNN) with an auxiliary network (MCNN-AUX) on top can be applied in order to utilize information provided by all attributes in three ways: by sharing the lower layers of the MCNN for all attributes, by sharing the higher layers for similar attributes, and by utilizing all attribute scores from the MCNN in an auxiliary network in order to improve the recognition of individual attributes.
Multi-task learning (MTL) can be a way of solving several problems at the same time utilizing shared information. MTL has found success in the domains of facial landmark localization, pose estimation, action recognition, face detection, as well as other areas.
The groups can be chosen, as in this example, according to attribute location. Some groupings can be separated from others and some can be absorbed into others depending on the desired results. For example, if male is kept separate from all other attributes the gender results may not be as good as with sharing, but the performance of the other attributes may be improved. A compromise may be, for example, to include male in the shared Conv1 and Conv2 layers and then to have separate Conv3, FC1, and FC2 layers.
If an independent CNN were used for each attribute following the architecture of one path in the MCNN, 3 convolutional layers and 3 fully connected layers, each CNN would have over 1.6 million parameters. So, for all 40 attributes, there would be over 64 million parameters. Using MCNN, this can be reduced to less than 15 million parameters, over four times fewer.
After training the MCNN, a fully connected layer, AUX, can be connected after the output of the trained MCNN. Starting with the weights from the trained MCNN, the weights for the AUX portion of the network can be learned, keeping the weights from the MCNN constant. The AUX layer can allow for interactions amongst attributes at the score level. The MCNN-AUX network can learn the relationship amongst attribute scores in order to improve overall classification accuracy for each attribute.
The method can also include, at 620, obtaining an unconstrained image of a current user of the mobile device. This unconstrained still or video image can be obtained automatically, for example, using a camera of the mobile device that points toward the user of the device. The obtained image can be obtained periodically or when triggered by an event such as opening a new application or an application with a particular security setting or classification. For example, accessing an application that has access to personal data of the user may trigger obtaining the image, even if the user had previously and recently authenticated.
The method can further include, at 630, processing the unconstrained image to determine at least one characteristic of the current user. The at least one characteristic can include a plurality of characteristics, and the authorization determination can be based on a correlation between or among the plurality of characteristics, as described above, for example, with reference to
In the case where the image is a video image, the method can further include, at 632, extracting an attribute vector from each image in the video. The method can also include, at 634, averaging the attribute vectors to obtain a single attribute vector representative of the video. In the case of a still image, an attribute vector can be similarly obtained and can simply be used without averaging.
The method can additionally include, at 640, making an authorization determination based on a comparison between the attributes and the determined characteristic. The authorization determination can include determining whether a level of confidence exceeds a threshold. The authorization determination is made without determining the identity of the current user. For example, in certain cases confirming that the gender of the current user, the chubbiness of the current user, and the eyeglasses condition of the current user matches the authorized user may be enough to indicate that the current user is authorized, even though such details do not uniquely identify the user.
The method can also include, at 650, taking some further action based on the authorization determination, such as locking the device, logging the apparent lack of authorization, and/or reporting the apparent lack of authorization. The obtained image can be stored or forwarded. In certain cases, the image can be used to update the enrolled attributes of the current user.
Each of these devices may include at least one processor or control unit or module, respectively indicated as 714 and 724. At least one memory may be provided in each device, and indicated as 715 and 725, respectively. The memory may include computer program instructions or computer code contained therein, for example for carrying out the embodiments described above. One or more transceiver 716 and 726 may be provided, and each device may also include an antenna, respectively illustrated as 717 and 727. Other configurations of these devices, for example, may be provided. For example, server 710 and UE 720 may be additionally configured for wired communication, in addition to wireless communication, and in such a case antennas 717 and 727 may illustrate any form of communication hardware, without being limited to merely an antenna.
Transceivers 716 and 726 may each, independently, be a transmitter, a receiver, or both a transmitter and a receiver, or a unit or device that may be configured both for transmission and reception. The transmitter and/or receiver (as far as radio parts are concerned) may also be implemented as a remote radio head which is not located in the device itself, but in a mast, for example. One or more functionalities may also be implemented as a virtual application that is provided as software that can run on a server.
A user device or user equipment 720 may be a mobile station (MS) such as a mobile phone or smart phone or multimedia device, a vehicle, a computer, such as a tablet, provided with wireless communication capabilities, personal data or digital assistant (PDA) provided with wireless communication capabilities, portable media player, digital camera, pocket video camera, navigation unit provided with wireless communication capabilities or any combinations thereof.
In an exemplifying embodiment, an apparatus, such as a node or user device, may include means for carrying out embodiments described above in relation to
Processors 714 and 724 may be embodied by any computational or data processing device, such as a central processing unit (CPU), digital signal processor (DSP), application specific integrated circuit (ASIC), programmable logic devices (PLDs), field programmable gate arrays (FPGAs), digitally enhanced circuits, or comparable device or a combination thereof. The processors may be implemented as a single controller, or a plurality of controllers or processors. Additionally, the processors may be implemented as a pool of processors in a local configuration, in a cloud configuration, or in a combination thereof.
For firmware or software, the implementation may include modules or units of at least one chip set (e.g., procedures, functions, and so on). Memories 715 and 725 may independently be any suitable storage device, such as a non-transitory computer-readable medium. A hard disk drive (HDD), random access memory (RAM), flash memory, or other suitable memory may be used. The memories may be combined on a single integrated circuit as the processor, or may be separate therefrom. Furthermore, the computer program instructions may be stored in the memory and which may be processed by the processors can be any suitable form of computer program code, for example, a compiled or interpreted computer program written in any suitable programming language. The memory or data storage entity is typically internal but may also be external or a combination thereof, such as in the case when additional memory capacity is obtained from a service provider. The memory may be fixed or removable.
The memory and the computer program instructions may be configured, with the processor for the particular device, to cause a hardware apparatus such as server 710 and/or UE 720, to perform any of the processes described above (see, for example,
Furthermore, although
One having ordinary skill in the art will readily understand that the invention as discussed above may be practiced with steps in a different order, and/or with hardware elements in configurations which are different than those which are disclosed. Therefore, although the invention has been described based upon these preferred embodiments, it would be apparent to those of skill in the art that certain modifications, variations, and alternative constructions would be apparent, while remaining within the spirit and scope of the invention.
This application is a non-provisional of, and claims the benefit and priority of, U.S. Provisional Patent Application No. 62/194,603 filed Jul. 20, 2015, “Attribute-based Continuous User Authentication on Mobile Devices,” the entirety of which is hereby incorporated herein by reference.
This invention was made with government support under FA87501320279 awarded by AFRL. The government has certain rights in the invention.
Number | Date | Country | |
---|---|---|---|
62194603 | Jul 2015 | US |