Labeling large collections of images can be a daunting and time-consuming task. While some tools exist that provide a user with an interface for applying labels to a repository of images, these existing tools lack an intuitive and interactive interface that recognizes and makes use of the uncertainty in image recognition software and overcomes user interface limitations inherent in mobile or touch-screen devices. Particularly on mobile devices, screen space tends to be limited and the use of menus is often cumbersome and inefficient. Further, with respect to face labeling, conventional methods do not utilize the full knowledge computed by a facial recognition engine. Instead prior art methods may base decisions on a strict threshold of confidence for determining whether unlabeled faces are presented to a user with a suggested label. For example, a conventional face labeling system may be somewhat confident in a face match, but not confident enough to display the faces as a suggested match. In such a case, the conventional face labeling system provides no indication of partial confidence in a face match. Accordingly, a conventional face labeling system is typically either too conservative or too liberal in providing face label suggestions. With a conservative setting, a conventional face labeling system will only suggest labels and faces which are highly likely to be a match. This approach results in a lot of work for the user, as the system will not display many suggested faces and the user will need to manually label many faces. With a liberal setting, a conventional face labeling system will display labels and faces that have a lower likelihood of being a match. This approach may result in frustration for the user, as the user will be required to correct any mistakes made by the conventional face labeling system.
An image labeling system is disclosed, which provides a method for labeling a collection of images, and which makes use of gesture recognition and different levels of confidence within image recognition engines in order to intuitively present labeling results. For example, the image labeling system may provide a mechanism for a user to label images that appear in a collection of digital images. The image labeling system may display one or more labeled images within a user interface, where each one of the one or more labeled images is displayed within a different region of a user interface of the image labeling system. Also within the user interface, the image labeling system displays one or more unlabeled images, where an unlabeled image is displayed in a location within the user interface that is dependent on similarities between the unlabeled image and one or more of the labeled images. The location in which an unlabeled image is displayed may further depend on the levels of uncertainty resulting from an image recognition computation. The image labeling system may also receive user input defining a gesture, and determine a labeling task from among a plurality of labeling tasks based on one or more characteristics of the gesture.
While the invention is described herein by way of example for several embodiments and illustrative drawings, those skilled in the art will recognize that the invention is not limited to the embodiments or drawings described. It should be understood, that the drawings and detailed description thereto are not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents and alternatives falling within the spirit and scope of the present invention. The headings used herein are for organizational purposes only and are not meant to be used to limit the scope of the description. As used throughout this application, the word “may” is used in a permissive sense (meaning having the potential to), rather than the mandatory sense (meaning must). Similarly, the words “include”, “including”, and “includes” mean including, but not limited to.
Various embodiments of an image labeling system allow a user to label collections of images using gestures. In some embodiments, the image labeling system may also provide the user with visual feedback on which images are being selected and indications of how the selected images are going to be labeled. As used throughout this application, a gesture may be any input received from a user indicative of motion in two-dimensional or three-dimensional space and the gesture may be defined or further modified based on elapsed time during the gesture.
Without the use of menus or traditional input devices such as keyboards or a mouse, it can be difficult to label a large quantity of images using traditional image labeling tools. However, in some embodiments, the image labeling system overcomes this problem and operates without using menus or traditional input devices. The image labeling system allows a user to label images on a touch-screen device through the interpretation of gestures to determine which images in a collection of images are being selected and the gestures are also interpreted to determine which label to apply to the selected images.
In some embodiments, the image labeling system may be implemented on a camera or a mobile device with a camera. In such an embodiment, during the camera preview period or while reviewing previously taken images, a user may see a given image within the user interface of the image labeling system, and in response to a gesture on the touch-sensitive screen, the image may be labeled according to any of the gesture recognition commands given below.
In some embodiments, a social media website may incorporate an implementation of the image labeling system to allow a user to quickly label, or apply tags to, collections of images. In other embodiments, the image labeling system may be a stand alone application that may be downloaded and installed onto a touch-sensitive device. In other embodiments, an implementation of the image labeling system may be built into the operating system of the touch-sensitive device and may be usable to label collections of images stored on the touch-sensitive device.
Various embodiments of a system and methods for labeling a collection of images are described below. For simplicity, embodiments of the system and methods for labeling a collection of images described will be referred to collectively as an image labeling system. For example purposes, embodiments of the image labeling system will be described as a system for labeling faces in digital images. Note that the example of labeling faces in digital images is not meant to be limiting, as other embodiments of the image labeling system may assign labels to images based on image content other than faces.
In some embodiments, the image labeling system may provide a semi-automated mechanism through which a user may assign labels to all of the images in a collection of digital images. For example, the image labeling system may enable the user to label all of the image elements (e.g., faces, animals, beaches) that appear in each image of the digital image collection. A label, or a “tag,” that is assigned to a face may be a person's name or may otherwise identify the person. Labels that are assigned to faces in a digital image may be associated with the image. For example, the face labels may be included in metadata for the image. A digital image may include several faces, or various combinations of image elements, where each image element may have a different label. Accordingly, each face label may include information which identifies a particular face in the image that corresponds to the face label. For example, the face label may include coordinate information which may specify the location of the corresponding face in the digital image. Other labels that may be assigned to images via the image labeling system may identify content other than faces that is contained within the images. For example, the image labels may identify a particular event, location or scene that is contained in the content of the image. In such a case, for example, an image of a beach or a snowy or rainy backdrop or activity such as skiing, generated metadata may be associated with the entirety of the image, rather than to any particular coordinate or component element of the image.
In some embodiments, a user may apply labels to images in a digital image collection for a variety of reasons. For example, the image labels may enable efficient search and retrieval of images with particular content from a large image collection. As another example, image labels may enable efficient and accurate sorting of the images according to image content. Examples of digital images that may be included in a digital image collection may include, but are not limited to, images captured with a digital camera, photographs scanned into a computer system, and video frames extracted from a digital video sequence. A collection of digital images may be a set of digital photographs organized as a digital photo album, a set of video frames which represent a digital video sequence, or any set of digital images which are organized or grouped together. Digital images may also be visual representations of other types of electronic items. For example, the digital images may be visual representations of electronic documents, such as word processing documents, spreadsheets, and/or portable document format (PDF) files. The image labeling system may be operable to enable a user to label a collection of any sort of electronic items which may be visually represented.
In some embodiments, the image labeling system may provide a semi-automatic mechanism through which a user may efficiently assign labels to all of the images in a set of images. The system, via a display in a display area, may automatically display unlabeled image elements that are likely to be similar to displayed, labeled image elements. The image labeling system may indicate a likelihood of similarity between an unlabeled image element and a labeled image element via the spatial proximity of the unlabeled image element to the labeled image element. For example, an unlabeled image element that is more likely to be similar to a labeled image element may be displayed closer, in spatial proximity, to the labeled image element. A user may provide a manual input which may indicate labels for the unlabeled image elements.
In some embodiments, the image labeling system may maintain the same context (e.g., a same view in a same display area) as image elements are labeled. The display of unlabeled image elements in the display area may be continuously updated as a user labels image elements, but the context of the display area may not be changed throughout the image element labeling process. Accordingly, the user does not have to lose context or navigate through multiple windows while labeling a set of images.
In some embodiments, the image labeling system may analyze image content to determine image elements that are likely to be similar. The system may occasionally make mistakes when determining similar image elements, due to obscured content, poor quality images, or for other reasons. However, any mistakes that the image labeling system makes may be unobtrusive to the user. For example, a user may simply ignore unlabeled image elements in a display area that do not match any of the labeled image elements in the display area. The non-matched unlabeled image elements may eventually be removed from the display area as the image labeling system continuously updates the display of unlabeled image elements in response to labels that are received from the user. For example, as a user labels images, the image recognition engine may recalculate the similarity of an unlabeled image with a higher degree of confidence, and if the unlabeled image has become increasingly dissimilar, it may eventually fall beneath the threshold at which images are chosen for display within the user interface labeling area.
In some embodiments, the user input which indicates labels for unlabeled images may serve two purposes for the image labeling system. The user input may enable the image labeling system to assign labels to unlabeled images in order to meet the goal of the system to label all of the images in a set of images. Furthermore, the user input may serve as training information which may assist the image labeling system in making more accurate estimations of similar image elements. The image labeling system may use the user-assigned labels as additional criteria when comparing image elements to determine whether the image elements are similar. The image labeling system may continuously receive training feedback as a user applies labels to images and may use the training feedback to increase the accuracy of determining similar image elements. Accordingly, the accuracy of the display of the unlabeled image elements may be increasingly improved as the image element labeling process progresses, which may, in turn, increase efficiencies for the user of the system.
Various embodiments of a system and method for labeling a collection of images are described. In the following detailed description, numerous specific details are set forth to provide a thorough understanding of claimed subject matter. However, it will be understood by those skilled in the art that claimed subject matter may be practiced without these specific details. In other instances, methods, apparatus or systems that would be known by one of ordinary skill have not been described in detail so as not to obscure claimed subject matter.
Some portions of the detailed description may be presented in terms of algorithms or symbolic representations of operations on binary digital signals stored within a memory of a specific apparatus or special purpose computing device or platform. In the context of this particular specification, the term specific apparatus or the like includes a general purpose computer once it is programmed to perform particular functions pursuant to instructions from program software. Algorithmic descriptions or symbolic representations are examples of techniques used by those of ordinary skill in the signal processing or related arts to convey the substance of their work to others skilled in the art. An algorithm is here, and is generally, considered to be a self-consistent sequence of operations or similar signal processing leading to a desired result. In this context, operations or processing involve physical manipulation of physical quantities. Typically, although not necessarily, such quantities may take the form of electrical or magnetic signals capable of being stored, transferred, combined, compared or otherwise manipulated. It has proven convenient at times, principally for reasons of common usage, to refer to such signals as bits, data, values, elements, symbols, characters, terms, numbers, numerals or the like. It should be understood, however, that all of these or similar terms are to be associated with appropriate physical quantities and are merely convenient labels. Unless specifically stated otherwise, as apparent from the discussion, it is appreciated that throughout this specification discussions utilizing terms such as “processing,” “computing,” “calculating,” “determining” or the like refer to actions or processes of a specific apparatus, such as a special purpose computer or a similar special purpose electronic computing device. In the context of this specification, therefore, a special purpose computer or a similar special purpose electronic computing device is capable of manipulating or transforming signals, typically represented as physical electronic or magnetic quantities within memories, registers, or other information storage devices, transmission devices, or display devices of the special purpose computer or similar special purpose electronic computing device.
In some embodiments, the image labeling system may analyze a collection of digital images to detect all image elements that appear in each image of the collection and may provide a mechanism for a user to assign labels to each one of the detected image elements. For example, a single image may include multiple people, and in such a case, a single image element may correspond to a single face within the image.
Embodiments of the method for labeling detected image elements in a collection of digital images may be implemented, for example, in an image labeling module 100, as depicted within
Image labeling module 100 may receive as input a collection of digital images, such as digital image collection 130 illustrated in
Image element detector 112 of module 100 may analyze digital image collection 130 to detect all of the image elements that appear in the images of digital image collection 130. As an example, image element detector 112 may be a face detector. The face detector may, in various embodiments, use various algorithms to detect the faces which appear in digital image collection 130. Such algorithms may include, for example, facial pattern recognition as implemented in algorithms such as Eigenfaces, Adaboost classifier training algorithms, and neural network-based face detection algorithms. In other embodiments, the image labeling system may be operable to detect image content other than, or in addition to, faces. For example, the image labeling system may be operable to detect content such as a particular scene or location in a collection of digital images.
Similarity engine 114 of module 100 may analyze the set of image elements detected with image element detector 112 to locate image elements that are likely to be the same image content. As an example, similarity engine 114 may be a face recognition engine that may analyze a set of faces detected in digital image collection 130. The face recognition engine may determine faces that are likely to belong to the same person. The face recognition engine may compare the facial characteristics for each pair of faces in the set of detected faces.
In addition to facial characteristics, the face recognition engine may compare visual and non-visual, contextual characteristics that are associated with faces in the set of detected faces. Examples of such visual and non-visual, contextual characteristics may be clothing features, hair features, image labels, and/or image time stamps. A label that is assigned to a face may indicate particular traits that may be useful in determining whether two faces belong to the same person. For example, a label that is assigned to a face may indicate a gender, race and/or age of the person representative of the face. Dependent on the facial characteristics and/or the contextual characteristics, the face recognition engine may compute a similarity metric for each pair of faces. The similarity metric may be a value which indicates a probability that the pair of faces belong to the same person. Similarity engine 114 may be configured to analyze other types of detected image elements (e.g., landscapes) to determine similar characteristics between the detected image elements. Dependent on the analysis, similarity engine 114 may calculate a similarity metric for each pair of detected image elements.
Depending on the similarity metrics calculated according to similarity engine 114, image labeling module 100 may display a subset of the image elements to a user. For example, display module 116 may select, dependent on the similarity metrics calculated according to similarity engine 114, a subset of the detected image elements to display for a user. The image elements may be displayed, for example, in user interface 110. Display module 116 may display, in user interface 110, for example, a combination of image elements which have labels (e.g., labeled image elements) and faces which do not have labels (e.g., unlabeled image elements).
Display module 116 may determine a display location, within user interface 110, for each unlabeled image element dependent on similarity metrics between the displayed, unlabeled image elements and the displayed, labeled image elements. For example, as described in further detail below, the spatial proximity of each displayed, unlabeled image element to a displayed, labeled image element may indicate the probability that the two image elements contain the same content. As an example, image labeling module 100 may display labeled and unlabeled faces in user interface 110. The spatial proximity of the unlabeled faces to the labeled faces in the display area of user interface 110 may indicate the probability that the unlabeled and labeled faces belong to the same person.
User interface 110 may provide a mechanism which a user may use to indicate image elements which contain the same content. User interface 110 may provide one or more textual and/or graphical user interface elements, modes or techniques via which a user may interact with module 100, for example to specify, select, or change the value for one or more labels identifying one or more image elements in digital image collection 130. For example, using a selection mechanism provided from user interface 110, a user, via user input 120, may indicate unlabeled faces that belong to the same person as a labeled face.
The image labeling system may be used with any type of computing input device via which a user may select displayed image elements and assign and/or change labels for displayed image elements. For example, the image labeling system may include a conventional input pointing device, such as a mouse. As another example, the image labeling system may include a stylus input applied to a tablet PC. As yet another example, the image labeling system may include a touch-sensitive device configured to interpret touch gestures that are applied to the surface of the touch-sensitive device. As an alternative, the image labeling system may include an input device that is configured to sense gestural motions in two-dimensional or three-dimensional space. An example of such an input device may be a surface that is configured to sense non-contact gestures that are performed while hovering over the surface, rather than directly contacting the surface. User interface 110 may provide various selection tools, for example, a rectangular selection box, a brush tool, and/or a lasso tool, via which a user may use any of the input mechanisms described above to select one or more images displayed in user interface 110.
Dependent on the user input, image labeling module 100 may assign labels to the unlabeled image elements selected, or indicated, through the user via user input 120. For example, image labeler 118 may assign labels to unlabeled image elements, dependent on the user input. As an example, image labeler 118 may assign labels to unlabeled faces, dependent on the user input. In some embodiments, the labels may be tags assigned to the images in which the labeled image elements are depicted. The labels may be stored in association with the images, for example, as part of the image metadata. Module 100 may generate as output a labeled digital image collection 140, with each face, or other image content, in the collection associated with a label. Labeled digital image collection 140 may, for example, be stored to a storage medium 150, such as system memory, a disk drive, DVD, CD, etc., and/or displayed on a display 160.
The images of digital image collection 130 may include various image elements which a user may wish to identify with labels. For example, digital image collection 130 may include images of various people which a user may wish to identify with a label assignment for each person. Labeling each person that appears in digital image collection 130 may allow a user to perform future searches to locate a particular person or persons within the digital image collection. For example, a user may wish to perform a search of the digital image collection in order to locate all images which contain a person labeled as “Ellen.” Since facial characteristics may be a convenient mechanism for recognizing a person in an image, people in digital images may be identified according to their faces. Similarly, a label which identifies a person in a digital image may be associated with the person's face in the image. Accordingly, the labels referred to herein may be labels associated with faces in a collection of digital images. A label associated with a face in a digital image may typically be the name of the person in the digital image, although other types of labels are possible. For example, a label may be a description that identifies a person as part of a particular group (e.g., “family” or “classmate”).
As described above, image labeling module 100 may receive a digital image collection 130. Image element detector 112 may perform an analysis of the images in digital image collection 130 to detect all of the faces that appear in digital image collection 130. To detect faces that appear in a digital image, image element detector 112 may identify regions or portions of the digital image that may correspond to a face depicted in the digital image. In various embodiments, various techniques may be used by image element detector 112 to identify such regions or portions of a digital image that may correspond to a face. Some example techniques that may be employed by image element detector 112 may include, but are not limited to, facial patterns defined according to Eigenfaces, Adaboost classifier training algorithms, and neural network-based face detection algorithms.
Image labeling module 100 may implement the method illustrated in
The labeled faces that are displayed according to face display module 116 may be a subset of the detected faces in digital image collection 130. A user, via user interface 110 of module 100, may assign labels to the subset of the faces in digital image collection 130. The initial user input which assigns labels to a subset of faces in the digital image collection may provide an initial set of labeled faces which the image labeling system may use to begin the image labeling process. In some embodiments, the user may select a desired number of the detected faces and may provide user input which may assign a label to each face selected according to user input. In other embodiments, image labeling module 100 may provide guidance, and/or instructions, to the user for labeling a subset of the detected faces. For example, image labeling module 100, via user interface 110, may instruct the user to select and label a certain number of the detected faces in digital image collection 130. In such an example, image labeling module 100 may request that the user assign labels to a particular number, or a particular percentage, of the detected faces in digital image collection 130.
In other embodiments, image labeling module 100 may select a subset of the faces detected in digital image collection 130 and may request that the user assign a label to each face in the selected subset of faces. Similarity engine 114 may calculate a similarity metric for each pair of detected faces in digital image collection 130. The similarity metric for a pair of faces may correspond to a measure of similarity between the faces. In some embodiments, image labeling module 100 may select the initial subset of faces to be labeled according to a user dependent on the similarity metrics calculated from similarity engine 114. For example, dependent on the similarity metrics, similarity engine 114 may form groups of similar faces. From each group of similar faces, similarity engine 114 may select a representative image. Image labeling module 100 may display some, or all, of the representative faces to the user and may request that the user assign a label to each one of the representative faces. Some or all of the faces which have been labeled according to user input may be displayed with image labeling module 100 in user interface 110.
An example of labeled faces that may be displayed in user interface 110 is illustrated in
Other embodiments may display a different number of labeled image elements, may use different portions of a display region, and/or may use a display region of a different shape. For example, instead of displaying four different labeled faces in the four corners of a rectangular display area, as illustrated in region 310 of
In some embodiments, image labeling module 100 may automatically select the labeled faces that are displayed in a display region. As an example, image labeling module 100 may arbitrarily select four faces from the subset of labeled faces for display in region 310. As another example, image labeling module 100 may display a number of representative images from groups of similar images that have been formed dependent on the similarity metrics, as described above. In other embodiments, a user may select the labeled faces which may be displayed in region 310 of user interface 110. As an example,
The image labeling system, in various embodiments, may use a variety of different methods to detect image elements in digital images. The image labeling system, in various embodiments, may also use a variety of different methods to determine similarities between image elements and calculate similarity metrics for pairs of image elements. As an example, image element detector 112 and similarity engine 114 may detect faces and calculate similarity metrics for pairs of the detected faces using a method similar that described in U.S. patent application Ser. No. 12/857,351 entitled “System and Method for Using Contextual Features to Improve Face Recognition in Digital Images,” filed Aug. 16, 2010, the content of which is incorporated by reference herein in its entirety.
As indicated at 210, the method illustrated in
Face display module 116 may display up to a maximum number of unlabeled faces, M, in display region 310. The maximum number of faces, M, may be determined such that the display area is not cluttered with too many unlabeled faces. For example, display module 116 may calculate M, based on the size of display region 310, such that a certain amount of open space remains in display region 310 when M unlabeled faces are displayed in display region 310. In other embodiments, a number of maximum faces, M, may be selected according to user input, for example, via an options or preferences menu within user interface 110.
Face display module 116 may select up to M unlabeled faces from the set of unlabeled faces for display in display region 310. The selection of up to M unlabeled face may be dependent on the displayed, labeled faces and may also be dependent on the similarity metrics calculated from similarity engine 114. Face display module 116 may use the similarity metrics to determine the M unlabeled faces that are most similar to the displayed, labeled faces. For example, in reference to
The display position of an unlabeled face may be dependent on the similarities (e.g., the similarity metrics) between the unlabeled face and the displayed, labeled faces. More specifically, as described in further detail below, the spatial proximity of an unlabeled face in display region 310 to a labeled face in display region 310 may indicate the likelihood that the two faces belong to the same person. For example, an unlabeled face and a labeled face that are displayed in close spatial proximity are very likely to be faces that belong to the same person.
In some embodiments, the display size of an unlabeled face may also be dependent on the similarities (e.g., the similarity metrics) between the unlabeled face and the displayed, labeled faces. For example, the display size of an unlabeled face may indicate that likelihood that the unlabeled face belongs to the same person as a labeled face. For example, an unlabeled face that is more likely to be the same person as a labeled face may be displayed in a larger size than an unlabeled face that is less likely to be the same person as a labeled face.
In other embodiments, the image labeling system may use other criteria to select and display unlabeled images in the display area. As an example, the image labeling system may place male faces on one side of the display area and female faces on the other side of the display area. As another example, the image labeling system may place faces in the display area based on criteria such as race or age. In yet another example, the image labeling system may place faces in the display area based on time and/or location (e.g., geo-tag) information for the images which depict the faces. The criteria for placing unlabeled images in a display area may be determined according to user input via user interface 110. For example, the user may wish to label all of the faces of people who attended a particular party or event. The image labeling system may use a image labeling method similar to that described above for a particular set of images which have timestamps within a specified time period, for example, a range of time over which the particular party or event took place.
As indicated at 220, the method illustrated in
As a specific example, a user may select a labeled face in display region. Subsequent to the selection of the labeled face, the user may further select one or more unlabeled faces in the display region. The user selection of the one or more unlabeled faces may indicate that the label for the selected face should be applied to the selected one or more unlabeled faces. In some embodiments, a user may select a labeled face in a corner and may select one or more unlabeled faces which should receive the same label through the process of “painting over” the unlabeled faces. The image labeling system may provide various mechanisms and/or tools via which a user may select a group of unlabeled faces. For example, a user may use a rectangle selection tool, a brush tool, and/or a lasso tool to select the one or more unlabeled faces.
Note that the examples of selecting one or more unlabeled faces via a rectangle selection tool, a brush selection tool, and a lasso selection are provided merely as examples and are not meant to be limiting. User interface 110 may provide a variety of mechanisms through which a user may select unlabeled faces. For example, a user may simply select the unlabeled faces based on clicking on the display of each unlabeled face. Further note that, in some embodiments, the user may directly select the labeled face before selecting the one or more unlabeled images, as described above. As an alternative embodiment, the labeled face may be automatically selected in response to the user selection of one or more unlabeled faces. For example, a labeled face which corresponds to (e.g., is most similar to) one or more selected, unlabeled faces may be automatically selected in response to the selection of the one or more unlabeled faces.
Receiving user input which indicates labels to be assigned to image elements in a collection of images may enable the image labeling system to 1) apply labels to the image elements and 2) receive training input which may allow the image labeling system to more accurately calculate similarity metrics between pairs of faces within the collection of images. The labels that are assigned to image elements may indicate additional characteristics for the image elements. For example, a label that is assigned to a face may indicate a gender, race, and/or age for the face. Accordingly, the image labeling system may use the assigned labels to more accurately determine similar faces in a set of detected faces. As described in further detail below, the image labeling system may use the user-assigned labels to recalculate similarity metrics for pairs of image elements in the collection of images. Since the recalculated similarity metrics may have the benefit of additional data (e.g., the newly applied labels), the recalculated similarity metrics may more accurately represent the similarities between pairs of faces.
As indicated at 230, the method illustrated in
As indicated at 240, the method illustrated in
The image labeling system may, as described in further detail below, recalculate similarity metrics for each pair of unlabeled image elements. The recalculated similarity metrics may be dependent on the newly assigned labels, and, thus may be more accurate than the previously calculated similarity metrics. The image labeling system may select the new set of unlabeled image elements for display dependent on the recalculated similarity metrics. Accordingly, the updated set of unlabeled image elements that is displayed may be more accurate matches to displayed labeled faces than a previous displayed set of unlabeled image elements.
The image labeling system may repeat blocks 200 through 240 of
In other embodiments, the image labeling system may identify the labeled image elements which have the highest amount of similar, unlabeled image elements remaining in the set of detected image elements. The image labeling system may automatically add one or more of these identified labeled faces to the display area. As an alternative, the image labeling system may request that the user select one of the identified labeled image elements. As yet another example, at any point during the execution of the image labeling process, a user may manually replace a labeled image element in the display area with another labeled image element. The user may replace a labeled image element based on selecting a new image element, for example from column 320, and dragging the new image element to the area occupied with the existing labeled image element. In response to the user input, the image-labeling system may replace the existing labeled image element with the new labeled image element. The system may also update the display of unlabeled image elements to include
As noted above, the image labeling system may operate to label documents, including documents that have not been rendered into images. In the case that the document includes an image, the image labeling system may operate similarly to the examples described above for images.
However, the image labeling system may also operate on documents in any native, non-image based format, such as a Microsoft™ Word™ document or any text based document or any document that may include text, and where the documents may exist in a local or network file system. In such a case, a user may initiate the image labeling system and enter a search term, for example, “January presentation.” Given such a search term, many documents may be found, including documents that may not be relevant to the January presentation the user had in mind. Within the image labeling system user interface, each of the found documents may have a corresponding preview image displayed in a location on the user interface such that the proximity of the preview image to a label area is based on the relevance determined by the search engine. In this example, a corner of the display region may be assigned a label that has yet to be associated with any particular document or image.
In some cases where there are a large number of search results, only a selection of the best matches may be displayed at a time, and more may be displayed as documents are labeled and removed from the preview area. For example, if the user has created a label area in a top left corner of the display region, the documents that the search engine has determined are most relevant are displayed in a location very near the top left corner. Similarly, documents that the search engine has determined are less relevant, yet still match the search term to some degree, are displayed in a location farther away. For example, if a document only partially matches the “January presentation” search term on the basis of including the word, “jan”, the document may be placed farther away. In this way, documents with a high degree of relevance based on the search engine results may be placed closer to a given label area, and/or displayed with a larger sized preview image.
Further, each of the image labeling system supported gestures or user input may apply similarly to labeling documents. In other words, a document may be flicked toward the label, or lassoed, selected according to a drawn path, or selected with a rectangle tool using a mouse input. Similarly, multiple labels may be used at a time, such as “January Presentation” in the top left corner and “America Invents Act” in another corner of the user interface.
In one embodiment, to assign multiple labels to a document without the document disappearing from the display region after a first label assignment, a labeled image may dragged on to the unlabeled image in the display region. In response, the unlabeled image is assigned the label corresponding to the labeled image while remaining in the display region. This process may be repeated to assign any number of labels to a given image element. To finally remove the image element from the display region, the image element may be dragged onto a labeled image, and after this assignment the image element may be removed from the display region. In addition to text-based documents, this process for assigning multiple labels to an image element works similarly with graphical, or non-text based images described throughout this application.
In the case that the image labeling system has been embedded within a network application connected to the Internet, such as a browser, an Internet search engine may be used to perform a search to retrieve documents for labeling. In such a case, the results may be presented as preview images of the web site or content site returned by the search engine. Once a user has labeled one or more of the search results, the user may then save the labeling results, for example, with the creation of a bookmark category for a set of labeled results. In the case that the search engine is directed to return image results, the user may also save a labeled set of images at a specified location or storage device, for example within a folder with a name corresponding to the label.
As described above, in reference to block 210 of
As illustrated in
Display module 116 may retrieve a similarity metric for each possible pair of an unlabeled face and a displayed, labeled face. Display module 116 may sort the unlabeled faces dependent on the retrieved similarity metrics. More specifically, display module 116 may sort the unlabeled faces such that unlabeled faces at the top of the sorted list have the highest probability of matching the displayed, labeled faces. Display module 116 may select the top M (e.g., the maximum number of unlabeled faces) number of unlabeled faces from the sorted list for display in the display area. Accordingly, display module 116 may have a higher probability of displaying unlabeled faces that are likely to match the displayed, labeled faces.
As indicated at 910, the method illustrated in
Display module 116 may calculate a distance between the unlabeled face and one or more of the displayed, labeled faces, dependent on the similarity metrics for the unlabeled face and the one or more displayed, labeled faces. The calculated distance may be a spatial proximity, in the display region, between the unlabeled face and one or more displayed, labeled face. The spatial proximity, in the display region, of an unlabeled face to a labeled face may indicate a probability that the faces belong to the same person. For example, a closer spatial proximity between an unlabeled face and a labeled face indicates a higher probability that the faces belong to the same person. Locating unlabeled faces, which are a likely match to a labeled face, in close spatial proximity to the labeled face may enable a user to easily select the unlabeled faces. For example, as illustrated in
In some embodiments, display module 116 may determine a spatial proximity between an unlabeled face and a displayed, labeled face that is most similar to the unlabeled face. The spatial proximity may a distance value that may be determined dependent on the similarity metric between the unlabeled face and the displayed, labeled face that is most similar to the unlabeled face. Display module 116 may use various methods to convert the similarity metric to a distance value. For example, display module 116 may linearly interpolate the similarity metric between the labeled face and the unlabeled face. The distance value may be inversely proportional to the similarity metric. For example, a higher probability similarity metric may result in a smaller distance value. From the distance value, display module 116 may determine a coordinate position, within the display region, for the unlabeled face. The determined coordinate position may specify a display position for the unlabeled face that is equivalent to the determined distance value between the unlabeled face and the labeled face. Accordingly, the spatial proximity of the unlabeled face to the labeled face may indicate the probability that the faces belong to the same person.
In other embodiments, display module 116 may determine spatial proximities between an unlabeled face and all of the displayed, labeled faces. The spatial proximities may be distance values that may be determined dependent on the similarity metrics between the unlabeled face and all of the displayed, labeled faces. For example, display module 116 may convert each similarity metric to a distance value based on a linear interpolation of the similarity metrics. Similarly as described above, each distance value may be inversely proportional to a respective similarity metric. For example, a higher probability similarity metric may result in a smaller distance value. From the distance values, display module 116 may determine a coordinate position, within the display region, for the unlabeled face. The determined coordinate position may be the coordinate position that best satisfies each of the distance values between the unlabeled face and each of the displayed, labeled faces. Accordingly, the spatial proximity of the unlabeled face to each one of the displayed, labeled faces may indicate the probability that the faces belong to the same person.
As indicated at 920, the method illustrated in
Display module 116 may calculate a display size for an unlabeled face dependent on the similarity metric between the unlabeled face and the most similar displayed, labeled face. The display size of an unlabeled face may indicate a probability that the unlabeled face and a closest displayed, labeled face belong to the same person. Display module 116 may convert the similarity metric for the unlabeled face and labeled face pair to a size scale. Display module 116 may determine the size of the display of the unlabeled face dependent on the size scale. As an example, for a similarity metric that indicates a probability above a threshold value (e.g., 70% probability that two faces belong to a same person), display module 116 may enlarge the display of the unlabeled face. As another example, for a similarity metric that indicates a probability below a threshold value (e.g., 30% probability that two faces belong to a same person), display module 116 may reduce the display of the unlabeled face. Accordingly, larger unlabeled face displays indicate higher probabilities that the unlabeled faces are a match to a corresponding displayed, labeled face.
As described above, the image labeling system may receive additional training information each time a user labels a face and, thus, may be able to provide more accurate displays of unlabeled faces. Accordingly, it may be beneficial for the image labeling system to receive user feedback (e.g., labels) on high probability faces as early as possible in the face labeling process in order to gain additional data for lower probability faces. Based on the user feedback, the system may be able to improve the probabilities of the lower probability faces, and, therefore, may be able to provide more accurate displays of unlabeled faces. Accordingly, it may be beneficial to the efficiency of the image labeling system to call a user's attention to high probability faces in order to encourage the user to provide labels for such faces early in the face labeling process.
In other embodiments, the image labeling system may use different characteristics to visually indicate similarities between labeled and unlabeled image elements. As an example, the image labeling system may use just spatial proximity to indicate similarities between labeled and unlabeled image elements. As another example, the image labeling system may use spatial proximity in addition to other characteristics that may direct a user's attention to high probability image elements. For example, the image labeling system may display high probability image elements in highlighted colors or as distinctive shapes
As indicated at 930, the method illustrated in
The image labeling system may specify a maximum amount of overlap that may be acceptable for the unlabeled image elements in the display region. For example, the image labeling system may specify that a maximum of 15% of an unlabeled image element may be covered with another, overlapping unlabeled image element. The amount of maximum amount of acceptable overlap for unlabeled image elements may also be a parameter that is configurable with user input via user options or preferences in user interface 110. Display module 116 may adjust the display positions of the unlabeled image elements such that any overlap between unlabeled image elements is below the maximum specified amount of overlap. For example, display module 116 may adjust the display positions of a set of unlabeled faces to minimize overlap between the displays of the unlabeled faces.
Display module 116 may use the particle system to determine a display position for each unlabeled face such that the display of the unlabeled image elements satisfies the criteria for maximum allowable overlap between unlabeled image elements. The particle system may determine the display locations dependent on the determined display size for each of the unlabeled faces and dependent on the desired spatial proximity between the unlabeled faces and the displayed, labeled faces. As described above, distance values (e.g., spatial proximities) between each unlabeled face and each labeled face may be determined based on a linear interpolation of the similarity metrics between the unlabeled face and the displayed labeled faces. Display module 116 may use the desired distance values between unlabeled and labeled faces and the display size of each unlabeled face as inputs to the particle system. The particle system may determine a display position for each unlabeled image element that best satisfies the criteria for distance values, display size and maximum amount of overlap.
Dependent on the distance values described above, each unlabeled face may have an optimal display location in the display area. The optimal display location may position the unlabeled face in the display area such that desired spatial proximities between the unlabeled face and one or more of the unlabeled faces are optimally satisfied. The particle system may assign, to each unlabeled face, an attractive force which may act to pull the unlabeled face toward the optimal display location for the unlabeled face. The particle system may assign, to each pair of unlabeled faces, a repulsive force which may act to minimize overlap between the displays of the unlabeled faces. For example, a repulsive force between a pair of unlabeled faces may be zero if the unlabeled faces do not overlap. However, if the unlabeled faces are moved such that they begin to overlap, the strength of the repulsive force may rapidly increase. The display location of an unlabeled face may be determined from computing a display location that results in an equilibrium status between the attractive forces and repulsive forces for the unlabeled face. One example of such a particle system is described in U.S. Pat. No. 7,123,269 entitled “Creating and Manipulating Related Vector Objects in an Image,” filed Jun. 21, 2002, the content of which is incorporated by reference herein in its entirety.
As indicated at 940, the method illustrated in
As described above, in reference to block 240 of
As indicated at 1010, the method illustrated in
As indicated at 1020, the method illustrated in
As indicated at 1030, the method illustrated in
The image labeling system, in various embodiments, may provide a number of user interface, and/or control, elements for a user. As an example, a user may be unsure of the identity of a particular unlabeled face that is displayed in the display area. Image labeling module 100 may provide, via user interface 110, a mechanism via which the user may view the source image that corresponds to the unlabeled face. As an example, the user may right-click on the display of the unlabeled face and the system may display the source image for the unlabeled face. In some embodiments, the system may overlay the source image on the display of unlabeled faces.
The image labeling system may also provide a mechanism through which a user may assign a label directly to a particular unlabeled image. The label that a user may assign directly to the particular unlabeled image may be a new label that the user has not previously defined. As an alternative, the label may be an existing label that the user would like to assign directly to an unlabeled image without having to first display an image which corresponds to the label. As an example, the image labeling system may enable a user to provide text input that may specify a label for an unlabeled image element.
The image labeling system may also provide a mechanism through which a user may remove one or more of the unlabeled image elements from the display.
The image labeling system may not be restricted to labeling faces in digital images. As an example, the image labeling system may be applied to labeling any content in images. As another example, the image labeling system may be applied to labeling content in video scenes. As yet another example, the image labeling system may be applied to labeling web page designs to indicate design styles for the web pages. The methods of the image labeling system described herein may be applied to any type of labeling system that is based on a similarity comparison. The image labeling system described herein may provide a system for propagating labels to a large collection of items from a small number of initial items which are given labels. The system may be applied to a collection of items for which visual representations of the items may be generated. As an example, the image labeling system may be used for classifying and/or labeling a large set of PDF files, based on similarities between visual representations of the PDF files.
In other embodiments, the image labeling system may be used to explore a large collection of images to locate a desired image. An exploration of the collection of images may be necessary, rather than a direct search, when a search query item is not available. For example, a user may want to find a particular type of beach scene with a particular palm tree in a collection of images, but may not have a source search query entry to provide to a search system. The user may just have a general idea of the desired scene. The user may use the image labeling system to narrow down the collection of images to locate the desired scene. For example, the user may select and label one beach scene in the images and place the image in one corner of the display area. The user may label other image, for example, an image of a tree scene, and place the images in other corners of the display area. In this way, a user may initiate a search for similar images using an image as the basis for the search, without the use of any text or keywords.
The user may execute the image labeling system and the system may locate images with similar beach scenes and similar tree scenes from the collection of images. The user may select some of the located images as images which are closer to the desired image and place these images in the corners of the display area and may again execute the image labeling system to locate more similar images. The user may repeat this process, continually replacing the corner images with images that are closer to the desired image. In this manner, the image labeling system may help the user converge the collection of images into a set of images that closely resemble the user's desired image. The user may continue the process until the desired image is located.
The underlying aspects of the image labeling system described above operate equally well when used in conjunction with any device capable of receiving user input defining a gesture. However, in such cases, the user interface elements through which a user interacts with the features of the image labeling system may not be the traditional mouse and keyboard input devices. Instead, devices capable of receiving gestures as user input may provide a user with a user interface that may interpret various hand gestures to determine a corresponding image labeling task. In some embodiments, the image labeling system may be implemented without any text-based or pull-down menus. In other embodiments, a labeling operation performed is not based on any text-based or pull-down menu selections, and instead, the labeling operation is based exclusively on a gesture or gestures.
The image labeling system provides interpretation of several gestures that may be associated with one or more labeling tasks. The image labeling system may operate on a device without the use of menus. Instead, the image labeling system provides other feedback cues to a user to provide interactive visual feedback indicative of a labeling task.
In some embodiments, gesture information may be received within the image labeling system from the operating system of the device as a result of a user touching a touch-sensitive screen. In other embodiments, gesture information may be received within the image labeling system from the operating system of the device as a result of a user making hand motions that may be received through a camera or other motion detection device. For example, the image labeling system may be used with an input device that is configured to sense gesture motions in multi-dimensional space. Other forms of gesture recognition using different hardware may be used by any device capable of executing the image labeling system. For each type of gesture input and for each type of gesture, a mapping may be defined to one or more labeling tasks of the image labeling system. As another example, the image labeling system may be used with an input device configured to sense a combination of touch gestures and non-contact gestures.
In the below descriptions of labeling tasks, certain gestures have been mapped to certain labeling tasks. However, any gesture may be defined to be mapped to any labeling task, and a user may also be allowed to define additional gestures or to modify gesture mappings.
In some embodiments, when a user first runs the image labeling system, the user may choose to populate the labeled images of display region 310 with an initial set of labeled image elements. For example, the image labeling system may present a user with a set of unlabeled image elements and allow the user to label one or more of the image elements. In response to the image labeling, the image labeling system may display the labeled image elements in display region 310, and display one or more unlabeled image elements based on similarity metrics of the unlabeled image elements to the labeled image elements, as depicted in
In some embodiments, given a display region 310 with both labeled and unlabeled image elements, the image labeling system activates an interactive labeling mode upon receiving information from the device operating system upon the device operating system detecting the beginning of a gesture while the image labeling system is active. In such a case, the interactive labeling mode is active for the duration of the input gesture. In some embodiments, the interactive labeling mode is active until a corresponding labeling task is complete. Upon each additional input information update regarding the continuing gesture, the image labeling system may determine the intended labeling task based on one or more characteristics of the gesture.
In some embodiments, recognition of the gesture input provided from the device operating system may correspond to a traversal of a finite state automaton. For example, given an initial indication of a touch on the screen of the device, the image labeling system may determine over which area of the user interface the touch is received. Based on the location of the touch, certain labeling tasks may be eliminated from consideration. For example, if the user touches a labeled image, a possible interpretation of the gesture may be a relabeling of the labeled image, but a gesture such as a flick may be eliminated from consideration as a possible interpretation. Given additional input defining the characteristics of the gesture, the image labeling system may proceed to recognize the gesture in conjunction with the location touched. In this example, each labeling task would have a unique path through the finite state automaton, where the identification of the labeling task is arrived at through continuous interpretation of the gesture input. Examples of gesture input received from the device operating system include spatial coordinates such as the location on the device touched, pressure values, velocity and distance associated with each of the one or more touches of the gesture, or elapsed time for the entire gesture or for a portion of the gesture.
In some embodiments, visual feedback provided from the image labeling system may be displayed to a user while the interactive labeling mode is active. This interactive feedback may provide a user with visual cues through a modification or display of user interface elements indicating a labeling task that may be performed according to the current gesture. For example, the path of a finger as a gesture is made may be drawn across the screen.
Upon receiving gesture input data, as depicted within element 2206, the image labeling system may determine a labeling task gesture from among a plurality of labeling task gestures, where the determining is based on the gesture input data, as depicted within element 2208. In addition to determining the gesture, the gesture input data may be used as the basis for determining which of the displayed one or more unlabeled images correspond to the gesture, as depicted within element 2210.
Once the labeling task gesture has been determined, a labeling task mapped to the labeling task gesture may be referenced to determine which labeling operation to perform on one or more of the unlabeled images determined to correspond to the gesture, as depicted within element 2212.
The image labeling system user interface may also provide a region of the user interface that includes other labeled images, such as window region 320 in
In some embodiments, a replacement labeled image is automatically selected based on having the most probable matches of the current set of labeled images not displayed within display region 310. In other embodiments, a user may be prompted to select a labeled image to replace the labeled image being removed. In some embodiments, such as incorporation of the image labeling system within a social media application, the next image element for labeling may be suggested to the user may be based on one or more social metrics of the social media application. For example, in some cases, a next image element for labeling may be submitted to a user based on the number of matches the unlabeled image element has to existing photos in the user's image library or collection. In other examples, an unlabeled image element such as a face may be suggested based on contextual information from the image from which the unlabeled image element was drawn, such as a baseball if there are a significant number of images in the user's current library that are related to baseball or, more broadly, an encompassing category such as sports.
Touch-sensitive devices are often mobile devices with limited computational power. To overcome the computational limitations of most mobile devices, the image labeling system may access remote computer systems to perform portions or all of the computational workload. The remote computing system or systems accessed from the image labeling system may be a cloud computing environment.
Further in regard to
As an example, a user may touch the screen on or near an unlabeled image 1602 and, without lifting a finger, move in short, quick strokes over the unlabeled image. In different embodiments, the gesture for interpreting a delete operation may defined in various ways. For example, given a touch near or on top of an unlabeled image, a user may move a finger in one stroke in an initial direction, and given subsequent back and forth strokes in the same direction as the initial stroke, a delete may be detected. In this way, consecutive vertical strokes may be used to delete an unlabeled image from the labeling region 310 of the user interface.
In some embodiments, the image labeling system may provide visual feedback such as drawing the trace of the user's finger as the finger moves back and forth executing a delete gesture, as depicted by element 1604. In other embodiments, after a delete gesture, a graphic may be drawn over the unlabeled image, such as an “X”, and the user may be prompted with a “YES” or “NO” pop up menu asking to confirm a delete operation.
In some embodiments, velocity may be used as a factor in distinguishing between a delete operation and other labeling tasks. For example, unless the velocity of user motion strokes are above a certain threshold, the strokes may not be interpreted as a delete operation. In other embodiments, any number of strokes above a certain threshold may be used to determine a delete operation. For example, two back and forth strokes may be defined to be a minimum number of strokes to be interpreted as a delete operation, or any number of strokes above three strokes.
In other embodiments, the image labeling system may specify that the strokes in a delete operation have a minimum or maximum length. In other embodiments, the image labeling system may specify that at least one stroke in a delete operation be over the unlabeled imaged intended to be deleted. In other embodiments, the delete operation may apply to the nearest unlabeled image, even if the delete strokes did not touch the unlabeled image. In some embodiments, a trash can icon may be displayed in any part of labeling region 310 or region 320, and a drag of the unlabeled image onto the trash can accomplishes the delete operation.
In some embodiments, the delete gesture may apply to more than one unlabeled image. For example, since images with similar characteristics may be located within the same region of the screen, similar images may be equally inapplicable to the current labeled images. In this example, if two unlabeled images are adjacent or near each other, and multiple delete swipes encompass both unlabeled images, then both unlabeled images may be removed from labeling region 310.
In one embodiment, multiple unlabeled images may be deleted through a combination of gestures. For example, a user may press a finger down on an unlabeled image, or near an unlabeled image, as with unlabeled image 1904 of
A flick gesture may be interpreted to be a gesture that begins with a screen touch and where the touch becomes a quick movement across a short distance of the screen.
As an example, in
In some embodiments, a user may drag a finger across multiple unlabeled images and as a last gesture before lifting a finger from the screen, the user may flick in the direction of a labeled image. In this case, each of the unlabeled images along the path may be labeled with the label of the labeled image nearest the receiving end of the trajectory of the flick gesture at the end of the path.
As described above, given each newly labeled image or images, the image recognition software may be provided the newly labeled images as training images, and the image recognition software may recalculate the probabilities of similarity of the remaining unlabeled images.
It may often be the case that several unlabeled images may be displayed within display region 310 of the image labeling system in close proximity because each of the unlabeled images has a similar labeling confidence based on calculated similarity metrics. As depicted in
In this example, as the user's finger moves across unlabeled images 1804, 1806, and 1808, the image labeling system draws the path according to line 1802. At the point the user touched the screen, the interactive labeling mode was activated and the user traced a finger across unlabeled images 1804, 1806, and 1808, the image labeling system provides continuous visual feedback such as updating the path drawn within the user interface.
In some embodiments, additional visual feedback may be provided to the user while in the interactive labeling mode such as reshaping the bubble of the unlabeled image to include a tapered point that points in the direction of the labeled image corresponding to the current best match of the image recognition software. In this example, upon a finger being lifted from the screen, unlabeled images 1804, 1806, and 1808 may be applied with the label corresponding to labeled image 1810, which would be the labeled image being pointed at by the tapered points of the images along path 1802. At this point images 1804, 1806, and 1808 may be labeled and removed from display region 310 and replaced with additional unlabeled images.
In some embodiments, additional visual feedback may be provided to the user during the interactive labeling mode such as highlighting the labeled image corresponding to the current best match of the image recognition software. For example, the border surrounding labeled image 1810 may be highlighted with a coloration of the border that is different from the other labeled images in the user interface, represented in
In some embodiments, while the interactive labeling mode is active and the unlabeled images along the path are indicated as selected, a user may, prior to lifting the path tracing finger, flick the finger toward any labeled image. In this case, instead of labeling the unlabeled images with the label of the nearest labeled image, the unlabeled images may be labeled with the label of a labeled image that is along the trajectory of the flick motion. Further in this embodiment, the flick may be visually represented within the user interface with a line on the screen drawn to correspond to the flick direction.
As with the path select labeling operation, it may often be the case that several unlabeled images may be displayed within the user interface of the image labeling system in close proximity because each of the unlabeled images has a similar labeling confidence of a match. However, in the lasso selection operation, a user may select multiple unlabeled images without tracing over each image. Instead, a user may trace a lasso through and/or around the unlabeled images the user wishes to select, and upon completing the selection and lifting a finger from the screen, the selected unlabeled images may be labeled with the label of the nearest labeled image on the screen.
In this example, the interactive labeling mode is activated upon detecting a finger touch on the screen. At the instant that the touch is detected, it is not yet possible to disambiguate between any of the possible labeling tasks. However, as the user moves a finger and traces a path it becomes possible to determine the labeling operation intended. For example, if a user traces a path that ends within a pre-defined proximity to the beginning of the path, a lasso operation is determined and any unlabeled images within the path or touching the path are selected for labeling. If the ending of the path were outside the pre-defined proximity to the beginning of the path, then a path selection operation may be determined as described above.
In some embodiments, interactive feedback is provided to the user in the form of visual cues drawn within the user interface of the image labeling system. As depicted in
In some embodiments, while the interactive labeling mode is active and the unlabeled images within the lasso selection region displayed with a gray background, a user may, prior to lifting the tracing finger, flick the finger toward any labeled image. In this case, instead of labeling the unlabeled images with the label of the nearest labeled image, the unlabeled images may be labeled with the label of a labeled image that is along the trajectory of the flick motion. Further in this embodiment, the flick may be visually represented within the user interface with a drawing on the screen of a line corresponding to the flick direction.
Given that in some embodiments the proximity of an unlabeled image on the screen is proportional to the confidence of the image recognition software of a match, this multiple image flick allows a user to label multiple images that are placed near a labeled image which does correspond to the proper labeling of the unlabeled image.
The image labeling system allows a user to label an image when the first labeled images are selected, as described above. However, the image labeling system also allows a user to rename or relabel an already labeled image. For example, as depicted in
In one embodiment, the interactive labeling mode is activated upon a finger touch, and if a user holds down a finger for a period of time, the image labeling system determines that a relabeling task is intended. Upon determining that a relabeling task is intended, a keyboard is displayed to allow the user to enter a new label name. Once the new label name has been entered, the new label name is applied to the image and all previously labeled images with the same label name are also relabeled. In some embodiments, the image over which the user pressed a finger is displayed alongside an entry box to present a visual association of the new label with the image to which the label may be applied, as depicted within labeling window 2002.
In some embodiments, the same touch and hold gesture may be used to label unlabeled images. This feature may be useful in cases where an unlabeled image does not match a currently displayed labeled image and the unlabeled image is either the only image of its kind or only one of a few images of its kind. In this case, it may be efficient to simply apply a label to the unlabeled image directly.
In some embodiments, in response to displaying a keyboard, the display area is reconfigured to a smaller space. For example, the labeled and unlabeled images are redrawn in the smaller space such that their positions respective of the original layout are preserved.
Another gesture that may be interpreted into a labeling task is a tap gesture. A tap may be interpreted as a quick touch and finger lift. At the initial touch of the screen, the interactive labeling mode is activated and if the duration of the touch is only a short, amount of time defined before the gesture began, then the labeling task is determined to be the labeling of the unlabeled tapped image with the label of the nearest labeled image within display region 310.
In some embodiments, the border of the unlabeled image may be momentarily highlighted. In other embodiments, the unlabeled image may be redrawn as it moves across the screen until it reaches the nearest labeled image, at which point the unlabeled image may disappear. This visual feedback may provide assurance to the user regarding the labeled image whose label is applied to the unlabeled image.
Another gesture that may be interpreted into a labeling task is a two-finger expansion over an unlabeled image. As an example layout of unlabeled images within user interface region 310 of
The image labeling system may increase the context from the original photo until the entire original photo is displayed. Further, the interactive labeling mode may remain activated while the user spreads apart and draws together the two fingers, and the image labeling system may continue to increase or decrease the displayed context in response. The interactive labeling mode may be deactivated in response to the user lifting the two fingers from the screen.
In addition to the visual contextual information, metadata corresponding the original image may also be displayed. For example, the time and date, or if available, the location, or the name of the folder or location of the image, or the corresponding photographic settings, such as the device that took the image, and the shutter speed, ISO, or exposure information.
After the temporary increase in context has been displayed to the user, the user may proceed to label the unlabeled image using the previously described gestures.
To swap out a currently displayed labeled image in display region 310, a user may touch down on the screen in the region displaying labeled image 2002 in panel 320. While maintaining the touch on top of the labeled image 2002, the user may drag labeled image 2002 on top of one of the currently displayed labeled images in display region 310, such as labeled image 2004. Once the labeled image has been dragged on top of currently displayed labeled image 2004, the user may lift their finger and in response, currently displayed labeled image 2004 may be replaced with labeled image 2002.
In other embodiments, instead of replacing a currently displayed labeled image, a user may drag a labeled image from panel 320 into labeling region 310 of the display. Upon the user lifting their finger and completing the drag gesture, the new labeled image may be dropped and drawn into labeling region 310 at the position where the drag gesture ended, without replacing any currently displayed labeled images.
Once the newly labeled image has been introduced into labeling region 310, new unlabeled images determined to match the newly labeled image to some degree may now be introduced into labeling region 310. Further in response to introducing the new labeled image, the previously displayed unlabeled images are rearranged to accommodate the new unlabeled images.
In other embodiments, a user may perform a delete gesture over a currently displayed labeled image, and in response, image labeling system may replace the deleted labeled image with a labeled image from user interface panel 320. For example, the labeled image at the top of the panel may be determined to be the next labeled image to display, in other cases, the labeled image with the greatest quantity of potential matches is selected to be displayed next.
In other embodiments, a user may flick an image element from user interface panel 320 into display region 310 to replace a currently displayed labeled image. For example, if a user flicks image element 2002 toward image element 2004, image element 2004 in display region 310 may be replaced with image element 2002.
Some devices implementing the image labeling system may include accelerometers, which may provide applications installed on the device with information regarding a direction and amount of motion of the device. The image labeling system may receive and interpret accelerometer information as a gesture or as a modification to a gesture.
The image labeling system may receive information from the device accelerometer in order to perform a subset of the labeling tasks described above. In one embodiment, given unlabeled and labeled images within a display area 310 as in
In some embodiments, the image labeling system may either define a threshold of movement or allow a user to disable accelerometer responses. The pre-defined threshold may be useful to distinguish between small, constant movements of the device that are part of normal user handling and between sharper, jarring motions that may more confidently be identified as movement intended to serve as a labeling operation.
Some devices implementing the image labeling system may include gyroscopes, which may provide applications installed on the device with information regarding roll, pitch, and yaw of the device. The image labeling system may receive and interpret gyroscope information as a gesture or as a modification to a gesture.
In one embodiment, given unlabeled and labeled images within a display area 310 as in
In this example, if the user decides to change the tilt direction and tilt toward a different labeled image, the movement of the unlabeled images may be updated to reflect the new tilt direction.
Various components of embodiments of methods as illustrated and described in the accompanying description may be executed on one or more computer systems, which may interact with various other devices. One such computer system is illustrated within
In the illustrated embodiment, computer system 1400 includes one or more processors 1410 coupled to a system memory 1420 via an input/output (I/O) interface 1430. Computer system 1400 further includes a network interface 1440 coupled to I/O interface 1430, and one or more input/output devices 1450, such as cursor control device 1460, keyboard 1470, multitouch device 1490, and display(s) 1480. In some embodiments, it is contemplated that embodiments may be implemented using a single instance of computer system 1400, while in other embodiments multiple such systems, or multiple nodes making up computer system 1400, may be configured to host different portions or instances of embodiments. For example, in one embodiment some elements may be implemented via one or more nodes of computer system 1400 that are distinct from those nodes implementing other elements.
In various embodiments, computer system 1400 may be a uniprocessor system including one processor 1410, or a multiprocessor system including several processors 1410 (e.g., two, four, eight, or another suitable number). Processors 1410 may be any suitable processor capable of executing instructions. For example, in various embodiments, processors 1410 may be general-purpose or embedded processors implementing any of a variety of instruction set architectures (ISAs), such as the x86, PowerPC, SPARC, or MIPS ISAs, or any other suitable ISA. In multiprocessor systems, each of processors 1410 may commonly, but not necessarily, implement the same ISA.
In some embodiments, at least one processor 1410 may be a graphics processing unit. A graphics processing unit or GPU may be considered a dedicated graphics-rendering device for a personal computer, workstation, game console or other computing or electronic device. Modern GPUs may be very efficient at manipulating and displaying computer graphics, and their highly parallel structure may make them more effective than typical CPUs for a range of complex graphical algorithms. For example, a graphics processor may implement a number of graphics primitive operations in a way that makes executing them much faster than drawing directly to the screen with a host central processing unit (CPU). In various embodiments, the methods as illustrated and described in the accompanying description may be implemented by program instructions configured for execution on one of, or parallel execution on two or more of, such GPUs. The GPU(s) may implement one or more application programmer interfaces (APIs) that permit programmers to invoke the functionality of the GPU(s). Suitable GPUs may be commercially available from vendors such as NVIDIA Corporation, ATI Technologies, and others.
System memory 1420 may be configured to store program instructions and/or data accessible by processor 1410. In various embodiments, system memory 1420 may be implemented using any suitable memory technology, such as static random access memory (SRAM), synchronous dynamic RAM (SDRAM), nonvolatile/Flash-type memory, or any other type of memory. In the illustrated embodiment, program instructions and data implementing desired functions, such as those for methods as illustrated and described in the accompanying description, are shown stored within system memory 1420 as program instructions 1425 and data storage 1435, respectively. In other embodiments, program instructions and/or data may be received, sent or stored upon different types of computer-accessible media or on similar media separate from system memory 1420 or computer system 1400. Generally speaking, a computer-accessible medium may include storage media or memory media such as magnetic or optical media, e.g., disk or CD/DVD-ROM coupled to computer system 1400 via I/O interface 1430. Program instructions and data stored via a computer-accessible medium may be transmitted by transmission media or signals such as electrical, electromagnetic, or digital signals, which may be conveyed via a communication medium such as a network and/or a wireless link, such as may be implemented via network interface 1440.
In one embodiment, I/O interface 1430 may be configured to coordinate I/O traffic between processor 1410, system memory 1420, and any peripheral devices in the device, including network interface 1440 or other peripheral interfaces, such as input/output devices 1450. In some embodiments, I/O interface 1430 may perform any necessary protocol, timing or other data transformations to convert data signals from one component (e.g., system memory 1420) into a format suitable for use by another component (e.g., processor 1410). In some embodiments, I/O interface 1430 may include support for devices attached through various types of peripheral buses, such as a variant of the Peripheral Component Interconnect (PCI) bus standard or the Universal Serial Bus (USB) standard, for example. In some embodiments, the function of I/O interface 1430 may be split into two or more separate components, such as a north bridge and a south bridge, for example. In addition, in some embodiments some or all of the functionality of I/O interface 1430, such as an interface to system memory 1420, may be incorporated directly into processor 1410.
Network interface 1440 may be configured to allow data to be exchanged between computer system 1400 and other devices attached to a network, such as other computer systems, or between nodes of computer system 1400. In various embodiments, network interface 1440 may support communication via wired or wireless general data networks, such as any suitable type of Ethernet network, for example; via telecommunications/telephony networks such as analog voice networks or digital fiber communications networks; via storage area networks such as Fibre Channel SANs, or via any other suitable type of network and/or protocol.
Input/output devices 1450 may, in some embodiments, include one or more display terminals, keyboards, keypads, touchpads, scanning devices, voice or optical recognition devices, or any other devices suitable for entering or retrieving data by one or more computer system 1400. Multiple input/output devices 1450 may be present in computer system 1400 or may be distributed on various nodes of computer system 1400. In some embodiments, similar input/output devices may be separate from computer system 1400 and may interact with one or more nodes of computer system 1400 through a wired or wireless connection, such as over network interface 1440.
As shown in
Those skilled in the art will appreciate that computer system 1400 is merely illustrative and is not intended to limit the scope of methods as illustrated and described in the accompanying description. In particular, the computer system and devices may include any combination of hardware or software that can perform the indicated functions, including computers, network devices, internet appliances, PDAs, wireless phones, pagers, etc. Computer system 1400 may also be connected to other devices that are not illustrated, or instead may operate as a stand-alone system. In addition, the functionality provided by the illustrated components may in some embodiments be combined in fewer components or distributed in additional components. Similarly, in some embodiments, the functionality of some of the illustrated components may not be provided and/or other additional functionality may be available.
Those skilled in the art will also appreciate that, while various items are illustrated as being stored in memory or on storage while being used, these items or portions of them may be transferred between memory and other storage devices for purposes of memory management and data integrity. Alternatively, in other embodiments some or all of the software components may execute in memory on another device and communicate with the illustrated computer system via inter-computer communication. Some or all of the system components or data structures may also be stored (e.g., as instructions or structured data) on a computer-accessible medium or a portable article to be read by an appropriate drive, various examples of which are described above. In some embodiments, instructions stored on a computer-accessible medium separate from computer system 1400 may be transmitted to computer system 1400 via transmission media or signals such as electrical, electromagnetic, or digital signals, conveyed via a communication medium such as a network and/or a wireless link. Various embodiments may further include receiving, sending or storing instructions and/or data implemented in accordance with the foregoing description upon a computer-accessible medium. Accordingly, the present invention may be practiced with other computer system configurations.
Various embodiments may further include receiving, sending or storing instructions and/or data implemented in accordance with the foregoing description upon a computer-accessible medium. Generally speaking, a computer-accessible medium may include storage media or memory media such as magnetic or optical media, e.g., disk or DVD/CD-ROM, volatile or non-volatile media such as RAM (e.g. SDRAM, DDR, RDRAM, SRAM, etc.), ROM, etc., as well as transmission media or signals such as electrical, electromagnetic, or digital signals, conveyed via a communication medium such as network and/or a wireless link.
The various methods as illustrated in the Figures and described herein represent examples of embodiments of methods. The methods may be implemented in software, hardware, or a combination thereof. The order of method may be changed, and various elements may be added, reordered, combined, omitted, modified, etc. Various modifications and changes may be made as would be obvious to a person skilled in the art having the benefit of this disclosure. It is intended that the invention embrace all such modifications and changes and, accordingly, the above description to be regarded in an illustrative rather than a restrictive sense.
Number | Date | Country | |
---|---|---|---|
61531566 | Sep 2011 | US |