The present application relates generally to methods and system for managing a collection of images, which may include static and/or video images, and, more specifically, to managing the collection of images based on linkages among identified subjects in an image.
Images that have been captured or otherwise generated by a user may be stored and grouped as collections of images (which may be also referred to as “albums”). A collection of images may be a conceptual or virtual grouping of images in one or more image repositories (e.g., image databases or cloud-based storage). That is, images that belong to a given collection are not necessarily grouped together in actual memory storage. In some examples, images from different image repositories may belong to the same image collection.
Various software applications and/or services have been provided for managing images stored in such collections. For example, existing photo/video album applications or services, such as Google™ Photos, are capable of generating an album that includes photographs and videos. The albums are typically organized in a table and cell style view and displayed in a graphical user interface (GUI) on a display device of a computing device (desktop, notebook, tablet, handheld, smartphone, etc.). The photographs and videos may be automatically organized, by the album application, into different groups/subgroups based on location, time, names of people tagged as being in the photograph or video, or some other label associated with each photograph or video. For simplicity, reference to a “captured image” or simply “image” may be understood to be a reference to a photograph (which may also be referred to as a static image) or to a video (which comprises a sequence of images or frames, in which a video frame may also be referred to as an image). Each group/subgroup may be displayed in the GUI in a similar table and cell style view.
Through management, in an album application, of human-centric linkages (hereinafter referred to as linkages) based on analysis of captured images (i.e., photos and videos), an album application may be configured to take advantage of the linkages when rendering an album in GUI on a display device. The album application may be shown to facilitate interaction with a collection of captured image to, in one case, allow for efficient searching among the captured documents. Conveniently, the linkages generated from analysis of the collection of captured documents allows for a display of the linkages in a human-centric graphical view.
In the present disclosure, the term “human-centric” means that the analysis of captured images is centered on identifying humans in the images and the linkages (e.g., co-occurrence, visual relationship, or common location) between identified humans. Although the term “human-centric” is used, it should be understood that the approach disclosed herein may also be used for analysis of non-human subjects (e.g., an animal) in captured images.
In some aspects, the present disclosure describes a system including a memory and a processor. The memory includes an image collection database, the image collection database storing a plurality of images. The processor is coupled to the memory, and the processor is configured to execute instructions to cause the system to: receive a set of metadata associated with a captured image, the set of metadata including data identifying each human in the captured image; generate a linkage score associating a first identified human with a second identified human in the captured image, the linkage score representing a relationship between the first and second identified humans; update respective records in the database associated with the first and second identified humans to include the generated linkage score; and store the captured image, in associated with the metadata, in the image collection database.
In some aspects, the present disclosure describes a method of managing an image collection database storing a plurality of images. The method includes: receiving a set of metadata associated with a captured image, the set of metadata including data identifying each human in the captured image; generating a linkage score associating a first identified human with a second identified human in the captured image, the linkage score representing a relationship between the first and second identified humans; updating respective records in the database associated with the first and second identified humans to include the generated linkage score; and storing the captured image, in associated with the metadata, in the image collection database.
In some aspects, the present disclosure describes a computer readable medium storing instructions that, when executed by a processor of a system, cause the system to: receive a set of metadata associated with a captured image, the set of metadata including data identifying each human in the captured image; generate a linkage score associating a first identified human with a second identified human in the captured image, the linkage score representing a relationship between the first and second identified humans; update, in an image collection database storing a plurality of images, respective records associated with the first and second identified humans to include the generated linkage score; and store the captured image, in associated with the metadata, in the image collection database.
In any of the above aspects, the instructions may further cause the system to (or the method may further include): identify each human in the captured image; determine an identifier for each identified human; and generate metadata for inclusion in the set of metadata associated with the captured image, the generated metadata including the identifier for each identified human.
In any of the above aspects, the set of metadata may include metadata identifying a location in the captured image, and the instructions may further cause the system to (or the method may further include): generate an entry describing the first and second identified humans in the identified location; and store the entry in association with the captured image in the image collection database.
In any of the above aspects, the captured image may be a captured video comprising a plurality of video images, and there may be multiple sets of metadata associated with the captured video, each set of metadata being associated with a respective video segment of the captured video. The instructions may further cause the system to (or the method may further include): perform the generating and the updating for each respective video segment.
In any of the above aspects, the captured video may be stored in the image collection database in association with the multiple sets of metadata.
In any of the above aspects, the instructions may further cause the system to (or the method may further include): provide commands to render a graphical user interface (GUI) for accessing the image collection database, the GUI being rendered to provide a visual representation of the relationship between the first and second identified humans.
In any of the above aspects, the instructions may further cause the system to (or the method may further include): in response to input, received via the GUI, indicating a selection of a plurality of humans for filtering the image collection database, identify, from the image collection database, one or more captured images associated with metadata that includes identifiers for each of the plurality of humans; and provide commands to render the GUI to limit access to only the identified one or more captured images.
In any of the above aspects, the input received via the GUI may be a touch input that traverses representations, rendered by the GUI, of the plurality of humans.
Other aspects and features of the present disclosure will become apparent to those of ordinary skill in the art upon review of the following description of specific implementations of the disclosure in conjunction with the accompanying figures.
Reference will now be made, by way of example, to the accompanying drawings which show example implementations; and in which:
Labels for captured images are generally created independently for each captured image. In one instance, one or more labels for a captured image, which may also be called “tags,” can be manually selected by a user and each selected label can be associated with the captured image. In another instance, one or more labels for a captured image may be automatically created and associated with a image by one or more image analysis techniques. Some of these image analysis techniques may use a model, learned using machine learning, to detect objects (including humans and non-humans) in a captured image and classify the detected objects. Existing applications or services for managing image collections (e.g., album applications) may be considered to be appropriate for users to manage a small number, say, in the hundreds, of captured images with a limited number of labels.
Electronic devices, such as smartphones, laptops, tablets, and the like, are becoming popular for capturing images (e.g., capturing static images such as photographs, and recording video images). As the storage capacity of such electronic devices has increased significantly over the years the number of images captured by, and stored on, the average electronic device has increased correspondingly. Indeed, the number of captured images may be seen to have increased to the order of thousands.
To keep the captured images organized, the captured images are generally organized into an image collection using an album application. Notably, as the number of captured images included in an image collection increases, the time spent by users searching for particular captured images in the image collection also increases. Similarly, as the number of captured images included in an image collection increases, the time spent by users organizing the captured images in the image collection can increase significantly.
In addition, as machine learning techniques advance, the number of labels that can be automatically generated and associated, by an image collection application or service, with a captured image has significantly increased. However, such automatically associated labels are generally used independently by the image collection application or service that performs the automatic association. It may be seen as difficult for users to organize and browse their captured images based on such a large number of automatically generated and associated labels.
In overview, aspects of the present application relate to methods and systems for managing an image collection, based on human-centric linkages. An example image collection management system (which may implement machine learning techniques), may be configured to analyze the linkages and use the linkages as a basis for presenting images using a GUI on a display. Such an image collection management system may be shown to facilitate interaction with a collection of captured images to, in one case, allow for more efficient searching among the captured images. Conveniently, the linkages generated from analysis of the collection of captured images may allow for a display of the linkages in a human-centric graphical view.
In contrast to traditional table and cell style views, generated by existing album applications or services, the image collection management system according to aspects of the present application may provide a graphical view of humans detected in the collection of captured images. Images of humans that have been detected in the captured image may be rendered in a GUI by the image collection management system. In some aspects, the images may be linked based human-centric linkages between humans detected in the images. For example, images may be linked based on a co-occurrence of detected humans in the captured images or in a particular common location. A user of the image collection management system can, for example, perform a selection of an image associated, in the graphical view, with a human. When the image collection management system detects a selection of an image associated, in the graphical view, with a human, the image collection management system may rearrange the graphical view to indicate the most related human(s) (e.g., the human(s) having the highest number of linkages, or the most highly scored linkages) to the human associated with the selected image. In some examples, the graphical view may present the most related human(s) limited to a specific time period (e.g., the image collection management system may automatically lessen the scores of the linkages over time, or may prune linkages that are older than a threshold time).
Additionally, a user may select, in the graphical view, multiple individual images associated with related individual humans. When the selection of multiple individual images associated with related individual humans is detected by the image collection management system the image collection management system may rearrange the graphical view to provide indications of captured image in which appear all of the humans associated with the selected images.
Moreover, selection of multiple humans, in the graphical view, can be done with a single gesture. A user may further be provided with an option to specify whether to find all the images that contain all the selected humans or any of the selected humans.
Each linkage between two humans may be described by a sentence template of natural language, e.g., [humans 1] and [humans 2] are attending [event] in [where] in [when]. The natural language sentence may be formulated based on analysis of, for example, recent associated captured images, as discussed further below. In this way, the image collection management system may enable users to more quickly browse a large collection of captured images, discover relationships between humans, learn the activities of the humans in the captured images, and/or more effectively search captured images featuring particular humans of interest.
Reference is now made to
The electronic device 102 includes multiple components, including a processor 202 that controls the overall operation of the electronic device 102. The processor 202 is coupled to and interacts with various other components of the electronic device 102, including a memory 204 and a display screen 104, shown in
The processor 202 may execute software instructions stored in the memory 204, to implement the image collection management system described herein. The image collection management system may be executed as part of another software application for managing image collections (e.g., part of another album application). Although the present application describes examples in which the image collection management system is executed by the electronic device 102 using instructions stored in the memory 204, the image collection management system may be implemented in other ways. For example, the image collection management system may run on a virtual machine (e.g., in a distributed computing system, or in a cloud-based computing system). The image collection management system may also be executed on a server and provided as a service to the electronic device 102 (e.g., the server analyzes the images for human-centric linkages and provides the rearranged images to the electronic device 102). Other such implementations may be possible within the scope of the present application.
The captured image analysis module 304 analyzes the input image and generates data representing detected linkages between humans in input image(s) and the overall image collection. The linkage data may be used by the HCl module 302 to provide a user interface that enables human-centric management and navigation of the image collection. For example, a user of the electronic device 102 may interact with the captured images in an image collection when the image collection management system renders the captured images and linkages, in a graphical user interface on the display screen 104, according to operations performed by the HCl module 302.
The segmentor 600 receives the set of video images (that together form the input video) and performs video segmentation to output two or more video segments. Each of the video segments is provided as input to each of the human detection, tracking and recognition submodule 602, the audio analysis submodule 604, the human action recognition submodule 606, and the scene recognition submodule 608. The human detection, tracking and recognition submodule 602 analyzes the video segment to detect, track and recognize human(s) in the video segment, and outputs a set of metadata including identifier(s) of the human(s). The audio analysis submodule 604 analyzes the audio data of the video segment to generate metadata including one or more labels representing a scene and/or activity in the video segment. The human action recognition submodule 606 analyzes the video segment to generate metadata including one or more labels representing a human action detected in the video segment. The scene recognition submodule 608 performs scene analysis to detect and recognize one or more scenes in the video segment, and outputs metadata representing the scene(s).
The human detection, tracking and recognition submodule 602, the audio analysis submodule 604, the human action recognition submodule 606 and the scene recognition submodule 608 all provide their respective metadata to a video image analysis metadata aggregator 610. In turn, the video analysis metadata aggregator 610 aggregates the received metadata into a single set of metadata that is outputted to the linkage discovery submodule 406. The video image analysis metadata aggregator 610 may also format the metadata into a format that is useable by the linkage discovery submodule 406. Further details about the operation of the video image analysis submodule 404 and its submodules 600, 602, 604, 606, 608, 610 will be discussed further below. It should be understood that the functions of two or more of the submodules 600, 602, 604, 606, 608, 610 may be combined into one submodule.
Subsequent to receiving (step 802) the input static image, the human detection and recognition submodule 502 may analyze (step 804) the input static image to recognize all the people, and respective attributes of the people, in the input image. The analyzing (step 804) may involve the human detection and recognition submodule 502 using any suitable human detection and recognition methods (e.g., using machine-learning techniques). For example, a suitable method for face detection, is described by Liu, Wei, et al. “Ssd: Single shot multibox detector.” European conference on computer vision. Springer, Cham, 2016. In another example, a suitable method for face recognition, is described by Schroff, Florian, Dmitry Kalenichenko, and James Philbin. “Facenet: A unified embedding for face recognition and clustering.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
After completing the analyzing (step 804) of the input static image, the human detection and recognition submodule 502 may output (step 806) a set of metadata associated with the input static image, to the static image analysis metadata aggregator 510. In some examples, the human detection and recognition submodule 502 may output the static image together with the generated set of metadata to the static image analysis metadata aggregator 510. If the static image is not outputted by the human detection and recognition submodule 502, the human detection and recognition submodule 502 may instead modify the static image (e.g., by inserting the metadata or adding a tag to reference the metadata) to associate the static image with the outputted metadata. In addition to providing, to the static image analysis metadata aggregator 510, the set of metadata associated with the input static image, the human detection and recognition submodule 502 may also output a subset of the set of metadata to the scene graph recognition submodule 504. For example, the human detection and recognition submodule 502 may output, to the scene graph recognition submodule 504, data defining a bounding box for each detected human in association with identification information for each detected human.
The set of metadata may, for example, include data in the form of a record for each human detected in the input static image. The data may include an identifier for the recognized human and an associated list of attributes of the recognized human. An example record 900 is illustrated in
Referring to
Unlike traditional scene graph recognition submodules, which analyze all the objects detected in an input image, the scene graph recognition submodule 504 may be configured to implement an approach to the analyzing (step 1006) wherein only human objects are considered and other objects are ignored as described in further detail below. This human-centric approach may be considered to significantly simplify scene graph recognition and make the analyzing (step 1006), by the scene graph recognition submodule 504, more realizable. In some examples, some types of non-human objects (e.g., animals) may be considered in addition to human objects.
In computer vision, a saliency map is an image that shows a unique quality for each pixel. The goal of a saliency map is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. As part of the analyzing (step 1006), the scene graph recognition submodule 504 may analyze (step 1006A) the input static image to generate a saliency map. For information on analyzing an input static image to generate a saliency map, see R. Margolin, A. Tal and L. Zelnik-Manor, “What Makes a Patch Distinct?” 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, Oreg., 2013, pp. 1139-1146. The scene graph recognition submodule 504 then creates (step 1006B), based on the saliency map, an attention mask. The scene graph recognition submodule 504 then applies (step 1006C) the attention mask to the input static image to generate a masked image that may be understood to help the scene graph recognition submodule 504 to focus on a region of the input static image that contains a human. The scene graph recognition submodule 504 may then analyze (step 1006D) the masked image.
After completion of the analyzing (step 1006D) of the masked image, the scene graph recognition submodule 504 outputs (step 1008) a set of metadata associated with the input static image, to the image analysis metadata aggregator 510. In some examples, the scene graph recognition submodule 504 may output the static image together with the generated set of metadata to the static image analysis metadata aggregator 510. If the static image is not outputted by the scene graph recognition submodule 504, the scene graph recognition submodule 504 may instead modify the static image (e.g., by inserting the metadata or adding a tag to reference the metadata) to associate the static image with the outputted metadata.
The set of metadata output (step 1008) by the scene graph recognition submodule 504 includes data for each recognized person, which may be in the form of a record. The data includes an identifier for the recognized person; one or more attributes associated with the recognized person; optionally an activity associated with the recognized person; and one or more labels for the scene. The metadata outputted by the scene graph recognition submodule 504 may be in the form of records for each recognized person, or may be in the form of a single record for the scene. Other formats may be suitable.
Referring to
Referring to
The segmentor 600 splits or partitions (step 1204) the input video images into two or more continuous segments. The segmentor 600 may, for example, split or partition the input video images according to detected scene changes. The video segments may be considered to represent basic processing units. The segmentor 600 then outputs (step 1206) each of the video segments to the human detection, tracking and recognition submodule 602, the audio analysis submodule 604, the human action recognition submodule 606 and the scene recognition submodule 608.
Referring to
Referring to
Referring to
Referring to
After completing analyzing (step 1604) the video segment, the scene recognition submodule 608 outputs (step 1606) a set of metadata associated with the video segment, to the video image analysis metadata aggregator 610. In some examples, the scene recognition submodule 608 may output the video segment together with the generated set of metadata to the video image analysis metadata aggregator 610. If the video segment is not outputted by the scene recognition submodule 608, the scene recognition submodule 608 may instead modify the video segment (e.g., by inserting the metadata or adding a tag to reference the metadata) to associate the video segment with the outputted metadata. The metadata output of the scene recognition submodule 608 may include one or more labels to describe the scene. A label may be generated from a database of different descriptive labels, for example. Multiple labels may be used to describe a scene, for example with different levels of specificity. The label may, for example, be selected from among the following example labels:
Referring to
The video image analysis metadata aggregator 610 also receives (step 1704), from the audio analysis submodule 604, a second set of metadata associated, by the audio analysis submodule 604, with the video segment.
The video image analysis metadata aggregator 610 further receives (step 1706), from the human action recognition submodule 606, a third set of metadata associated, by the human action recognition submodule 606, with the video segment.
The video image analysis metadata aggregator 610 still further receives (step 1708), from the scene recognition submodule 608, a fourth set of metadata associated, by the scene recognition submodule 608, with the video segment.
The video image analysis metadata aggregator 610 then aggregates (step 1710) the received sets of metadata to a single set of aggregated metadata. Aggregating the metadata may involve simply combining the data from each of the first, second, third and fourth sets of metadata into a single larger set of metadata. In some examples, aggregating the metadata may involve removing any redundant data. The video analysis metadata aggregator 610 then outputs (step 1712) the video segment and the aggregated single set of metadata to the linkage discovery module 406. The aggregated single set of metadata may replace the first, second, third and fourth sets of metadata, or the first, second, third and fourth sets of metadata may be kept with the addition of the aggregated single set of metadata. In some examples, the video image analysis metadata aggregator 610 may also output the video segment that is associated with the aggregated single set of metadata. If the video segment is not outputted by the video image analysis metadata aggregator 610, the video image analysis metadata aggregator 610 may instead modify the video segment (e.g., by inserting the metadata or adding a tag to reference the metadata) to associate the video segment with the single set of aggregated metadata.
The example methods of
Referring to
The image collection human knowledge base 704 stores captured images and data about humans that have been recognized in the captured images. In some examples, data about the recognized humans may be stored in the form of records. A single record may include information about a single human (who may be uniquely identified in the image collection human knowledge base 704 by a human ID), including one or more attributes about the human, and one or more linkage scores representing the strength of a linkage between the identified human and another human. Further details are discussed below.
The linkage analysis submodule 702 accesses (step 1806) the records in the image collection human knowledge base 704 for a given pair of recognized humans in the captured image. The linkage analysis submodule 702 analyzes (step 1808) the metadata associated with the captured image to determine an extent to which the given pair of recognized humans are linked. As part of the analyzing (step 1808), the linkage analysis submodule 702 may assign a linkage score representative of a strength of a linkage between the two recognized humans. The linkage analysis submodule 702 then edits (step 1810) the records in the image collection human knowledge base 704 associated with the two recognized humans to add (or update) the linkage score. The linkage analysis submodule 702 then stores (step 1812) the edited records in the image collection human knowledge base 704.
One factor that may be used when establishing a linkage score for a linkage between two humans is the total number of times the two humans have co-occurred in captured images. The linkage between two humans may be considered to be stronger if the two humans co-occur in captured images more often than co-occurrence of two other humans in captured images.
Another factor that may be used when establishing a linkage score for a linkage between two humans is the total number of times the two humans co-occur in a given location. The linkage between two people may be considered to be stronger if the two humans co-occur in various locations more often than co-occurrence of two other humans in various locations.
In some examples, a linkage score may also be calculated between a human and a location. For example, a linkage score between a given human and a given location can be defined by counting the number of captured images where the given human appears in the given location.
The linkage analysis submodule 702 may determine a linkage score, between human i and human j, using the following equation:
where Nip is the number of photos having human i; Njp is the number of photos having human j; Nijp is the number of photos having both human i and human j; Niv is the number of videos having human i; Njv is the number of videos having human j; Nijv is the number of videos having both human i and human j; NiL is the number of locations where human i appears; NjL is the number of locations where human j appears; and NijL is the number of locations where both human i and j appear.
The terms ∂, β and γ are weights that are configurable to balance relative impact, on the linkage score, lij, of photos, videos and locations. The weights may be manually configurable. Alternatively, the linkage analysis submodule 702 may learn the weights using a linear regression model on a labeled (e.g., manually label) training data set. An example linear regression model is known as a support-vector machine. In machine learning, support-vector machines (SVMs) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. During training, an associated learning algorithm of the SVM, given a set of training samples, each marked as belonging to one or the other of two categories, executes the associated learning algorithm and learns a model that assigns new samples to one category or to the other during inference.
A linkage score is one manner of describing a linkage between human i and human j. Another manner of describing such a linkage is a one-sentence diary entry. The diary entry may be generated, by the linkage analysis submodule 702, on the basis of captured documents in which both human i and human j have been detected. The diary entry can be generated, by the linkage analysis submodule 702, by filling in the missing information in a predefined human-to-human linkage template. A predefined human-to-human linkage template may have a format such as the following:
The linkage analysis submodule 702 may be configured to fill in the missing information in a predefined template based on the metadata received from the static image analysis metadata aggregator 510 and the video image analysis metadata aggregator 610 (depending on whether the captured image is a static image or a set of video images).
The linkage analysis submodule 702 may also be configured to generate an individual diary entry by filling in the missing information in a predefined human-to-location linkage template. A predefined human-to-location linkage template may have a format such as the following:
The information generated by the linkage analysis submodule 702 (e.g., the linkage score and/or the diary entry) may also be added to the metadata associated with the captured image.
Over time, the captured document analysis module 304 may process a plurality of captured image such that the image collection human knowledge base 704 is well populated with records of humans that have been detected in the captured images. Additionally, through the operation of the linkage analysis submodule 702, a human for whom there exists a record in the image collection human knowledge base 704 may be associated, by a linkage, with another human for whom there exists a record in the image collection human knowledge base 704. Subsequent to processing, by the linkage analysis submodule 702, both records in the image collection human knowledge base 704 will indicate that there is a linkage between the two humans and will include a linkage score indicative of a strength of the linkage.
The HCl module 302 may process the records in the image collection human knowledge base 704 to form a fluidly reconfigurable graphical representation of the contents of the image collection human knowledge base 704. The HCl module 302 may then control the display screen 104 of the electronic device 102 to render the graphical view.
The HCl module 302 then accesses a record for a second human in the album human knowledge base 704 and controls the display screen 104 to render (step 2004) a GUI comprising the graphical view 1900 of the album including a representation (e.g., a photograph) of the second human, e.g., the related image 19046. The HCl module 302 may select the second human on the basis of a linkage score contained in the record for the first human. For example, the HCl module 302 may select, for the second human, among those humans with whom the first human has a positive linkage score.
The HCl module 302 controls the display screen 104 to render (step 2006) in the example view 1900 a connection between the representation of the first human and the representation of the second human. That is, the HCl module 302 then controls the display screen 104 to render (step 2006) a connection between the central representation 1902 and the related representation 1904B.
The HCl module 302 may control the display screen 104 to render (step 2006) the connection in a manner that provides a general representation of the linkage score that has been determined between the humans represented by the two representations. For example, the HCl module 302 may control the display screen 104 to present render (step 2006) a relatively thick line connecting representations of two humans associated, in their respective records, with a relatively high linkage score between each other. Furthermore, the HCl module 302 may control the display screen 104 to render a relatively thin line connecting the representations of two humans associated, in their respective records, with a relatively low linkage score between each other.
Notably, the central representation 1902, the related representations 1904 and the peripheral representations 1906 may be rendered in a variety of sizes of representations. The size of the representation may be representative of a prevalence of the human associated with the representation within the image collection human knowledge base 704. That is, the HCl module 302 may render in the GUI a relatively large representation associated with a human detected in a relatively high number of captured images represented in the image collection human knowledge base 704. It follows that the HCl module 302 may render in the GUI a relatively small representation associated with a human detected in a relatively low number of captured images represented in the image collection human knowledge base 704.
It is well-established that the display screen 104 of the electronic device 102 may be touch-sensitive display screen and, a user may interact with the electronic device 102 using the display screen 104.
In one aspect of the present application, the user may interact with the example view 1900 to change the focus of the example view 1900. For example, responsive to the user tapping on the related representation 1904C, the HCl module 302 may modify the example view 1900 may self-adjust so that the related representation 1904B becomes the central representation 1902 of an altered example view (not shown). The HCl module 302 may further modify the example view to adjust the relationship of the representations to the newly altered central representation. In the altered example view, the formerly central representation 1902 and the formerly peripheral representations 1906M, 1906N, 1906P will become related representations. Additionally, in the altered example view, the related representations 1904C, 1904D and 1904F will become peripheral representations.
In another aspect of the present application, the user may interact with the example view 1900 to filter the captured images in the image collection human knowledge base 704. For example, the user may wish to review captured image in which have been detected the humans associated with the central representation 1902 and two of the related representations 1904C, 1904D.
The HCl module 302 may determine the human IDs corresponding to the selected representations, and may filter the image collection human knowledge base 704 to generate (step 2110) a filtered image collection that includes only the captured images having metadata that includes all three human IDs (that is, only captured images in which all three selected humans have been recognized). For example, the HCl module 302 may query the image collection human knowledge base 704 to identify all captured images associated with metadata that includes all three human IDs, and generate the filtered image collection using those identified captured images. The HCl module 302 may render (step 2112) the table and cell style view such that only representations of captured images in the filtered image collection are shown. That is, the table and cell style view only provides access to a filtered set of captured images.
The user may then provide input to select a particular captured image, among the filtered set of captured images. Responsive to the input selecting the particular captured image, the HCl module 302 may display the captured image in a manner that takes up a majority of the display screen 104.
In the case wherein the particular captured image is a video, the selected three humans may be detected in only a particular video segment. That is, the metadata for only a particular video segment within the video includes all three human IDs. The HCl module 302 may, rather than presenting the entirety of the video from the first video image, instead present only that particular video segment where the three selected people have been detected. Alternatively, the HCl module 302 may present the entire video, but automatically play the video starting from the first frame of the particular video segment (instead of the first frame of the entire video).
As presented in
In this way, the present application provides a way to enable a user to more quickly browse a large collection of captured images, discover relationships between humans in the captured images, learn the activities of the humans and/or more effectively search captured images featuring particular humans of interest.
Unique to the example view 2200 of
The HCl module 302 may filter the image collection human knowledge base 704 to generate a filtered image collection that includes only the captured images in which all four people have been detected, for example as discussed above in detail. The HCl module 302 may render the table and cell style view such that only representations of captured images in the filtered image collection are shown. That is, the table and cell style view only provides access to a filtered set of captured images.
The user may then select a particular captured image, among the filtered set of captured images. Responsive to the selecting of a particular captured image, the captured image may be displayed in a manner that takes up a majority of the display screen 104.
The present application has described example methods and systems to enable management of images in an image collection on a human-centric basis. The examples described herein, enable automatic identification of linkages between humans in captured images, and generates data (e.g., linkage scores) to enable management of the captured images on the basis of the strength of human-centric linkages.
In some examples, the present application provides improvements for managing and searching a large number of images, on the basis of human-centric linkages. A more effective way is provided for navigating through the large number of images in the image collection.
In some examples, the present application describes methods for generating diary entries that provide information about human activities in captured images, including human-to-human activities as well as human-to-location activities.
Although the present disclosure describes functions performed by certain components and physical entities, it should be understood that, in a distributed system, some or all of the processes may be distributed among multiple components and entities, and multiple instances of the processes may be carried out over the distributed system.
Although the present disclosure describes methods and processes with steps in a certain order, one or more steps of the methods and processes may be omitted or altered as appropriate. One or more steps may take place in an order other than that in which they are described, as appropriate.
Although the present disclosure is described, at least in part, in terms of methods, a person of ordinary skill in the art will understand that the present disclosure is also directed to the various components for performing at least some of the aspects and features of the described methods, be it by way of hardware components, software or any combination of the two. Accordingly, the technical solution of the present disclosure may be embodied in the form of a software product. A suitable software product may be stored in a pre-recorded storage device or other similar non-volatile or non-transitory computer readable medium, including DVDs, CD-ROMs, USB flash disk, a removable hard disk, or other storage media, for example. The software product includes instructions tangibly stored thereon that enable a processing device (e.g., a personal computer, a server, or a network device) to execute examples of the methods disclosed herein.
The present disclosure may be embodied in other specific forms without departing from the subject matter of the claims. The described example embodiments are to be considered in all respects as being only illustrative and not restrictive. Selected features from one or more of the above-described embodiments may be combined to create alternative embodiments not explicitly described, features suitable for such combinations being understood within the scope of this disclosure.
All values and sub-ranges within disclosed ranges are also disclosed. Also, although the systems, devices and processes disclosed and shown herein may comprise a specific number of elements/components, the systems, devices and assemblies could be modified to include additional or fewer of such elements/components. For example, although any of the elements/components disclosed may be referenced as being singular, the embodiments disclosed herein could be modified to include a plurality of such elements/components. The subject matter described herein intends to cover and embrace all suitable changes in technology.