Embodiments of systems and methods for video analysis are provided herein. In a first embodiment, a method for providing an analysis includes four steps. The first step is the step of identifying a target by a computing device. The target is displayed from a video through a display of the computing device. The second step of the method is the step of receiving a query related to the identified target via a user input to the computing device. The third step of the method is the step of generating a search result based on the video. The search result comprises information relating to the identified target. The fourth step is the step of displaying the search result through the display of the computing device.
In a second embodiment, a system for video analysis is provided. The system includes a target identification module, an interface module, a search result module, and a display module. The target identification module is configured for identifying a target from the video supplied to a computing device. The interface module is in communication with the target identification module. The interface module is configured for receiving a query related to the identified target via a user input to the computing device. The search result module is in communication with the interface module. The search result module is configured to generate a search result based on the video. The search result comprises information related to the identified target. The display module is in communication with the search result module. The display module is configured to display the search result through the display of the computing device.
According to a third embodiment, a system for generating a search result based on an analysis is supplied. The system includes a processor and a computer readable storage medium. The computer readable storage medium includes instructions for execution by the processor which causes the processor to provide a response. The processor is coupled to the computer readable storage medium. The processor executes the instructions on the computer readable storage medium to identify a target from a video supplied to a computing device, receive a query related to the identified target, and generate the search result based on the video. The search result comprises information related to the identified target.
There are inherent difficulties associated with searching and analyzing data using existing technologies. Existing technologies are time-consuming, inconvenient, unreliable, and provide false positives. Furthermore, existing technologies have a tendency not to be helpful insofar that they cannot reduce or filter a large set of data to a meaningful subset for presentation to a user.
In contrast, the technology presented herein provides embodiments of systems and methods for providing analysis in a convenient and meaningful presentation that is beneficial to the user. Specifically, systems and methods for providing data analysis and generating reliable search results are provided herein. Such systems and methods may be based on queries. Queries may include rules that may be configurable by the user. In other words, the user may be given the flexibility to define the rules. Such user-defined rules may be created, saved, edited, and re-applied to data of any type, including but not limited to data streams, data archives, and data presentations. The technology provided herein may be user-extensible. For instance, the user is provided with the means to define rules, searches, and user selections (such as user selections regarding data sources, cameras, targets, triggers, responses, time frames, and the like).
Moreover, the technology described herein provides systems and methods for providing the user with a selection of existing rules and/or time frames to execute searches. Also, data may be pre-processed to generate metadata, which may then be searched with one or more rules. For instance, metadata in video may be searched using user-configurable rules for both real-time and archive searches. As will be described in greater detail herein, metadata in video may be associated with camera, target and/or trigger attributes of a target that is logged for processing, analyzing, reporting and/or data mining methodologies. Metadata may be extracted, filtered, presented, and used as keywords for searches. Metadata in video may also be accessible to external applications.
The technology herein may also utilize, manipulate, or display metadata for searching data archives. In some embodiments, the metadata may be associated with a video. For instance, metadata in a video may be useful to define and/or recognize triggered events according to rules that are established by a user. Metadata may also be useful to provide only those videos or video clips that conform to, the parameters set by a user through rules. By doing this, videos or video clips that only include triggered events as identified by the user are provided to the user. Thus, the user is not presented with a search result having hundreds or thousands of videos, but rather a much smaller set of videos that meet the user's requirements as set forth in rules. Further discussion regarding the use of metadata in video will be provided herein.
The technology may be implemented through a variety of means, such as object recognition, artificial intelligence, hierarchical temporal memory (HTM), any technology that recognizes patterns found in objects, and any technology that can establish categories of objects. However, one skilled in the art will recognize that this list is simply an exemplary one and the technology is not limited to a single type of implementation.
One skilled in the art will recognize that although some embodiments are provided herein for video analysis, any type of analysis from any data source may be utilized with this technology. For instance, instead of a video source, an external data source (such as a web-based data source in the form of a news feed) may be provided instead. The technology is flexible to utilize any data source, and is not restricted to only video sources or video streams.
The computing device 120 may be a computer, a laptop computer, a desktop computer, a mobile communications device, a personal digital assistant, a video player, an entertainment device, a game console, a GPS device, a networked sensor, a card key reader, a credit card reader, a digital device, a digital computing device and any combination thereof. The computing device 120 preferably includes a display (not shown). One skilled in the art will recognize that a display may include one or more browsers, one or more user interfaces, and any combination thereof. The display of the computing device 120 may be configured to show one or more videos. A video may be a video feed, a video scene, a captured video, a video clip, a video recording, or any combination thereof.
The network 110 may also be configured to couple to one or more video sources 130. The video may be provided by one or more video sources 130, such as a camera, a fixed security camera, a video camera, a video recording device, a mobile video recorder, a webcam, an IP camera, pre-recorded data (e.g., pre-recorded data on a DVD or a CD), previously stored data (including, but not limited to, previously stored data on a database or server), archived data (including but not limited to, video archives or historical data), and any combination thereof. The computing device 120 may be a mobile communications device that is configured to receive and transmit signals via one or more optional towers 140.
Still referring to
Notably, one skilled in the art can recognize that all the figures herein are exemplary. For all the figures, the layout, arrangement and the number of elements depicted are exemplary only. Any number of elements may be used to implement the technology of the embodiments herein. For instance, in
Turning to
Any aspect of the method 200 may be user-extensible. For example, the target, the query, the search result, and any combination thereof may be user-extensible. The user may therefore define any aspect of the method 200 to suit his requirements for analysis. The feature of user-extensibility allows for this technology to be more robust and more flexible than the existing technology. Users may combine targets, queries, and search results in various combinations to achieve customized results.
Still referring to
Also, at step 202, identifying the target from a video may include receiving a selection of a predefined object. For instance, preprogrammed icons depicting certain objects (such as a person, a pet or a vehicle) that have already been learned and/or otherwise identified by the software program may be shown to the user through a display of the computing device 120. Thus, the user may then select a predefined object (such as a person, a pet or a vehicle) by selecting the icon that best matches the target. Once a user selects an icon of the target, the user may drag and drop the icon onto another portion of the display of the computing device, such that the icon (sometimes referred to as a block) may be rendered on the display. Thus, the icon may become part of a rule (such as the rule 405 shown in
The technology allows for user-extensibility for defining targets. For instance, a user may “teach” the technology how to recognize new objects by assigning information (such as labels or tags) to clips of video that include the new objects. Thus, a software program may “learn” the differences between categories of pets, such as cats and dogs, or even categories of persons, such as adults, infants, men, and women. Alternatively, at step 202, identifying the target from a video may include recognizing an object based on a pattern. For instance, facial patterns (frowns, smiles, grimaces, smirks, and the like) of a person or a pet may be recognized.
Through such recognition based on a pattern, a category may be established. For instance, a category of various human smiles may be established through the learning process of the software. Likewise, a category of variety of human frowns may be established by the software. Further, a behavior of a target may be recognized. Thus, the software may establish any type of behavior of a target, such as the behavior of a target when the target is resting or fidgeting. The software may be trained to recognize new or previously unknown objects. The software may be programmed to recognize new actions, new behaviors, new states, and/or any changes in actions, behaviors or states. The software may also be programmed to recognize metadata from video and provide the metadata to the user through the display of a computing device 120.
In the case where the target is a motion sequence, the motion sequence may be a series of actions that are being targeted for identification. One example of a motion sequence is the sequence of lifting a rock and tossing the rock through a window. Such a motion sequence may be preprogrammed as a target. However, as described earlier, targets may be user-extensible. Thus, the technology allows for users to extend the set of targets to include targets that were not previously recognized by the program. For instance, in some embodiments, targets may include previously unrecognized motion sequences, such as the motion sequence of kicking a door down. Also, targets may even include visual, audio, and both visual-audio targets. Thus, the software program may be taught to recognize a baby's face versus an adult female's face. The program may be taught to recognize a baby's voice versus an adult female's voice.
At step 204, a query related to the identified target is received via a user input to the computing device 120. The query may be stored on a computer readable storage medium (not shown). The query may include one or more user-defined rules. Rules may include source selection (such as video source selection), triggers, and responses. Rules are described in further detail in the U.S. patent application Ser. No. ______ filed on Feb. 9, 2009, titled “Systems and Methods for Video Monitoring,” which is hereby incorporated by reference.
The query may include an instruction to provide one or more clips of one or more videos based on a specific time period or time frame. One skilled in the art will recognize that the time period can be of any measurement, including but not limited to days, weeks, hours, minutes, seconds, and the like. For instance, the query may include an instruction to provide all video clips within the last 24 hours. Another example is the query may include an instruction to provide all video clips for the last 2 Thursdays. Alternatively, the query may include an instruction to provide all video clips regardless of a video timestamp. This is exemplified by a time duration field 760 showing “When: Anytime” in
The query may include an instruction to provide one or more videos from one or more video sources. A user may define which video source(s) should be included in the query. An example is found in
The query may comprise an instruction to provide a video clip regarding the identified target. The identified target may include one or more persons, vehicles or pets. The identified target may be a user-defined target. User-defined targets are discussed at length in the U.S. patent application Ser. No. ______ filed on Feb. 9, 2009, titled “Systems and Methods for Video Monitoring,” which is hereby incorporated by reference. The query may include an instruction to provide a video clip showing an identified target within a region. For instance, a query may include an instruction to provide video clips that show people within a region designated by the user. The user may designate a region by drawing a box (such as a bounding box), circle or other shape around a region that can be viewed by a video source.
At step 206, a search result is generated. As mentioned previously, the search result may be based on any type of data. The search result may be based on one or more videos captured by one or more video sources. The search result may include information related to the identified target. Generating the search result may include filtering the video based on the query. One skilled in the art will recognize that there is a multitude of ways to filter videos. For instance, filtering videos based on a query can be accomplished by using metadata that is associated with the videos being analyzed. As discussed previously, this technology may extract, identify, utilize and determine the metadata that is associated with videos. Due to the object recognition aspects and the sophisticated higher level learning of this technology, the metadata may include metadata relating to identified targets, attributes regarding identified targets, timestamps of videos or clips of videos, source settings (such as video source location or camera location), recognized behaviors, patterns, states, motion sequences, user-defined regions as captured by videos, and any further information that may be garnered to execute a query. One skilled in the art will recognize that this list of metadata that can be determined by this technology is non-exhaustive and is exemplary.
Still referring to step 206, generating the search result may include providing one or more video clips with a text description of the information related to the identified target. The text description of a given video clip may be all or part of a query, a rule, and/or metadata associated with the video clip. For instance, based on the object recognition aspects of this technology, the technology may recognize a user's pet dog. If the user's pet dog is seen moving in a designated region based on a video, then the generation of the search result may include providing the video clip of the dog in the region with the location of the video source. In
The text description may include further information about the identified target, based on a query, a rule and/or metadata associated with the video clip. For instance, the thumbnail 860 of the video clips of “Pet—Living Room Camera” 850 (as shown in
Generating the search result may include providing a thumbnail of the video or video clip which may include a bounding box of the identified target that matched an executed search query. In the previous example, the bounding box 870 of the identified target (a pet named Apollo) is shown to the user on the display of a computing device. Alternatively, generating the search result may show a frame where the identified target matched an executed search query (such as the frame 860 of the pet Apollo in
At step 208, the search result is displayed to the user. The search result may be displayed to the user on a display of a computing device 120. The search result may be presented in any format or presentation. One type of format is displaying the search results in a list with thumbnails for each of the video clips that match the search query or criteria, as described earlier herein. Both
The method 200 may include steps that are not shown in
According to one exemplary embodiment, the target identification module 310 is configured for identifying a target from the video supplied to a computing device 120 (
The search result module 340 is configured to filter the video based on the query. The search result module 340 may be configured to provide the video with a text description of the information related to the identified target. The information related to the identified target may include metadata associated with the clip of the video, or it may include all or part of the query. The search result module 340 is also configured to provide a thumbnail of the video clip, as described earlier herein.
The system 300 may comprise a processor (not shown) and a computer readable storage medium (not shown). The processor and/or the computer readable storage medium may act as one or more of the four modules (i.e., the target identification module 310, the interface module 320, the search result module 330, and the display module 340) of the system 300. It will be appreciated by one of ordinary skill that examples of computer readable storage medium may include discs, memory cards, servers and/or computer discs. Instructions may be retrieved and executed by the processor. Some examples of instructions include software, program code, and firmware. Instructions are generally operational when executed by the processor to direct the processor to operate in accord with embodiments of the invention. Although various modules may be configured to perform some or all of the various steps described herein, fewer or more modules may be provided and still fall within the scope of various embodiments.
Turning to
Still referring to
Once a video source 440 is selected and displayed as part of the rule 405 (such as the selected side camera video source icon 445), the user may define the target that is to be identified by a computing device. Preferably, the user may select the “Look for” icon 450 on a left portion of the display of the computing device. Then, a selection of preprogrammed targets is provided to the user. The user may select one target (such as “Look for: People” icon 455 as shown in the exemplary rule 405 of
The user may select one or more triggers. The user may select a trigger via a user input to the computing device 120. A plurality of trigger icons 460, 465 may be provided to the user for selection. Trigger icons depicted in
The bounding box may track an identified target. Preferably, the bounding box may track an identified target that has been identified as a result of an application of a rule. The bounding box may resize based on the dimensions of the identified target. The bounding box may move such that it tracks the identified target as the identified target moves in a video. For instance, a clip of a video may be played back, and during playback, the bounding box may surround and/or resize to the dimensions of the identified target. If the identified target moves or otherwise makes an action that causes the dimensions of the identified target to change, the bounding box may resize such that it may surround the identified target while the identified target is shown in the video, regardless of the changing dimensions of the identified target.
Also, the “Look Where” pane 430 may allow the user to select a radio button that defines the location attribute of the identified target as a trigger. The user may select the option that movement “Anywhere” is a trigger. The user may select the option that “inside” a designated region (such as “the garden”) is a trigger. Similarly, the user may select “outside” a designated region. The user may select an option that movement that is “Coming in through a door” is a trigger. The user may select an option that movement that is “Coming out through a door” is a trigger. The user may select an option that movement that is “Walking on part of the ground” (not shown) is a trigger. In other words, the technology may recognize when an object is walking on part of the ground. The technology may recognize movement and/or object in three-dimensional space, even when the movement and/or object is shown on the video in two dimensions. Further, the user may select an option of “crossing a boundary” is a selected trigger.
If the “When” icon 465 is selected, then the “Look When” pane (not shown) on the right side of the display is provided to the user. The “Look When” pane may allow for the user to define the boundaries of a time period that the user wants movements to be monitored. Movement may be monitored when motion is visible for more than a given number of seconds. Alternatively, movement may be monitored for when motion is visible for less than a given number of seconds. Alternatively, movement may be monitored within a given range of seconds. In other words, a specific time duration may be selected by a user. One skilled in the art that any measurement of time (including, but not limited to, weeks, days, hours, minutes, or seconds) can be utilized. Also, one skilled in the art may appreciate that the user selection can be through any means (including, but not limited to, dropping and dragging icons, checkmarks, selection highlights, radio buttons, text input, and the like).
Still referring to
If the Notify icon 472 is selected, then a notification may be sent to the computing device 120 of the user. A user may select the response of “If seen: Send email” (not shown) as part of the notification. The user may drag and drop a copy of the Notify icon 472 and then connect the Notify icon 472 to the rule 405.
As described earlier, a notification may also be sending a text message to a cell phone, sending a multimedia message to a cell phone, or a notification by an automated phone. If the Report icon 474 is selected, then a generation of a report may be the response. If the Advanced icon 476 is selected, the computer may play a sound to alert the user. Alternatively the computer may store the video onto a database or other storage means associated with the computing device 120 or upload a video directly to a user-designated URL. The computer may interact with external application interfaces, or it may display custom text and/or graphics.
Another embodiment is where Boolean language is used to apply to multiple triggers for a particular target. For instance, Boolean language may be applied, such that the user has instructed the technology to locate a person “in the garden OR (on the sidewalk AND moving left to right).” With this type of instruction, the technology will locate either persons in the garden or persons that are on the sidewalk who are also moving left to right. As mentioned above, one skilled in the art will recognize that the user may include Boolean language that apply for both one or more targets(s) as well as one or more trigger(s).
A further embodiment is a rule 505 that includes Boolean language that provides a sequence (such as “AND THEN”). For instance, a user may select two or more triggers to occur in a sequence (e.g., “Trigger A” happens AND THEN “Trigger B” happens. Further, one skilled in the art will understand that a rule 505 includes one or more nested rules, as well as one or more rules in a sequence, in a series, or in parallel. Rules may be ordered in a tree structure with multiple branches, with one or more responses coupled to the rules.
As shown in
Now referring to
In the example provided in
The first timeline 665 is from 8 am to 4 pm. The first timeline 665 shows five vertical lines. Each vertical line may represent the amount of time in which movement was detected according to the parameters of the rule application “People—Walking on the lawn” 660. In other words, there were five times during the time period of 8 am to 4 pm in which movement was detected that is likely to be people walking on the lawn. The second timeline 675 is also from 8 am to 4 pm. The second timeline 675 shows only one vertical line, which means that in one time period (around 10:30 am), movement was detected according to the parameters of the rule application “Pets—In the Pool” 670. According to
In
Search results may filter existing video to display to the user only the relevant content. In the case of quick searches, the relevant content may be that content which matches or fits the criteria selected by the user. In the case of rule searches (which will be discussed at length in conjunction with
In
In
Controls for videos 780 may be provided to the user. The user may be able to playback, rewind, fast forward, or skip throughout a video using the appropriate video controls 780. The user may also select the speed in which the user wishes to view the video using a playback speed control 785. Also, a timeline control 790 that shows all the instances of a current search over a given time period may be displayed to the user. In
Turning to
A rule may be saved by a user. In
As earlier described, rules may be modified or edited by a user. A user may edit a rule by selecting a rule and hitting the “Edit” button 820. Thus, a user may change any portion of a rule using the “Edit” button. For instance, a user may select a rule and then the user may be presented with the rule as it currently stands in the rule editor 400 (
Rules may be uploaded and downloaded by a user to the Internet, such that rules can be shared amongst users of this technology. For example, a first user may create a sprinkler rule to turn on the sprinkler system when a person jumps a fence and enters a region. The first user may then upload his sprinkler rule onto the Internet, such that a second user can download this the first user's sprinkler rule. The second user may then use the first user's sprinkler rule in its entirety, or the second user may modify the first user's sprinkler rule to add that if a pet jumps the fence and enter the region, then the sprinkler will also activate. The second user may then upload the modified sprinkler rule onto the Internet, such that the first user and any third party may download the modified sprinkler rule.
Also, rules may be defined for archival searches. In other words, videos may be archived using a database or an optional video storage module (not shown) in the system 300 (
Turning now to
The technology mentioned herein is not limited to video. External data sources, such as web-based data sources, can be utilized in the system 100 of
While the invention is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the invention to the specific form or forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the invention.
This application is related to the U.S. patent application Ser. No. ______ filed on Feb. 9, 2009, titled “Systems and Methods for Video Monitoring,” which is hereby incorporated by reference.