The present disclosure relates, generally, to a system and method for detecting and identifying people or objects within crowded environments, and more particularly to an image capturing system for determining the location of subjects within a crowded environment of a captured video sequence and presenting motion data extracted from the video.
In some embodiments, a method including calibrating an image capturing system, capturing a video sequence of images with the image capturing system, detecting a subject of interest in the video, tracking the subject over a period of time, and extracting data associated with a motion of the subject based on the tracking may be provided.
In some embodiments of the present disclosure, a method may be provided that includes calibrating an image capturing system, capturing a video sequence of images with the image capturing system, applying a crowd segmentation process to the video sequence to isolate the subject, tracking the subject over a period of time, and extracting data associated with a motion of the subject based on the tracking.
In some embodiments herein, the calibrating may include an internal calibration process and an external calibration process for the image capturing system. In some embodiments, the calibrating of the image capturing system may be accomplished relative to a location of the image capturing system and includes determining geometrical information associated with the location.
In some embodiments a system is provided. The system may include image capturing system and a computing system connected to the image capturing system. Further, the computing system may be adapted to calibrate the image capturing system, detect a subject of interest in a video sequence captured by the image capturing system, track the subject over a period of time, and extract data associated with a motion of the subject based on the tracking.
In some embodiments, methods and systems in accordance with the present disclosure may visually and, in some instances automatically, extract information from a live or a recorded broadcast sequence of video images (i. e., a video). The extracted information may be associated with one of more subjects of interest captured in the video. In some instances, the extracted information may pertain to motion parameters for the subject. The extracted data may be further presented to a viewer or user of the data in a format and manner that is easily understood by the viewer.
Since the information is extracted or derived from the video image, the viewer is presented with more information than may be available in the original video sequence. The extracted information may provide the basis for a wide variety of generated statistics and visualizations. Such produced statistics and visualizations may be presented to a viewer to enhance a viewing experience of the video sequence.
In some embodiments, a method for remote visual estimation of at least one parameter associated with a subject of interest is provided herein. In particular instances, the at least one parameter may be a speed, direction, acceleration, and other motion parameters associated with the subject. The method may include capturing the subject on video and using, for example, computer vision techniques and processes, to extract data for estimating motion parameters associated with the subject.
In an effort to accurately correlate an image captured by the image capturing system with the real-world in which the image capturing system and images captured thereby exist, the image capturing system is calibrated. The calibration of the image capturing system may include an internal calibration wherein a camera device and other components of the image capturing system are calibrated relative to parameters and characteristics of the image capturing system. Further, the image capturing system may be externally calibrated to provide an estimation or determination of a relative location and pose of camera device(s) of the image capturing system with regards to a world-centric coordinate framework. A desired result of the calibration process of operation 105 is an accurate estimation of a correlation between real world, 3-dimensional (3D) coordinates and an image coordinate view of the camera device(s) of the image capturing system.
The calibration process of operation 105 may include the acquisition and determination of certain knowledge information of the location of the image capturing system. The information regarding the location of the image capturing system may be referred to herein as geometrical information. For example, in an instance the image capturing system is deployed at a sporting event, the calibration process may include learning and/or determining the boundaries of the arena, field, field of play, or parts thereof. In this manner, knowledge of the extent of a field of play, arena, boundaries, goals, ramps, and other fixtures of the sporting event may be used in other processing operations.
In some embodiments, the geometrical information and other data relating to the calibration process of operation 105 may be used in coordinating and reconciling images captured by more than one camera device belonging to the image capturing system.
At operation 110, a sequence of video images or a video is captured by the image capturing system. The video may be captured from multiple angles in the instances multiple camera devices located at more than one location are used to capture the video simultaneously.
At operation 115, a process to detect a subject of interest in the captured video is performed. The process of detecting the subject may be based, in part, on the knowledge or geometrical information obtained in the calibration operation 105. In some embodiments, such as the context of a sporting event, known characteristics of the field such as the location of the playing surface relative to camera, the boundaries of the field, an expected range of motion for the players in the arena (as compared to non-players) may be used in the detection and determination of the subject of interest.
In some embodiments, the subject(s) of interest may be detected by determining objects in the foreground of the captured video by a process such as, for example, foreground-background subtraction. Detection processes that involve determining objects in the foreground may be used in some embodiments herein, particularly where the subject of interest has a tendency to move relative to a background environment. The subject detection process may further include processing using a detection algorithm. The detection algorithm may use geometrical information, including that information obtained during calibration process 105, and image information associated with the foreground processing to detect the subject of interest.
It should be appreciated that other techniques and processes to detect the subject(s) of interest in the captured video and compatible with other aspects of the present disclosure may be used in operation 115.
In some embodiments, a further complexity may be encountered in that the subject of interest may be in close proximity with other subjects and objects. In some embodiments, the particular subject of interest may be in close proximity with other subjects of similar size, shape, and/or orientation. In these and other instances, operation 120 provides a mechanism for isolating the subject of interest from the other objects and subjects. In particular, operation 120 provides a crowd segmentation process to separate and isolate the subject of interest from a “crowd” of other objects and subjects.
In accordance herewith, either operation 115 or 120 may be applied or used in processing a video sequence. In some embodiments, the use of either operation 115 or operation 120 may be based on the images captured or processed by the methods and systems herein.
At operation 125, the subject of interest, having been visually detected in the captured video and separated from the background and other objects and subjects, is tracked over a period of time. That is, location information associated with the subject of interest is determined for the subject of interest for a successive number of images of the captured video. The location data associated with the subject of interest over a period of time is also referred to herein as motion data.
The motion data provides an indication of the motion of the subject of interest. In some embodiments, the motion data associated with the subject of interest may be estimated or determined using geometrical knowledge of the image capturing system and the captured video that is obtained or learned by the image capturing system or available to the image capturing system.
In some embodiments, motion data associated with the subject of interest over a period of time uses fewer than each and every successive image of the captured video. For example, the tracking aspects herein may use a subset or “key” images of the captured video (e.g., 50% of the captured video).
Tracking operation 125 may include a process of conditioning or filtering the motion data associated with the subject of interest to provide, for example, a smooth, stable, or normalized version of the motion data.
At operation 130, a data extracting process extracts data associated with the motion data. The extracted data may include determining or deriving a speed, a maximum speed, a direction of motion, an acceleration, an average acceleration, a total distance traveled, a height jumped, a hang time calculation, and other parameters related to the subject of interest. For example, in the context of a sporting event, the extracted data may provide, based on the visual detection and tracking of the subject of interest, the speed, acceleration, acceleration, average speed and acceleration, and total distance ran by the player on a specific play or, for example, in a period, quarter, or the entirety of the game up to a particular instance in time.
Some aspects of process 100 may be invoked and performed in an automatic manner. For example, calibration operation 105 may comprise an auto-calibration process for the image capturing system.
At operation 210, the extracted data associated with a motion of a subject (i.e., motion data) is presented to a viewer or user. As illustrated at 215, 220, and 225, the extracted data may be provided to a number of destinations including, for example, a broadcast of the video. The processes disclosed herein are preferably sufficiently efficient and sophisticated to permit the extraction and presentation of motion data substantially in real time during a live broadcast of the captured video to either one or all of the destinations of
In some embodiments, data extracted from a video sequence of a subject may be communicated or delivered to a viewer in one or more ways. For example, the extracted data may be generated and presented to a viewer during a live video broadcast or during a subsequent broadcast (215). In some instances, the extracted data may be provided concurrently with the broadcast of the video, on separate communications channel in a format that is the same or different than the video broadcast. In some embodiments, the broadcast embodiments of the extracted motion data presentation may include graphic overlays. In some embodiments, a path of motion for a subject of interest may be presented in one or more of a video graphics overlay. The graphics overlay may include a location, a line, a pointer, or other indicia to indicate an association with the subject of interest. Text including one or more of an extracted data (e.g., statistic) related to the motion of the subject may be displayed alone or in combination with the subject and/or the path of motion indicator. In some embodiments, the graphics overlay may be repeatedly updated over time as a video sequence changes to provide an indication of a past and a current path of motion (i.e., a track). In some embodiments, the graphics overlay is repeatedly updated and re-rendered so as not to obfuscate other objects in the video such as, for example, other objects in a foreground of the video.
In some embodiments, at least a portion of the extracted data may be used to re-visualize the event(s) captured by the video (225). For example, in a sporting event environment, the players/competitors captured in the video may be represented as models based on the real world players/competitors and re-cast in a view, perspective, or effect that is the same as or different from the original video. One example may include presenting a video sequence of a sporting event from a view or angle not specifically captured in the video. This re-visualization may be accomplished using computer vision techniques and processes, including those described herein, to represent the sporting event by computer generated model representations of the players/competitors and the field of play using, for example, the geometrical information of the image capturing system and knowledge of the playing field environment to re-visualize the video sequence of action from a different angle (e.g., a virtual “blimp” view) or different perspective (e.g., a viewing perspective of another player, a coach, or fan in a particular section of the arena).
In some embodiments, data extracted from a video sequence may be supplied or otherwise presented to a system, device, service, service provider, or network so that a system, device, service, service provider, or network may use the extracted data to update an aspect of the service, system, device, service provider, network, or resource with the extracted data. For example, the extracted data may be provided to an online gaming network, service, service provider, or users of such online gaming networks, services, service providers to update aspects of an online gaming environment. An example may include updating player statistics for a football, baseball, or other type of sporting event or other activity so that the gaming experience may more closely reflect real-world conditions. In yet another example, the extracted data may be used to establish, update, and supplement a fantasy league related to real-word sports/competitions/activities.
In some embodiments, at least a portion of the extracted data may be presented for viewing or reception by a viewer or other user of the information via a network such as the Web or a wireless communication link interfaced with a computer, handheld computing device, mobile telecommunications device (e.g., mobile phone, personal digital assistant, smart phone, and other dedicated and multifunctional devices) including functionality for presenting one or more of video, graphics, text, and audio (220).
The image capturing system including multiple cameras may thus provide a mechanism for a variety of visualizations in accordance with the present disclosure due, at least in part, to the number of perspectives captured by the plurality of cameras.
It should be noted that telemetry data for at least some of the other players shown in image 400 may be determined in addition to the data displayed in the graphics overlay for players 405 and 420. In some embodiments, the telemetry information for all of the players in an image is determined, whether or not such information is presented in combination with a broadcast of the video. The determined and processed telemetry data may be presented in other forms, at other times, and to other destinations.
In some embodiments, a visualization in accordance herewith may include a presentation of a rotation exhibited by a subject. For example, a visualization such as that of
In some embodiments, a visualization in accordance herewith may include a presentation of an articulation exhibited by a subject. The articulation of a subject may be determined and tracked by, for example, marking or keying on the location of the limbs of the subject.
In some embodiments, a re-visualization of a captured image may provide a rendering of the image from a perspective or angle different than that depicted in the captured image. Such an alternate perspective presentation may be facilitated, in part, by the use of more than one image capture device in an image capturing system. Some of the example views that may be derived or generated and presented based on the captured image and operations herein include, a top-down view (e.g.,
In some embodiments of the methods, processes, and systems herein, a plurality of efficient and sophisticated visual detection, tracking, and analysis techniques and processes may be used to effectuate the visual estimations herein. The visual detection, tracking, and analysis techniques and processes may provide results based on the use of a number of computational algorithms related to or adapted to vision-based video technologies.
While the disclosure has been described in detail in connection with only a limited number of embodiments, it should be readily understood that the disclosure is not limited to such disclosed embodiments. Rather, the disclosure embodiments may be modified to incorporate any number of variations, alterations, substitutions or equivalent arrangements not heretofore described, but which are commensurate with the spirit and scope of the invention. Accordingly, the disclosure is not to be seen as limited by the foregoing description.