Capturing video images of an object that moves from one location to another requires changing the orientation of the video imager as the object changes locations. While this is not difficult to accomplish when a person manually changes the imager orientation, it is not such a simple task when automated tracking is required. Moreover, manually tagging a video data stream after the video has been captured to indicate the location in the video of the depiction of events is well known. However, automated tagging of a video stream in real time to indicate the location of video corresponding to events is not such a simple task. There has been a need for improvement in the automatic tracking of objects to be captured on video as the object move about from one place to another. In addition, there has been a need for automatic real-time tagging of video streams with event information.
The following description is presented to enable any person skilled in the art to make and use a system, method and article of manufacture to track the position of a target object, to detect the occurrence of events associated with the tracked object and to create a video record of the tracked object that is associated with indicia of the detected events and indicia of portions of the video record that correspond to the detected events. Various modifications to the preferred embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the invention. Moreover, in the following description, numerous details are set forth for the purpose of explanation. However, one of ordinary skill in the art will realize that the invention might be practiced without the use of these specific details. In other instances, well-known structures and processes are shown in block diagram form in order not to obscure the description of the invention with unnecessary detail. Where the same item is shown in different drawings that item is marked with the same reference numeral in each drawing in which it appears. Thus, the present invention is not intended to be limited to the embodiments shown, but is to be accorded the widest scope consistent with the principles and features disclosed herein.
Referring again to
The remote device 102 microphone 110 acts as an audio sensor to sense sound information imparted to the remote device 102. For example, the sound may be produced when a baseball bat hits a baseball or when a person speaks. Moreover, the base device microphone 124 also acts as an audio sensor to sense sound information. As explained below with reference to audio analysis block 402, a difference in the arrival time of sound at the remote device 102 and the arrival time of the same sound at the base device 104 provides a measure of distance between the two devices. In alternate embodiments, audio analysis block 402 can be part of the imager system 116.
The remote device accelerometer 112 acts as a motion sensor to detect motion imparted to the remote device 102. The accelerometer detects motion of the remote device 102. When the remote device is disposed upon an object 103 (e.g. a person's head, arm, torso, etc), the motion of the remote device 102 indicates motion of that object. In some embodiments, the accelerometer outputs multi-dimensional acceleration data that is filtered (e.g. noise removal filters) and integrated to produce tracking data that is indicative of a change of position since last measurement using algorithms known in the art (e.g., dead reckoning). In some embodiments a three axis accelerometer is used that provides an acceleration motion value for each of the three dimensions. The remote device 102 transmits motion data generated by the accelerometer 112 over the RF communication channel to the base 104 where computation of position based upon the motion data occurs. Moreover in alternative embodiments, the accelerometer may be employed as part of a more robust inertial navigation system (INS) that uses computer processing, linear motion sensors (accelerometers) and rotation motion sensors (gyroscopes) to continuously calculate the position, orientation, and velocity (direction and speed of movement) of the tracked object without the need for external references. Gyroscopes measure the angular velocity of the object in an inertial reference frame. By using the original orientation of the system in an inertial reference frame as the initial condition and integrating the angular velocity, the current orientation of the tracked object can be determined at all times.
User input is received via the user interface (UI) 114.
Audio sensor information, and motion sensor information and the UI input information are communicated to the base device 104 through RF transmission. During initialization, when the remote device 102 and the base device 104 first begin a communication session, an RF communication channel is established between the remote device 102 and the base device 104. Establishing the RF channel involves synchronization of communications signals between the devices. In some embodiments, the synchronization involves establishing a unique time basis, such as an agreement between the remote device and the base device on specific time slots for prescribed categories dedicated communication between them to take place. Alternatively, for example, the synchronization involves setting unique RF frequencies for communication.
The remote IR transmitter 106 produces an IR signal to act as a beacon to indicate the remote device position. The IR signal produced by the remote IR transmitter 106, which is a first IR signal, proceeds in a first direction that follows a path. In the example illustrated in
The base device IR transmitter 120 emits IR signals similar to those emitted by the remote device IR transmitters 106 but at a different unique time basis (e.g., during a different time slot). The base device IR signal, which is a second IR signal, proceeds in a second direction toward the tracked object 103. The second direction is represented by the arrow from the base IR transmitter 120 to the tracked object 103. The base device IR signal reflects off the tracked object 103 in a third direction represented by the arrow from the tracked object to the base device IR sensor 118, and the base device IR sensor 118 detects the reflected IR signal. The base IR transmitter is aligned to point in the same direction as the quad cell sensor. The reflection from the tracked subject is expected to come directly back into the quad cell sensor. The base device IR signals also can act as a backup to the remote IR signals. For example, the base device IR signals provide for more robust tracking through detection of reflected base device IR signals from the tracked object 103 that continue to track the object even when the remote device IR signals become temporarily blocked or out of line of sight.
Additionally, data may be transmitted over the remote device IR channel as a backup or supplement to the RF communications channel such as the remote device's unique identifier, accelerometer and other sensor information, such as UI control buttons actuated by a user.
The imager 116 implements one or more object recognition algorithms. Once an object to be tracked has been identified within a field of view of the imager 116, the imager follows the object within the imager field of view using known image recognition techniques. In some embodiments, video captured by the imager is evaluated frame by frame to track object movement. For example, in some embodiments, a known face detection algorithm is used to recognize a face within a video image and to track movement of the face within the imager field of view. An initial position of the tracked object is obtained by the imager 116 at the start of a tracking operation based upon IR signals detected by the IR sensor 118. Alternatively, a user initially may point the imager 116 at the tracked object at the start of the tracking operation. Subsequently, the imaging system employs object recognition techniques to independently track the object within the imager field of view based upon the object recognition algorithms.
A sensor fusion process 404 determines position of the tracked object 103 as it moves. The base device servo system 117 adjusts the orientation of the imager 116 to position it continue to track object 103 as it changes position. In some embodiments, the fusion algorithm employs a Kalman filter process to track target object position. A Kalman filter process produces estimates of the true values of measurements and their associated calculated values by predicting a value, estimating the uncertainty of the predicted value, and computing a weighted average of the predicted value and the measured value. In general, in a Kalman filter process, the most weight is given to the value with the least uncertainty. Thus, the sensor fusion process 404 receives as input potentially noisy input data from multiple sensors (e.g., accelerometer, audio, IR, UI and imager) and fuses the data to determine an estimate of the instantaneous position of the tracked object 103. It will be appreciated that the noise is generated by uncertainty of measurement and not by inherent sensor signal noise. The sensor fusion process 404 runs periodically to update a determined position of the tracked object 103 at prescribed time increments. In some embodiments, the time increments correspond to time intervals in which a time stamp is (or may be) associated with captured video images so as to more easily tag the captured video with target object positions that are associated with time stamps to indicate the portions of the video that corresponds to the computed positions. The position information computed by the fusion process 404 is stored in the storage device 310 for use by the servo control system 117 for tracking, for validity checks and history, for example.
Validity process 406 checks the validity of a target object position computed according to the fusion process 404. In some embodiments, a set of rules for target position data are built in a table to assess tracking validity. If a determined position does not satisfy one of the rules, then the determined position is invalid and is discarded. If a determined position satisfies the one or more validity rules then the determined position is valid and the determined position is passed to the servo 117 and is stored together with a time stamp to indicate it's time of occurrence relative to portions of the video image stream.
Varying states of the magnitude of the IR signal (either generated by the remote device 102 or reflected from the tracked object) represent digital data in the form of zeros (no IR light) or ones (IR light present). The magnitude of the IR signal is the sum of four cells:
Mag=A+B+C+D.
The horizontal target position (or azimuth) is defined by the difference of the horizontally aligned cells:
Az=((B+C)−(A+D))/Mag
The vertical target position (or elevation) is defined by the difference of the vertically aligned cells:
El=((A+B)−(C+D))/Mag
In some embodiments, distance information as well as received magnitude of the IR signal (in the base device) is used to communicate back to the remote device the amount of gain to use for IR LEDs. The base device measures IR signal strength using Magnitude value from the quad cell IR sensor. If the Magnitude signal is greater or smaller than the specified parameters (pre-programmed in the base device) then the base instructs the remote device via RF communications to decrease or increase the gain of the IR signal.
Referring again to
It will be noted that although tracking and event detection described herein involve physical objects in the physical world, position and event information collected through these tracking and tagging processes can be used to guide the motion of virtual objects in a virtual world. For example, position and event information gathered about movements and actions of a real person can be translated to virtual movements and virtual actions of a virtual animated person in a virtual world, such as in a video game scenario. In other words, an animated object (e.g., an animated character) is caused to minor or mimic the movements and actions of the real object (e.g., a person) that is tracked based upon position and event information gathered about that real object.
As an alternative embodiment, the servo system 117 does not provide mechanical tilt in base device. Rather, a tilt effect is achieved digitally in imager 116 by cropping an image from the captured image. Commands are issued from sensor fusion algorithm 404 for imager 116 to perform cropping operations. Desired aspect ratio of image is maintained and the image is cropped around tracked object 103.
Moreover, in some embodiments, base device memory 310 is preloaded with cinematic rules used to configure the base device processor 302 to dictate how the servo control system 117 should move the imager 116 relative to the tracked object. The base device servo control system 117 uses determined position data in combination with the cinematic rules in such a way that the tracked object is positioned correctly within the imager field of view. In some embodiments, the servo control system utilizes its own loop tracking algorithms known in the art, such as PID (Proportional Integrative Derivative) control loops to analyze the changes in position information and react to it.
Some example cinematic rules are as follows.
1. Let the tracked object move without moving orientation of the imager until the tracked object moves more than a prescribed distance away from the center of the field of view by the imager 116.
2. Use the accelerometer data to control the speed of the imager movement. For example, if the motion data indicates movement of the target object, but the IR signal is lost, then the servo 117 re-orients the position of the imager 116 in reaction to the motion data. On the other hand, if motion data indicates an acceleration of the tracked object, but the IR signal indicates that the object has not moved, then the servo/base system 117 past a threshold that results in unappealing video quality.
4. Avoid repetitive, opposing motions of a similar nature by storing determined position information indicative of past movements, comparing and limiting with some threshold.
In addition, in some embodiments, imager focus control (i.e. setting the focal point of an the imager lens (not shown) to match the distance to the tracked object) is adjusted based upon the position of the target determined according to the fusion process 404 to improve resulting image quality captured by the imager 116. This can also be done using known algorithms for the imager for focus in combination with the determined position information. Also, in some embodiments the determined position information is used to determine where auto-focus will be applied within the image frame.
In general, the first data comprises sensor data that is more reliable, and therefore, better suited for use in the prediction phase. The second data comprises sensor data that is less reliable, and therefore, better suited for use in the adjustment phase. More specifically, in some embodiments, the first data comprises motion sensor position data such as the accelerometer and other physical sensor (e.g., gyroscope) position data. The second data includes observed azimuth and elevation displacement information from the remote device IR (dX1,dY1), base device reflective IR (dX2,dY2), and imager (PIS) object recognition (dX3,dY3) to refine the new predicted position into a more accurate adjusted position estimate, which is the determined position (Xf, Yf). Consider, for example, that during ordinary motion such as walking or sitting, the accelerometer sensor 112 provides information that is quite accurate as to changes in position. However, accelerometer based determinations are subject to drift over time. On the other hand, while IR signals (transmitted or reflected) can provide accurate position information, but these signals can be unreliable since they are subject to being temporarily blocked or out of view. Likewise, while image recognition can provide refined position information about recognized features of a tracked object, those features sometimes cannot be reliably discerned by the imager 116.
It will be understood that data from the same ‘time’ is used during both the predict phase and the adjust phase. More specifically, at each timestamp both phases are performed, first predict and then adjust. However, this is not necessary. If for some reason observed displacement information is not available, adjust may be skipped. Also if at a timestamp, one of the observed displacements is not available, the adjust phase can be performed with only the available data. Also, if accelerometer and other physical sensor data is not available at a timestamp, then adjust can be performed without the predict phase.
Moreover in alternative embodiments, alternative predict and adjust phases may be employed. For example, as one alternative, only remote device IR data are employed during the predict phase, and the other remote device data (motion and audio) are employed during the adjust phase. In yet another alternative, for example, only position information provided by the imager (e.g. position computed based upon captured video image data) is employed during the predict phase, and remote device IR data, acceleration data and audio data are used during the adjust phase.
Referring to
For example, for the acceleration sensor 112, the event identification criteria may include a library of acceleration profiles or prescribed thresholds that correspond to events involving motion such as throwing a ball or jumping or a deliberate control ‘gesture’. A gesture comprises a physical action such as moving one's hand back and forth while holding the remote device 102 or shaking the remote device or moving the device in a circular motion that indicates some event according to the acceleration profile. Continuing with the accelerometer example, the decision module 904 would compare a profile of received acceleration data with stored criteria profiles to determine whether an acceleration (or motion) event has occurred. Alternatively, for example, for the audio sensors 110, 124, the event identification criteria may include a library of sound profiles or prescribed thresholds that correspond to events involving sound such as the sound of laughter or the sound of a ball impact with a baseball bat. The decision module 904 would compare a profile of the audio data with stored criteria profiles to determine whether an audio event has occurred.
If decision module 904 determines that the selected sensor data does not correspond to a prescribed event according to the event identification criteria, then control flow returns to module 902. If decision module 904 determines that the selected portion of the sensor data does correspond to a prescribed event according to the event identification criteria, then module 906 creates an event tag to identify the detected event. Module 908 stores the event tag in the storage device in association with a time stamp of the time at which the selected portion of the acceleration data was received. More particularly, individual streams of sensor data, which may take the form of sequences of sensor sample data, are received by the base device 104 from the remote device 102 for each of multiple sensors, and sensor data from each of those streams is stored with time stamp information to indicate when each of respective data were received by the base device 104. As explained below, tags in conjunction with the time stamps are used to align events detected using sensors with recorded video information and with other streams of data.
With regard to the acceleration data, it will be appreciated that acceleration data is used both for tracking and for event detection. Ordinary motion such as walking or running can be tracked based upon acceleration data. Conversely, prescribed motions that correspond to an event such as a gesture or the swing of a baseball bat or a golf club can be identified based upon prescribed acceleration profiles or prescribed thresholds. Moreover, in some embodiments, a one or more motion sensor can be located physically separated from the remote device control electronics. For example, a first smaller sized accelerometer (not shown) could be mounted on a person's hand to more accurately follow hand movements during a golf swing. The first accelerometer could be electrically coupled to the remote device with a wire or through wireless communications, for example. In addition, a second accelerometer (not shown) could be located on a person's wrist in order to track larger body movements. The accelerator data for the two different accelerometers could communicate with the base device 104 during different time slots so as to distinguish their data.
In some embodiments, UI control signals can be transmitted from a second device (not shown) different from the device mounted on the tracked target. Thus tagging may result from operation of such a second remote device that transmits a UI signal to the base device 104. In that case, the flow described with reference to
The distance measurement computed according to the process 1200 of
Two alternate methods for determining distance between the remote device 102 and the base device 104 involve RF signal strength measurement and IR signal strength measurement, respectively. In the RF signal strength alternative, baseline RF strength is measured during initial remote to base synchronization and connection. As RF strength increases or decreases, an algorithm is applied to the signal that calculates estimated distance changes. Distance changes are stored in memory 310 for tracking, tagging and editing. In the IR signal strength alternative, baseline IR strength is similarly measured during initial optical acquisition. As IR strength increases or decreases, an algorithm is applied to the signal that calculates estimated distance changes. Distance changes are stored in memory 310 for tracking, tagging and editing.
Providing multiple steams of sensor data and position data augmented by time stamps and event tags provides a rich collection of information for use in selecting and editing the video data stream. For example, if a user wishes to search for a portion of the video data stream that corresponds to the sound of a bat hitting a ball, then the user could look at video clips around event tags that indicate the occurrence of that sound. Alternatively, if the user wants to look at portions of video that correspond to the swinging of a bat whether or not the bat connects with the ball, then the user could look at video clips around event tags that indicate the occurrence of a motion like the swinging of a bat. Other kinds of data also could be included in the feed. For example, the remote device 102 could be equipped with a GPS unit, and could report GPS coordinates to the base device 104 over the RF channel. In that case, one of the sensor streams could provide GPS coordinates that are time aligned with the video stream. Tags could be generated based upon the occurrence of select GPS coordinates and the video stream could be searched based upon the GPS tags.
In alternate embodiment in which a gyroscope is used, or in the case that each IR LED on the remote is running at a different time basis, the orientation of the remote should be known with respect to the base. In this case, feedback is sent over the RF communications to turn off the remote device IR LEDs facing the wrong way to save power.
In an alternative embodiment in which multiple remotes are employed, each remote device (‘remote’) is assigned a unique identifier code, and the base device 104 distinguishes between the multiple remotes on the basis of those unique identifier codes. Once each remote is identified by the base, independent time basis (each remote having a specific time slice) for communications are established so they do not conflict. The different remotes can be distinguished by the quad cell imager by selecting time basis for reading the IR signals. Alternatively, via RF communication, a remote not being tracked can be shut off until a command to be tracked is observed. This is advantageous for saving battery. Each remote can send independent audio and accelerometer data using RF communications link that can be used for the sensor fusion algorithm as specified above. The remainder of video and data capture proceeds similar to the single remote case.
There are a variety of approaches to informing the base device which remote device to track. One approach is to use the UI on the remotes to signal the base device 104. For example a UI switch on a remote is turned on to indicate the remote to track. A gesture measured by the remote (“throwing” control back and forth in a simulated, or real fashion in the example of throwing a ball), measured, for example, as a peak acceleration value that exceeds a stored threshold by the accelerometer to demonstrate which remote to follow.
Voice activation can be used to determine which remote should be tracked by the imager 116. The remote microphone records the user's voice and sends it over RF communications to the base. An envelope detector (amplitude peak detector) is used to determine the presence of sound. When a sound is detected by the remote, the base device 104 selects that remote to track. When speaker stops, the corresponding remote is tracked until the second user/speaker uses his voice. At this point, base device 104 switches to the new remote to track. In this way, the imager shuttles back and forth between speakers in conversations.
An alternate method to select the remote to track is to use a microphone driven data packet that turns on corresponding remote's IR LEDs for a specified period of time, at the end of which the signal stops and the system holds. Tracking resumes when new IR signal is received. An additional alternative method is to compare time of flight difference between the different remotes' audio streams. The remote which has the least delay in the audio stream is tracked by the base.
More complex algorithms which take into account 3D position data of multiple remotes. Examples are averaging algorithm (find average position of all available remotes and point imager at the average position) or time division algorithm (point imager at each available remote for a certain period of time).
Another approach to the use of two remotes in the system is that of having different roles (for example, distinguishing “target” versus “director”). A target remote is defined as the remote to be tracked by the imaging system as defined before. A director remote is identified as such manually by the users as described above through a remote interface provided, or the second user can simply be outside of the usable range of the quad cell IR sensor. The director remote is not used for object tracking. A remote designated as a dedicated director remote by selecting unique RF identifiers or optical frequencies. The base device receives commands from the director remote through RF communications and uses that for imaging control and other data input needs for follow-up editing.
The foregoing description and drawings of embodiments in accordance with the present invention are merely illustrative of the principles of the invention. Therefore, it will be understood that various modifications can be made to the embodiments by those skilled in the art without departing from the spirit and scope of the invention, which is defined in the appended claims.
This application claims priority to commonly owned co-pending provisional patent application Ser. No. 61/337,843 filed Feb. 10, 2010, and to commonly owned co-pending provisional patent application Ser. No. 61/343,421 filed Apr. 29, 2010, and to commonly owned co-pending provisional patent application Ser. No. 61/402,521 filed Aug. 31, 2010, which are expressly incorporated herein by this reference in their entirety.
Number | Date | Country | |
---|---|---|---|
61337843 | Feb 2010 | US | |
61343421 | Apr 2010 | US | |
61402521 | Aug 2010 | US |