The present invention relates to the field of interactive image display, and more specifically to apparatus and methods relating to the real-time tagging, positioning, and tracking of objects for interactive image display applications such as interactive television.
Object identification and hyperlink tagging in video media allows a viewer to learn more about displayed objects by selecting an object and being linked to a website with additional information about the object. This provides sponsors of a television program or a movie production with a means to effectively embed advertising in a program or to display advertisements that will allow interested viewers to learn more about products or services displayed therein.
Currently, no object tagging or tracking procedures are considered at the time of filming. The object identification and tagging in the video medium is done at the post-editing stage. This task is typically done by a human manually entering the object information in a database. A more automated approach has been to use image recognition technology to track the object of interest in the captured video stream. This, however, is more error-prone even with current state-of-the-art image processing algorithms.
The present invention is directed to apparatus and methods that track the location of an object within a video image at the time of capture of the video image. The location of the object within each frame can be recorded as meta-data for the video image so that when the video image is played back, a viewer can select the object using suitable interaction means and be linked through to a source of additional information about the object, such as a product website or the like. Preferably, the present invention allows multiple objects in an image to be individually tracked and identified.
In accordance with an exemplary embodiment of the present invention, a device emitting radio frequency (RF) signals is attached to an object that is to be identified and tracked within a video image. Using an RF receiver with multiple antennas and applying trilateration techniques, the object's location within the video image is determined in real time and recorded as the video image is recorded. Where multiple objects are to be tracked, each object is provided with a radio device having a unique ID and the location of each device within the video image is recorded.
Using a projection algorithm, positions of the objects in the 3-D field can be mapped to a set of pixels on the 2-D screen on which the image is displayed. The coordinate information, the frame number of the filmed video, the ID of the radio device, and other relevant or useful information can be stored in a database, as meta-data, or in any appropriate form, at the time of recording.
In a further exemplary embodiment, a camera capturing an image containing the tagged object is also provided with RF emitting devices which allow for the determination of the camera position and orientation using trilateration techniques. Using additional camera information such as focal length and field of vision, the 2-D virtual screen representing the captured image can be derived.
The aforementioned and other features and aspects of the present invention are described in greater detail below.
As contemplated in the exemplary system 100, each object 150 is provided with a radio device or tag 155 that allows the positioning block 110 to locate the object and track its position in real time using trilateration techniques, described below in greater detail. Any of a variety of suitable radio technologies, including, for example, RFID, Bluetooth, or UWB, can be exploited for this purpose. The tag 155 may be an active device which emits a signal under its own power, or it may be a passive device which emits a signal derived from a signal with which it is illuminated. Where multiple objects 150 are to be tagged, each tag 155 preferably emits a unique ID to allow individual tracking of the multiple objects.
As the camera 140 captures images of a scene including the tagged object 150, the object's location in three dimensions is determined by the positioning block 110. For determining the location of the object 150 with trilateration, the positioning block 110 uses multiple antennas for receiving signals from the tag 155. (An additional, emitting antenna may be included for implementations using passive tags.) In addition, the location, shooting angle, focal length, and/or field-of-view of the camera 140 is provided to the positioning block 110. The camera information can be provided to the positioning block 110 over a dedicated interface (wireless or hard-wired) or, like the object 150, the camera 140 may have one or more tags attached thereto, with the tags providing the camera information. An exemplary trilateration arrangement in which the camera is provided with multiple tags is described below. In a further exemplary embodiment, the relevant camera information can be determined by the camera itself or by data collection apparatus associated with the camera and sent therefrom to the positioning block.
The camera information and object location information are provided in real time to the computing block 120. Using a projection algorithm described in greater detail below, the computing block maps the three-dimensional object location information onto a two-dimensional field representing the viewing screen of the captured video image. The location of the tagged object 150 within a scene can be represented in terms of pixel locations in the captured image.
The 2D location information of the tagged object 150 within each frame of a captured video stream is provided and recorded in the media storage 130. For multiple tagged objects, the location information for each object is associated with the object's ID. Each tagged object is associated with a hyperlink so that when the viewer of the video stream points to and selects the object (with a suitable interaction device such as, for example, a mouse or a television remote control), the user can navigate to a website with additional information about the object.
Exemplary techniques for carrying out the steps illustrated in
An exemplary arrangement for determining the coordinates in three-dimensional space of an object will now be described with reference to
R0 is treated as the origin of the Cartesian coordinate system and the line
For an arbitrary transmission point P=(x,y,z), r0, r1, r2, and r3 are the distances between point P and points R0, R1, R2, and R3, respectively, and are determined using the aforementioned TDOA technique. The RF signal receiving points and the transmission points can be arranged so as to have non-negative coordinates by proper placement of R0, R1, R2, and R3.
The coordinates of the reference points can be represented by d1, d2, d3, d4, d5 and d6, the distances between the reference points. These distances are fixed and known. The angles among the line segments connecting reference points can be obtained from basic trigonometric relationships, as follows:
Then, the coordinates R1(0,y1,z1) and R2(0,0,Z2) are given by:
The coordinates of R3(x3,y3,z3) can be obtained by solving the following equations:
d
3
2
=x
3
2
+y
3
2
+z
3
2
d
5
2
=x
3
2
+y
3
2+(z3−z2)2
d
6
2
=x
3
2+(y3−y1)2+(z3−z1l)2. (3)
These equations yield the following solutions:
Once the coordinates of the reference points R1, R2 and R3 are determined, the coordinates of point P=(x,y,z) can be obtained by solving the following system of equations:
r
0
2
=x
2
+y
2
+z
2
r
1
2
=x
2+(y−y1)2+(z−z1)
r
2
2
=x
2
+y
2+(z−z2)2
r
3
2=(x−x3)2+(y−y3)2+(z−z3)2 (5)
This system of equation yields the following solution:
The sign of x should be positive due to the assumptions made above.
As such, using the exemplary trilateration technique described, the 3D coordinates of the tagged object (at point P), can be determined from the distances between the receiving antennas (d1, d2, d3, d4, d5 and d6) and the distances between the receiving antennas and the tagged object (r0, r1, r2, and r3 ).
Ultimately, the object appears on a two-dimensional screen, thus, the object coordinates in three-dimensional space should be mapped on a virtual planar surface which represents the screen to be viewed. An exemplary procedure for performing such a mapping will now be described with reference to
Three points are shown on the camera 310, Ca, Cb, and Cc, at which emitters, such as the tag used for the object 320 are located, in accordance with an exemplary embodiment of the invention. The coordinates of each of these points, Ca=(xa,ya, za), Cb=(xb,yb,zb), Cc=(xc,yc,zc), can be determined from the distances between these points and the reference points R0, R1, R2, and R3, using a similar procedure and arrangement as described above for the coordinates of the object 320, P=(xp,yp,zp). With reference to
Ideally, the point Ca is at the center of the lens of the camera but because of the physical limitations of placing an emitting device there, it is preferably as close as possible, such as centered directly above the lens. In this embodiment, the points Cb, and Cc are equidistant from the center of the camera lens, in which case, the line Lc includes the midpoint between the points Cb, and Cc, namely, Cm=(xm,ym,zm), where xm=(xb+xc)/2, ym=(yb+yc)/2, zm=(zb+zc)/2. The line, Lc, through Ca and the midpoint Cm=(xm,ym,zm) of Cb and Cc, can be expressed as follows:
Let l, m, n be the directional cosine of the line Lc, then they become:
The image of the object point P on the screen 350 is designated as point Pi=(xi,yi,zi). A line Lp from the point Ca to the object image point Pi=(xi,yi,zi) is:
Because the line Lc is perpendicular to the plane 350 and the point Co=(xo,yo,zo) is in the plane 350, the equation of the plane 350 becomes
l(x−xo)+m(y−yo)+n(z−zo)=0. (10)
The center point of the screen plane 350 can be used as the origin of a two-dimensional coordinate system for the screen plane 350. Since the center point Co=(xo,yo,zo) is on the line Lc, it satisfies the following:
Another equation is needed to close the system and to determine the coordinates of the point Co. The focal length f of the camera is the distance from the lens of the camera Ca to the focal point of the camera, which corresponds to the center point Co. As such:
f=√{square root over ((xa−xo)2+(ya−yo)2+(za−zo)2)}{square root over ((xa−xo)2+(ya−yo)2+(za−zo)2)}{square root over ((xa−xo)2+(ya−yo)2+(za−zo)2)}. (12)
Let ko be a constant which satisfies:
in which case the focal length f and ko have the following relationship:
The coordinates of point Co are:
x
o
=x
a
+k
o(xm−xa)
y
o
=y
a
+k
o(ym−ya)
z
o
=z
a
+k
o(zm−za) (15)
The coordinates of the object image point Pi can be obtained from the following system of equations:
Eq. 17 follows from the fact the point Pi is on screen 350. The second part of the above equations is valid since the point Pi is on a line connecting point Ca and the object point P=(xp,yp,zp). kp is a constant which satisfies the line equation. The coordinate of the point Pi becomes:
Now, we have all the coordinate information for the center point Co and the object image point Pi. A line through these two points is:
The line equations for Lx and Ly will give the values of the angles θ and φ shown in
The directional cosine of line Lx should be proportional to the directional cosine of a line passing through points Cb and Cc since they are parallel. More precisely the directional cosine, (lbc,mbc,nbc), of a line through points Cb and Cc becomes
We then have l1=klbc,m1=kmbc, and n1=knbc for a certain constant k. The equation of line Lx can be rewritten as:
To obtain the directional cosine of Ly we have two equations:
l
2
l
bc
+m
2
m
bc
+n
2
n
bc=0, (26)
since Lx⊥Ly, and
l
2
l+m
2
m+n
2
n=0, (27)
since Ly is on the plane 350. This system of equations yields the following solution for the directional cosine of Ly:
for a constant h. The equation of line Ly becomes
The directional cosine of Ly can be rewritten as:
Let line LIO be defined by the two points Co and Pi. Then, the angle φ between Lx and LIO becomes
φ=arc cos(l1lio+m1mio+n1nio) (31)
The angle θ, between Ly and LIO is
θ=arc cos(l2lio+m2mio+n2nio) (32)
Since f, h, and v are readily available, the angles δh and δv can be derived as:
The ratios θ/δv and φ/δh are sufficient to determine, respectively, the relative vertical and horizontal positions of the object image point Pi on the screen 350. This is shown in
Once the coordinates of the object within the camera image have been determined, as described above, this information along with any other relevant information that may be desired, is recorded, as discussed above with reference to
The present invention can be used in a variety of applications. Consider an illustrative application of the present invention in which a movie studio is filming a scene in Central Park in which the main actor and actress are sitting on a bench. A sponsor of the movie is a well-known fashion company that wants to advertise a new handbag held by the actress on her lap. The fashion company wants to provide a direct link to their online shop if a viewer moves the pointer, available with an interactive TV set, to the proximity of the handbag. At the time of filming, a Bluetooth radio device, or the like, is placed inside the handbag. Four radio antennas placed around the bench receive the radio signals from the Bluetooth device and send it to a laptop computer. Simultaneously, the video camera sends frame numbers to the laptop computer where the concurrently generated object position and frame numbers are associated and stored in a database. The present invention allows the producer to build a database of all the necessary information regarding the location of the object (i.e., handbag) in the video screen, its identity, and the frame number. Advantageously, this can be done without human intervention or error-prone image recognition technologies. The trilateration positioning device, video camera, and computer can communicate over wired or wireless connections.
The present invention provides accurate means of object tracking and tagging in real time for interactive TV applications, streaming video, or the like. This eliminates time consuming and/or error-prone post processing steps involved in locating objects in the video. It is a useful tool for a variety of applications such as advertising and marketing in interactive video. Additionally, the present invention can help advertisers track the amount of time that their products are seen on the screen, and provide other useful information.
Note that while the apparatus and methods of the present invention are most advantageously used in conjunction with video or moving images, the present invention can just as readily be applied to still imaging as well, where individual images are captured.
It is understood that the above-described embodiments are illustrative of only a few of the possible specific embodiments which can represent applications of the invention. Numerous and varied other arrangements can be made by those skilled in the art without departing from the spirit and scope of the invention.