The present invention relates generally to image-based movement tracking in complex scenes More particularly the invention relates to an arrangement according to the preamble of claim 1 and corresponding method according to the preamble of claim 13. The invention also relates to a computer program according to claim 24 and a computer readable medium according to claim 26.
Modern image processing technology has enabled sophisticated real-time extraction of data from complex scenes. For instance, different types of vehicles may be automatically discriminated and tracked based on images recorded by video cameras, The international patent application WO96/13023 describes a device for identification of vehicles at a control station, such as a road toll facility. Here, at least one video camera registers a vehicle profile in the light of a particular illumination device.
Also individuals may be automatically tracked by means of cameras associated with adequate image processing equipment. The document U.S. Pat. No. 6,359,647 discloses an automated camera handoff system, which is capable of tracking a target object between the fields of view of different cameras within a multi-camera surveillance system Consequently, a second camera takes over the responsibility for generating images when the target object leaves the field of view of a first camera and is estimated to enter the field of view of the second camera.
In other applications, it may instead be relevant to register images of one and the same scene by means of more than one camera, Sports events constitute one such example, because here different view angles may be interesting depending upon the events of the game. Therefore, switching from a first to a second camera may be desired even though also the first camera registers a particular event. The international patent application WO03/056809 describes a system for real-time monitoring of athlete movements, such as soccer players running in a soccer field. This system automatically determines which camera of a plurality of cameras that is best positioned to carry out the filming of a certain event on the field. Thus, the system assists a producer responsible for a television recording/transmission of a soccer match, or similar game.
Nevertheless, in many sports, it may also be interesting to generate various types of statistics and systematic quantitative data compilations in respect of the activities undertaken by the individual sports participants. For example in games like soccer, football, basketball, volleyball, hockey and tennis, it may be desirable to determine the amount of court covered, the total distance run, the peak running speed, the average running speed, the time in possession of the ball, a spatial distribution of specific players relative to the playing field and/or relative to other players. The document U.S. Pat. No. 5,363,297 discloses an automated camera-based tracking system for producing such data records. Here, two cameras are preferably used, which are positioned roughly orthogonal to one another (for example a first camera, filming an overhead view of the field, and a second camera filming a side-view of the field). Namely thereby, the risk of shadowing and situations with occlusions/overlapping silhouettes is minimized.
However, in many cases, it is simply not possible or practical to use an overhead camera, such as in outdoor sports like soccer. Furthermore, an outdoor playing field may be relatively large and/or the light conditions may here be problematical. As a result, two orthogonally arranged side-view cameras would normally not be capable of providing the image resolution, or quality, necessary to track the individual players with a satisfying degree of accuracy. The resolution requirements can be relaxed substantially if each player and the ball are assigned specific identities via a manual operator interaction. Still, due to the large distances and possibly complicated light conditions, a system of this type is likely to lose track of one or more the players and/or the ball relatively quickly in an outdoor application. Of course, various kinds of independent telemetric systems involving transmitters/receivers placed on the players may be employed to aid the cameras. Nevertheless, such solutions in turn, are associated with other problems including inconveniences for the players, and are therefore not particularly desirable.
It is therefore an object of the present invention to alleviate the problems above, and thus provide a reliable and efficient image-based solution for tracking the movements made by each of a number of objects in a given area.
According to one aspect of the invention this object is achieved by the arrangement as described initially, wherein at least one of the image registration means includes a stereo-pair of cameras in which a first camera is separated a base distance from a second camera. The first and second cameras are essentially parallel and directed towards the area, such that a first image plane of the first camera registers a portion of the area substantially overlapping a portion of the area registered by a second image plane of the second camera.
An important advantage attained by this design strategy is that a comparatively high resolution can be attained by means of a reasonable amount of resources, Moreover, if multiple stereo-pairs of cameras are employed, a reliable tracking may be accomplished even if the objects obscure one another during shorter time periods.
According to a preferred embodiment of this aspect of the invention, the data processing unit includes a stereo module, which is adapted to produce a stereo image based on data from a first image recorded by the first camera and data from a second image recorded by the second camera. It is presumed that both the first and second images are recorded at a particular point in time. The stereo module thereby combines information from the two images into a representation suitable for higher-level conclusions, such as the positions of a particular objects. Preferably, the stereo image represents estimates of time varying elevations over a stationary surface of the area.
According to another preferred embodiment of this aspect of the invention, the data processing unit includes a scene initialization module, which is adapted to generate an initial background model of the area based on data from the image registration means recorded in the absence of objects in the area. Hence, by means of the background model, moving objects, e.g., individuals and balls, may be discriminated from essentially stationary objects, e.g. the play field and various stands and platforms for spectators.
According to yet another preferred embodiment of this aspect of the invention, the stereo module is adapted to produce the stereo image by means of a procedure, which involves transforming one of the first and second images to match a representation of the other of the first and second images. Specifically, this transformation results in that, in the stereo image, each image point that is based on an image point which in the first image is estimated to represent a particular segment of the surface is projected onto the same image point as an image point in the second image which is estimated to represent the particular segment. Moreover, the transformation results in that, in the stereo image, image points in the first and second images which are estimated to represent objects above the surface are at least laterally translated with respect to one another. The degree of translation here depends on the objects' altitude relative to the surface. This representation is advantageous because it further facilitates the discrimination of the moving objects, e.g. the players and the ball.
According to still another preferred embodiment of this aspect of the invention, the data processing unit includes a first information extraction module adapted to determine an estimate of which image points that represent the surface. The initial background model serves as input data for this determination.
According to another preferred embodiment of this aspect of the invention, the data processing unit includes a density module, which is adapted to produce a density map based on the stereo image. The density map represents respective probability functions over candidate positions for the objects in the area. Hence, by studying the probability functions a position for each object may be estimated. Preferably, a second information extraction module in the data processing unit performs this is operation. The second module is adapted to discriminate positions of the objects based on the probability functions.
According to yet another preferred embodiment of this aspect of the invention, the first information extraction module is adapted to repeatedly (e.g. after each second video frame) determine an updated background model based on a previous background model and the discriminated positions of the objects. The first information extraction module is also adapted to repeatedly (e.g. after each second video frame) determine an updated estimate of which image points that represent the surface based on the updated background model. Such an updating is desirable because thereby a high tracking reliability is maintained. Normally, the light conditions and other environmental parameters which are essentially uncorrelated to the events occurring in the area change during over time. Therefore the background model must be updated in order to enable a continued correct discrimination of the moving objects.
According to other preferred embodiments of this aspect of the invention, the area is a sports field and the objects include players participating in a sports event, e.g. a ball game, which is conducted in the sports field. The objects may therefore also include at least one ball. Consequently, the players and the ball may be tracked during a game, and as a result, various types of statistics and systematic quantitative data compilations can be generated. For instance, the following parameters may be determined in respect of each player: an amount of court covered, total distance run, peak running time, average running time, time in possession of the ball, spatial distribution relative to the playing field and/or relative to other players.
According to still another preferred embodiment of this aspect of the invention, the data processing unit is adapted to generate, in real time, at least one data signal, which describes at least one type of statistics and/or systematic quantitative information pertaining to the number of objects. The at least one data signal is based on positions for the objects that have been determined during a time interval foregoing a present point in time. Thus, for instance, current statistics over the accomplishments of individual players in a ball game may be presented to a TV audience in live broadcasting, i.e. during an ongoing game.
According to another aspect of the invention, this object is achieved by the method described initially, wherein at least a part of the data is registered by means of a stereo-pair of images of the area. The image planes of these images are essentially parallel, such that a first image plane registers a portion of the area, which substantially overlaps a portion of the area registered by a second image plane. Moreover, the first and second image planes are separated a base distance from one another.
The advantages of this method, as well as the preferred embodiments thereof, are apparent from the discussion hereinabove with reference to the proposed arrangement.
According to another aspect of the invention this object is achieved by a computer program directly loadable into the internal memory of a digital computer, comprising software for controlling the method described above when said program is run on a computer.
According to yet another aspect of the invention this object is achieved by a computer readable medium, having a program recorded thereon, where the program is to make a computer perform the method described above.
Further advantages, advantageous features and applications of the present invention will be apparent from the following description.
The present invention is now to be explained more closely by means of preferred embodiments, which are disclosed as examples, and with reference to the attached drawings.
a-b illustrate, by means of an example, first and second images recorded by a stereo-pair of cameras according to one embodiment of the invention,
The first and second cameras 101a and 101b are essentially parallel and directed in an angle towards the area 100, such that they register substantially overlapping portions of the area 100. As can be seen in the
The image registration means 101, 102, 103 and 104 repeatedly and synchronously record image data D1, D2, D3 and D4 respectively of events that occur within the area 100. Thus, the data D1, D2, D3 and D4 constitute multiple simultaneous representations of these events. The data D1, D2, D3 and D4 is sent to a data processing unit 110, which is adapted to repeatedly determine a respective position for each of the objects pi, pj, and B. For example, a first set of objects pi may encompass the players of a home team, a second set of objects pj may encompass the players of a visiting team and a third object B may be a ball. In any case, according to the invention, each object pi, pj, and B can be automatically tracked based on the data D1, D2, D3 and D4.
In order to relax the image resolution requirements on the output data D1, D2, D3 and D4 (i.e. essentially the amount of data) from the image registration means 101, 102, 103 and 104, a unique identity is preferably manually assigned to each object pi, pj and B, for instance before starting the match. This means that it is not necessary that system is capable of discriminating the numbers (or other characterizing features) of the players' clothing (or appearance). Instead, it is sufficient if the system can maintain a consistent tracking of each object. Theoretically, an image resolution in the order of one pixel per object is enough to accomplish such a tracking. However in practice, a higher resolution is often desired for robustness reasons. Of course, the assignment of identities may also need to be updated occasionally during the course of the game, e.g. after so-called pile ups, in connection with breaks and when one or more players are substituted by one or more other players.
a shows an example of a first image I1 which may be registered by a first image plane of a stereo-pair of images, such as 310 in the
b shows an example of a second image I2, which for illustrative purposes here represents a slightly transformed image of an image registered by a second image plane of a stereo-pair of images, such as 320 in the
As can be seen in the second image I2, which is recorded at the same point in time as the first image I1, the fifth individual p5 is much less obscured by the fourth individual p4 than in the first image I1. This vouches for good tracking possibilities. Provided that the stereo-pairs of cameras are arranged properly in relation to one another and in relation to the area 100, this kind of separation normally arises, i.e. that one or more occlusions in the first image I1 are resolved in the second image I2, and vice versa.
Preferably, according to one embodiment of the invention, the transformation described with reference to the
The stereo image IS is produced based on data from a first image, e.g. I1, recorded by a first camera and data from a second image, e.g. I2, recorded by a second camera in a stereo-pair. The first and second images I1 and I2 are recorded at a particular point in time.
The stereo image IS represents estimates e1, e2, e3 and e4 of image elements that describe objects which are not part of the stationary surface S and whose position in the area 100 vary over time (i.e. moving objects). Specifically, the production of the stereo image IS may involve transforming one of the first and second images I1 and I2 to match a representation of the other of the first and second images I1 and I2, such that in the stereo image IS, each image point that is based on an image point which in the first image I1 is estimated to represent a particular segment of the surface S is projected onto the same image point as an image point in the second image I2 which is estimated to represent the particular segment. As a further result of this transformation, image points in the first and second images I1 and I2 which are estimated to represent objects above the surface S are at least laterally translated with respect to one another. Here, the degree of translation depends on the objects' altitude relative to the surface S. Consequently, vertically oriented objects having an essentially longitudinal extension (such as upright standing soccer players) will approximately be represented by inverted-cone shapes in the stereo image IS.
In order to render it possible to accurately estimate which image points that represent the surface S a background model is generated according to a preferred embodiment of the invention. An initial background model of the area 100 is based on data D1, D2, D3 and D4 from the image registration means 101, 102, 103 and 104, which is recorded in the absence of objects pi, pj and B in the area 100. Hence, a first estimate of which image points that represent the surface S is based on the initial background model. After that, an updated background model is repeatedly determined based on a previous background model in combination with discriminated positions of the objects pi, pj and B. Based on the updated background model, in turn, an updated estimate of which image points that represent the surface S is determined.
Of course, in the general case, only a fraction of the total number of objects may be visible from a particular stereo-pair of cameras. Therefore, information gained from two or more image registering means may have to be aggregated in the data processing unit 110 (see
A first camera 101a in the image registration means repeatedly records data D1′ pertaining to representations of events occurring within the area, and second camera 101b in the image registration means simultaneously there with also records data D1″ pertaining to these events, however from a slightly different angle (which is given by the base distance to the first camera 101a and the distances to any registered objects). In any case, the first and second cameras 101a and 101b are essentially parallel and directed towards the area, such that a first image plane of the first camera 101a registers a portion of the area which substantially overlaps a portion of the area registered by a second image plane of the second camera 101b .
The data processing unit 110 receives the data D1′ and D1″ from the cameras 101a and 101b respectively. Specifically, according to a preferred embodiment of the invention, a scene initialization module 730 in the data processing unit 110 receives data D1′ and D1″ recorded in the absence of objects in the area. Based on this data D1′ and D1″, the scene initialization module 730 generates an initial background model M′B of the area, which is sent to a storage means 745, either directly or via a first information extraction module 740.
In steady-state operation of the arrangement, the data D1′ and D1″ from the cameras 101a and 101b are also sent to a stereo module 710 in the data processing unit 110. The stereo module 710 produces a stereo image IS based on the data D1′ and D1″. As mentioned previously, the stereo image IS represents estimates of time varying elevations over a stationary surface S of the area.
The stereo module 710 produces the stereo image IS by means of a procedure, which involves transforming one of the first and second images to match a representation of the other of the first and second images, such that in the stereo image IS each image point that is based on an image point which in the first image is estimated to represent a particular segment of the surface S is projected onto the same image point as an image point in the second image which estimated to represent the particular segment. Moreover, due to the transformation, image points in the first and second images which are estimated to represent objects above the surface S are at least laterally translated with respect to one another, where the degree of translation depends on the objects' altitude relative to the surface S.
In steady-state operation of the arrangement, the first information extraction module 740 repeatedly determines an estimate of which image points that represent the surface S based on a previous background model stored in the storage means 745. Hence, a first updated background model M″B is based on the initial background model M′B. Preferably, the first information extraction module 740 repeatedly determines an updated background model M″B based on a previous background model M′B (stored in the storage module 745) and the discriminated positions pi,j(x, y) of the moving objects. The module 740 also repeatedly determines an updated estimate of which image points that represent the surface S based on the updated background model M″B.
A density module 720 in the data processing unit 110 produces a density map A based on the stereo image IS. As described above with reference to the
Preferably, the data processing unit 110 is also adapted to accumulate the discriminated positions pi,j(x, y) of the moving objects over time. Namely, thereby the unit 110 can generate various data signals, which describe different types of statistics and/or systematic quantitative information pertaining to the moving of objects. It is further preferable if the data processing unit 110 has such processing capacity that these data signals can be generated in real time. Each data signal is based on positions for the moving objects that have been determined during a time interval foregoing a present point in time. Thereby, for instance, current (and continuously updated) statistics over the accomplishments of individual players in a ball game may be presented to a TV audience in live broadcasting, i.e. during an ongoing game.
In order to sum up, the general method for tracking the movements of a number of objects in a particular area according to the invention will now be described with reference to
A first step 810 registers stereo image data pertaining to multiple simultaneous representations of events occurring within the area. Preferably, this data is registered from more than one location, for instance two or four as illustrated in the
In order to obtain a reliable tracking of sports events, and thus a high data quality, the stereo image data should be updated relatively often, say in the order of 25-30 times per second.
The process steps described with reference to the
The term “comprises/comprising” when used in this specification is taken to specify the presence of stated features, integers, steps or components. However, the term does not preclude the presence or addition of one or more additional features, integers, steps or components or groups thereof.
The invention is not restricted to the described embodiments in the figures, but may be varied freely within the scope of the claims.
Number | Date | Country | Kind |
---|---|---|---|
04026960.7 | Nov 2004 | EP | regional |
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/EP05/55659 | 10/31/2005 | WO | 5/14/2007 |