This invention relates to a method and device for tracking the position of a user's head and a related device. In particular, embodiments of the invention relate to altering a three dimensional display according to the position of a user.
A number of different methods of displaying three dimensional images to a user are known. In a common implementation, used in public cinemas, the left and right eyes of the user are presented with different information at successive time periods. In such an implementation, the user is presented with a movie where alternate frames are intended for alternate eyes. The disadvantage of such implementations is that some way of distinguishing between the information intended for the right eye from the information intended for the left eye is needed. Often this is done with means of a set of glasses worn by the user which distinguish the different information sets through the use of polarisation or alternate occlusion.
An alternate implementation of 3D display simultaneously transmits different information to the left and right eyes (autostereoscopy). An example of such a system is the use of a lenticular screen overlaid on a display. The display and lenticular screen are arranged so each pixel is either presented to the left or the right eye and this allows the simultaneous projection of different information to the two eyes, resulting in the user experiencing stereoscopic vision.
The advantages of such systems which are capable of projecting stereoscopic information is that the user does not need to carry glasses which are unwieldy and can become uncomfortable, specifically over long periods of time.
A growing field for the use of 3D display technology is in the operating theatre. In particular, where a surgeon is engaged in laparoscopy or other surgical techniques where the surgeon is not directly able to view the interaction between the surgical instruments and the patient being operated on. In such applications, a depth of field perception is important for the surgeon as this may assist in helping the surgeon evaluate distances in the area being operated on.
Furthermore, in surgery, significant disadvantages exist in the use of glasses and, in particular glasses used for 3D displays. Firstly, the surgeon is unable to touch his own glasses due to concerns relating to contact infection (sterility is mandatory). In particular, once the glasses become fogged the surgeon must ask an assistant to clear the glasses as he or she is unable to touch the glasses. Secondly, due to the polarisation employed in many glasses used for 3D display, such glasses cut out a significant portion of the ambient light and therefore the surgeon will require the operating theatre lights to be turned on when viewing anything other than the display (instruments, compress, etc.). Thirdly, as noted, prolonged use of these glasses can become uncomfortable, particularly where the surgeon also requires corrective eye glasses.
For these reasons a 3D display which does not require glasses is to be preferred in the environment of the operating theatre. However, the problem with a glasses-free implementation such as one using a lenticular overlay is that as the user's head moves relative to the display, the 3D effect is disturbed or lost. In order to solve this problem it is known to switch the left- and right-eye information for the lenticular display to compensate for left and right movement of the user's head. This may be based on a tracked movement of the user's head.
However, all such head-tracking technologies have been designed to operate at normal working distances between the user and the display (i.e. a distance of about 700 mm away from the display when the user sits in front of the display at a desk). Furthermore, known implementations assume that the ambient light is at normal working levels, whereas in an operating theatre, the ambient light is significantly lower than in other working environments.
It should also be noted that in the operating theatre environment it is important that the position of the head be tracked reliably. Many prior applications have a relatively large tolerance in discrepancies between the actual and calculated positions of the user's head. However, for a surgeon such lag is unacceptable; any perceived lag could have very serious consequences.
A first aspect of the invention relates to a tracking device for tracking a position of a user's head, the device comprising a camera, a radiation source radiating electro-magnetic radiation, and a processor for calculating parameters indicative of the position of the head relative to the camera, wherein the camera is adapted to capture images using illumination provided by the radiation source, wherein the radiation source comprises a source of infrared radiation and the camera comprises a monocular image input, characterised in that
the tracking device further comprises a display adapter for controlling a three dimensional display, the display adapter being connected to the processor, wherein the display adapter is adapted to control a three dimensional display in dependence on the calculated variables indicative of the position of the head.
The processor may be adapted to designate an area of a captured image as the head on the basis of recognising one or more eyes of the head.
The processor may be adapted to designate an area of a captured image as the head on the basis of recognising one or more tracking markers attached to the head.
The processor may be adapted to recognise a user according to the presence of a recognition marker.
The processor may be adapted to control the display adapter to display three dimensional information when a user is recognised and display two dimensional information when a user is not recognised.
The user may be recognised by the recognition marker.
The tracking marker or the recognition markers may comprise one or more markers adhered to clothing. The markers may be comprised of a material which reflects infra-red light.
The camera may capture successive images and each image may correspond to an illumination of the head by the radiation source.
The radiation source may radiate electromagnetic radiation predominantly as infrared radiation.
The radiation source may comprise two sets of infrared light sources arranged so that a first set is closer to the camera than a second set.
The radiation source may be adapted to alternate the activation of the first set and the second set. Alternatively, both sets may be activated at the same time.
Recognition of a user's head may be based on images captured when the first set is illuminated. Tracking of a user's head may be based on images captured when the second set is activated. Each set may comprise two LEDs. Each of the LEDs of the first set may be closer to the camera than each of the LEDs of the second set.
The processor may be adapted to compare an image captured when the first set is activated, and the second set is not activated, to an image captured when the second set is activated and the first set is not activated. This may be the case when a three-dimensional model of the head is used. Alternately, if the sets are activated simultaneously, the processor may compare two images captured at different times.
The processor may be adapted to process images captured when the first set of infrared light sources is activated for information relating to the recognition and/or tracking markers.
The radiation source may radiate radiation with wavelengths between 750 nm and 950 nm.
The processor may be adapted to generate a model corresponding to the object and evaluate a likelihood that the model represents the object and the processor may be further adapted to perform the evaluation of the likelihood using a threshold conversion of one or more regions of the image.
The processor may be adapted to designate regions of one or more images captured by the camera as regions corresponding to the eyes and the at least one other characteristic of the head, and perform a threshold conversion on said portions of said images.
The threshold conversion may comprise identifying a colour value of a central part of a designated region and converting image information of said part on the basis of said identified colour value.
The threshold conversion may comprise converting to black and white image information.
The model may comprises a three dimensional model of the head.
The three dimensional model of the head may comprise three dimensional locations for two eyes and one or more markers. Preferably, the model comprises three markers arranged in a triangular pattern. The markers may be tracking markers or recognition markers.
The processor may be adapted to produce a plurality of models arranged in a first list, each model being representative of a change in position of the object, and select one or more models from said plurality of models to correspond to a change in position of the object, wherein the processor may be further adapted to select the one or more models on the basis of:
The indexed list may be created by setting the index of a model equal to a sum of weights of the index and all preceding indices in the first list.
The tracking device may further comprise predicting a change in position of the object in dependence on the calculated variables.
The prediction may be based on the selected models.
The camera may capture a single image of the object at a time.
The camera may have a maximum resolution of 2500 by 1800 pixels with a frame rate of 100 frames per second.
The radiation source may comprise two sets of infrared light sources arranged so that a first set is closer to the camera than a second set. The radiation source may be adapted to alternate the activation of the first set and the second set, the processor being adapted to compare an image captured when the first set is activated, and the second set is not activated, to an image captured when the second set is activated and the first set is not activated. This may be the case where a three dimensional model is used. Alternatively, both sets are illuminated simultaneously. This may be the case when a two dimensional model is used.
The model may be a two dimensional model.
The processor may comprise a central processing unit connected to a memory storing a computer program, the central processing unit being adapted to process the computer program to carry out any of the method claims contained herein.
A further aspect of the invention extends to a system for displaying three dimensional information comprising a tracking device as described and a three dimensional display wherein the three dimensional display is connected to the display adapter.
The three dimensional display may be an autostereoscopic display for simultaneously displaying a left-eye image and a right-eye image, wherein the processor may be adapted to swap the left-eye image and the right-eye image in dependence on the location of the user's head relative to the three dimensional display.
The tracking device may be for detecting the position of a user's head in an operating theatre. In this application, the camera may be a video camera having a frame rate of 100 frames per second where alternate frames are used as on-axis and off-axis images, and the radiation source may comprise IR LEDs which do not emit substantial radiation in the visible spectrum.
In an embodiment, the tracking device may be adapted to track the position of the heads of two or more users. In this embodiment, the processor may be adapted to recognise a shape of a marker and wherein the users are distinguished by a shape of the corresponding marker worn by each user.
A further aspect of the invention extends to a method of tracking a position of a user's head comprising:
The method may further comprise designating an area of a captured image as the head on the basis of recognising: one or more eyes of the head.
The head may be recognised on the basis of recognising one or more tracking markers attached to the head.
The method may further comprise recognising a user according to the presence of a recognition marker.
The method may further comprise displaying three dimensional information when a user is recognised and displaying two dimensional information when a user is not recognised. The user may be recognised by the recognition marker.
Further, or alternatively, the display may be switched from displaying three dimensional information to displaying two dimensional information when tracking of the head is lost.
The tracking markers and/or the recognition markers comprise one or more markers adhered to clothing.
The method may further comprise capturing successive images wherein each image corresponds to an illumination of the head by the radiation source.
The radiation source may radiate electromagnetic radiation predominantly as infrared radiation.
The radiation source may comprise two sets of infrared light sources arranged so that a first set is closer to the camera than a second set, the method comprising alternating the activation of the first set and the second set.
The method may further comprise comparing an image captured when the first set is activated, and the second set is not activated, to an image captured when the second set is activated and the first set is not activated.
The method may further comprise processing images captured when the first set of infrared light sources is activated for information relating to the recognition and/or tracking markers.
The radiation source may radiate radiation with wavelengths between 750 nm and 1 mm.
The method may further comprise generating a model corresponding to the object and evaluating a likelihood that the model represents the object, wherein the evaluation of the likelihood may involve using a threshold conversion of one or more regions of the image.
The method may further comprise designating regions of one or more images captured by the camera as regions corresponding to the eyes and the at least one other characteristic of the head, and performing a threshold conversion on said portions of said images.
The threshold conversion may comprise identifying a colour value of a central part of a designated region and converting image information of said part on the basis of said identified colour value.
The threshold conversion may comprise converting to black and white image information.
The model may comprise a three dimensional model of the head.
The three dimensional model of the head may comprise three dimensional locations for two eyes and one or more markers.
The method may further comprise producing a plurality of models arranged in a first list, each model being representative of a change in position of the object, and selecting one or more models from said plurality of models to correspond to a change in position of the object, wherein the processor is adapted to select the one or more models on the basis of:
The indexed list may be created by setting the index of a model equal to a sum of weights of the index and all preceding indices in the first list.
The method may further comprise predicting a change in position of the object in dependence on the calculated variables.
The prediction may be based on the selected models.
The method may comprise capturing a single image of the object at a time.
The radiation source may comprise two sets of infrared light sources arranged so that a first set is closer to the camera than a second set, the radiation source being adapted to alternate the activation of the first set and the second set, the method comprising comparing an image captured when the first set is activated, and the second set is not activated, to an image captured when the second set is activated and the first set is not activated.
A further aspect of the invention comprises determining a region corresponding to a marker by performing a threshold conversion on a pixel representation of that region. The pixel representation may be coded in a greyscale colour scale. In this case, the method may comprise determining a greyscale colour value of a central pixel of the region and designating this as c. The method may further comprise converting all pixels with a colour value less than c−1 to a first colour and all pixels with a colour value more than c−1 to a second colour. The first colour may be white and the second colour may be black. Alternatively, the first colour may be black and the second colour may be white.
A further aspect of the invention extends to evaluating a plurality of models which involves calculating a weighting for each model, generating a list of all of the models designated by their respective weightings, generating an indexed list wherein each index of the indexed list corresponds to a sum of all preceding weights, and wherein the indexed list is sorted by a binary sort.
The model may be a two dimensional model.
The three dimensional display may be an autostereoscopic display for simultaneously displaying a left-eye image and a right-eye image, and wherein controlling the three dimensional display in dependence on the calculated parameters may comprise swapping the left-eye image and the right-eye image in dependence on the location of the user's head relative to the three dimensional display
a and 8b are illustrations of the results of threshold conversion on regions of an image;
The system 10 further comprises a radiation controller 22 connected to the radiation source 16 to control the manner in which the radiation source illuminates the user's head 14. A capture device 24 captures digitised images from the camera. A central processor 28 receives the captured images from the image capture device 24 and processes this information as described below. The 3D display 20 is controlled by a display adapter 26. The 3D display 20 used in this embodiment is a display with a lenticular overlay, as known in the art. This display 20 displays 3D information from a 3D source 38. The 3D source 38 may be any source of 3D information (left and right-eye information). For example in an operating theatre, the 3D source 38 may be a stereoscopic camera used for laparoscopy. The 3D source 38 is connected to the display adapter so that the 3D information from the source may be displayed on the 3D display in a known manner.
The 3D display is a lenticular display and as a user moves their head from left to right or from right to left, the 3D effect is blurred. Therefore, in embodiments of this invention, the processor tracks the position of the user's head and sends this information to the display adapter 26. The display adapter, once informed of the position of the user's head relative to the display 20 is then able to determine whether the user's perception of the 3D effect would be improved by switching the left and right-eye information.
As stated, the 3D display 20 is a lenticular display, but is to be realised that any display employing the application of optical technologies and elements (so called parallax barrier or lenticular lens panes) that ensure that each eye of the viewer sees a slightly different perspective may be used. The human brain then processes these perspectives to a spatial picture.
The central processor 28 in the embodiment illustrated is a computer comprising a CPU 160 connected to a graphics processing unit 164 and a memory 162.
It is to be realised that although various portions of the system 10 have been illustrated and described as separate devices, the actual hardware may not correspond to the blocks of
The arrangement of the radiation source 16 relative to the camera 18 is illustrated in
However, it is desirable for embodiments of this invention that the head detection and tracking system is capable of operating at distances exceeding the standard working distance of about 700 mm. Since one of the primary uses of embodiments of the invention relates to use in an operating theatre, a distance between a surgeon and the display will be between 1 m and 3 m. In an embodiment, compensations for movement of lateral up to 1 m are compensated for, preferably with reference to a horizontal axis of symmetry.
The use of stereoscopic input for head tracking and detection suffers from the disadvantage that such systems provide too much information to perform calculations on, particularly where a three dimensional model of the user's head is utilised (or other factors relying on significant calculations) and it is necessary to process the images at a frame rate of between 20 and 30 frames per second. In practice, using the types of radiation sources considered here, it has been found that it is necessary to process the information for a particular head position in about 20 ms, which is difficult where stereoscopic images are involved. This is particularly the case where a significant resolution is needed.
It has been found that instead of using a stereoscopic image input, a monocular image input is used and, provided that the imaging sensor has sufficient resolution, the required calculations can be performed, as described below. Therefore, in an embodiment, the video camera has a frame rate of between 80 and 120 frames per second. Preferably, the frame rate is about 100 frames per second. In these embodiments, the frame rate may also, or instead, refer to the number of images which the processor 28 is capable of processing (in other words, redundant frames could be discarded). Furthermore, it has been found that the resolution of the image produced by the camera can have a significant impact on the accuracy of the determination of the position of the user's head. This is all the more so in this case where a monocular camera is used. Preferably, the horizontal pixel resolution of the camera is such that a single pixel corresponds to 1 mm in the lateral plane of the user (although it is to be realised that some variation in this amount is inevitable as the user is able to move towards and away from the camera). In this embodiment, the resolution corresponds to between 0.5 and 1.5 cm. In the embodiment illustrated, the camera has a resolution of 2 500 (horizontal) by 1 800 pixels (vertical).
In these embodiments, for use in surgery, a minimum frame rate of 25 frames per second is needed since the update of the 3D display used by the surgeon needs to be in ‘real time’. Furthermore, it is a constraint that the position of the user's head be tracked in the time available between captured images (in other words, one half of the frame rate since the procedure of embodiments of the invention rely on two frames, see below).
The display adapter 26 may be a conventional display adapter such as a graphics card (whether separate or integrated). However, for embodiments of this invention it is important that the display adapter is able to control the three dimensional display 20. To do so, it is important that the display adapter is able to swap the left eye and right eye images, or at least generate the instructions according to which this can be done. Similarly, for further embodiments, it is important that the display adapter is able to general the instructions for the display 20 to switch in between two dimensional and three dimensional modes. It is to be realised then that in an embodiment, the display adapter may be the same as the processor 28 in which case the device would include a graphics card or other means for processing the image information necessary for its display.
As illustrated in
In the embodiment illustrated, the radiation controller 22 is connected to the processor 28 which is also connected to the capture device. In this manner the processor is able to co-ordinate the operation of the camera 18 and the radiation source 16 to ensure that the on- and off-axis images are captures at the correct times.
In general the process of embodiments of the invention is outlined in
As previously mentioned, the processing of the image data according to certain embodiments relies on a three dimensional model of the user's head 14 (
In a further embodiment, a two-dimensional model of the user's head is used. This is illustrated in
Advantageously, embodiments of the invention are able to utilise the fact that a user may be wearing a mask and a cap by incorporating markers in these articles of clothing. In further embodiments, the markers may be incorporated in other clothing or clothing accessories to be worn by a user (such as a hat, glasses). Alternatively, the markers may be incorporated into a support frame worn by the user.
In a further embodiment, the system comprises two 3D displays where each display is intended for a corresponding user. In such a system, the difficulty lies in being able to distinguish the head of the first user from the head of the second user. In such an embodiment, different shaped markers are used to distinguish between different users. In particular, circles may be used as markers for a first user and triangles as markers for a second user. In a further multi-user embodiment, a single display viewable by multiple users may be used. In all of these embodiments, the users' heads are tracked and the output of the display or displays altered in accordance with the tracked position.
For certain embodiments a difference between the on-axis image and the off-axis image is required. In the following step, step 86, a difference image is calculated by subtracting pixel values for the on-image from those of the off-image. This difference image is used later in the process. However, the difference image is only required for certain models of the user's head and therefore is not always necessary. Therefore, this step has been illustrated with a dashed outline in
Once the difference image has been calculated, the process moves to step 88 where the head is detected in the image. At the following step, step 90, the position of the head is calculated and the changes in the position are determined. Therefore, the step 90 has a loop representing the continuous tracking of the user's head. As part of the tracking of the head at step 90, the position of the head is determined (step 92) and this information is used to control the 3D display at step 94.
The step of recognising the head at step 88 (head detection) uses known algorithms for recognising whether a head is present in a particular image. In the embodiment shown, Haar Cascades are used to recognise a face. Other known facial-recognition algorithms may be used instead. The output from the face recognition is used to build the model corresponding to the head model at the co-ordinate position determined by the face recognition algorithm.
Each of the N models is created by performing a minor transformation to the input model. In this embodiment, the transformations correspond to a small change in position (translation or rotation in one of the six degrees of freedom) of the head. In this embodiment, the changes are based on an assumed Gaussian distribution with a mean position estimated at the position assuming a speed of movement of 1 m·s−1. Many changes to this constraint to the randomised model generation are possible. For example, a head is less likely to rotate in the plane parallel to the plane of the body and such rotation could be constrained more than transverse movement.
In the embodiment illustrated, parallel processing using a GPU is used to evaluate each of the models in the manner described as follows. In the following step 106 (for n=1), the processing branches depending on whether a region corresponding to an eye or to a marker is being dealt with. For each of the eyes 54 and 56 (
In the following step, a weighting is applied to the calculations for that region. Since the region here corresponds to an eye, the weighting applied is 0.4 so that the scores for both eyes together has a maximum value of 0.8.
A similar process is then carried out for regions corresponding to the three markers 59, 60 and 62 (
It is to be realised that the weighting applied can vary. In an embodiment, it has been found that the weighting of 0.8 for the eye regions and 0.2 for the marker regions provides particularly favourable results.
In the final step for n=1 an overall score between 0 and 1 is calculated for that model at step 120 by combining all of the calculations for each of the regions of that model.
It is to be realised that the steps detailed above for n=1 are carried out for all models up to n=N. Once this has been done, N scores have been produced and, at step 122 the scores are compared and the best score is used for further processing. It is to be realised however that it is not necessary that the model returned for further processing represent the best of all the models generated. In an alternate embodiment discussed below it is also possible to return one of the better models instead of the best.
At the following step 124 a prediction of the movement of the head is made based on the difference between the best model selected at step 122. In this embodiment, this information is used to generate a vector representing the estimated movement of the user's head and on this basis a new model is generated. The new model is then used as an input model for a further iteration of the process 100 (i.e. used as an input model to step 104).
In this manner a likely position of the head in the captured images is generated. Referring back to
In further embodiments, other adjustments may be made on the basis of the determined information, depending on the type of 3D display used.
As mentioned above, the step 122 of the process of
In an alternative embodiment illustrated in
σ1,σ2,σ3, . . . ,σN
In the following step, step 154 an indexed list is created by adding the weight of a model to the sum of the weight of each preceding model:
In the following step, step 156, a binary search is performed on the indexed list created in step 154. To implement the binary search a random number between 0 and the sum off all weights (ΣnN=1σn) will be generated and the relevant index of the model to be selected is found using binary search for the random number in the indexed list. This is repeated as many times as there are indexed pairs in this embodiment (i.e. N times), although this is not essential to the invention; in a further embodiment, the binary search is conducted for fewer than N random numbers between 0 and the sum of all weights.
Binary search has the advantage of being quick, but the disadvantage that it may not return the best model. However, the search will return a favourable model and it has been found that the gains in speed are significant when compared to using a traditional sorting algorithm which involves comparing each score to all the others. In this embodiment then a favourable model is returned in step 158 instead of returning the best model of step 122 of
In a further refinement to the processing of embodiments of the invention, a threshold conversion is performed for each of the regions corresponding to eyes and markers (see steps 110 and 116 of process 100 of
In this embodiment therefore, the colour value of the central pixel is read (using the 256 greyscale range with which the colour information is stored in this embodiment). If this integer value is c then a value of c−1 is taken and all pixels in the region with a colour value less than c−1 are set equal to white and all pixels in the region with a colour value more than c−1 are set equal to white. In this manner the image information for the region is converted to black and white using a threshold colour value.
Two results of such threshold conversions are illustrated in
In the threshold conversion described above, the threshold used for the conversion was c−1. It is to be realised that other threshold values could be used instead. For example, c−2, c−3 or the subtracting of a suitable integer value from c may be used instead. In a system with excess processing capacity, it may be possible to use more sophisticated algorithms for the threshold conversion too. However, the advantages in this threshold conversion lie primarily in its simplicity; it is not significantly expensive in processing resources to implement, and it yields reliable results.
In an alternative embodiment, a two dimensional model of the head is used. Such an embodiment has the advantage that the calculations involved are less complex, but the distances between the head and the display which such a model can successfully implement are more restrictive. In this embodiment, instead of the three dimensional model illustrated in
Only a single image is required for this, and therefore in this embodiment, steps 82 and 84 of
The image illuminated by both the on-axis and off-axis LEDs in this embodiment is used to determine whether a recognition marker is present. However, as described above, where the on-axis and off-axis LEDs are activated in sequence, the image corresponding to illumination by the on-axis LEDs is used to recognise the recognition marker. The use of the on-axis image for this purpose has a number of advantages. For example, more of the reflections of the on-axis LEDs 32 (
The display operates most effectively when the user's right eye 172 is located in the right eye zone 162 and the left eye 174 is located in the left eye zone 164. The user's perception of the display becomes confused if the eyes are located in the incorrect zones and the three dimensional effect is lost. By tracking the position of the user's head and therefore of the eyes relative to the left and right eye zones of the display, the tracking device of embodiments of the invention is able to determine when the left eye enters a right eye zone (and the right eye enters a left eye zone) and then switch the images projected onto the two zones, thereby restoring the three dimensional effect.
If the determination in step 186 is that the eyes of the user are in the correct zone, the process will return to step 182 to redetermine the position of the head.
If the determination in step 186 is that the eyes of the user have moved into the opposite zones, the left-eye image and the right eye image are swapped, thereby restoring the three dimensional effect, in step 188. The process will then return to step 182.
In this embodiment, the display 20 is able to operate in both two dimensional and three dimensional modes. As mentioned, if the user's eyes are not located in the correct zones, the three dimensional effect is lost, and the user becomes confused by the images being displayed. In applications such as surgery, it is important that the user's perception of the information being displayed is interfered with as little as possible. Therefore, it is preferable to have the display show a two dimensional image rather than a confused three dimensional image.
Therefore, in the embodiment illustrated, if the processor 28 determines at step 88 (
Alternatively, or in addition, the mode may be switched if there is more than one user detected.
It is to be realised that this step of switching display modes is not dependent on the type of model used for the user's head. With reference to
The location of the left and right eye zones of a display are determined by the camera during a calibration step. In this embodiment, the display displays different colours (for example red and green) for all left and right eye zones in a dark room with a wall or other screen located at the user distance. The wall or screen will then reflect the zones back to camera and the processor is able to designate those areas of the captured images as the left and right eye zones.
The terms ‘two dimensional’ and ‘three dimensional’ have been used herein, specifically when referring to displays and information. It is to be realised that these are references to a user's perception and are not necessarily references to characteristics of the information and display, or other corresponding noun.
Number | Date | Country | Kind |
---|---|---|---|
92138 | Jan 2013 | LU | national |