The present invention relates to a system and method for displaying a video image to a user having a visual impairment.
Age-related Macula Degeneration (AMD) is a leading cause of vision loss among older people. In particular, AMD is a common degenerative condition of aging that causes damage to the macula and affects central vision, resulting in low vision. Patients with AMD usually have symptoms including blurred vision or distortion (for example, straight lines appearing wavy and objects appearing to be of an unusual size or shape). Many patients may also develop a scotoma at the fovea. Therefore, patients with AMD tend to face difficulties with simple daily activities such as reading, facial recognition, etc. In addition, to the patients, objects may not appear to be as bright as they used to be.
To assist a patient with a scotoma to read, the location of the scotoma may be defined and the patient may be trained to use a Preferred Retinal Locus (PRL) instead of his/her fovea for fixation on an object of interest. Typically, this involves proper training of the patient and the use of Amsler Grid (which is a diagnostic tool generally used by optometrists to locate and characterize scotomas, and to determine suitable PRLs for patients).
In particular, after determining a suitable PRL for a patient, the patient may be trained to use this PRL for fixation on the object of interest. This involves training the patient to move his/her scotoma away from the object of interest. Over time and with proper training, the patient can adapt and develop his/her PRL (which may be different from the determined PRL) at an eccentric (offset) area away from the fovea. This PRL may be described as a “pseudo-fovea” and such a technique may be known as eccentric viewing. In UK, there is even a community-based training program (Macular Society) where skilled volunteers are trained to teach patients eccentric viewing and steady eye strategies.
The teaching and training of the eccentric viewing technique are often challenging. A substantial period of time is required to train AMD patients with scotomas to use their PRLs to look at objects and optometrists have to be around during the training. Furthermore, at present, rehabilitation is not well established and in some cases, the teaching approach has to be individualized.
As an alternative or further to the above-mentioned training, patients having AMD can utilize an appropriate low vision aid to enhance their vision. In fact, many learners of eccentric viewing tend to further require low vision aids due to the impairment of their fovea areas.
Although there are a number of spectacle-mounted low vision aids available in the market, the magnification factors of their magnifying lens are usually fixed. In such cases, a patient can only change the magnification factor by buying a new hyperocular low vision aid to attach to the spectacles. Similarly, the magnification levels for intraocular lens are usually fixed and cannot be changed without a surgical operation to replace the originally implanted lens with one of higher or lower magnification. Moreover, the use of low vision aids or intraocular lens having higher magnification usually results in a smaller field of view and increased distortion for the patient. The shorter working distance associated with the higher magnification can also compromise the amount of light reaching the reading material, making reading difficult and uncomfortable for the patient.
The present invention aims to provide a new and useful system and method for displaying a video image to a user having a visual impairment.
In general terms, the present invention proposes a system or method having at least one of the following: including a marker in a display region so as to guide a user to look at the marker to see the image in focus and deforming a portion of the image to correct a deformation in a corresponding portion of the visual field of the user due to the visual impairment.
Specifically, a first aspect of the present invention is a system for displaying a video image to a user having a visual impairment, the system comprising:
With the above-mentioned operation (i), the system is able to use the marker to guide the user to use his/her PRL to view the captured image. This can allow the user to more easily view the captured image with his/her PRL as this requires less training and can be done in the absence of an optometrist. With the above-mentioned operation (ii), the system is able to allow the user to see the captured image with less distortion, therefore allowing the user to feel that his/her vision has been enhanced.
A second aspect of the present invention is a method for displaying a video image to a user having a visual impairment, the method comprising:
A third aspect of the present invention is an image processing module for processing images captured with a video camera in real time to generate processed images for display in real time, in a display region, to a user having a visual impairment, wherein the processing is in dependence on data characterizing the visual impairment and wherein the image processing module comprises:
The processor of the first aspect may comprise the image processing module of the third aspect.
An embodiment of the invention will now be illustrated for the sake of example only with reference to the following drawings, in which:
The system 400 may be adapted for wearing by a user, for example, by further comprising elements operative to attach the system 400 to the user. For example, the system 400 may be in the form of a wearable device such as goggles which can be worn by the user daily.
Alternatively, the system 400 may be in the form of a small handheld device such as a smart phone, a tablet or a phablet that a user can hold and place in front of his/her eyes.
As shown in
In this document, a video image is used to refer to an image frame captured by the video camera 402 (either most recently captured or has been processed by one or more image enhancement operations). This video image is 2-dimensional and comprises a plurality of pixels having respective x and y coordinates, and respective intensity values.
In step 502, data characterizing the visual impairment is generated by measuring characteristics of the visual impairment and is stored in the data storage device 406. In step 504, images are captured with the video camera 402. In step 506, voice commands spoken by the user are captured using the microphone 404 and are recognized by the processor 408. In step 508, the captured images are processed in real time by the processor 408 to generate processed images. This processing is dependent on the data characterizing the visual impairment generated in step 502. The processing may be modified based on the recognized voice commands captured in step 506. In step 510, using the image display device 410, the processed images are displayed to the user in real time in a display region for viewing by the user. The display region is configured to display a plurality of pixels in 2-dimension.
The processor 408 of system 400 is configured such that it is able to perform multiple image enhancement operations including eccentric fixation, distortion correction, vision enhancement, magnification and image stabilization. These are shown in
A patient suffering from eccentric fixation tends to suffer from blurred vision at the center of his/her visual field, causing him/her to see the center of an image as a blurred area. The eccentric fixation operation allows the patient to use his/her PRL to view an object of interest, so that the object of interest can appear clear to him/her.
In sub-step 702, a fixation point for the user is selected. The fixation point has coordinates [xf, yf] where xf is less than the total number of pixels that the display region of the image display device 410 can display along the x-axis and yf is less than the total number of pixels the display region can display along the y-axis. In particular, the fixation point is selected by presenting a screen to the user on the display region for the user to select a point on the screen. The point selected by the user is then set as the user's fixation point. The screen used in sub-step 702 may comprise markers to guide the user. In this case, the user may be asked to fix his/her glance at each marker for a certain period of time and then select the marker he/she can see clearly. The selected marker is then set as the user's fixation point.
In sub-step 704, a video image is captured using the video camera 402.
In sub-step 706, the video image is relocated to a location based on the selected fixation point.
In particular, sub-step 706 comprises translating the video image to a location such that the center of the image is at the selected fixation point of the display region.
In sub-step 708, other image enhancement operations (as shown in
In sub-step 710, it is determined whether there is a need to change the fixation point. In particular, a marker is included in the display region. The location of this marker corresponds to the location of the fixation point which is offset from a centre of the display region. The user is then asked to look at the marker. At this marker, the user is able to see the processed image. The user is then asked to indicate whether the processed image he/she sees is in focus. If the user indicates that the processed image is not clear, it is determined that there is a need to change the fixation point and sub-steps 702-710 are repeated with a new fixation point selected in sub-step 702. This new fixation point is selected by shifting the current fixation point by a predetermined number of pixels in the x and/or y direction. Otherwise, if the user indicates that the processed image is clear, it is determined that there is no need to change the fixation point and sub-step 712 is performed in which the [xf, yf] coordinates of the current fixation point are stored in the data storage device 406.
Besides the [xf, yf] coordinates from the data storage device 406, the input to sub-steps 902-906 further comprises a video image which may be one that is most recently captured using the video camera 402 or one that has been processed by one or more operations of the system 400 as shown in
In sub-step 902, the input video image is relocated to a location based on the [xf, yf] coordinates of the fixation point. Specifically, the video image is translated to a location such that the center of the video image is at the [xf, yf] coordinates of the display region.
In sub-step 904, other image enhancement operations (as shown in
In sub-step 906, a marker is included at the [xf,yf] coordinates of the display region.
The processed image is then displayed to the user at the [xf, yf] coordinates of the display region in step 510 of method 500. The [xf, yf] coordinates are offset from a centre of the display region as the user is one who suffers from blurred vision at the centre of his/her visual field. These coordinates are indicative of the user's PRL since they are obtained using sub-steps 702-712. Accordingly, by displaying the processed image with its centre at the [xf, yf] coordinates of the display region, if the user looks at the marker, the user sees the captured image in focus.
A patient suffering from visual field distortion tends to see a portion of his/her visual field as deformed. As a result, when looking at an image, the patient sees the portion of the image corresponding to the deformed portion of his/her visual field as distorted or deformed. This portion of the image may be termed as the user's zone of distortion in the image.
In sub-step 1102, a video image is captured using the video camera 402.
In sub-step 1104, a user distortion matrix characterizing an image deformation caused to at least a portion of the visual field of the user by the visual impairment is defined. This user distortion matrix is defined based on the user's zone of distortion in the captured image as input by the user. Specifically, the user is shown the captured video image with a grid overlaid on the image and the user is requested to input to system 400 the coordinates of two points defining his/her zone of distortion on the grid, specifically, point_top_left (the point corresponding to the top left hand corner of his/her zone of distortion) and point_bottom_right (the point corresponding to the bottom right hand corner of his/her zone of distortion). This may be done by having the user input the coordinates of these two points using verbal commands via the microphone or by displaying the video image on a touch screen and requesting the user to touch the above-mentioned points. Using the coordinates of the above-mentioned two points, the system 400 automatically marks out the user's zone of distortion by first setting (i) all the pixels of the image having the same x coordinates as point_top_left, (ii) all the pixels having the same y coordinates as point_top_left, (iii) all the pixels having the same x coordinates as point_bottom_right and (iv) all the pixels having the same y coordinates as point_bottom_right as the boundary of the zone of distortion. Next, the zone of distortion is marked by setting this zone as comprising all the pixels within the boundary and all the pixels of this boundary. The user distortion matrix M is then defined by setting the user distortion matrix M as a 2-dimensional matrix having entries respectively corresponding to the pixels of the marked zone of distortion, with the values of these entries being the intensity values of the respective pixels.
In sub-step 1106, a correction matrix M′ is generated by inverting the user distortion matrix M. The purpose of this correction matrix M′ is to correct the user distortion matrix M.
In sub-step 1108, the correction matrix M′ is applied to the video image captured in sub-step 1102 to obtain a deformed image. More specifically, applying the correction matrix M′ to the video image deforms a portion of the image (this portion corresponds to the zone of distortion input by the user).
In sub-step 1110, other image enhancement operations (as shown in
In sub-step 1112, it is determined whether there is a need to redefine the correction matrix M′. In particular, the user is asked if the processed image is now clear and if not, it is determined that there is a need to redefine the correction matrix M′ and sub-steps 1102-1112 are repeated. Else, if the user finds the processed image clear, it is determined that there is no need to redefine the correction matrix M′ and sub-step 1114 is performed in which the coordinates of the current points defining the user's zone of distortion i.e. point_top_left and point_bottom_right are stored as zone of distortion points' coordinates in the data storage device 406.
Besides the zone of distortion points' coordinates from the data storage device 406, the input to sub-steps 1202-1204 further comprises a video image which may be one that is most recently captured using the video camera 402 or one that has been processed by one or more operations of the system 400 as shown in
In sub-step 1202, a correction matrix M′ is calculated and then applied to the input video image to obtain a deformed image. In particular, using the zone of distortion points' coordinates, the system 400 automatically marks out the user's zone of distortion by first setting (i) all the pixels of the image having the same x coordinates as point_top_left, (ii) all the pixels having the same y coordinates as point_top_left, (iii) all the pixels having the same x coordinates as point_bottom_right and (iv) all the pixels having the same y coordinates as point_bottom_right as the boundary of the zone of distortion. Next, the zone of distortion is marked by setting this zone as comprising all the pixels within the boundary and all the pixels of this boundary. A user distortion matrix M is then defined by setting the user distortion matrix M as a 2-dimensional matrix having entries respectively corresponding to the pixels of the marked zone of distortion, with the values of these entries being the intensity values of the respective pixels. The correction matrix M′ is then calculated as the inverse of the user distortion matrix M and applied to the image. This causes a portion of the image to be deformed (or in other words, augmented) by the correction matrix M′. This portion of the image depends on the above-mentioned zone of distortion marked out by the system 400 using the zone of distortion points' coordinates obtained from sub-steps 1102-1114.
Other image enhancement operations (as shown in
Specifically, the portion of the image to be deformed in sub-step 1202 corresponds to the zone of distortion input by the user (in sub-steps 1102-1114), and the deformation corrects the portion so that the image appears undistorted to the user (whereas the deformed image is likely to appear distorted to a person with normal vision). In other words, upon displaying the processed images by the display device 410, the correction matrix corrects the deformation in the corresponding portion of the visual field of the user due to the visual impairment. For example, a user with visual impairment may see a straight line as a curve. With the application of the correction matrix to the image comprising the straight line, the straight line can appear closer to its original form (i.e. straighter) to the user.
The input to sub-steps 1402-1412 comprises a video image which may be one that is most recently captured using the video camera 402 or one that has been processed by one or more operations of the system 400 as shown in
In sub-step 1402, the user is asked if the magnification of the image is acceptable and if so, sub-step 1412 is performed in which other image enhancement operations (as shown in
In sub-step 1404, the user is asked to input a command (which may be a verbal command via the microphone 404) indicating whether the user wishes to increase or decrease the magnification of the image.
The current zoom factor of the image is then determined in sub-step 1406. In particular, the zoom factor is stored in the data storage device 406. The zoom factor has a default value (e.g. 1) when the system 400 starts operation and each time the user indicates that he/she wishes to increase/decrease the magnification of the image, the zoom factor is increased/decreased by a certain amount (e.g. 1) and stored in the data storage device 406. For example, if the zoom factor is at the default value 1 and the user indicates his/her wish to increase the magnification of the image, the zoom factor is increased to 2. In sub-step 1406, the current zoom factor is determined by retrieving the zoom factor from the data storage device 406.
In sub-step 1408, a transformation matrix T is computed based on the user's command and the current zoom factor. In sub-step 1410, the transformation matrix T is applied to the image to obtain a transformed image. In particular, the transformation matrix is computed in sub-step 1408 to magnify the image by a particular zoom factor. This particular zoom factor is calculated based on the user's command and the current zoom factor. For example, if the current zoom factor is at its default value 1 and the user indicates his/her wish to magnify the image, the transformation matrix is computed so that after applying the transformation matrix to the image in sub-step 1410, the image is magnified by 2×. Take for example a display region in the form of a bitmap of dimension w×h. In this case, if the current zoom factor is 1 and the user wishes to magnify the image, the bitmap is scaled up to the size of 2w×2h and then cropped to the size of w×h, the scaling and cropping being done while keeping the centroids of both the original bitmap and the scaled up bitmap invariant.
The user is then asked again in sub-step 1402 if the magnification of the image is now acceptable and if not, sub-steps 1404-1410 are repeated. If so, sub-step 1412 is performed.
In sub-step 1412, other image enhancement operations (as shown in
Therefore, with system 400, the magnification of the real time video images can be adjusted according to the users' instructions. More specifically, by inputting commands such as verbal commands into system 400, users are able to increase or decrease the level of magnification until they find the transformed image acceptable.
Although magnifying an image helps to enlarge an image so the user can see details of the image more clearly, it narrows down the user's field of view causing the user to see less of the image. As a result, the user has to physically move his/her head (if the system 400 is in the form of goggles) or move the handheld apparatus (if the system 400 is in the form of such an apparatus) so as to scan the surroundings until the user can see the objects he/she is interested in. To reduce the amount of scanning the user has to do, the sub-step 1410 may further comprise automated object tracking. In particular, automated object tracking may be performed to identify one or more objects of interest in the image and adjust a location of the image in the display region (after magnifying the image), so that the objects of interest the user are interested in are made visible to the user. For example, an object of interest may be placed with its centre at the centre of the display region. In another example, the object of interest may be placed with its centre at a fixation point (corresponding to the user's PRL) selected in a manner similar to that as described in sub-step 702 above (in this case, the object may be initially placed at the centre of the display region and then translated to the fixation point in sub-step 702 of an eccentric fixation operation performed in sub-step 1412). The automated object tracking may be implemented using object and face detection techniques known in the art.
Prior to performing sub-steps 1702-1712, the user is asked to take one or more colour and/or contrast sensitivity tests, for example a test based on the colour confusion axis. The results from these tests are then input into sub-steps 1702-1712.
In sub-step 1702, foreground and background colours are selected. The initial foreground and background colours are selected based on the test results of the colour and/or contrast sensitivity test(s) of the user.
In sub-step 1704, a video image comprising text is captured using the video camera 402.
In sub-step 1706, it is determined which pixels of the captured image belong to the foreground (i.e. which are the foreground pixels) and which pixels of the image belong to the background (i.e. which are the background pixels). In particular, the pixels forming the text are determined to belong to the foreground whereas the rest of the pixels are determined to belong to the background.
In sub-step 1708, the colours of the foreground pixels are changed to the selected foreground colour and the colours of the background pixels are changed to the selected background colour to obtain a contrast-enhanced image.
In sub-step 1709, other image enhancement operations (as shown in
In sub-step 1710, it is determined if the foreground and background colours have to be re-selected. In particular, the processed image is displayed to the user and the user is asked to input a command to indicate if he/she finds the processed image acceptable. If the user finds the processed image acceptable, the selected foreground and background colours are stored in the data storage device 406 in sub-step 1712. If not, sub-steps 1702-1710 are repeated with new foreground and background colours selected in sub-step 1702.
The foreground and background colours (both initial and new) can be selected in many ways. For example, a colour palette comprising commonly used colours may be presented to the user for the user to choose the foreground and background colours. In another example, the entire rainbow spectrum of colours may be presented to the user for the user to choose the colours. In yet another example, if the user is particular about the exact shade of his/her desired colours, the user can enter the exact Red, Green and Blue components for the colours he/she wants.
The input to sub-steps 1802-1810 comprises a video image which may be one that is most recently captured using the video camera 402 or one that has been processed by one or more operations of the system 400 as shown in
In sub-step 1802, it is determined if the input image comprises text. If not, the operation ends, the image is displayed to the user and the next video image frame is input to sub-step 1802. If the input image comprises text, sub-steps 1804-1810 are performed.
In sub-step 1804, the foreground and background colours stored in the data storage device are retrieved.
In sub-step 1806, it is determined which pixels of the image belong to the foreground and which belong to the background. In particular, the pixels forming the text of the image are determined to belong to the foreground and the rest of the pixels are determined to belong to the background.
In sub-step 1808, the colours of the foreground pixels are changed to the foreground colour retrieved from the data storage device and the colours of the background pixels are changed to the background colour retrieved from the data storage device to obtain a contrast-enhanced image.
In sub-step 1810, other image enhancement operations (shown in
By changing the colour of both the text and background of an image containing text, the contrast between the text and background can be enhanced and users can read the text of the image more easily.
Besides enhancing the contrast between the text and background of an image, other adjustments of the contrast, sharpness and/or colours of images can be performed. For example, sub-steps similar to those in
Furthermore, in another embodiment, the text in the input image is enhanced using least colour confusion axis and an iterative method. A calibration process is first performed for a particular user to obtain characteristics of the enhancement for the user. These characteristics are saved as part of the user's profile. These saved characteristics are then used to enhance the images with detected text when processing the images in real time. In one example, the calibration process comprises having the user configure multiple reading profiles and loading these profiles when necessary. For example, a red/green colour blind user tends to confuse the colours blue and purple. During the calibration process, such a user can configure his/her settings to indicate that blue and purple colours in images shall be replaced with other colours that appear less confusing to them. The user can start with a particular colour setting and after using this setting for a period of time, the user can change this to a new colour setting and save the new colour setting in his/her profile (either replacing the existing profile or adding to the collection of the user's profiles). The user can repeat this as many times as he/she wishes. In other words, the user can set his/her preferred colour settings in an iterative manner.
As shown in
In sub-step 2102, global estimation is performed. In particular, an offset value indicating a positional offset between the plurality of input images is first calculated. The offset value may be calculated over all of the input images or only a subset of the input images.
In sub-step 2104, global motion classification is performed. More specifically, in sub-step 2104, the offset value is compared against a threshold. If the offset value is more than the threshold, it is determined that the user is not stationary (i.e. moving). In this case, the scene that the user is looking at is constantly changing so it is not necessary to perform image stabilization. Therefore, if the offset value is more than the threshold, the image stabilization operation ends and the images are displayed to the user. Else, it is determined that the user is stationary and sub-steps 2106-2108 are performed.
In sub-step 2106, global motion compensation is performed by making use of the portion of the scene that is similar as an anchor and stabilize the user view accordingly. In particular, one or more of the input images are modified based on the positional offset (as indicated by the offset value) to obtain a modified set of images.
In one example, the global estimation may be performed by first extracting images having a high degree of similarity from the plurality of input images, seeking respective anchor regions (for instance, regions of a predetermined size) in these extracted images and calculating the offset value as the positional offset between the anchor regions (for instance, the distance from the centre of one anchor region in one of the extracted images to the centre of another anchor region in another of the extracted images). In this example, the global motion compensation may be performed by using the offset value to modify one or more of the images to bring the anchor regions into alignment. This may be done by modifying successive ones of the input images to reduce the offset between the images.
In step 2108, other image enhancement operations (shown in
Various modifications will be apparent to those skilled in the art.
For example, although the distortion correction and the eccentric fixation operations are two important operations of system 400, it is not essential that the system 400 includes both of these operations. Rather, the system 400 can include any one or more of the image enhancement operations shown in
Different ones of the image enhancement operations may be useful for different users. Distortion correction is usually needed before the visual field is damaged. Once there is permanent visual field damage, eccentric fixation is probably the only means by which the user can see properly. Although not essential as mentioned above, it is preferable for the system 400 to address both early and late visual field damage cases in order that the system 400 can be useful for a larger group of low vision users. Therefore, it is preferable to include both the distortion correction and eccentric fixation operations in system 400.
Further, system 400 need not contain a microphone for capturing voice commands. The user's commands may be in a different format and system 400 may instead or further comprise an alternative user input device. For example, system 400 may contain a keypad for the user to type his/her commands for processing by the system 400.
Method 500 need not comprise a step of generating the data characterizing the visual impairment. Instead, such data may be input by the user to system 400 for storage in the data storage device 406. The user also need not provide voice commands and in this case, the processing of step 508 may be performed based on the data characterizing the visual impairment alone.
After capturing an image using the video camera 402, the image may be processed using only one of the operations of system 400 as shown in
In addition, although
The embodiments of the present invention have several advantages, some of which are described below.
The system 400 performs real time video image processing and can provide an augmented reality environment to assist and guide patients with low vision. In particular, the system 400 is able to capture real time video images (real time visual targets) as seen from a user's perspective (by objective and subjective ocular and visual tests) and can automatically enhance these images based on the user's needs (which are determined via calibration processes involving the patient's feedback).
Further, the system 400 can display augmented-reality images on transparent LCDs. More specifically, the image display device 410 can be configured to simultaneously display a plurality of overlying layers, one of the layers comprising the processed image and another one of the layers comprising the captured images. In this manner, the user can see the actual object of interest and the augmented layers (comprising the processed images) simultaneously. The system 400 can appear like a normal pair of spectacles, but with graphics including for example, augmented corrected zones, magnified and enhanced real time images, Amsler grid and markers corresponding to PRLs appearing within the user's visual field. This allows patients having visual problems to be able to view objects in the same manner as people with normal vision. The system 400 hence allows automated/semi-automated visual target tracking.
Moreover, the system 400 can provide patients automatic assistance in using their PRL and a customized field of view based on their needs by automatically characterizing each individual's visual abnormality using objective and subjective assessment techniques. Using the characterization outcome, the system 400 enhances images captured by the video camera to meet the patient's needs. In particular, the system 400 can determine a suitable PRL for the patient and guide the patient to use the PRL to view the images. Therefore, the system 400 can help enhance the vision of patients having eccentric fixation issues.
The system 400 may be used with the assistance of a visual therapist for fine-tuning purposes depending on the visual tasks required. However, the system 400 may also be used in the absence of an optometrist, hence helping to reduce the workload of optometrists. The system 400 is thus a useful rehabilitation tool for AMD patients with low vision.
The system 400 can also be integrated into a wearable product or a handheld tool, facilitating the use of it by the patient. The patient also can simply input voice commands to operate system 400. Thus, the system 400 can be easily integrated into a patient's daily life.
Unlike prior art low vision aids, the system 400 allows adjustment of the magnification of the images as per the patient's needs (based on commands input by the patient). The system 400 is also able to automatically correct distortions and enhance a patient's vision.
Table 1 below shows a comparison between the system 400 and prior art low vision aid devices. From Table 1, it can be seen that system 400 provides several advantages over prior art low vision aids, thereby helping to overcome the limitations of the prior art low vision aid devices.
Number | Date | Country | Kind |
---|---|---|---|
10201503733W | May 2015 | SG | national |
Filing Document | Filing Date | Country | Kind |
---|---|---|---|
PCT/SG2016/050222 | 5/12/2016 | WO | 00 |