1. Field of Invention
The present invention relates to biometric analysis of a retinal image, and more particularly, to biometric analysis of a blood vessel pattern in an area of the retinal image with high structural content.
2. Description of the Related Art
Due to the unique character of each individual's retina, various systems attempt to use the retina for biometric identification. Previous approaches have focused on identifying the boundaries of the optic disc in a retinal image and using features of the optic disk as a basis of comparison with other retinal images.
In particular methodologies, the optic disk boundary is used as a fiduciary and blood vessels are segmented, encoded, and matched with respect to this specific fiduciary. There are generally five aspects to a system implementing a retinal algorithm in these methodologies: optic disk image auto-capture; blood vessel segmentation; encoding; matching; and data retrieval and caching strategy. Optic disk image auto-capture involves consistent detection of the optic disk boundary in multiple captured frames. Optic disk image auto-capture is combined with blood vessel segmentation. Specifically, blood vessel segmentation involves locating blood vessel cross-sections using a model-fit method applied along concentric ellipses based on the detected optic disk boundary. Encoding entails recording three parameters for vectors representing the width of the blood vessel cross-sections, as well as their positions with respect to the optic disk. Meanwhile, in matching, the vectors recorded during encoding are compared to a database of stored vectors from reference images, and a matching score is produced reflecting the percentage of matching vectors in the comparison. Data retrieval and caching strategy employs flat files stored in disk directories, where video frame, encoded data, and header information for stored images are retrieved as one monolithic file.
Unlike the methodologies described above which depend on the information available in the proximity of the detected optic disk boundary, embodiments of the present invention detect and use data from any area of the retinal image that contains high structural content. The embodiments enable image auto-capture, blood vessel segmentation, encoding, matching, and data retrieval and caching according to areas of spatial variation in the image, i.e. spatial intensity variations of pixel intensity values, rather than a fiduciary such as the optic disk.
In particular, an embodiment of the present invention identifies retinal blood vessels for biometric identification by: receiving at least one image with retinal data; detecting an area in the image corresponding to a spatial variation in the image; and determining a blood vessel pattern in the area. The image may be an image bitmap. The spatial variation may be determined according to a spatial intensity gradient. Furthermore, the area corresponding to the spatial variation can be defined by a fitted shape.
A specific embodiment determines a structural measurement in the area corresponding to the spatial variation. For instance, the structural measurement can be a structural center of mass. In such an embodiment, the blood vessel pattern is determined relative to the structural measurement.
A further embodiment determines the blood vessel pattern by determining blood vessel cross sections within the area corresponding to the spatial variation and linking the blood vessel cross sections to determine blood vessels. The blood vessel pattern can include blood vessel bifurcations and locations of entry and exit from the retina. Each of the blood vessel cross sections can be represented by an N-vector determined by a N-parameter non-linear fitting function or a linear function combination. For instance, a non-linear five-parameter model can be fit to intensity profiles within the boundary according to a Levenberg-Marquardt method.
Once the blood vessel pattern is determined, it can be saved for future biometric identification. Alternatively, it can be compared with a reference blood vessel pattern for immediate identification. Regions of interest around the detected blood vessels can be used to normalize or align the blood vessel patterns before they are compared for identification.
Still other aspects, features, and advantages of the present invention are readily apparent from the following detailed description, by illustrating a number of exemplary embodiments and implementations, including the best mode contemplated for carrying out the present invention. The present invention is also capable of other and different embodiments, and its several details can be modified in various respects, all without departing from the spirit and scope of the present invention. Accordingly, the drawings and descriptions are to be regarded as illustrative in nature, and not as restrictive.
a illustrates the auto-capture process in another exemplary embodiment of the present invention.
b illustrates the auto-capture process in yet another exemplary embodiment of the present invention.
a illustrates a retinal video frame showing blood vessel structure.
b illustrates an exemplary application of a polar-coordinate method in the segmentation of the blood vessel structure.
c illustrates an exemplary application of a Cartesian coordinate method in the segmentation of the blood vessel structure.
In order to provide a system and method for biometric retinal identification, exemplary embodiments of the present invention employ areas of a retinal image that contain high structural content. Structural content is a measure of the spatial variation in an image, i.e. spatial intensity variations of pixel intensity values. The area of highest structural content does not have to be bounded by a simple geometric shape, and its center usually occurs near the densest area of large vessel concentration and bifurcation. While exemplary embodiments may employ geometric shapes for calculation purposes, they do not merely fit a geometric shape to a generalized retinal image. In particular, the calculations performed within the boundary are dependent on data conditions in small areas within the boundary. For instance, the calculations may be performed in a locally adaptive manner based on the ratio of very bright pixels to total pixels in a small neighborhood.
In order to identify areas of greatest structural content and derive biometric data from these areas, embodiments of the present invention may use the following fiduciary set within the retinal image:
These candidate fiduciary points are semi-invariant, clear and ubiquitous. Thus, this fiduciary set can be reliably found under the expected imaging conditions.
The process of capturing a retinal image is described with reference to
Steps 101 through 106 of
The details of the assessment, or image quality test, are further illustrated in
In step 203, sharpness and focus, as well as saturation, are measured. As the image sequences peak in focus, vessel detail becomes clearer or sharper. Increasing detail is measured as an increase in gradient strength near structural edges. An overall increase in the intensity gradients within an image indicate an increase in focus. Alternatively, a high pass filter can be used to quantify high frequency components within an image. These high frequency components increase with focus.
Additionally, as a user aligns with a retinal image capture device, the overall signal increases as the user becomes more optimally aligned in accordance with the operation of the device. Therefore, in addition to assessing focus, overall signal intensity can be used to assess when a user is optimally aligned. This can be manifested as an overall increase in pixel intensities (and contrast) within an image.
However, as intensity increases, areas of the image may become saturated. In other words, groups of pixels, or picture elements, achieve their maximum attainable intensity values, creating “white areas” where detail becomes washed out. If enough -detail is washed out, the vessels cannot be reliably located. Accordingly, saturation is identified and measured by searching for clusters of very high intensity pixels in regions containing structure and recording the number of pixels with values above a predetermined or adaptive threshold.
Finally, in step 204, the results of any of the steps 201 through 203, or any combination or weighted combination thereof, are compared to predetermined or adaptive thresholds. If the thresholds are not met or exceeded, the captured image fails the image quality test and the video frame is discarded and the process returns to step 101 in
If, however, the video frame passes the image quality tests, a retina code is generated for the image as shown in step 103 in
If the image passes the retina code quality test of step 104, the retina result, including the video frame and the data from image analysis, is placed in a cache in step 105. The cache holds a ranked queue-of retina results from the plurality of video images in memory. The retina results are ranked according to criteria, which may include, but are not limited to, the maximum number of vessels, maximum number of VCSs, intensity contrast of the vessels and/or VCSs, focus measure, or combinations thereof. A maximum of M (M≧1) retina results are held in the cache for comparison with available data for biometric identification/verification, or K (K≧M) retina results for enrollment of new biometric data. If, at step 105, the cache already contains the maximum permitted number of retina results, M or K, the current retina result replaces the lowest ranking retina result in the cache if it ranks higher. In step 106, a counter keeping track of how many video frames have reached this point is incremented. If this counter exceeds a threshold T, (T≧M for identification/verification or T≧K for enrollment), the auto-capture is halted and the process continues with step 107, where the best N results (1≦N≦M or 1≦N≦K), are extracted from the cache and passed to step 109. Otherwise, if the counter has not reached the threshold, the process returns to step 101 to process more images. At any point during the auto-capture process, a timeout signal can be sent by the controlling software and the auto-capture process is halted and the process continues to the final encoding step 107. If, however, at this point, fewer than N results are contained in the cache, the auto-capture has failed to extract the required information and a “fail to acquire” signal is returned. Otherwise, the N retina results are then passed to the next processing step detailed below.
At step 109, a blood vessel segmentation and encoding process generates a final retina encoding. This final encoding can take the exact same form as the retina code as generated in step 103. In this case no further processing takes place at step 109. In the preferred embodiment the final segmentation and encoding step is similar to that of step 103 except that it constitutes a more thorough process. Namely the blood vessel segmentation method, as described below, is applied more exhaustively within the SSR. Other embodiments may apply alternative segmentation and encoding methods not related to the segmentation- and encoding steps applied in step 103. If the retina results are being used for biometric identification or verification, the results proceed to alignment and matching modules. If the retina results are being used for biometric enrollment, the results can be compressed and/or encrypted for future use or may be passed on to alignment and matching modules to test for repeat enrollments.
An alternative embodiment of the auto-capture process is shown in
b illustrates yet another embodiment of the auto-capture process, which is similar to the embodiment shown in
If the image passes the retina code quality test of step 316, the retina result, including the video frame and the data from image analysis, is placed in a retina code cache in step 317. The retina code cache holds a ranked queue of retina results from the plurality of video images in memory, similar to the cache of step 105 in the embodiment of
In a variation of the embodiment illustrated in
As described previously, encoding and segmentation occurs either during the process of auto-capture or immediately following the auto-capture at steps 109, 307 and 321 in
In a polar-unwrapped or Cartesian SSR, the N-parameter non-linear fitting function and/or various individual linear function combinations are applied to points within the SSR. One embodiment utilizes the Levenberg-Marquardt algorithm to fit a Gaussian model along intensity profiles in the SSR. In the case of a polar-unwrapped method, the SSR is sampled into J intensity profiles of length I along concentric ellipses at J different radii. In the case of a Cartesian method, the SSR is sampled to J intensity profiles of length I along the x-axis and K intensity profiles of length L along the y-axis.
As shown in
Initially, intensity profiles are sampled from within the SSR, shown as step 400. In step 401, for each and every point, .i, along-each intensity profile, a window of values centered on i is recorded. These intensity values become the local data for the application of the model-fitting method in step 402. The Levenberg-Marquardt method can be used to fit a non-linear five-parameter model to the data in the window. The model is constructed from the addition of a one-dimensional Gaussian curve, used to approximate the profile of a blood vessel, and a straight line, used to approximate the local gradient of the intensity within the image. The model function is:
y=p1*exp[−(x−p2)2/(p3)2]+p4*x+p5,
where the five parameters are:
The parameters are set to initial default values with p2 set to i, and the Levenberg-Marquardt method is used to fit this function to the data, and the five parameters are recorded for each point, i, in each intensity profile. An example result is shown in
In step 403, parameter sets from step 402 resembling blood vessels are identified. A function is used to record sets of parameters that could represent VCS (candidate VCSs), where the parameters fall within defined tolerances. In step 404, the candidate VCSs from step 403 are consolidated. If the parameters for a candidate VCS match the parameters for neighboring points, a VCS is recorded, represented by the five parameters. Repeat detection of a single vessel is consolidated into a single record, where a set of five parameters represents a particular combination of those recorded at a particular point and those recorded at neighboring points. All detected VCSs are recorded for all the intensity profiles for each image.
Search algorithms for the steps above include, but are not limited to, exhaustive search, hierarchical search, and directed search. In an exhaustive search, all points detected in the-retinal image are fit. In a hierarchical search, every location (i, j), where i≧1 and j≧1, is fit and is followed by local neighborhood searches around the resulting initial vessel cross sections. The hierarchical search may be performed at multiple resolutions. In a directed search, initial points to be fitted are chosen by local parameters, which include, but are not limited to, gradient and/or intensity strength relative to surroundings and presence of line segments. This first step in the directed search is followed by local neighborhood searches around the resulting initial vessel cross sections. The directed search may also be performed at multiple resolutions.
The initial VCSs may be merged into final VCSs along preferred directions. Merging allows the refinement of the fit parameters while reducing the many vessel representations along the preferred direction to more reliable representations. Merging techniques can include comparison of differences of neighboring VCS parameters to predetermined and/or adaptive thresholds and combining parameters into a new, single VCS if the thresholds indicate enough similarity. One example is the merging of initial VCSs into final VCSs angularly (horizontally) for each radial step (column) on a polar unwrapped grid.
In step 406 of
Once the blood vessels are identified, regions of interest (ROIs) are located about the blood vessels, further illustrated as step 407 in
In step 408, bitmaps representing image intensity gradients corresponding to each ROI are recorded. These bitmaps are the size of each corresponding ROI and contain pixel values representing local intensity gradients in the original video frame. These bitmaps are the template bitmaps used in the normalization and registration steps below.
Although the template matching approach described herein aligns images according to intensity gradients, an alternative embodiment of the present invention can align the blood vessel segmentations directly.
a-6c illustrate examples of some segmentation results. In particular,
Initially, the LRR and DRR are loaded into memory. As described above, each encoding result contains an image bitmap corresponding to the recorded video frame. Thus, the matching process operates with a series of bitmaps, each indicating intensity gradients in a recorded video frame and information characterizing the blood vessels contained in the recorded video frame, known as the retina code.
As shown in
In one embodiment, the pre-compare step first filters out twenty-five to fifty percent of the reference candidates very swiftly with an intra-retinal code correlation comparison. This comparison is made between vessel pairs in the object and reference images, for which coincidence of the SCMs between the object and reference images is not as important. The information that is used to compare the vessel pairs includes, but is not limited to, the distance between-vessel centers, vessel pair angles, and the difference in vessel pair lengths and widths. These comparisons are performed on a subset of the vessel pairs available and are compared to thresholds.
If the previous comparison does not rule out a match between the object and reference images, the pre-compare step 722 in one embodiment may proceed to direct comparison, or matching, between the images. The search algorithms for this matching include, but are not limited to, exhaustive search, hierarchical search, and directed search. In an exhaustive search, all vessels are matched to all other vessels, cross section by cross section at all increments in a search region. In a hierarchical search, all vessels are matched to all other vessels, cross section by cross section at locations (i,j), where i≧1 and j≧1, followed by local neighborhood searches around the resulting smallest difference points. The hierarchical search and directed search can be performed at multiple resolutions. In this embodiment, this comparison technique in step 722 may serve as the final matching stage. A similarity score between the two encodings is compared to a threshold. A “true” or “false” signal is generated depending on this threshold being exceeded. Therefore, in this particular embodiment, the matching process terminates at step 722.
However, a plurality of images of a retina result may represent the retina of the same individual, but differences between images may result from differing recording conditions. For instance, human alignment with the capture device and pupillary differences can result in variations in the intensities in images from the same individuals, where some images may be generally darker than others. Accordingly, the images may preferably be normalized as shown in step 723. In particular, template bitmap normalization may be employed to correct for average (per data point) gradient intensity differences between the SSR of an object retina code and the SSR of a reference (enrolled) retina code. As described previously, the template bitmaps are derived from the detection of ROIs or a single centered ROI within the image. In one embodiment, a scale factor is applied to each data point in the reference template bitmaps. The scale factor is the ratio of the average (per data point) gradient intensity in the object retina SSR and the average (per data point) gradient intensity in the reference retina SSR. In another embodiment, every object retina area to be matched to a reference retina template bitmap is scaled by the ratio of the average gradient strength in the reference retina template bitmap and the average gradient strength in the object retina area. The normalization enables template matching based on structural rather than intensity differences between the object area and the reference template. Other implementations include removing background low level gradients from the template bitmaps using adaptive global or local thresholds.
In addition to variations in the intensities, alignment variations may also result in displacements in the position of the blood vessel pattern as recorded in the fixed field of view of the camera. Accordingly, the image/ retina code registration, which occurs in step 724, accounts for variations in the positions of the blood vessel patterns between images as introduced by the image capture process. In particular, step 724 determines the displacement between the blood vessel structures as recorded in the two images by aligning the blood vessels between the images.
Due to the two-dimensional projection in the image plane of the three-dimensional retinal surface and the degrees of freedom in the movement of the eye, this displacement can occur as a non-linear deformation. This deformation can be modeled by an elastic transformation defined using tie-points centered on blood vessel features.
Measurement of the displacement, in either the polar unwrapped or Cartesian systems, may begin with an initial value of the difference between the SCMs of the object and reference retinas. The difference is applied to the VCSs of the object retina. If no fiduciary point is used, the coordinates of the VCSs may be centered on an arbitrary coordinate system. Vessel-by-vessel comparisons of distance are performed between the object and reference retinas, refining the center-to-center distance between the retinas. At this stage, a directed search is performed to find the optimal displacement between the two retinas. In one embodiment, a difference between SCMs is calculated. In an alternative-embodiment-a displacement between the blood vessel-patterns is calculated. In one embodiment, the optimal difference minimizes the distances between vessels in the reference and object images. In another embodiment, the optimal difference maximizes the number of points with final matching differences between VCSs in the reference and object images.
Alternatively, the displacement can be modeled using a sequence of rigid-body transformations encompassing translations, rotations, and shears. In step 724 of an alternative embodiment, the deformation is modeled using a single rigid-body translation, (tx, ty). Once calculated this translation can be used to align retina codes directly or to register the two images such that the blood vessels structures align. In this particular embodiment, a hierarchical multi-resolution template matching is used to register the two images. The method uses the template bitmaps recorded with the DRR and matches them to an intensity gradient image derived from the video frame contained in the LRR. The matching takes place though a range of translations at various scales with the result set as the highest scoring translation at the largest scale. A search sequence is predefined that details how many different scales are to be used, and in what sequence. Each scale refers to a reduction factor in the size of the live image and the template bitmaps. In addition, at each scale a sequence of step sizes is defined. In one implementation, a match for a given translation can be calculated using a binary AND of overlapping pixels. Other metrics may be used including the absolute differences between pixels or normalized cross-correlations. An initial search space is defined for the starting scale and step size. This search space is defined to include all expected actual translations between images. For all possible translations in the search space, at a step resolution defined by the step size, a score is calculated. The highest scoring translation is remembered and the search space at the next step size is defined by the differences in the last and current step sizes. If the search sequence defines no further step size resolutions at the current scale then the template matching moves onto the next scale in the search sequence. The current best translation and the new search space are both scaled according to difference between the last and current scale factors. This procedure continues until there are no further scales and step size resolutions defined in the search sequence. Another embodiment allows for more than one optimal translation to be kept at a given scale or step size and number of best scores corresponding to different translations are compared at the conclusion of the search sequence. In another embodiment, once the optimal translation is estimated, a series of rotations are applied as scored as above. The highest scoring displacement between the images is then set to the highest scoring translations followed by the highest scoring rotation.
In step 725, an image similarity score based on the measurement of the optimal difference or translation may be compared to an adaptive or predefined threshold. If this test fails, the match returns a “fail.” Otherwise, the process proceeds onto the generation of matching scores in steps 726, 727 and 728.
In an alternative embodiment, the rigid-body translation, (tx, ty) is estimated using a hierarchical multi-resolution search as described previously, except that the translations are scored using the rate of matching VCSs within the DRR and the LRR and/or the distance between corresponding VCSs within the DRR and the LRR. VCSs are deemed to correspond if they are nearest-neighbors and the distance between them is less than a threshold. The highest scoring translation corresponds to the highest proportion of matching VCSs and/or the smallest measured average distance between VCSs. This score may be used as the final matching score and compared with a threshold. If this test fails, the match returns a “fail.” Otherwise, the process returns a “pass” match result. Alternatively, if the test is passed, the process proceeds onto the generation of matching scores in steps 726, 727 and 728. Note that this technique may be used to replace or complement step 722 described previously.
In one embodiment, the video frame in the LRR is encoded as a retina code for a second time, except that the ellipse used to define the polar coordinates is set to be the one within the SSR in the DRR translated by the calculations of step 724. This new retina code encoding is compared to the encoding within the DRR. A match score is generated by the proportion of VCRs in the two retina codes whose parameters match within predetermined ranges.
However,
Quickly and efficiently retrieving reference data for the matching process is critical to achieving high match rates. The speed at which reference data can be retrieved is dependent upon several factors, such as basic storage I/O speed, intelligent binning (data grouped according to demographic and feature-based classification), and intelligent caching. Demographic grouping can include age, sex, and eye color, but is not limited to these attributes. Feature-based grouping can include enrollment-dependent features such as total number of vessels, total vessel cross sections, vessel lengths, vessel widths, vessel heights, and vessel to vessel distances.
Feature-based information, retina codes, and the video frames they are derived from can be stored independently. For the embodiment which includes a matching process that necessarily terminates at step 722 video frames, template bitmaps and data required for steps 723 to 729 are not stored. For alternative embodiments that utilize step 722 as a pre-filter this data need only be retrieved from data storage when the matching process meets the tests at step 722.
The actual retrieval searching methods (dependent on database schema) can include, but are not limited to, simple comparison, hashing, neural network-based, genetic algorithm-based, and hidden Markov-based methods.
The binning comparisons, data retrieval, and matching can be performed by independent or dependent processes, running on the same or multiple physical processors. Indeed, each of the three operations can execute on N (N≧1) physical processors and/or machines, with data retrieved from M (M≧1) physical storage systems. In one embodiment all the processes run on a single physical processor and retrieve data from a single physical storage system, all part of a single physical machine, such as a computer laptop or workstation. In another embodiment each bin is allocated a single physical machine. In yet another embodiment, each bin is allocated a single physical machine for binning and matching, while an enterprise-wide data management and storage system is shared by all bins.
In one embodiment of a Cartesian-based retina encoding and matching system, the retrieved data for matching is on the order of two to four kilobytes per reference code, implying that for two hundred fifty thousand to five hundred thousand reference codes, one gigabyte of memory is necessary. For small enrollee populations the entire reference data set can be placed in high speed system memory (such as DDR) on a currently available laptop computer, or a mobile smart camera system with one to two gigabytes of system memory. Data discrimination, data retrieval, and data matching are completely scaleable in storage capability and matching speed.
In general, caching and other data storage by exemplary embodiments of the present invention may be achieved with networked or non-networked systems that employ physical storage media of various forms, including, but not limited to, hard disk, optical disk, magneto-optical disk, RAM, and the like.
Furthermore, physical processors and/or machines employed by exemplary embodiments may include one or more networked or non-networked general purpose computer systems, microprocessors, digital signal processors, micro-controllers, and the like, programmed according to the teachings of the exemplary embodiments of the present invention, as is appreciated by those skilled in the computer and software arts. Appropriate software can be readily prepared by programmers of ordinary skill based on the teachings of the exemplary embodiments, as is appreciated by those skilled in the software art. In addition, the devices and subsystems of the exemplary embodiments can be implemented by the preparation of application-specific integrated circuits or by interconnecting an appropriate network of conventional component circuits, as is appreciated by those skilled in the electrical art(s). Thus, the exemplary embodiments are not limited to any specific combination of hardware circuitry and/or software.
Stored on any one or on a combination of computer readable media, the exemplary embodiments of the present invention may include software for controlling the devices and subsystems of the exemplary embodiments, for driving the devices and subsystems of the exemplary embodiments, for enabling the devices and subsystems of the exemplary embodiments to interact with a human user, and the like. Such software can include, but is not limited to, device drivers, firmware, operating systems, development tools, applications software, and the like. Such computer readable media further can include the computer program product of an embodiment of the present inventions for performing all or a portion (if processing is distributed) of the processing performed in implementing the inventions. Computer code devices of the exemplary embodiments of the present inventions can include any suitable interpretable or executable code mechanism, including but not limited to scripts, interpretable programs, dynamic link libraries (DLLs), Java classes and applets, complete executable programs, and the like. Moreover, parts of the processing of the exemplary embodiments of the present inventions can be distributed for better performance, reliability, cost, and the like.
Common forms of-computer-readable media may include, for example, a floppy disk, a flexible disk, hard disk, magnetic tape, any other suitable magnetic medium, a CD-ROM, CDRW, DVD, any other suitable optical medium, punch cards, paper tape, optical mark sheets, any other suitable physical medium with patterns of holes or other optically recognizable indicia, a RAM, a PROM, an EPROM, a FLASH-EPROM, any other suitable memory chip or cartridge, a carrier wave or any other suitable medium from which a computer can read.
While the present invention has been described in connection with a number of exemplary embodiments, and implementations, the present inventions are not so limited, but rather cover various modifications, and equivalent arrangements, which fall within the purview of prospective claims. For instance, although the embodiments of the present invention described herein employ various forms of spatial (two-dimensional) gradient, the present invention is not limited to these specific ways of determining spatial variation.
This application claims priority to U.S. Provisional Application No. 60/795,645 filed Apr. 28, 2006, the contents of which are incorporated entirely herein by reference.
Number | Date | Country | |
---|---|---|---|
60795645 | Apr 2006 | US |