This invention relates to an image analysis process for measuring the signal on biochips.
A biochip is composed of a small solid support, usually a microscope slide, on which specific DNA probes of cellular gene sequences are grafted or synthesized in situ in clearly defined positions. The probes are grouped in a spot occupying a few square micrometers containing about 109 identical molecules. The set of spots thus forms a two-dimensional matrix with a complexity of up to 106 spots/cm2 as described in document reference [1] at the end of the description.
The probe addressing type on the support as described in document reference [2] can vary and therefore impose a particular geometry:
The support may or may not be structured. For example, a biochip is known composed of a matrix of wells with gold electrodes. An electropolymerization is made on each electrode between a normal pyrrole and a pyrrole on which a probe has been fixed covalently.
All these methods according to known art lead to a two-dimensional matrix that can be used to screen and quantify a mix of several thousand DNAc molecules previously marked by a fluorochrome or any other molecules as described in document reference [3], in a single step. The hybridizing rate in a given position is determined starting from the intensity of the transmitted signal, which simultaneously indicates the nature and the quantity of molecules that were present in the mix of marked and studied molecules. It is also possible to study several conditions simultaneously. In this case, each condition needs to be marked by a different fluorochrome and they need to be mixed before they are hybridized on the biochip. There are thus one image for each fluorochrome during the slide reading step.
For example, in one step of making a biochip, the probes are deposited by a robot that has a print head with several pins, or rings. This type of robot with several rings, for example 4 rings, is used to take a small volume of samples, the deposit itself being made of a small dot that passes through the ring before coming into contact with the surface of the slide. A robot system with several hollow or solid pins, for example 48, is used to take the samples and to deposit them on the surface of the support.
The geometry of the deposit depends on the geometry of the robot head, the type of plates used to store the probes, the size of the pins, the pitch between each spot, etc. In general, a set of spots grouped in several distinct blocks is obtained, each block corresponding to a pin.
Image acquisition then consists of making a digital image of the biochip after excitation in order to measure the intensities of the different fluorescence signals. The biochip is read by a scanner equipped with excitation lasers corresponding to the absorption wavelengths of the fluorochrome(s) used. The fluorescence intensity is measured by a photomultiplier. The quality and resolution of the images obtained depend on the scanner used. Different types of instruments can be used, such as confocal and non-confocal scanners, CCD camera scanners, etc.:
This read step is very important. A good compromise between the contrast, the background noise intensity and the resolution is essential to overcome difficulties in analyzing images (location of spots, location of dust, segmentation of signals, etc.), and to optimize the reliability of the results. The precision and resolution of the scanner are controlling factors for the analysis of DNA biochips.
A filter method can be used to optimize this step, and statistically improve the image resolution.
A digital image, for example an image in the 16-bit Tiff (Tag-based Image File Format) format is obtained at the output from the scanner. This format is a good compromise between the size of the files to be stored and the information quality, in other words the number of intensity levels that can be coded. Consequently, it is very frequently used to record images originating from many different scanners. Furthermore, this type of Tiff format is not specific to any system, can easily be manipulated and enables the addition of complementary private information.
The intensity of each pixel is discretized on a scale of 65536 gray shades, from the lowest intensity (0) to the highest (65535). A pixel is the smallest measurable area on the image. Its intensity represents the average fluorescence emitted by the corresponding resolution surface area on the biochip. It is the smallest hybridizing unit that can be measured on this type of image. The intensity is proportional to the number of DNAc molecules fixed to the corresponding region of the slide. The density of the molecules on a spot is not necessarily homogeneous. Spots that are composed of a variable number of pixels with different intensities can have a very heterogeneous signal. As shown in
The problem to be solved by the invention is to:
The difficulty in automating a procedure for precisely positioning spots on the biochip image (Spot Finding Problem) is due to three main constraints:
Each spot is composed of a variable number of pixels with heterogeneous intensity. This is due firstly to the probe deposition method which generates more or less circular spots depending on the robots used, and secondly to the quantity of DNAc that is hybridized on the probes. As illustrated in
This heterogeneity in the geometry of the spots and the presence of dust or artefacts are problematic considering the size of the spots and explain why conventional morphological detection methods are inefficient. Thus, methods based on contour detection such as that defined in document reference [4] used for positioning of radio-marked DNA on nylon membranes, or methods developed for positioning of proteic spots on 2D PAGE gels, cannot be applied to small spots (about 10 times smaller than spots on nylon membranes). The existence of these three problems, namely the very small size of the spots, their imprecise morphology and the anisotropic background noise make these methods unsuitable.
Fluorescence signal extraction software according to prior art is usually based on four steps:
All this software requires precise parameter settings for the geometry of hybridizing areas.
As described in document reference [5], in some software the positioning of hybridizing areas begins with semi-automatic placement of a grid. Each spot block has to be positioned manually on the image. For each block, the next step is to supply precise parameters of the two-dimensional matrix that makes it up (number of rows, number of columns, pitch between spots, spot size) in order to place a grid. In combining these geometric parameters and a protected analysis, the image is segmented into frames inside which a more or less circular variable sized mask delimits the hybridizing area. This mask is centered on the maximum intensity.
Document reference [6] describes software that uses a slightly more automated and more reliable spot positioning method. However, it is necessary to supply all information about the probe deposition geometry, in other words the number of rows and columns in each block of the image. The blocks are isolated after supplying the position of the first spot in the image. The pre-grid is then placed automatically around the different spots as a function of the previously supplied parameters. Each frame is analyzed individually to optimize centering of the hybridizing area. Since the spots have not yet been properly identified, the center of each spot is determined approximately by the center of intensity of its own frame, in the same way as for the center of mass. For example, for an image I with pixel intensities I(x,y), the center of intensity along x is given by the estimate:
None of these software programs according to known art puts forward a completely automatic solution for the four steps mentioned above. The delimitation step is particularly penalizing. All knowledge about the geometry of the deposit has to be input, and usually the block has to be positioned on the image. The semi-automatic grid positioning step leads to a serious lack of reliability on areas of the image with a low contrast. However, quantification must be exact and precise in order to perform analytic processing. Positioning imperfections are inevitable sources of errors. A painstaking check of the entire biochip is necessary at the end of the step to calculate a spot fluorescence intensity estimator. Many problems have been observed in areas with low contrast.
All these initialization, verification and reworking steps may take several hours and are indispensable before starting to quantify the spots. In the laboratory, a single experiment may be based on the use of several hundred biochips. Therefore, it is essential to develop a fast and efficient software requiring a minimum number of interactions with the operator and minimizing the verification steps.
Furthermore, the measurement of the spot fluorescence intensity is imposed by the software used and is not necessarily suitable for the deposition type used.
Therefore, the purpose of the invention is to solve the problems mentioned above by putting forward an image analysis process for measuring the probe reaction rate on a biochip.
The invention relates to an image analysis process for measuring the signal on biochips organized into one or several blocks each including a large number of spots each comprising at least one probe, the said process being characterized in that it comprises the following steps:
Advantageously, averaged orthogonal projections are used with two x and y axes in the plane of the biochip so that the periodic signal can be amplified and thus two histograms {overscore (H)}j and {overscore (H)}i are obtained. The discrete single dimensional Fourier transform (FFTD) is then used to identify the main frequencies of the image in x and y, the maximum power of the transform P(ω) giving the values ω0, from which the repetition periodicities λ0 of the spots in x and y are deduced. A variable threshold can be applied to the two histograms {overscore (H)}j and {overscore (H)}i the optimum threshold being given by the value that gives the maximum of the equation
the distribution of the periodicity λ then being examined along the two histograms {overscore (H)}j and {overscore (H)}i, a window with a size 2 λ0 sliding along the two histograms. The variations of the power of the Fourier transform of these histograms are explored locally, regions in which there is a loss of power corresponding to regions between blocks, and this method supplies a global cutout of the image.
A window with a width of λ0 and a length of 2λ0 centered on the global cut line is used to produce a fine delimitation of the boundaries of each block, and the presence of excessively intense and therefore suspicious signals can then be explored locally, and the limit of the blocks around the signal may be redefined if necessary. This is done firstly by extending the window by λ0/2 on one side of the limit and then on the other side and if the signal increases only for one of the two extensions of the window, the limit of the block is shifted by λ0 into the adjacent block for which there was no increase in the signal, if there was no increase in the signal or if the signal was increased for the two shifts, then either the signal was a parasite signal or the two blocks are very close and in both cases the probability of the signal belonging to the right and to the left of the cut line is calculated, and P(ω) is obtained for each half projection with and without a suspicious signal, the gain obtained then being equal to:
if the gain is greater than 1 for one of the two half projections only, the signal considered is assigned to it, by shifting the cut line by λ0/2 into the adjacent block; if the gain is not greater than 1 for either of the two blocks, then it is a parasite signal and the cut line is not shifted; if the gain is greater than 1 for both half projections, the signal is in phase with the two half projections and therefore the blocks are very close and in phase, and the signal is assigned to the block for which the gain is highest, by shifting the cut line by λ0/2 into the adjacent block.
Advantageously, unilluminated spots are isolated as follows:
The next step is to apply a digital reconstruction algorithm to these histograms, which is used to start from the initial projections and recreate a signal with almost identical amplitudes, the use of a threshold then making it possible to produce a binarized representation of the created histogram and to determine the precise positions of the spots present.
The binarized projection can be convoluted using a sinusoidal function of λ0 such as:
and by searching for the minimum and maximum values of the convolution function, it is then possible to automatically reconstruct a grid surrounding all detectable spots.
Advantageously, several images with low excitation and low detection can be created and accumulated into a single image.
Advantageously, a high resolution image is created and the resolution of the image is then reduced artificially by replacing n adjacent pixels by the average of the n pixels which thus reduces the size of the image by a factor of n, and a homogeneity filter is applied to eliminate the pixel with the greatest heterogeneity with regard to the adjacent pixels.
Advantageously, pixels within each spot can be sorted to identify the different types of pixels present: background noise, hybridizing signal, parasite signal.
Advantageously, a modified EM algorithm is applied in which the classes are made smaller and the concept of a pixel that does not belong to any class is introduced, by applying a threshold to the calculated probability that it belongs to one of the two or three main classes.
The following steps could be used:
A new convergence estimator can be introduced based on similarity or the calculated distance between the real distribution of pixels and the recalculated distribution using the mix parameters.
In one advantageous embodiment, the invention relates to an image analysis process that comprises the following steps:
Advantageously, during the positioning step, a method based on identification of the local loss of the periodic signal is used, such as the power of the Fourier transform of the spot projection signal onto the axes by rows or by columns.
Advantageously, during the filter step, a first filter is used to rectify the image, if necessary, by alignment of the spots in the same block. During the filter step, a second filter may also amplify the signals specifically originating from the spots and level other types of signals.
Advantageously, the position of each block in the image is refined.
a illustrates a histogram after digital reconstruction.
b illustrates a binarized histogram.
a illustrates a projection and a sinusoidal function before adjustment.
b illustrates a projection and sinusoidal function after adjustment.
a to 18d illustrate different images of the same spot acquired in four different conditions,
a illustrates an application of the modified EM method to define pixel classes with convergence evaluated with log-probability.
b illustrates an application of the modified EM method to find the pixel classes with the convergence evaluated with the RMS-log-probability.
a illustrates an image before filtering.
b illustrates this image after degrading filtering.
a,
23
b and 23c illustrate flow charts that show complete processing for a single image, complete processing for several images with different hybridizing conditions of the same slide with superposition of images, and minimum processing for an image to be used to implement the process according to the invention, respectively.
The process according to the invention is an image analysis process for measuring the reaction rate of probes arranged on a solid substrate forming a biochip organized into one or several blocks each comprising a large number of spots each composed of at least one probe.
The images obtained during the read phase are composed of several blocks of relatively well aligned spots arranged at a constant pitch.
In the process according to the invention, the four main steps in the image analysis mentioned above, taking account of difficulties that exist with devices according to prior art, namely:
The fixed geometry of the head of the robots used leads to the formation of blocks that can be approximated as homogeneous x by y blocks such as 4 by 4 or 48 by 12 blocks depending on the number of pins fitted on this head. If the block is not homogeneous (blocks are missing at the end of the spot), the processing will be done in the same way as for a homogeneous block. The empty spot frames will be eliminated later. Since the deposition is done with a constant pitch, spots appear in the form of a periodic and regular signal. The signal periodicity is lost between each block of spots.
The process according to the invention uses this periodicity property to identify the positions of blocks and spots. This periodicity is searched for on row and column projections of the image, in x and y respectively. However, due to possible variations in the image, these projections cannot be used directly for all types of images.
Delimitation of Blocks of Spots
The blocks are more or less well aligned along x and y. Averaged projections orthogonal to the axes are used to amplify the main periodic signal, and singular artefact type signals are minimized. The result is two histograms {overscore (H)}j and {overscore (H)}i like that illustrated in
where:
The single dimensional discrete Fourier transform (FFTD) can then be used to identify the main frequency of the image. This mathematical tool is very frequently used in image analysis, its result being the frequency representation of the original space.
The choice of working on these two histograms {overscore (H)}j and {overscore (H)}i is made for reasons of speed, since it is much more difficult to use the 2D FFTD transform on the entire image.
The power of the transform P(ω) is calculated for values of ω such that ω=2π/λ, λ may for example be included in the interval [10 pixels, 50 pixels] corresponding to the minimum and maximum amplitudes of the distance between the spots for the robots used, respectively:
The maximum value of P(ω) gives the value ω0, from which the spot repetition periodicity λ0 is deduced.
The interval may possibly be adjusted to the deposition frequency if it is significantly different.
The background noise can introduce parasites to the periodicity λ0. Therefore a variable threshold is applied to the two histograms {overscore (H)}j and {overscore (H)}i. The value of this threshold is determined as follows. The following ratio is calculated for a first value of the threshold S0, arbitrarily fixed very low:
where P(ωi) corresponds to the calculated value of P(ω) for the values {overscore (H)}i>S0 and where P(ωmax)=maxP(ωi).
The value of the threshold is then increased by a fixed step, and this calculation is repeated. The optimum threshold is given by the value that gives the maximum value for equation (3) above.
The distribution of the periodicity λ is then examined along the two histograms {overscore (H)}j and {overscore (H)}i. A window with size 2 λ0 slides along these histograms. As shown in
In regions between blocks, P(ω0) passes through a minimum so that the limits of the blocks can be identified.
For an image containing n blocks along one of its axes, there are n+1 spaces between blocks. But due to missing spots, the P(ω0) signal may pass through more than n+1 minima. The regions between blocks corresponding to a total lack of spots generate the lowest minima. Therefore, knowing the number of blocks, it is easy to extract the n+1 relevant values on the rows and on the columns. It is then possible to start from an original image such as that shown in
However, in a large number of cases the blocks are not well aligned. The positions of the limits of each block are refined using a window that slides along the cut line of each block. For example, the length of this window may be 2 λ0, parallel to the cut line between two blocks, and its width may be λ0 perpendicular to the cut line. This window is centered on the cut line. While this window is sliding, if the average signal calculated inside it becomes significantly greater than the average signal along the cut line (for example the average +2σ), this increase may be due to the presence of one or several spots. Therefore, it is checked if the signal is in phase with one of the blocks on each side of the cut line.
In a first step, the window is extended by λ0/2 on one side of the limit and then on the other side:
if the gain is greater than 1 for only one of the two half projections, the signal considered is assigned to it by shifting the cut line by λ0/2 in the adjacent block. If the gain is not greater than 1 for either of the two blocks, then it is a parasite signal. The cut line is not shifted. If the gain is greater than 1 for the two half projections, the signal is in phase with the two half projections. Therefore, the blocks are close and in phase and the signal is assigned to the block for which the gain is highest, by shifting the cut line by λ0/2 in the adjacent block. The final decision about where a block belongs is determined by the cardinal or parity rules described later. Thus, the limits of each block can be defined. To avoid any overlap in the adjacent block when the cut limit is being shifted (it is possible that the limits have to be shifted by a value less than or greater than the proposed values), the shift may be made by searching for the local minima in the shift direction.
Position of a Grid Around Each Spot
Positioning based on a mathematical morphology study of the entire image is difficult, due to heterogeneity in the geometry of spots, a possible lack of hybridizing at some points and the presence of spots with low contrast.
In the same way as for the positioning of blocks, the histograms {overscore (H)}j and {overscore (H)}i are made for rows and columns of a block respectively, by applying equations (1) to it.
The periodicity measurement λ0 of the spots signal is refined by applying an FFTD transform on the two histograms of each block. Once again, the maximum value of P(ω) in equation (2) gives the spot repetition periodicity λ0.
The theoretical positions of the hybridizing zones (illuminated or unilluminated spots) are identified using the least squares method to align projections with a sine curve with equation:
A sin(ω0xi+Φ) (5)
The average amplitude A of the projections is calculated as follows:
The phase of the sine curve is calculated by sliding a sinusoidal pattern with amplitude A along the profile of the histogram {overscore (H)}i calculated on a period λ0. For each position, the RMS (Root Minimum Square) value between the pattern and the projection is calculated.
The smallest RMSi value obtained gives the position at which the pattern is best aligned with the signal from a spot line. The phase shift obtained from this position j is such that:
Φ=−ω0×j (8)
The average positions of the spots are identified precisely by identifying maximum values of histograms {overscore (H)}j and {overscore (H)}i, where a histogram {overscore (H)}i is illustrated in
Therefore the values of the histograms are centered. A digital reconstruction algorithm is then applied to these histograms, so that a signal with almost identical amplitudes can be reconstituted from the initial projections.
A threshold is then used to binarize the reconstructed histogram and to determine the exact positions of the spots present.
During the deposition, a deviance of the spot can be observed that leads to local separation between the optimum sine curve and the projection itself or the binarized projection.
In order to compensate for this divergence, the applied sinusoidal function is optimized such that:
A sin(ω0x(I−∝x)+Φ) (9)
where the parameter α corrects the sine curve on the deviance of the deposits. The quantity α is chosen such that:
is a minimum.
The reconstructed histogram is used to precisely define the position of the spots with a high contrast. By adding information from the sine curve that gives the theoretical positions, all spots with a contrast that is too low to be detected can be found. The overlap between the theoretical positions (sine curves) and real positions (binarized projection) is optimized by convoluting the binarized projection using a sinusoidal function of λ0 such that:
a illustrates a projection and a sinusoidal function before the adjustment.
b illustrates the projection and the sinusoidal function after the adjustment.
By searching for the minimum and maximum values of the convolution function, it is then possible to automatically reconstruct a grid surrounding all detectable spots. The position of missing spots is then deduced by cross checking between the convolution function (equation 11) and the sinusoidal function (equation 9). Each of the maximum values of the convolution function is assigned to the maximum value of the sinusoidal function (9) minimizing the Euclidian distance such that:
RMSmax=√{square root over (({tilde over (H)}max−A sin)}(ω0x(maxi−αx)+Φ))2 (12)
where Nmax: the number of identified maximum values.
The missing spots correspond to the maximum values of the sinusoidal function (9) that were not assigned to the convolution function. The position of the missing spots can therefore be deduced. The spot phase and size rules defined below are applied to avoid the assignment of rows or columns of additional spots at the ends of blocks.
A convolution function signal cannot be defined as a row or column of spots unless it is in phase with the maximum values of the sinusoidal function (9),
Initially, a maximum of the convolution function (11) cannot have a neighboring maximum between i+λ0/4 and i+λ0/2 (these thresholds can obviously be defined by parameters as a function of the type of slide used).
Finally, for extreme maxima of the convolution function (for example delimiting a block of spots on the x axis), it is imposed that the sum of the quantity centered on the maximum defined as follows:
where nmax=the number of maximum values found, and the quantities centered on the two adjacent minima of the sinusoidal function defined as:
where nmin=number of minimum values found should be such that RM{overscore (S)}max+RM{overscore (S)}min<Hmax
When these rules are applied, the ends of each block can be automatically delimited.
The next step is to check the first and the last, row and column of each block. Blocks for which the spots at the beginning or the row or column are not illuminated are not detected.
The positions of these unilluminated spots are found by applying cardinal rules between the different blocks. The most probable number of rows and columns in each block are calculated. In blocks for which a row or a column is missing, this row or this column is assigned to one of the ends of the block using the sinusoidal function (9), such that this additional row or column minimizes the Euclidian distance between the sinusoidal functions restricted to the number of spots of blocks located above and below, and on each side of the block considered. This operation may be repeated as many times as necessary if more than one row or more than one column is missing at the ends.
The blocks of a biochip are not necessarily all the same size. The number of rows and columns depends on the selected deposition strategy, the chosen rules being adaptable to the particular strategy.
Identification of Pixels in Each Spot
Once the position of the spots has been found, the last two steps in the image analysis must be performed. These steps may or may not be carried out independently.
The homogeneity of the image obtained by the DNA biochips depends on the type of deposition used: either mechanical deposition or deposition by ink jet.
The problem to be solved is unchanged, in other words to determine the significant pixels containing the hybridizing signal, the background noise level and the parasite pixels, for each spot.
As illustrated in
There are two possible solutions to this problem:
It is important to arrange a homogeneous image, in other words an image in which pixels of the same type are only slightly dispersed (background noise, hybridizing signal) in order to obtain the best position of the hybridizing signal. This type of dispersion is largely due to acquisition conditions, which should be sufficient to optimize detection of weak signals and strong signals.
a) a single image with a 10 μm resolution, a fast pass, gain=100 PMT 100 (arbitrary scale),
b) a single image with 10 μm resolution, a slow pass, gain=80 PMT 80 (arbitrary scale),
c) ten independent images with 10 μm resolution, a fast pass, gain=80 PMT 80 (arbitrary scale) then superposition and an average of the image in a single image,
d) a single image with 5 μm resolution, a fast pass, gain=80 PMT 80 (arbitrary scale) then reduction to 10 μm eliminating the furthest pixel.
On this figure, the spot is the same spot that was acquired under four different conditions. It is found that it is difficult to apply a strong excitation and a strong detection and to obtain a homogeneous image with a low signal/background noise ratio or with a low signal/spot noise ratio. The acquisition conditions are illustrated in
One variant consists of making several images with low excitation and low detection, which are accumulated into a single image. As shown in
Since the variance of the background noise is reduced, the acceptance threshold for low intensity spots is lowered.
Therefore, it is not always desirable or possible to make several acquisitions from the same slide. One variant consists of making a high resolution image and then artificially reducing the resolution of the image by replacing n adjacent pixels by the average of n pixels. This thus reduces the size of the image by n.
d thus illustrates an acquired image at 5 μm adjusted to 10 μm using this variant, which corresponds to n=4.
A homogeneity filter can then be applied by eliminating the pixel with the greatest heterogeneity compared with adjacent pixels. Heterogeneity is estimated at more than 2σ (standard deviation) of the selected homogeneity estimator. In this case, the chosen homogeneity estimator is the average of a window of 16 pixels centered on the four pixels to be averaged, where σ is calculated on the same window of 16 pixels.
The four measurements made are independent. The probability that the same error would be made on the four pixels is very low. The result obtained is an image with a noise intensity reduced by a factor of 1.8. In this case, the intensity of the background noise is reduced by a factor 2.3 and the signal/background noise ratio or the signal/hybridizing signal increases by a factor of 1.27. On the other hand, the average intensity and the variance of the hybridizing signals are only very slightly modified, which results in a lower gain in the signal/noise ratio by a factor of 1.12. As for accumulation, holding the hybridizing signal and reducing the average and the variance of the background noise by the proposed filter, are a means of reducing the spot acceptance threshold intensity. The problem of fluorescence being extinguished is thus eliminated.
Once electronic noise has been minimized, sorting pixels within each spot is a means of identifying the different types of pixels present (background noise, hybridizing signal and parasite signal).
The fact that only signal type pixels are retained is a means of keeping or restoring measures that should have been eliminated on normal quality criteria (see document references [7] and [8]). An approximate breakdown into segments by calculating pixel by pixel ratios is not sufficient to differentiate the hybridizing signal from the background noise signal when they are too close, as illustrated in
By applying a conventional EM algorithm with 2 or 3 classes based on a Gaussian distribution, the distributions illustrated in
This example clearly illustrates that the applied Gaussian assumptions are too wide compared with the real distribution of each class. This is due to the fact that this algorithm assigns transition pixels between the signal and the noise, either at the background noise or at the signal, or at parasite signals.
A modification to this method could make classes smaller and introduce the concept of a pixel that does not belong to any class, by applying a threshold on the calculated probability that it belongs to one of the two or three main classes.
It is then possible to determine the class with the hybridizing signal that is logically the class with the greatest population and usually corresponds to the intermediate class.
a illustrates an application of the EM method thus modified, to find the pixel classes; convergence being estimated by log probability with bg: estimated distribution of the background noise, Real distribution: distribution of pixels in the hybridizing signal, and offset: saturating pixel.
b illustrates an application of the modified EM method to find pixel classes; convergence being estimated using the RMS-log-probability with bg: estimated distribution of the background noise, Real distribution: distribution of pixels in the spot and offset: saturating pixel.
Thus by using the modified EM method, the background noise is measured within the spot, and therefore it is no longer under or overestimated so that the problem of negative spots can be eliminated.
Since this type of analysis is carried out independently, the problem of perfect superposition of the images is eliminated. Furthermore, the signals closest to the background noise can still be measured reliably.
EM Method
After the automatic pause of a cache delimiting each spot, the method consists of classifying the pixels into three classes iteratively. The parameters of a mix are maximized in order to make this classification.
As a first approximation, the pixels are classified into two and then into three classes if necessary: background noise, hybridizing signal and possibly saturating pixels.
Initializing Step 1
The first step consists of building up a blurred pixel distribution table Ci0(k), where i is the intensity and k is the class. This table illustrated in table 2 at the end of the description is built up such that the value 1 is assigned for the background noise class and the value 0 is assigned for the hybridizing signal class for pixels for which the intensity is less than the value of the background noise+2 sigma (taken in reference outside the spot). The blurred distribution table Ci0(k) is set equal to value 0 for the background noise class and 1 for the signal class, for pixels for which the intensity is greater than the value of the background noise+2 sigma (used as a reference outside the spot). Initially, signals larger than 65500 are considered as being saturating and are not included.
This first blurred distribution table with two classes (K=2) is built up for a first segmentation with two classes.
The next step is then to define the probability p(k) of each class, such that:
where:—ni is the number of pixels with intensity level i,
The mix parameters:
Θ(Po(1), σo(1), μo(1), . . . , Po(k), σo(k),μo(k)) (16)
are defined as follows:
where:—ni is the number of pixels with intensity level i,
Estimating Step 2
The new blurred distribution table is calculated from parameters θm−1 such that:
Maximizing Step 3
The mix parameters θm are recalculated starting from this new distribution table Cmi(k).
Steps 2 and 3 are reiterated until convergence of the classification of the log probability defined as:
L(Θ,C)=l(Θ,C)+E(C) (21)
This approach imposes:
But the quantity θ(K) used to calculate whether the pixel belongs to one or the other in each class, can be extremely low for intermediate pixels very far from the considered classes.
The definition of the EM algorithm makes it necessary to assign these pixels to one of the classes, which widens the class distributions and therefore creates an artificial mix of the different classes. The consequence is a bad evaluation of the chosen observable hybridizing intensity (average, median etc.). This problem can be overcome by using a method to relax distances by modifying the quantity φK so that it converges towards 0 for these remote pixels, such as θ (K):
Under these conditions, the probabilities Ci(0) and Ci(1) converge towards 0. Thus, a pixel that is too far away for the mix parameters considered is not assigned to either of the two classes in this step. Nor is the pixel used in the calculations in this step.
Similarly, convergence of the log-probability can widen classes simply by stopping the classification early. Therefore, a new convergence estimator is introduced based on the similarity or the calculated distance between the real distribution of the pixels and the recalculated distribution using mix parameters. RMS log-probability is such that:
For example, we could have RMS=1/R, where R is the correlation factor between the theoretical distribution calculated from the mix and the real distribution curve.
Estimating is stopped when this estimator converges. The result is classes much closer to the real distribution.
When the spot contains saturating pixels (more than 65500) a new blurred partition table is created with an additional class K=3 (column) and new levels filling the table up to 65535 pixels (row). The new column is initialized equal to 0 up to level 65500, and then to 1 up to the final level. The additional levels in the existing columns are initialized to 0. The entire procedure is restarted using this new matrix. The upper limit in equations (15), (17), (18) and (21) increases from 65500 to 65535.
Improvement of the Image in Order to Improve Detection of the Position of Spots
The position of spots is determined more easily when there is a contrast between the background noise and the spots signal. This contrast can be increased by using a degrading filter that does not necessarily keep the proportionality of the signals. The filtered image is only used to find the position of the spots. Quantification is done starting from the raw image, or from an image filtered using a non-degrading filter.
The proposed degrading filter is based on the periodicity of the required signal. A threshold is applied to the projection {overscore (H)}i (or {overscore (H)}j) such that if {overscore (H)}i<{overscore (H)}i threshold=0 or if {overscore (H)}i≧threshold, starting from projections (equation 1) the result is:
The threshold is chosen to maximize equation (3):
calculated from the modified projection {overscore (H)}i. Once the threshold has been determined, the pixels in image Aij<threshold are corrected to 0 (Aij=0) in the image and the pixels in image Alij>threshold are corrected to
It would also be possible to choose to replace the pixels Alij>threshold by a constant value for example such as Aij=500.
A second “low pass” type filter is then applied that assigns the average of the signals in a window with (a+1) pixels along the side centered on the pixel (i, j), to each pixel (i, j) in the image such that:
A large number of parasite signals are thus eliminated, application of these filters gives the results shown in
Realignment of Spots (Cutting the Image)
In some cases the spots are not well aligned in x and/or in y which can make it more difficult to determine the cut lines between blocks and to position the grids. This problem can be overcome by calculating the histogram of the first m rows in the image as follows:
Row m+1 is then added to the histogram with all possible offsets or α translations between +λ0/2 and −λ0/2 such that:
{overscore (H)}(mα)j={overscore (H)}(m)j+A(m+1)j+α
The value of the power of the Fourier transform of the histogram {overscore (H)}(mα)j is calculated for each translation α. The value of the shift αmax maximizing the score is kept. Row m+1 is shifted by the quantity αmax in the image. This procedure is performed for all rows in the image starting from row position m=λ0 until the position x=n−λ0 where n is the last row. The same procedure may be applied to the image in y.
As a complement to the EM method, a morphomathematical approach can be carried out on each frame using connectivity (see document reference [10]) to define all objects present in the frame. These objects are used to define and initialize additional classes in the matrix Ci(k). Each object found defines a class, in addition to the saturating pixels class and background noise pixel classes. The EM method may or may not merge classes of the different objects and is used to define the class of pixels in the hybridizing signal while excluding parasite pixels, for example caused by dust.
a,
23
b and 23c illustrate all steps to be followed to quantify the spot, in three example embodiments, as follows respectively:
Number | Name | Date | Kind |
---|---|---|---|
6349144 | Shams | Feb 2002 | B1 |
6980677 | Niles et al. | Dec 2005 | B1 |
20020022226 | Nakao et al. | Feb 2002 | A1 |
20020055102 | Stern | May 2002 | A1 |
20030048933 | Brown et al. | Mar 2003 | A1 |
20030129648 | Shams | Jul 2003 | A1 |
20030219150 | Niles et al. | Nov 2003 | A1 |
20040001623 | Ugolin et al. | Jan 2004 | A1 |
20040047499 | Shams | Mar 2004 | A1 |
Number | Date | Country | |
---|---|---|---|
20040001623 A1 | Jan 2004 | US |