The present invention relates to a method for correcting data by removing a part of a noise component from the data related to electrophoresis, a method for determining whether a peak in data related to electrophoresis is a sample-derived peak or a spike, an apparatus, and a program.
With the development of the genome analysis technology, correlations between various diseases of human beings and gene mutations have been clarified. An acquired gene mutation derived from a disease such as cancer is characterized in that it is difficult to predict a mutation occurrence position on a genome and it is difficult to predict a mutation abundance ratio in an individual or a tissue. For example, a cancer tissue sample excised from a cancer patient contains cancer cells and normal cells and further, the cancer cells contain a variety of gene mutations, and thus, an abundance ratio of cells having a gene mutation in a specific position of a specific gene in the sample is sometimes extremely low. Therefore, a highly sensitive detection method is required in order to detect the acquired gene mutation derived from the disease. Further, there is also a case where not only the presence or absence of a gene mutation in a specific position of a target gene but also its abundance ratio is also taken as an index when selecting a therapeutic method or a therapeutic drug. Therefore, not only highly sensitive detection of a gene mutation but also quantification of its abundance ratio is important.
A conventional DNA sequencer using the Sanger's method is intended for determination of a base sequence, and thus, has problems that the detection power of a gene mutation that exists in a trace amount, that is, sensitivity is insufficient and that a range in which its abundance ratio can be quantified, that is, a dynamic range is narrow. Various optical systems have been proposed for an increase in sensitivity and an increase in dynamic range, and studies have also been conducted in terms of data processing. In particular, the increase in sensitivity and the increase in dynamic range by the data processing do not involve a change of an optical system, and thus, can be introduced at relatively low cost.
For example, WO 2015/015585 A presents a method for detecting a gene mutation with high sensitivity and quantifying the gene mutation with high precision by comparing a measured and calculated relative signal intensity of a nucleic acid sample with a relative signal intensity of a known nucleic acid sample stored in advance.
Further, WO 2016/132422 A discloses a method for estimating the magnitude of a noise component with high accuracy by performing time-frequency analysis on measurement data to acquire waveform data representing temporal changes of a plurality of frequency components, and analyzing the acquired waveform data.
However, the conventional technology based on the data processing has a problem that it is necessary to construct a database in advance.
Although the method in WO 2015/015585 A is an effective and excellent method, it is necessary to construct a known information database in advance in order to perform such a comparison with known information. Since there is a variety of gene mutations, a relatively large database is required, and further, periodic data expansion is required to cope with new target genes.
Note that the method in WO 2016/132422 A is an excellent method for grasping a noise level, and leads to an increase in sensitivity and an increase in dynamic range if an application to removal of a noise component is possible, but there is no mention regarding a guideline, a method, and an effect of the noise component removal.
The present invention has been made to solve the above problems, and an object thereof is to provide a technology for achieving an increase in sensitivity or an increase in dynamic range by data processing while eliminating the need for constructing a database in advance.
An example of a method according to the invention is a method for correcting data related to electrophoresis by removing a part of a noise component from the data, and includes: acquiring first data by performing electrophoresis of a labeled nucleic acid sample to be analyzed and simultaneously detecting label signals at a plurality of measurement wavelengths, the first data being detection intensity waveform data containing a sample-derived component and a noise component; selecting, from the first data, specific wavelength data corresponding to one or more measurement wavelengths which is a target of time-frequency analysis; performing filtering processing to cut some or all of components on a high frequency side on the specific wavelength data for one or more cutoff frequencies; comparing peak intensities of the specific wavelength data before and after the filtering processing for each of the cutoff frequencies; calculating, as a first cutoff frequency, a minimum cutoff frequency at which a decrease in peak intensity of the specific wavelength data falls within a predetermined allowable range among the cutoff frequencies; and correcting the first data or post-color-call data of the first data by performing filtering processing with the first cutoff frequency.
Further, an example of a method according to the present invention is a method for determining whether each of peaks in data related to electrophoresis is a sample-derived peak or a spike, and includes: performing correction using the above-described method; calculating a peak intensity change rate for each of the peaks based on a peak intensity before the correction and a peak intensity after the correction; and determining that the peak at which an absolute value of the peak intensity change rate is greater than a predetermined threshold at one or more measurement wavelengths is the spike.
According to the technology of the present invention, it is possible to achieve the increase in sensitivity or the increase in dynamic range by the data processing while eliminating the need for constructing the database in advance.
For example, a large-scale database is not required and an optical system is not changed, and thus, introduction at low cost can be achieved.
Another characteristic relating to the present invention will become apparent from the description of the present specification and the accompanying drawings. Further, other objects, configurations, and effects will be apparent from the following description of embodiments.
Hereinafter, embodiments of the present invention will be described with reference to the drawings. Note that modes for carrying out the present invention are not limited to the embodiments to be described later, and various modifications can be made within the scope of the technical idea.
The electrophoresis data correction device 1 is connected to a capillary electrophoresis sequencer (not illustrated) through the communication interface 7.
The storage unit 6 stores an operating system (OS) and an electrophoresis data correction program 8. When the CPU 2 executes the electrophoresis data correction program 8, the electrophoresis data correction device 1 functions as a data selection unit 8A, a time-frequency analysis unit 8B, a filtering processing unit 8C, a peak intensity comparison unit 8D, a cutoff frequency adjustment unit 8E, a smoothing processing unit 8F, and a frequency acquisition unit 8G which will be described later.
The electrophoresis data correction device 1 is configured to execute a method according to the present embodiment. Further, the electrophoresis data correction program 8 causes a computer to execute such a method, thereby causing the computer to function as the electrophoresis data correction device 1. In the present embodiment, a method for correcting data related to electrophoresis by removing a part of a noise component from the data is executed.
The method according to the present embodiment includes acquiring electrophoresis data (first data) by performing electrophoresis of a labeled nucleic acid sample to be analyzed and simultaneously detecting label signals at a plurality of measurement wavelengths. This data is detection intensity waveform data containing a sample-derived component and a noise component, and includes data at the plurality of wavelengths. In the present embodiment, this electrophoresis data is set as a correction target.
Hereinafter, an electrophoresis data correction method using the electrophoresis data correction device 1 of the present embodiment will be described with reference to a flowchart of
First, data (specific wavelength data) corresponding to one or more measurement wavelengths which is a target of time-frequency analysis is selected from the electrophoresis data (step S1). The selection can be made, for example, based on a user's instruction. Further, the selection may be automatically performed by the electrophoresis data correction device 1 based on a predetermined criterion.
If there is no corresponding specific wavelength data (NO in step S2), the process in
If the corresponding specific wavelength data exists (YES in step S2), the maximum frequency at which the power of the sample-derived component is higher than the power at a white noise level in a power spectrum of a specific wavelength is acquired (step S3). The frequency acquired here is used as an initial value of a cutoff frequency to be described later, and is referred to as an initial cutoff frequency, hereinafter. For example, the time-frequency analysis unit 8B acquires the power spectrum from the electrophoresis data, and the frequency acquisition unit 8G acquires the initial cutoff frequency.
Although the initial cutoff frequency can be arbitrarily set, the calculation amount of data correction processing can be reduced as will be described later if the initial cutoff frequency is set to the maximum frequency at which the power of the sample-derived component is higher than the power of the white noise level as described above.
A detailed description of step S3 will be given first for convenience in order to describe the details of steps S1 to S2.
On the other hand, it can be seen that the power spectrum with a sample has white noise on a high frequency side, but has high power on a low frequency side of a certain frequency. This means that the power of the sample-derived component is distributed on the low frequency side of the certain frequency.
Since the horizontal axis of a graph of
Although
In step S3, the smoothing processing unit 8F may perform smoothing on the power spectrum. Examples of specific smoothing methods include a moving average method, an adjacent averaging method, a Savitzky-Golay method, an FFT filter, a percentile filter, LOWESS/LOESS smoothing, and the like. That is, the method according to the first embodiment may include performing smoothing processing on the power spectrum upon acquiring the initial cutoff frequency.
Note that it is unnecessary to automatically calculate the initial cutoff frequency in step S3. For example, the user may read the maximum frequency at which the power of the sample-derived component is higher than the white noise level from the power spectrum or the smoothed power spectrum, and input the read maximum frequency to the frequency acquisition unit 8G.
In the power spectrum with a sample, there is a case where the maximum frequency at which the power of the sample-derived component is higher than the white noise level depends on the electrophoresis speed, and thus, depends on, for example, an electrophoresis voltage, the viscosity of the polymer, the temperature of the capillary, and the like among measurement conditions. Meanwhile, there is a case where the maximum frequency does not depend on a wavelength or a color of light to be observed.
However, there is a case where the magnitude of a sample-derived peak extremely differs depending on the wavelength or color of light to be observed. If the sample-derived peak is small, the sample-derived component is buried in white noise in the power spectrum so that it is difficult to acquire an appropriate initial cutoff frequency. Therefore, in step S1 described above, it is desirable to select electrophoresis data in which the sample-derived peak is sufficiently large.
The maximum frequency at which the power of the sample-derived component is higher than the white noise level in the power spectrum depends on measurement conditions. Therefore, prior to the start of the process in
In a case where the initial cutoff frequency for the representative measurement condition has been acquired in advance and the data selected in step S1 is data measured under the representative measurement condition, the process may proceed to filtering processing (step S4) to be described later without performing the determination in step S2.
Further, the user may set an expected value of the maximum frequency at which the power of the sample-derived component is higher than the white noise level. In a case where the user sets the predicted value, the predicted value may be set as the initial cutoff frequency, and the process may proceed to the filtering processing (step S4) to be described later without performing the determination in step S2.
Step S3 has been described as above. Next, steps S1 to S2 will be described. In a DNA sequencer using the Sanger's method, a sharp peak, called a spike, in which a plurality of wavelengths and colors overlap each other due to mixed bubbles and foreign matters sometimes appears in a waveform of electrophoresis data even if a sample has not been migrated.
The spike is steep as compared with a sample-derived peak waveform and has a small number of data points forming a peak. A height of the peak is often extremely large, but is the same as a height of a sample-derived peak in some cases. It is necessary to distinguish between the spike and the sample-derived peak waveform during analysis such as sequence analysis or fragment analysis, so that various methods are used.
Specific examples of a method for determining a spike include determination methods respectively using a peak height, a half-value width, and a range of overlapping wavelengths or colors, and a method using a combination thereof.
There is a case where it is difficult to acquire an appropriate initial cutoff frequency if electrophoresis data contains a large spike.
The spike exists near time 824 in the electrophoresis data of
On the other hand, in the electrophoresis data of
A spike has a sharp waveform, and thus, has power in a wide frequency band. Since a spike having a large peak height has high power, a power spectrum of a sample-derived component is buried with even a small number of spikes. On the other hand, in a case of a spike having the same magnitude as a sample-derived peak, a power spectrum of a sample-derived component is not buried with a power spectrum of a spike component since the number of spikes is usually sufficiently smaller than the number of sample-derived peaks.
As described above, the electrophoresis data includes the simultaneously measured data of the plurality of measurement wavelengths. Upon selecting the specific wavelength data from the electrophoresis data in step S1, the possibility that an appropriate initial cutoff frequency can be calculated increases by not selecting data of a measurement wavelength including a large spike (several times to several tens of times or more of a sample-derived peak) as illustrated in
Such a criterion can be appropriately determined by those skilled in the art based on known techniques and the like, and can be defined based on, for example, the peak height, the half-value width, whether or not peaks appear to overlap each other at a plurality of measurement wavelengths, a range of colors (measurement wavelengths) at which peaks appear, and the like as described above. Further, data may be automatically selected based on the defined criterion.
The present inventors have experimentally confirmed that the maximum frequency at which power of a sample-derived component is higher than a white noise level in a power spectrum does not change even in pieces of electrophoresis data measured at different wavelengths as long as the same sample is simultaneously measured under the same electrophoresis condition.
Therefore, in step S1, the data selection unit 8A can select data of a measurement wavelength at which it is determined that there is no spike based on a predetermined criterion from the electrophoresis data measured at the plurality of wavelengths, or can select data of a measurement wavelength at which it is determined that a peak value of a spike falls within the same range as a peak value of a sample-derived component based on a predetermined criterion.
In step S4, the filtering processing unit 8C performs the filtering processing using the initial cutoff frequency acquired as described above. The filtering processing is to cut off some or all of components on the high frequency side of the initial cutoff frequency, and can be performed using, for example, a low-pass filter, a band-pass filter, or a combination thereof.
Next, the peak intensity comparison unit 8D compares peak intensities before and after the filtering processing (step S5).
The cutoff frequency adjustment unit 8E changes a cutoff frequency from the initial cutoff frequency, and calculates a cutoff frequency (first cutoff frequency) that is the minimum frequency among cutoff frequencies at which a decrease in peak intensity due to the filtering processing falls within a predetermined allowable range (step S6).
In step S6, the filtering processing to cut some or all of components on the high frequency side of the specific wavelength data is performed for one or more cutoff frequencies. Then, the peak intensities of the specific wavelength data before and after the filtering processing are compared for each cutoff frequency. Furthermore, among these cutoff frequencies, the minimum cutoff frequency at which the decrease in peak intensity of the specific wavelength data falls within the predetermined allowable range is calculated as the first cutoff frequency.
In the present embodiment, an increase in peak intensity is determined to fall within the allowable range. However, as modifications, the increase in peak intensity may be determined to be out of the allowable range, or it may be determined whether the increase in peak intensity falls within the allowable range based on an increase rate (for example, by a comparison with a predetermined threshold).
In this manner, the cutoff frequency adjustment unit 8E sets the initial cutoff frequency as an initial value of the cutoff frequency, and calculates the first cutoff frequency by repeating the filtering processing while lowering the cutoff frequency. Therefore, if the initial cutoff frequency is set to the maximum frequency at which the power of the sample-derived component is higher than the power of the white noise level, it is possible to omit the operation in a high frequency band in which calculation is unnecessary, so that the calculation amount can be reduced.
A case where electrophoresis data illustrated in
The filtering processing unit 8C applies a low-pass filter with the cutoff frequency of 1.1 Hz to the electrophoresis data illustrated in
In step S6, the filtering processing unit 8C may acquire the background value in the specific wavelength data. The background value can be appropriately acquired based on a known technique or the like. For example, the background value can be calculated as an average value of portions having no peak in the specific wavelength data.
The peak intensity may be represented using heights of some peaks instead of the heights of all the peaks. Further, the peak intensity may be represented not by the height of the peak but by an area of a peak. The area of the peak can be appropriately calculated based on a known technique or the like. For example, integration may be performed between times at which local minima or background values are given on both sides of a peak time, or a predetermined constant may be subtracted from a result of the integration. If the peak intensity is represented using the area of the peak, the intensity can be calculated in consideration of not only the value of the peak top but also the width.
A change in noise component according to cutoff frequencies will be described in order to describe effects of the present embodiment. An index of noise is a standard deviation of a portion having no sample-derived peak in electrophoresis data, and is compared before and after the filtering processing. In the example of
In the case where the low-pass filter with the cutoff frequency of 1.1 Hz was applied, the change in peak intensity was 0.998, the change in noise was 0.625, and the change in dynamic range was 1.599. This means that the peak intensity decreases by 0.2%, the noise decreases by 37.5%, and the dynamic range increases by 59.9%.
In the case where the allowable range of the decrease in peak intensity was set to 1% or less, it was calculated that the cutoff frequency could be lowered to 0.84 Hz based on interpolation. Note that
Note that an interpolation operation can be appropriately designed based on a known technique or the like. For example, a linear or non-linear interpolation operation can be performed according to the number of cutoff frequencies.
After step S6, the filtering processing unit 8C performs the filtering processing with the first cutoff frequency calculated as described above on the electrophoresis data (including a plurality of pieces of measurement wavelength data) to be corrected (step S7), thereby correcting the electrophoresis data.
In step S7, the user may be notified of the calculated first cutoff frequency through the display unit 4 such that the user can set a cutoff frequency to be used for correction. That is, the filtering processing unit 8C may perform the filtering processing based on the cutoff frequency set by the user, thereby correcting the electrophoresis data.
In this manner, it is possible to achieve an increase in sensitivity or an increase in dynamic range by data processing while eliminating the need for constructing a database in advance according to the first embodiment.
In steps S4 to S6 of
If the decrease in peak intensity falls within the allowable range (YES in step S6-1), filtering processing with a lowered cutoff frequency is performed, and peak intensities before and after the filtering processing are compared (step S6-2-1). Here, whether a decrease in peak intensity falls within the allowable range is determined again (step S6-3-1).
If the decrease in peak intensity falls within the allowable range (YES), the process returns to step S6-2-1. If the decrease in peak intensity is out of the allowable range (NO), the minimum cutoff frequency (first cutoff frequency) at which the decrease in peak intensity falls within the allowable range is calculated by interpolation (step S6-4), and the process proceeds to step S7.
If the decrease in peak intensity is out of the allowable range in step S6-1 (NO in step S6-1), filtering processing with a raised cutoff frequency is performed, and peak intensities before and after the filtering processing are compared (step S6-2-2). Here, whether a decrease in peak intensity falls within the allowable range is determined again (step S6-3-2).
The process returns to step S6-2-2 if the decrease in peak intensity is out of the allowable range (NO). If the decrease in peak intensity falls within the allowable range (YES), the minimum cutoff frequency (first cutoff frequency) with the decrease in peak intensity falling within the allowable range is calculated by interpolation (step S6-4), and the process proceeds to step S7.
If an increase width and a decrease width of the cutoff frequency in steps S6-2-1 and S6-2-2 are set to 10% or less of the frequency acquired in step S3, the first cutoff frequency can be accurately calculated.
In the first embodiment described above in (1), in steps S4 to S6 of
In the present embodiment, however, filtering processing and peak intensity comparison are performed collectively to some extent, and a cutoff frequency at which a decrease in peak intensity due to the filtering processing becomes a predetermined value is calculated.
Hereinafter, an electrophoresis data correction method of the present embodiment will be described with reference to a flowchart of
A plurality of cutoff frequencies are set based on an initial cutoff frequency acquired in step S3′, and each filtering processing to cut components on the high frequency side is performed on electrophoresis data which is a target of time-frequency analysis (step S4′).
The plurality of cutoff frequencies may be set with a predetermined step size, for example, with the initial cutoff frequency as an upper limit.
Peak intensities before and after each filtering processing are compared (step S5′), and the minimum cutoff frequency (first cutoff frequency) with a decrease in peak intensity falling within an allowable range is calculated by interpolation (step S6′). Thereafter, filtering processing with the calculated first cutoff frequency is applied to electrophoresis data measured at a plurality of wavelengths to be corrected (step S7′), whereby the correction of the electrophoresis data ends.
In step S7, a user may be notified of the calculated first cutoff frequency such that the user can set a cutoff frequency to be used for correction. That is, the filtering processing unit 8C may perform the filtering processing based on the cutoff frequency set by the user, thereby correcting the electrophoresis data.
The first cutoff frequency can be accurately calculated if the step size of the frequency is set to 10% or less of the initial cutoff frequency upon setting the plurality of cutoff frequencies based on the initial cutoff frequency acquired in step S3′.
In this manner, it is possible to achieve an increase in sensitivity or an increase in dynamic range by data processing while eliminating the need for constructing a database in advance according to the second embodiment, which is similar to the first embodiment.
The electrophoresis data is corrected in the first embodiment described in (1) and the second embodiment described in (3). That is, data to be corrected is measurement value data (first data) obtained by electrophoresis, and this data is corrected by performing the filtering processing with the first cutoff frequency.
In a third embodiment, data after color call is corrected. That is, a method according to the third embodiment includes correcting the post-color-call data for the measurement value data (first data) obtained by electrophoresis by performing filtering processing with a first cutoff frequency.
The color call will be described. By performing electrophoresis for fluorescent dyes, a matrix that is information indicating fluorescence spectra of the respective fluorescent dyes used in a reagent kit is obtained. Based on this matrix, electrophoresis data, which is data of a signal spectrum for each wavelength band, can be converted into data of a signal spectrum for each type of fluorescent dye (post-color-call data). The post-color-call data also includes data at a plurality of wavelengths.
The color call is processing of acquiring signal spectrum data for each type of fluorescent dye used as a label. The color call can be performed, for example, by weighting data of measurement wavelengths of the electrophoresis data according to respective measurement wavelengths. Weighting factors for the measurement wavelengths vary depending on the type of fluorescent dye.
First, a description will be given with reference to
In a case of comparing
A decrease in height of the spike by the correction according to the first or second embodiment will be described later.
In a case of comparing
Next,
In a case of comparing
From the above, it can be said that noise of the post-color-call data can be reduced by correcting the post-color-call data.
The post-color-call data correction device 11 is connected to a capillary electrophoresis sequencer (not illustrated) through the communication interface 17.
The storage unit 16 stores an operating system (OS) and a post-color-call data correction program 18. When the CPU 12 executes the post-color-call data correction program 18, the post-color-call data correction device 11 functions as a data selection unit 18A, a time-frequency analysis unit 18B, a filtering processing unit 18C, a peak intensity comparison unit 18D, a cutoff frequency adjustment unit 18E, a smoothing processing unit 18F, and a frequency acquisition unit 18G which will be described later.
The post-color-call data correction device 11 is configured to execute the method according to the present embodiment. Further, the post-color-call data correction program 18 causes a computer to execute such a method, thereby causing the computer to function as the post-color-call data correction device 11.
Hereinafter, a post-color-call data correction method using the post-color-call data correction device 11 will be described with reference to a flowchart of
First, data (specific wavelength data) corresponding to one or more measurement wavelengths which is a target of time-frequency analysis is selected from electrophoresis data (step S1″). This electrophoresis data is original data of post-color-call data to be corrected. The selection can be made, for example, based on a user's instruction. Further, the selection may be automatically performed by the post-color-call data correction device 11 based on a predetermined criterion.
Subsequent steps S2″ to S6″ are similar to steps S2 to S6 in the flowchart of
In steps S2″ to S6″, constituent elements indicated by reference signs 12 to 18 and 18A to 18G in
The filtering processing unit 18C applies filtering processing using the first cutoff frequency calculated in step S6″ to post-color-call data to be corrected (step S7″), whereby the correction of the post-color-call data ends.
In step S7″, the user may be notified of the calculated first cutoff frequency through the display unit 14 such that the user can set a cutoff frequency to be used for correction. That is, the filtering processing unit 8C may perform the filtering processing based on the cutoff frequency set by the user, thereby correcting the post-color-call data.
In this manner, it is possible to achieve an increase in sensitivity or an increase in dynamic range by data processing while eliminating the need for constructing a database in advance according to the third embodiment, which is similar to the first and second embodiments.
In the third embodiment described in (4), the correction target is the post-color-call data, but the first cutoff frequency is calculated using the electrophoresis data that is the original data thereof. In a fourth embodiment, a first cutoff frequency is calculated using post-color-call data as first data, instead of electrophoresis data, to correct the post-color-call data.
That is, in the present embodiment, the first data is the post-color-call data of measurement value data obtained by electrophoresis, and a method according to the present embodiment includes correcting the post-color-call data by performing filtering processing with the first cutoff frequency. Note that the post-color-call data is detection intensity waveform data containing a sample-derived component and a noise component, which is similar to the measurement value data.
The post-color-call data correction device 11 can have the same configuration as that of the third embodiment (
Hereinafter, a post-color-call data correction method according to the present embodiment will be described with reference to a flowchart of
First, data (specific wavelength data) corresponding to one or more measurement wavelengths which is a target of time-frequency analysis is selected from the post-color-call data (step S11). The selection can be made, for example, based on a user's instruction. Further, the selection may be automatically performed by the post-color-call data correction device 11 based on a predetermined criterion.
The data selection unit 18A can select data of a measurement wavelength at which it is determined that there is no spike based on a predetermined criterion from the post-color-call data including data of a plurality of wavelengths, or can select data of a measurement wavelength at which it is determined that a peak value of a spike falls within the same range as a peak value of a sample-derived component based on a predetermined criterion.
If there is no corresponding specific wavelength data (NO in step S12), the process ends without performing analysis.
If the corresponding specific wavelength data exists (YES in step S12), the time-frequency analysis unit 18B acquires a power spectrum from the specific wavelength data, and the frequency acquisition unit 18G acquires, from the power spectrum, the maximum frequency (initial cutoff frequency) at which the power of the sample-derived component is higher than a white noise level (step S13).
Upon acquiring the initial cutoff frequency by the frequency acquisition unit 18G, the smoothing processing unit 18F may smooth the power spectrum.
Further, the user may read the maximum frequency at which the power of the sample-derived component is higher than the white noise level from the power spectrum or the smoothed power spectrum and input a value of the initial cutoff frequency. The frequency acquisition unit 18G may acquire this value.
The initial cutoff frequency depends on measurement conditions of electrophoresis data which is original data. Therefore, prior to the start of the process in
In a case where the initial cutoff frequency for the representative measurement condition has been acquired in advance and the original electrophoresis data is data measured under the representative measurement condition, the process may proceed to filtering processing (step S14) to be described later without performing the determination in step S12.
Note that the initial cutoff frequency for the representative measurement condition may be acquired from the power spectrum of the post-color-call data, or may be acquired from a power spectrum of the electrophoresis data which is the original data.
Further, the user may set an expected value of the maximum frequency at which the power of the sample-derived component is higher than the white noise level. In a case where the user sets the predicted value, the predicted value may be set as the initial cutoff frequency, and the process may proceed to the filtering processing (step S14) to be described later without performing the determination in step S12.
In step S14, the filtering processing unit 18C performs the filtering processing using the initial cutoff frequency acquired as described above. The filtering processing is to cut off some or all of components on the high frequency side of the initial cutoff frequency, and can be performed using, for example, a low-pass filter, a band-pass filter, or a combination thereof.
Next, the peak intensity comparison unit 18D compares peak components before and after the filtering processing (step S15).
The cutoff frequency adjustment unit 18E changes a cutoff frequency from the initial cutoff frequency, and calculates a cutoff frequency (first cutoff frequency) that is the minimum frequency among cutoff frequencies at which a decrease in peak intensity due to the filtering processing falls within a predetermined allowable range (step S16).
The filtering processing unit 18C applies filtering processing using the calculated cutoff frequency to post-color-call data to be corrected (step S17), whereby the correction of the post-color-call data ends.
In step S17, the user may be notified of the calculated first cutoff frequency through the display unit 14 such that the user can set a cutoff frequency to be used for correction. That is, the filtering processing unit 18C may perform the filtering processing based on the cutoff frequency set by the user, thereby correcting the electrophoresis data.
In this manner, it is possible to achieve an increase in sensitivity or an increase in dynamic range by data processing while eliminating the need for constructing a database in advance according to the fourth embodiment, which is similar to the first to third embodiments.
In a fifth embodiment, spike determination using correction of electrophoresis data and post-color-call data is performed. That is, a method according to the fifth embodiment is a method for determining whether a peak in data related to electrophoresis is a sample-derived peak or a spike.
In the first to fourth embodiments, it has been described that a sharp peak, called a spike, in which a plurality of wavelengths and colors overlap each other due to mixed bubbles and foreign matters sometimes appears in the electrophoresis data even if a sample has not been migrated.
It is necessary to distinguish between the spike and a sample-derived peak waveform during analysis such as sequence analysis or fragment analysis, so that various methods are used. Specific examples of a method for determining a spike include determination methods respectively using a peak height, a half-value width, and a range of overlapping wavelengths or colors, and a method using a combination thereof.
However, a peak size, the half-value width, and the range of overlapping wavelengths or colors are different for each spike, and thus, a spike whose peak size is close to a sample-derived peak is sometimes erroneously determined as the sample-derived peak.
The spike can be determined with high accuracy by using the correction of the electrophoresis data and the post-color-call data described in the first to fourth embodiments. Hereinafter, the spike determination using the correction of the electrophoresis data will be described with an example.
For cases wherein the electrophoresis data is corrected by setting an allowable range of a decrease in peak intensity to 1% or less, peak waveforms before the correction are illustrated in
Arrows (a) to (e) in
An arrow in
An arrow in
An arrow in
As described above, a peak height decreases by the correction in most of the sample-derived peaks, but a change rate thereof is 1% or less, which is the same as a predetermined range of a decrease in intensity of a sample-derived peak component. The height of the sample-derived peak sometimes increases by the correction, but a change rate thereof is also 1% or less since the change rate is smaller than that in the case of the decrease.
On the other hand, the peak height decreases by the correction in most of the spikes, but the change rate thereof is 10% or more, which is larger than that of the sample-derived peak. Further, the peak height of the spike sometimes increases by the correction, but the change rate thereof is higher than that of the sample-derived peak even in the case of the increase.
Therefore, it is possible to determine whether the peak is the sample-derived peak or the spike based on a change rate of a peak intensity caused by the correction. For example, first, the peak intensity change rate is calculated for each peak based on a peak intensity before correction and a peak intensity after correction. Then, a peak at which an absolute value of the peak intensity change rate is greater than a predetermined threshold at one or more measurement wavelengths can be determined to be the spike, and a peak at which the absolute value of the peak intensity change rate is not greater than the predetermined threshold can be determined to be the sample-derived peak.
Although the height of the peak is used as an index of the peak intensity in the present embodiment, an area of the peak may be used as the index of the peak intensity.
Hereinafter, a description will be given using the height of the peak as the index of the peak intensity. The sample-derived peak and the spike can be discriminated except for a specific spike to be described later by determining, for example, a case where the absolute value of the peak height change rate caused by the correction is twice or more the allowable range (for example, 1%) used in step S6 as the spike. In this case, assuming that the allowable range is 1% or less, a case where the absolute value of the peak height change rate caused by the correction is 2% or more is determined as the spike.
Such a threshold can be set to an arbitrary value, but most of sample-derived peaks can be correctly determined as the sample-derived peaks if the threshold is set to a value exceeding an upper limit of the allowable range of step S6 (a value higher than 1% in the above example). If the threshold is twice or more the upper limit of the allowable range of step S6, more sample-derived peaks can be correctly determined as the sample-derived peaks.
Here, a description will be given with reference to
An arrow in
In this manner, it is difficult to determine a spike in which the peak height is saturated at the measurement upper limit value at a plurality of successive points as the spike based on the absolute value of the peak height change rate after the correction.
There is a possibility that such a spike in which the peak height is saturated at the measurement upper limit value can be determined by an existing determination method. Examples of the existing determination method include a method of performing determination based on a peak height before correction, a method of performing determination based on a half-value width of a peak before correction, a method of performing determination based on whether or not a peak before correction overlaps at a plurality of measurement wavelengths, a method of performing determination based on a range of a color in which the peak before correction appears, a combination thereof, and the like.
Therefore, if a method for discriminating between the sample-derived peak and the spike according to the present embodiment is used in combination with the existing determination method, more peaks can be correctly determined.
Although the example of the spike determination using the correction of the electrophoresis data has been described as above, the spike determination can be similarly performed even in the case of the post-color-call data.
Further,
As illustrated in
Therefore, the data correction may be performed after the spike is removed. Specifically, the correction is performed on electrophoresis data or post-color-call data as described in the first to fourth embodiments. Next, a spike is determined by the method in the fifth embodiment based on the peak intensities before and after correction and a conventional spike determination method.
Those skilled in the art can appropriately determine an adjustment method in a case where a determination result obtained by the method in the fifth embodiment and a determination result obtained by the conventional spike determination method do not match. For example, a peak determined as a spike by either method may be determined to be the spike, or only a peak determined as a spike by both the methods may be determined to be the spike.
Then, the spike is removed from the electrophoresis data or post-color-call data before correction, and the electrophoresis data or post-color-call data from which the spike has been removed is corrected again. As a result, it is possible to prevent the waveform that does not originally exist and corresponds to the cutoff frequency from appearing at the bottom of the corrected spike.
There are various methods as a method of removing a spike. For example, there is a method of removing a plot forming a spike, and then complementing a data point by nonlinear curve fitting or nonlinear peak fitting using data points around the removed plot.
A process of removing a spike from electrophoresis data including the spike, complementing a data point by nonlinear curve fitting, and then, performing correction will be described using the following example. The data illustrated in
In this manner, the sample-derived peak and the spike can be more appropriately identified according to the fifth embodiment. As a result, the spike can be more easily removed, so that the noise included in the electrophoresis data can be further reduced, and the increase in sensitivity or the increase in dynamic range can be achieved.
Number | Date | Country | Kind |
---|---|---|---|
2021-050981 | Mar 2021 | JP | national |