This invention relates generally to RNA-based detection of viruses. In particular, this invention is related to RNA-based compositions, methods and devices for rapid detection of influenza viruses.
Over the past few centuries, respiratory diseases caused by RNA virus have caused global epidemics. The notorious COVID2019, SARS, MERS, Spanish Flu are all in this kind, and they take away the lives of millions of people every time they break out.
Communicable diseases such as influenza A have been a life-threatening issue that could take away lives of millions of people, especially for old people, young children, and patients with chronic diseases. In 1957, Asian flu took away more than 2 million lives. In 1968, Hong Kong flu caused 1 million deaths worldwide. Over the past years, in Hong Kong alone, thousands of people died because of influenza, and the number is increasing every year.
However, the symptoms of cold and influenza are hard to distinguish. In a survey which collected about 300 responses from China, Hong Kong or Singapore, more than 70% of the respondents could not differentiate between cold and flu, and around 50% stated that they do not seek medical help when they get flu-like symptoms. Inaccurate diagnosis of influenza leads to inappropriate or delayed treatment of flu-like symptoms and puts lives of patients and communities in danger.
Moreover, currently, many healthcare systems for epidemic diseases are highly centralized, which means that people can only receive testing and treatment at certain hospitals and clinics. This traditional system is highly vulnerable and may even collapse when dealing with respiratory infectious diseases, as they often break out in large volumes and high densities.
Therefore, there is a need to develop a cheap, rapid, accurate and convenient tool for detection of influenza which permits on-site detection of influenza by the general public. The tool will also allow patients to monitor their infectious status on a regular basis throughout treatment.
At the time of this invention, there are two most widely used ways to detect influenza for clinical purposes: one is quantitative polymerase chain reaction (qPCR) (Patel P. 2011), another one is rapid tests using influenza-specific antibody (e.g. ID NOW™ Influenza A & B 2 assay from Abbott).
qPCR can be used to detect RNA of viruses for the purpose of detection or identification. It has high accuracy but requires expertise and must be conducted in a laboratory setting. Moreover, it takes a long time (around 6 hours) to complete the testing and therefore not suitable for on-site testing.
Rapid tests using influenza-specific antibodies such as enzyme-linked immunosorbent assay (ELISA) are also used in clinics. However, antibody-based tests are often more expensive and less specific than the nucleic acid-based method. For instance, the current rapid tests use color change on the test paper to indicate the testing results. Due to the difficulty of recognizing different colors by human eyes, the false positive rate is as high as 30% to 50% (Nie, 2014).
A relatively new approach for high-throughput screening and rapid detection of pathogens including influenza viruses is “toehold switch” which is an RNA probe complementary to the target RNA with high specificity that releases ribosome binding site (Green A A, 2014). This tool overcomes the limitations of the qPCR and rapid antibody methods in terms of time and location, and provides a preliminary tool for pandemic control (Pardee K, 2016). This technique, however, is yet to be commercialized. Past toehold switch designs utilized fluorescent proteins and hydrolases (e.g. lacZ) as reporters to balance between detection accuracy and sensitivity (Green A A, 2014; Pardee K, 2016; CUHK iGEM Team, 2017), and have several limitations such as specific spectral requirements for fluorescence detection, high costs of enzyme substrates and long waiting time which could be as long as 4 hours (Pardee K, 2016).
RNA aptamer is a single-strand RNA molecule that can bind to a specific target. With higher thermal stability, smaller size, shorter developing time, aptamers are believed to be an alternative to antibody, especially in the field of diagnostics. Although rapid tests using RNA probes such as RNA aptamer probes (RAPID) offer advantages over rapid tests using antibodies, such as lower costs and higher specificity, the challenge of using RNA aptamers is that aptamers are harder to design and even if specific aptamers are successfully designed, the test is not very affordable for the general public because the production cost of aptamers remain relatively high. Thus it is desirable to develop a tool for optimization of the aptamer sequence design and methods for mass production and screening of aptamers such as using bacterial system which may reduce the cost to 1-2 US dollar per test.
In view of the foregoing, the present invention provides RNA-based compositions, methods and devices that are capable of rapid and accurate detection of influenza viruses outside of laboratory setting, thereby providing a more convenient and affordable testing. With this tool, the pressure of healthcare system during epidemic seasons is not only expected to be reduced, but also able to facilitate large-scale screening, as aptamers are much easier and thus cheaper to produce compared to antibody and rt-PCR.
The present invention provides compositions, methods, devices and systems for detecting target gene sequences using fluorescent RNA aptamer probes. The present invention may be used for detecting influenza viruses but can be adapted for detection of other types of pathogenic organisms or other genetic material.
In one embodiment, the present invention provides RNA aptamer probes which specifically bind to gene sequences of a certain type or subtype of influenza virus and, in the presence of certain fluorogens, produce detectable fluorescent signals upon binding to the target gene sequences. In some embodiments, the RNA aptamer probes emit no or negligible fluorescence in the absence of their respective target RNA sequences. Upon binding to their respective target RNA sequences, the RNA aptamer probes change their confirmation which enables them to interact with a fluorogen in a way that induces fluorescence or leads to an increase in intensity of the fluorescence produced by the complex. It is to be understood that when RNA aptamer probes are described herein as fluorescing, emitting fluorescent light or fluorescence, the RNA aptamer probe refers to the RNA aptamer in complex with the fluorogen.
In one embodiment, the present invention provides a method or system for designing RNA aptamer probes for detecting genetic materials of a particular type or subtype of influenza virus or another organism.
In some embodiments, the present invention provides a software that may be used to design RNA aptamer probes. In some embodiments, the system is equipped with a neural network that trains the processing ability of the system in differentiating between positive and negative signals.
In one embodiment, the present invention provides a device for detecting and processing light or fluorescent signals produced by the present RNA aptamer probes or other light-emitting moieties which indicate the presence of target organisms or their genetic material. In some embodiments the devices are battery operated and may be used in conjunction with mobile phone cameras for detecting the fluorescent signal.
In one embodiment, the present invention provides an integrated system for a subject self-test for of influenza virus outside of laboratory setting. In some embodiments, the present integrated system comprises one or more of: a module for collecting a sample of nasal fluid from a subject, a module for treating the collected sample with a detecting reagent comprising one or more fluorogen-bearing influenza-specific probes, a light-shielded module for taking one or more images recording light emitted from the treated sample, and a module for processing the images and outputting results indicating the presence or absence of particular type or subtype of influenza virus. In some embodiments, the present integrated system is linked with a mobile phone of the user and configured to enable the user to take images of their samples using their mobile phones and upload the images to the present integrated system for image-processing and analysis, and to receive results from the integrated system via the mobile phone.
In one embodiment, the present invention provides a system for detecting and processing light or fluorescent signals given out by the present RNA aptamer probes or other light-emitting moieties which indicate the presence of target organisms or their genetic material. In some embodiments, the system is equipped with a neural network that trains the processing ability of the system in differentiating between positive and negative signals.
Various embodiments of the present invention may be used to collect data for monitoring and control of influenza as well as data that may be used for improvement of the design of the probes representing some embodiments of the invention. In some embodiments, machine learning may be used for probe design and for data analysis.
In some embodiments, the methods, systems and devices of the present invention may be used to detect genetic material of various pathogenic organisms or other genetic material of interest.
In one embodiment, the present invention provides compositions of nucleic acids which are capable of binding to target nucleic acid sequences of a particular organism, such as influenza virus, and are capable of binding to a fluorophore molecule serving as a reporter. Fluorophore and fluorogen are used interchangeably in this description.
In one embodiment, the present invention provides compositions of RNA aptamer probes. In some embodiments, the present RNA aptamer probes comprise an aptamer structure, a sequence complementary to the target sequence and a fluorogen-binding site. In some embodiments, the present RNA aptamer probes can serve as an RNA aptamer probe which specifically binds to its target sequences (such as gene sequence of a certain type or subtype of influenza virus) upon which it is able to interact with a fluorogen and produce detectable fluorescent signals. In some embodiments, the present RNA aptamer probe produces no or negligible level of fluorescence in the absence of its respective target sequence. Upon binding to its target sequence, the probe changes conformation, which causes it to interact with a fluorogen molecule in a way that produces fluorescence or increases the level of fluorescence.
In some embodiments, the present RNA aptamer probes are modified from light-up RNA aptamers (LURAs). Light-up RNA aptamers are able to bind to fluorogens and have been developed for RNA detection (Bouhedda F, 2018). Spinach RNA aptamer and Broccoli RNA aptamer which conjugate with fluorogen DFHBI (3,5-difluoro-4-hydroxybenzylidene imidazolinone) are some of the examples of LURAs. However, the present RNA aptamer probes are not limited to those aptamers or any LURAs existing at the time of this invention. Other RNA aptamer structures which can be modified to recognize specific gene sequences and bind to fluorogens can be employed for generating the present RNA aptamer probes. By the same token, the present invention is not limited to DFHBI, other fluorogens or reporting molecules which work with the chosen aptamer structure can be used.
The Spinach aptamer, along with its structural characteristics and photophysics, is well-characterized (Bouhedda F, 2018). According to the crystal structures of Spinach and iSpinach-D5 aptamers, the Spinach aptamer generally consists of two arms, P1 and P2, surrounding a G-quadruplex containing docking site of its fluorogen, DFHBI. While the docking site is indispensable for the formation of the Spinach-DFHBI complex, the lengths of the P2 arm have been shown to be less important by previous mutagenic studies to shorten the arms. By contrast, it was found that the P1 arm length has a dramatic effect on the fluorescence level and a single-base deletion can lead to the complete loss of fluorescence in E. coli.
The present invention provides modular light-up RNA aptamers targeting influenza RNA. In one embodiment, the RNA aptamer is obtained by adding 11 base pair sequences complimentary to specific target viral RNA sequences to each side of the P1-truncated Spinach aptamer. The aptamer is modified by deleting one base pair at its stem, which functions as a stabilizer of the fluorescence-activating G-quadruplex structure. The modified Spinach aptamer has a misfolded or unfolded conformation when it is not bound to the target influenza RNA, and will change to a correct conformation when hybridizes to the RNA (
There were no known algorithms for predicting the binding of RNA to DFHBI and the resulting fluorescence level at the time of this invention. According to previous data, shortening of the P2 arm did not lead to a significant change in the fluorescence level of the Spinach aptamer (Ong, 2017). Thus, two presumptions were made in the present probe design process: (1) the formation of the DFHBI docking site is dependent on the correct folding of the P1 arm and (2) the P1 arm folding is optimized when the hybridization of the variable regions is the most favorable. Based on these two assumptions, RNA aptamer probes containing sequences complementary to a total of 22-bp gene sequences of influenza virus A were designed using Invitrogen BLOCK-iT siRNA Designer, which can find the region of RNA with the least amount of secondary structures, as well as human genome BLAST (
Table I lists genes of influenza virus A and their accession numbers for design of RNA aptamer probes representing some embodiments of the present invention. The hemagglutinin genes (H1, H3 and H7) and neuraminidase genes (N1, N2 and N9) were selected for influenza subtyping, while the region of Polymerase Basic 2 gene (PB2) that is ubiquitous in most influenza A genomes was chosen for influenza detection. After inputting the selected sequences into BLOCK-iT designer, candidate sequences with GC content around ˜50% were chosen and a 22-bp region of each of the chosen candidate sequences was randomly selected for probe design (
As shown in
A total of 27 RNA aptamer probes were designed. All probes have the P1 and P2 arms and the docking site sequences as shown in Table 2 (refer also to
In vitro transcription kits were used to produce RNA probes and their target RNAs for assays. Example 1 and
In order to investigate the effectiveness of the designed aptamers, their refolding ability upon binding to the target RNA sequences was tested according to the procedures described in Example 2.
On/off ratio (i.e. the ratio of fluorescent signal produced in the presence of the target sequence to the fluorescent signal produced when target sequence is absent) is indicative of the ability of aptamer probe to detect its respective RNA target. If the intensity of fluorescence obtained from the aptamer-target RNA pair increases in a statistically significant manner as compared to the signals obtained from the aptamer alone (i.e., a statistically significant on/off ratio), the aptamer candidate are selected for further investigation. The results in
Though the present aptamers were designed to target hemagglutinin or neuraminidase genes of specific influenza subtypes, unwanted binding between the aptamers specific to a particular subtype and sequences from non-target subtype(s) may occur. Five of the tested aptamers (i.e. for N9, N2, H7, H3, PB2) that performed well in the above mentioned refolding assay were selected to investigate their cross-reactivity. Using the procedures for refolding assay described in Example 2, the aptamers were mixed with -their target or non-target sequences and the resulting fluorescence were measured. An aptamer is regarded to be specific if it gives statistically significant on-off signal in response to its target RNA but not non-target RNA.
It is important to ascertain the minimum amount of target viral RNA needed to distinguish between the fluorescent signals from negative and positive samples under the blue light box by naked eye, in order to understand at which stage of influenza latency viral RNA could be detected by visual examination. Generally, the detection limit of an aptamer probe depends on the level of background noise generated by the aptamer.
Example 3 describes the procedures for determining the minimum amount of target RNA required to obtain a visually distinguishable difference between positive and negative signals using two aptamer probes, N2-694 and N9-545, which represent a probe with a lower sensitivity and a probe with more background fluorescence respectively. Limit of detection can be determined by visual examination by naked eye which is less accurate, or by taking the value of the minimum amount of target RNA required to generate a signal that is larger than the signal generated by the aptamer only (i.e. negative signal) plus 3 standard deviations (the threshold value). In the upper panel of
Visually, for N2-694 probe, more than 0.2 μM of target RNA was needed to visualize the difference, while about 0.2-0.5 μM of target RNA was required for N9-545 probe (See lower panels in
As in some embodiments of the invention, the test sample is nasal fluid obtained from individual subjects, performance of the RNA aptamers in the presence of nasal fluid was also evaluated to fully assess the detecting ability of the present RNA aptamer probes. In particular, performance of RNA aptamers in various ionic conditions (sodium, potassium, calcium and magnesium ions) in nasal fluid was tested as described in Example 4. The results are shown in
As an example, N2-694 and N9-545 aptamers were tested. Different concentration of sodium, potassium, calcium and magnesium ions were added to the aptamer folding reaction mixture of N2-694 and N9-545 probes, mimicking the addition of nasal fluid to the freeze-dried aptamer kit by household users. Results obtained indicated that the two aptamer probes behaved similarly at different ion concentrations. In the range of target ionic concentrations (i.e., the ionic concentrations after addition of the nasal fluid which mimics the real situation in which the present invention is used; namely, 138-139 mM of sodium ion, 131-140 mM of potassium ion, 1-1.85 mM of calcium ion and 5.47-5.17 mM of magnesium ion, see Tables 9 and 10), the two probes performed well in elevated concentrations of sodium, potassium and magnesium ions, but an increase in calcium ion concentration resulted in a decrease of fluorescence signal of both probes. Both probes were able to give good on/off ratios in the range of target ionic concentrations and hence are suitable candidate for the purpose of on-site detection of influenza virus A.
Overall, the results indicated that the present aptamer system is not adversely affected by sodium, potassium and magnesium ions naturally present in the nasal fluid and is slightly affected by the elevated concentration of calcium ion. It is likely that the present aptamers would perform satisfactorily in term of detection of target RNA molecules when real samples instead of folding assay buffer are used.
Real-time PCT system was used to monitor the change in fluorescent signals and the time required for signal development.
Using N9-694 and N2-545 as the candidate probes, it was shown that the probe-target pairs required about 10 minutes of cooling to give a detectable signal, and the temperature at that point was around 75° C. (
After refolding is completed, melting curve (dissociation) analysis of the two pairs of probe-target was performed with the same real-time PCR system (
As mentioned above, the aptamers being tested in the present invention were designed from randomly selected sequences. The present invention further provides methods for rational design of RNA aptamer probes which may allow to design more effective RNA aptamers (e.g. aptamer with a higher ON/OFF ratio). An automated system for rational design of aptamers can also be built to enable a high-throughput design (i.e., design of a large number of aptamer probes specific to various target sequences quickly).
By comparing candidate aptamers which gave good and poor performance in the preceding studies, some methods of the present invention identify parameter(s) which may be adjusted and optimized to achieve a higher on/off ratio.
RNA aptamer probes which do not fluoresce in the absence of the target sequences but fluoresce upon binding to their target sequences and interacting with a fluorogen are desirable for the present purpose. That is, the RNA aptamer should have no or low fluorescence when not bound not its target sequence (also referred to herein as autofluorescence) and a high target-induced fluorescence. It is presumed that an RNA aptamer will have a low autofluorescence and a high induced fluorescence if:
Hence, for the purpose of rational design, the following data obtained from the preceding experiments were compared to evaluate the degree of auto-fluorescence and target-induced fluorescence of various aptamers:
Generally, aptamers are designed using the method known as Systematic evolution of ligands by exponential enrichment (SELEX). However, SELEX is not cost-efficient, has a long development cycle and may not provide the optimal design. Currently, there is no known software that can be used to design RNA aptamers directly.
One embodiment of the present invention provides a method to screen and evaluate aptamers. It can serve as a tool to optimally design RNA aptamers.
One embodiment of the present invention provides a software implementing the method of screening and evaluating aptamers. The output of this software is cross validated with the results of experimental tests of the designed aptamers. Parameters used for designing the aptamers may be continuously fine-tuned based on experimental results by using regression analysis. Thus, as this system is used and more experimental data is added into the system, prediction of the optimal aptamer design by the software will become more and more accurate. Steps of this method are schematically depicted in Figures I1 and 12.
The screening and evaluation module and the database system are the two core components of the software. The screening and evaluation module evaluates candidate aptamer designs based on several factors. In one embodiment of the present invention, an on/off ratio is used as a measure of performance of aptamer probes. The on/off ratio is the ratio of fluorescent signal produced by a probe and a fluorogen in the presence of its target sequence to fluorescent signal produced when its target sequence is absent. Application of the method to design of miniSpinach aptamer probes targeting influenza virus RNA is described below as an example. The algorithm of the present method, or similar algorithms, can be applied to other types of aptamers and targeting sequences. Multiple linear regression analysis is used to model the relationship between the selected parameters and the performance of aptamer probes. Other types of regression analysis, such as polynomial regression, logarithmic regression and others may be used in various embodiments of the present invention.
This module screens and evaluates candidate aptamer design. After a .fasta file containing the viral RNA sequence and a range of window sizes is entered into the software, a window slides from the first position to the end of the whole virus sequence as illustrated in
This part elucidated the correlation between the degree of destabilization of the truncated miniSpinach and the degree of auto-fluorescence of the destabilized aptamer in the presence of a fluorogen. In particular, the correlation between probability of binding between certain base pairs of the aptamer which may be responsible for stabilizing the aptamer structure and fluorescence obtained from the aptamer alone were evaluated. The correlation, if robustly established, may be used to determine whether an aptamer design likely gives rise to auto-fluorescence and thus is not suitable for making the present RNA aptamer probes.
Binding probability between certain pairs in an aptamer is an important indicator of whether the truncated miniSpanich is destabilized. Candidate aptamers are derived from a truncated miniSpanich (P1-a4-b5), which is produced by removing one base pair in the stem of the original fluorescing miniSpanich (P1-a5-b5) (described in Ong, 2017) reducing the number of base pairs in the stem from 5 to 4. Therefore, it is expected that a “well-destabilized” aptamer probe (one that does not autofluoresce) is less likely to have strong interactions between the remaining 4 base pairs in stem a, i.e., interactions between nucleotides 14-62, 15-61, 16-60 and 17-59 in a candidate aptamer probe. A scoring equation is determined by plotting the experimentally determined mean fluorescent count of the aptamers in the absence of target (N=17) against the binding probability between nucleotides 14-62, 15-61, 16-60 and 17-59 calculated by CentroidFold [2] and by multiple linear regression.
The best fit (largest R2) scoring equation for this dataset is:
Mean fluorescent Count=Constant+Score A+Error,
A high Score A suggests the aptamer design is more likely to be auto-fluorescing.
As
Having a destabilized miniSpinach with no or low auto-fluorescence is insufficient as the destabilized miniSpinach is not necessarily inducible by the target RNA and hence may not exhibit target-induced fluorescence. Therefore, it is desirable to have another score for identifying aptamer designs that are more likely to be inducible by their target RNA through the analysis of the on/off ratio obtained as described in Example 2.
The effect of free energy on the performance of aptamer (as measured by the on/off ratio) is well supported by experimental data. Assuming that the formation of heterodimer between the aptamer and target is in equilibrium, and considering that the free energy (Delta G value) is related to thermodynamic stability, the following factors are selected for the regression analysis:
The ON/OFF ratio is plotted against all factors above, and the best fit (largest R2) equation is then determined using multiple linear regression. Frequency of the MFE structure in the aptamer-target heterodimer included in the analysis may be related to the structural stability of the aptamer-target heterodimer and the structural dynamics of the assembled heterodimer.
The best fit equation for the data set studied (15 probes) is found to be:
On/Off Ratio=Constant+Score I+Error,
where Score I is the sum of linear terms in the regression analysis, given by
Score I is an indicator of the probability of formation of aptamer-target heterodimer. Higher Score I indicates higher probability of formation of aptamer-target heterodimer.
This part elucidated the correlation between binding affinity between the destabilized miniSpinach and its target RNA and fluorescence of the aptamer-target pair. This correlation, if robustly established, may be used to determine whether a destabilized miniSpinach can be re-stabilized by target RNA thereby giving rise to target-induced fluorescence and hence is suitable for the making the present RNA aptamer probes.
As
Since the regression only considered variables included in the equation which only concern the binding between the RNA molecules but not the docking of DFHBI, it is not surprising that the R2 value is relatively low. While in the “turn-on” event, only representative variables for molecular dynamics between aptamer and target, as well as representative variables for structural dynamics (frequency of the MFE structure in the complex) were included, there is no representative variables for the molecular dynamics that account for the binding event of the heterodimer with DFHBI due to difficulties in predicting the interaction between an RNA and a small molecule (which is not an RNA).
In sum, although the R2 values for both scoring methods are not high, they are still far from random. Therefore, it is reasonable to conclude that the two scoring models have the potential to assist a rational design for miniSpinach aptamer that is more likely to give a significant on/off signal inducible by its target RNA sequence, and also facilitate an automated, high-throughput screening of aptamer designs for targeting short or long target RNA sequences.
Secondary structures play a key role in determining the expected performance of an aptamer. In the present method, secondary structures of candidate aptamers are predicted using the Vienna RNA python package. After prediction, two outcomes are considered:
The ratio of different base pairs seems to influence the binding stability of RNA.
The melting temperature of RNA refers to the temperature at which it is in single strand. Since the RNA aptamer has to be in single strand in order to interact with the target sequence and DFHBI, the melting temperature is also critical to the performance of RNA aptamer probes.
Experimental performance data is generated for aptamer probes designed according to the present method and may be entered into the database system. Database may contain the following information:
Scores.
This is the output of the evaluation function, e.g. Score I and Score A.
Autofluorescence level and fluorescence level in the presence of the target sequence.
This is the direct reading of fluorescence level before and after adding target sequences. The absolute fluorescence value may vary in different settings, depending, for example, on the measurement setting and equipment used.
Fold change of fluorescence level.
The measured absolute fluorescence value may vary in different settings, depending, for example, on the measurement settings and equipment used. Thus, fold-change of fluorescent level may be used as a meaningful parameter. Fold change is a ratio of the fluorescent level in the presence of the target sequence to the fluorescent level in the absence of the target sequence.
Optimal detection environment.
The optimal detection environment may include such factors as ion composition of the sample, temperature, and concentrations of various sample components. Some embodiments of the present invention may be able to predict the optimal detection environment based on the experimental data.
The performance of the present methods may be evaluated by experimentally testing various aptamers designed using the method. The experimental data may be used to further refine and improve the methods.
In some embodiments, the screening step can comprise the steps of:
In one embodiment, the present invention provides a method for designing a sequence of an RNA aptamer capable of binding to a target nucleic acid, the RNA aptamer comprises a G-quadruplex structure that is capable of binding to a fluorogen. In one embodiment, the method comprises:
In one embodiment, the present method or system for designing a sequence of a RNA aptamer capable of binding to a target nucleic acid is implemented in combination of other methods or systems such as those available in the Vienna RNA secondary structure server (L. Ivo, 2003.)
In one embodiment, the present invention provides a method and system for producing RNA aptamers using bacterial expression system.
Bacterial expression systems are generally less costly than in vitro cell-free transcription kits, thus the present RNA aptamer probes and tests can be made more affordable to the public if the probes can be massively produced by a bacterial expression system. Research and development costs can also be reduced since the processes of screening, characterization and optimization usually require a considerable amount of probes and targets.
To explore the possibility of producing and screening RNA aptamer probes using bacterial system, RNA aptamer probes and their RNA targets were co-transformed and their interaction was evaluated by a whole-cell assay described in Example 5. The expected on/off ratio as observed in the in vitro cell-free refolding experiments was not observed in the whole cell assay (
To verify, total RNA was extracted from the E. coli obtained after the whole-cell assay and an amount of RNA equivalent to the amount of RNA extracted from the same number of cells as was used in the whole cell assay was tested for its fluorescence level. Surprisingly, a 7-fold recovery of the fluorescence level of the positive control (miniSpinach) was observed in total RNA, while no recovery was observed in the probe-target pairs (
In one embodiment, the present invention provides a battery-operated and mobile-phone-based device for detecting and processing light or fluorescent signals given out by the present RNA aptamer probes or other light-emitting moieties which indicate the presence of target organisms or their genetic materials.
At the time of this invention, there is no comparable mobile-phone and light based device for detection of influenza viruses. Current medical devices for influenza detection are costly and usually require expertise to operate, hence they are not convenient to use by the general public and the fees charged for clinical tests are high. As compared to currently available devices, the present device has an improved light path design which is accomplished by changing the light path by 90-degree to reduce background noises from excitation light rays and using a convex lens to convert the excitation light rays to parallel rays to avoid capturing undesirable light by the mobile camera. This lens may be referred to herein as conversion lens.
Apart from detection based on RNA aptamer probes (RAPID), some embodiments of the present invention can be used for color detection, fluorescent detection using other types of probes and emissive light detection. This may be achieved by changing the light source in the fluorometer. Some embodiments of the present invention, have multiple light sources built into the hardware allowing user to select the desired light source.
In some embodiments, the present invention provides a fluorometer. The various embodiments of the fluorometer are also referred to herein as Tracer. In some embodiments, Tracer is a battery-operated mobile-phone-based fluorometer which can not only record the intensity, but also the distribution and color pattern of the fluorescent signal. In some embodiments, Tracer is made up of a black housing made of polylactic acid (PLA), light emission system and optical system, as shown in
The various embodiments of Tracer are designed to measure fluorescent signal given out by the RNA aptamer probes, but are also capable of detecting light signals given by other light-emitting moieties. In some embodiments, the power of Tracer is provided by replaceable battery cell.
In some embodiments, the housing is a black shell made of polylactic acid (PLA) and designed to optimize the measuring environment so that the light outside Tracer does not influence the measuring results. In some embodiments, the light-emitting diode emits visible blue light with 450 nm central wavelength, and is adjusted to parallel through a plane mirror and a convex lens. In some embodiments, the power supply system produces stable 700 mA current so that the light-emitting diode can work with a power of 5 Watt. In some embodiments, the battery box is placed on the back of Tracer so that users can replace the battery by themselves.
In some embodiments, when Tracer is switched on, it emits excitation light with a central wavelength of 450 nm. This central wavelength corresponds to the peak value of the absorption spectrum for DFHBI. The central wavelength used may be selected based on the properties of a particular fluorophore. Fluorescent signals can be collected by mobile phone camera. In other embodiments, excitation light with different central wavelengths may be used. The choice of the excitation light wavelength depends on which fluorophen is used (see Bouhedda, 2018).
Some embodiments of Tracer have a shell made of black PLA This prevents the light outside from penetrating the shell so that the measuring results will not be affected by the environment. In other embodiments, the shell may be made of other materials and be of different colors. Various materials that prevent the light from penetrating the shell may be used.
In some embodiments, users can adjust the movable convex lens to help the mobile phone camera to focus. This lens may be referred to herein as focusing lens.
In some embodiments, the battery box is placed on the back of Tracer so that users can replace the battery by themselves. With an internal current regulator, the working power of LED remains stable at 5 Watt regardless of the battery voltage.
In some embodiments, the present device comprises a housing for holding various components of the device. In some embodiments, the housing is a shell is made of black PLA. The Black PLA shell is a black box that holds all other components of tracer inside. The refractive index of PLA is as low as 3%, which can protect the diagnostics results from the influence of the outside environment. Moreover, with a melting point of around 160° C., PLA provides great heat stability. Also, PLA is an environmentally-friendly material as it is biodegradable.
In other embodiments, the shell may be made of other materials and be of different colors. A suitable material for the shell may be chosen based on such considerations as the materials' refractive index, melting point, light adsorption, weight, strength, durability and costs. If the refractive index is too high, it may be difficult to make the excitation light parallel. A reflective coating may be applied to the shells made of various materials to reduce or eliminate interference from the outside light.
In one embodiment, the present device comprises a light emission system for generating light signals. In some embodiments, the light emission system provides Tracer with stable blue light with central wavelength of 450 nm. This central wavelength may be selected when DFHBI is used as a fluorophore as it corresponds to the peak value of the absorption spectrum for DFHBI. It consists of three parts: LED, LED current regulator and a power supply.
In some embodiment, as illustrated in
High power LED generates great amount of heat. Thus, in some embodiments, a heat sink such as a star heat sink is used to prevent overheating. In some embodiments, a start heat sink is used.
LED current regulator can be included in some embodiments of the present device so that its performance will not be affected by the voltage of the battery. In some embodiments, two LED current regulators (AMC7135, from ADDtek, Taiwan) are used in parallel to regulate the current to 700 mA.
In one embodiment, the present device comprises an optical system for manipulating light signals.
In some embodiments, the light emitted by light-emitting diode is converted to parallel by a convex lens and is reflected by the plane mirror. The plane mirror changes the direction of the light by 90 degrees and directs it toward the sample. The light excites the fluorophore and makes it emit fluorescent light (emission light). This fluorescent light is filtered by a bandpass filter so that the signal captured by the mobile phone camera is not affected by the excitation light. The light path of the excitation light and the emission light is designed to be perpendicular in order to minimize interference. In some embodiments, the convex lens is moveable to assist in camera focusing.
In some embodiments, the divergent blue light emitted by the light-emitting diode is converted to parallel by the convex lens (
In some embodiments, a moveable convex lens may be used. In some embodiments, the moveable convex lens is LA1289-A-ML convex lens with the following dimensions: diameter 0.5 inches. ARC 350-700 nm, weight 0.05 lbs. This lens may be referred to herein as focusing lens.
Lenses of other dimensions may be used on different embodiments. Smaller size lenses allow to minimize the overall dimensions of the device.
In some embodiments, the bandpass filter is used to filter out the excitation light so that it does not cause interference with the emission light. In some embodiments, Thorlabs FB510-10 Bandpass filter is used having the following dimensions: diameter 0.5 inches.
Tracers representing some embodiments of the present invention are operated as follows. A battery is installed. Sample is put into a sample holder of Tracer and the lid is closed. To give a satisfactory performance, lid of Tracer should not be opened when the Tracer is on. The Tracer is then switched on. Mobile phone camera is aimed at the signal collection port. Signal collection port is an opening in the shell of the device through which fluorescence may be observed and an image of the sample may be taken. The moveable lens is adjusted manually until a clear image is displayed on the mobile phone screen. The pictures are then taken with the mobile phone camera. The Tracer may then be switched off and the sample taken out of the Tracer.
Example 6 describes the components of an embodiment of Tracer and its estimated production cost.
Example 7 describes some procedures that were used to evaluate the performance of Tracer. The evaluation comprises two major parts: accuracy and precision.
In one embodiment, the present invention provides a system for operating the present device and processing images obtained by the present device. In some embodiments, the system has modules performing various functions, such as, calibration, image processing and machine learning.
At the time of this invention, a software or system that is capable of processing a large number of florescent images and equipped with machine learning for producing more accurate results was lacking.
In some embodiments, the present system comprises a module for calibration so that the present device is compatible with mobile phones of different configures (see Reference 13, for example).
In some embodiments, the present system comprises a module for image processing.
Light-up aptamers provide a rapid, cheap and convenient way for on-site virus detection, which also brings possibility of self-detection method for the general public. To facilitate detection of virus by untrained public, a software representing one embodiment of the present invention may be used with mobile phone camera to detect fluorescent signal given off by light up aptamers. Though the software is currently used to detect fluorescent light, it can potentially be used to detect any color and light signal.
In some embodiments, the software includes five main parts: pre-calibration, mobile phone camera calibration module, deep neural network image processing module, diagnosis system and database system.
The software reads in the image uploaded by a user along with the mobile phone model number. Then the input image is calibrated based on the model number of the phone used to produce the image. The image is processed by the deep neural network image processing module. The diagnostic system outputs the diagnostic result based on the result of image processing and other information input by the user. If the result is positive, the user is suggested to see the doctor. After that, feedback is collected and used for training the image processing module and the diagnostic system.
In some embodiments, a convolution neural network is used.
The main steps of the image analysis of some embodiments of the present invention are described below.
Pre-calibration eliminates possible background noise when no sample is inside the device. The user is asked to take three pictures using their own mobile phone without turning on the excitation light or putting in the sample. Before the image of the sample is processed, the average of the three images the user takes is subtracted to remove background noise.
Users may use different mobile phones and color, brightness and other characteristics of the pictures may vary among different mobile phone camera. Moreover, mobile phone cameras can also introduce distortion to the images. Thus, a mobile phone camera calibration system is implemented to make sure that differences between different phone models do not influence the final diagnostic results. Since calibration takes a long time, calibration results of several popular mobile phone models are included in the database (e.g., iPhone). The calibration system is implemented based on OpenCV. Tw phantoms may be used for the calibration process.
The images taken by mobile phone camera can be distorted. In calibration of distortion, both tangential and radical distortion are considered (implementation details are described in reference 16). A 5×5 black and white phantom is used to calibrate the distortion of the camera in some embodiments of the present invention (see
Same color may be different reproduced differently by different cameras. Therefore, a color correction module is implemented. A 24-color phantom for color correction is used in some embodiments of the present invention (see
This module uses a convolutional neural network to classify the input images into positive and negative classes. The input images will be convoluted by some convolution layers and pooled by pooling layers. At the end, the images will be classified by fully connected layers. The accuracy of the test data is at 83.3%. This module may be performed on local computers and trained on datasets locally obtained. It may also be performed on remote servers and/or utilizing cloud computing The module may be trained on locally generated datasets or on datasets generated at various locations and by various users and pooled together.
In another embodiment, the deep neural network image processing module employs a 3-layers convolutional neural network to classify the input images taken with the previously-described hardware into positive and negative classes. The input images will be convoluted by 3 convolution layers and be pooled by pooling layers. At the end, the images will be classified by fully connected layers. Each input image will be firstly resized with 128 in width and 128 in height before being fed into the neural network. The filters of the first convolution layer is 32 which means the output channel of this layer is 32 and the kernel size is 3*3 with stride 1. Following a batch normalization function, a relu function is employed as the non-linear activation function of the first layer. The following tow convolution layers are similar weight the first one except the filter size. The filter of the second convolution layer is 64 and that of the third convolution layer is 128. A 2*2 max pooling layer is attached after each convolution layer. Then, two fully connected layers are employed. The output dimension of the first fully connected layer is 64 and that of the last layer is one to indicate whether the image is positive or negative. The first fully connected layer employs a relu function as the activator while the second one employs sigmoid. Since the outcome is binary, a binary cross entropy loss function is applied.
The symptoms and basic information of the user is also crucial to the diagnosis of influenza. To make the model available to users, a diagnostic system may be implemented on the website in some embodiments of the present invention. The diagnostic system may combine information collected from the user and the image processing result.
Users need to upload their images and the image will be classified by the neural network module. The result will return to users immediately. The online system contains only trained neural network model and is only used to test the user's images. The model is trained on local computers and the weights of the model will be updated routinely.
Information collected from the user may include: age; gender; geographic location; symptoms, such as body temperature, runny nose, sore throat, cough, muscle ache and other symptoms; when the symptoms start to occur; vaccination status. The types of information collected nay be adjusted based on the pathogens that are being detected using the present invention and/or disease that is being diagnosed.
The database of some embodiments of the present invention includes images with positive/negative annotation based on the experimental data, all the raw input data from the user and the user's feedback after he/she sees a doctor. The database is used to train the image processing and diagnostic system. Because geographical location data of the user may also be collected, the data may be used for disease control.
Other models may be designed to reduce the misclassification rate. The models may be trained with real clinical data. A self-calibration module may be included in some embodiments. If a user's phone model is not included in the database, the user can use the phantom inside the kit to calibrate the camera. After the user uses the phantom to calibrate the image, the calibration result may be included in the database.
Integrated System for Self-Detection of Influenza without Clinical or Laboratory Equipment
In one embodiment, the present invention provides a system adapted for self-detection of influenza virus by individual subjects without the need of any laboratory apparatuses or skills.
In one embodiment, the present invention provides an integrated system for a subject to conduct a detection of influenza virus outside of laboratory setting. In some embodiments, the present integrated system comprises one or more of: a module for collecting a sample of nasal fluid from a subject, a module for treating the collected sample with a detecting reagent comprising one or more fluorogen-bearing and influenza-specific probes, a light-shielded module for taking one or more images recording light emitted from the treated sample, and a module for processing the images and outputting results indicating the presence or absence of particular type or subtype of influenza virus. In some embodiments, the present integrated system is linked with a mobile phone of the user and is configured to enable the user to take images of their samples using their mobile phones and upload the images to the present integrated system for image-processing and analysis, and to receive results from the integrated system via the mobile phone.
In some embodiments, the present invention provides a method for self-detection of influenza virus by individual subjects without the need for any laboratory equipment or skills. In some embodiments, the method comprises:
Alternatively, the reaction can be done in a tube/cuvette instead of on a paper strip. The step may be:
In various embodiments, RNA aptamer probes may be provided embedded on a strip, freeze-dried in a tube or in other suitable form. In vitro transcription reaction mix for RNA aptamer probes may be provided instead of the RNA aptamer probes themselves. In such a case, in vitro transcription step is performed to obtain aptamer probes. Fluorogen may be added to the mix containing the patient sample and the RNA aptamer probes before or after the incubation step.
In some embodiments, image processing software is trained with positive and negative controls, such that a used does not need to also measure positive and negative control samples. In other embodiment, positive and negative control samples may be provided. Negative control may contain the same components as the sample obtained from the subjects except and a composition mimicking nasal fluid (or other biological material that may be used a sample for testing). Positive control samples may contain labeled probes and a known amount of the RNA they bind to as well as a composition mimicking nasal fluid (or other biological material that may be used a sample for testing).
In some embodiments, image processing software is trained with positive and negative controls, such that a used does not need to also measure positive and negative control samples. In other embodiment, positive and negative control samples may be provided. Negative control may contain the same components as the sample obtained from the subjects except the subject sample itself. Positive control samples may contain labeled probes and a known amount of the RNA they bind to.
In some embodiments results are uploaded through a website or a mobile phone application and the analysis may be performed on a remote server. In other embodiments, the image analysis software may be installed on the phone or a user computer itself.
The present invention can be adapted for detecting signals other than fluorescent signals. For example, light sources of various kinds can be added to the device so that user can determine which light they would like to use for various types of detections such as color detection, fluorescent detection and emissive light detection.
The present invention is applicable to samples of various kinds containing the target sequence. The sample can be a biological sample collected from the subject directly, or a sample derived from a biological sample collected from the subject. In one embodiment where the present invention is used for detection of influenza virus, applicable samples can be nasal fluid, saliva, tears or any other biological samples that contain viral genetic material.
The present invention provides a nucleic acid probe for detecting a target nucleic acid sequence. In one embodiment, nucleic acid probe of this invention comprises: (a) a fluorogen binding region comprising an aptamer sequence forming a G-quadruplex structure; (b) a first targeting sequence which interacts with a first portion of the target nucleic acid sequence; and (c) a second targeting sequence which interacts with a second portion of the target nucleic acid sequence; wherein interaction between the first targeting sequence and the first portion of said target nucleic acid sequence and interaction between the second targeting sequence and the second portion of said target nucleic acid sequence triggers conformational change of said G-quadruplex structure, which is then able to interact with a fluorogen in a way that induces fluorescence.
In one embodiment, the target sequence is a sequence present in the genome of a pathogen.
In one embodiment, the pathogen is influenza virus.
In one embodiment, the first targeting sequence comprises at least 11 nucleotides.
In one embodiment, the second targeting sequence comprises at least 11 nucleotides.
In one embodiment, the G-quadruplex structure in a stabilized form has a high binding affinity for a fluorogen than the destabilized form.
In one embodiment, the G-quadruplex structure gains stability and interacts with a fluorogen in a way that induces fluorescence when the first targeting sequence interacts with the first portion of the target nucleic acid sequence, or when the second targeting sequence interacts with the second portion of the target nucleic acid sequence, or both.
In one embodiment, the fluorogen binding region comprises the sequence of SEQ ID NO: 2 and SEQ ID NO: 4.
In one embodiment, the first targeting sequence comprises a sequence selected from the group consisting of even numbered sequences selected from the group of SEQ ID NO: 88-141.
In one embodiment, the second targeting sequence comprises a sequence selected from the group consisting of odd numbered sequences selected from the group of SEQ ID NO: 88-141.
In one embodiment, the fluorogen is 3,5-difluoro-4-hydroxybenzylidene imidazolinone (DFHBI).
In one embodiment, the binding of said probe to the target nucleic acid sequence enables said probe to interact with a fluorogen in a way that a visible fluorescent signal.
The present invention also provides a method for detecting a target nucleic acid sequence in a sample. In one embodiment, the method comprises: (1) providing a biological sample containing nucleic acids from a subject; (2) adding a nucleic acid probe and a fluorogen to said sample, wherein the nucleic acid probe comprises: a fluorogen binding region comprising an aptamer sequence forming a G-quadruplex structure; a first targeting sequence which interacts with a first portion of the target nucleic acid sequence; and a second targeting sequence which interacts with a second portion of the target nucleic acid sequence; (3) measuring fluorescence in said sample using a device capable of measuring fluorescence, wherein fluorescence indicates the presence of said target nucleic acid sequence.
In one embodiment, the target nucleic acid sequence is a nucleic acid sequence from a pathogen, and the nucleic acid probe is capable of hybridizing with said nucleic acid sequence.
In one embodiment, the pathogen is influenza virus.
In one embodiment, the device in step (3) is Tracer.
In one embodiment, the biological sample from the subject is one or more of the following: nasal fluid, saliva and tear.
In one embodiment, the target nucleic acid sequence is a nucleic acid sequence from a specific subtype of influenza virus, and the nucleic acid probe is capable of hybridizing to said target nucleic acid sequence.
The present invention further provides an imaging device configured for taking fluorescent images from a fluorescence-emitting sample using a mobile communication device. In one embodiment, the imaging device comprises (a) a housing comprising a movable opening and a signal collection port; (b) a sample holder to hold the fluorescence-emitting sample; (c) a power source; (d) a light source comprising one or more light emitting diodes and a current regulator; and (e) an optical module comprising a converging element, a focusing element and a filtering element.
In one embodiment, said device further comprises a heat exchanger.
In one embodiment, the converging element is a converging lens and the focusing element is a focusing lens that can be manually adjusted by a user.
In one embodiment, the sample is in liquid form at the time of imaging or has been deposited on a solid medium at the time of imaging.
In one embodiment, the power source is a battery.
In one embodiment, the present invention provides a method for detection of a target nucleic acid by an individual using said device, comprising the steps of: (1) providing a nucleic acid probe and a fluorogen to the sample, wherein the nucleic acid probe comprises: a fluorogen binding region comprising an aptamer sequence capable of forming a G-quadruplex structure; a first targeting sequence which interacts with a first portion of the target nucleic acid sequence; and a second targeting sequence which interacts with a second portion of the target nucleic acid sequence; (2) providing a biological sample from a subject; (3) combining the nucleic acid probe and the biological sample to obtain a test sample; (4) measuring fluorescence in the test sample using a device capable of measuring fluorescence to obtain fluorescence data and (5) determining whether the target nucleic acid is present in the sample by analyzing the fluorescence data obtained in step (4).
The present invention also provides an integrated system for detection of a target nucleic acid by an individual. In one embodiment, the integrated system comprises one or more of: a module for collecting a sample from a subject, a module for treating the collected sample with a detecting reagent comprising one or more fluorogen-bearing and influenza-specific probes, a light-shielded module for taking one or more images recording light emitted from the treated sample, and a module for processing the images and outputting results indicating the presence or absence of particular type or subtype of influenza virus.
In one embodiment, the system is used in conjunction with a mobile communication device, wherein the system is configured to enable the individual to do one or more of the following using the mobile communication device: take images recording light emitted from the treated sample, upload the images to the integrated system and receive results from the integrated system.
The present invention further provides a method for designing a sequence of an RNA aptamer capable of binding to a target nucleic acid, wherein the RNA aptamer forms a G-quadruplex structure that is capable of binding to a fluorogen, the method comprising: (a) selecting a target nucleic acid sequence; (b) generating a plurality of candidate sequences of an oligonucleotide having a hybridizing sequence substantially complementary to the target nucleic acid sequence; (c) evaluating the secondary structure of the candidate sequences and determining the likelihood of giving a fluorescence in the absence of the target nucleic acid by the candidate sequences; (d) for one or more of the candidate sequences, determining the binding probability between the candidate sequence and nucleotides involved in the formation or stabilization of the G-quadruplex structure, thereby determining the likelihood of giving a fluorescence in the absence of the target nucleic acid by the candidate sequence; (e) for one or more of the candidate sequences, determining the minimal free energy of one or more of: (i) the heterodimer of aptamer-target nucleic acid, (ii) the homodimer of aptamer, (iii) the homodimer of target nucleic acid, and the frequency of the aforementioned heterodimer and homodimer, thereby determining the likelihood of giving a fluorescence upon binding to the target nucleic acid by the candidate sequence; and (f) designing the sequence of RNA aptamer according to the results obtained in steps (c), (d) and (e).
The present invention also provides a system for designing a sequence of a RNA aptamer capable of binding to a target nucleic acid, wherein the RNA aptamer forming a G-quadruplex structure that is capable of binding to a fluorogen, comprising: (a) a sequence processing component for retrieving and processing sequence information, wherein said sequence processing component is further operable for (i) receiving a target nucleic acid sequence and selecting a portion of the target nucleic acid; and (ii) generating a plurality of candidate sequences having a hybridizing sequence complementary to the selected sequence; (b) a storage component for storing a training data set and a test data set for parameters indicative of the performance of the candidate sequences in binding to the selected sequence; (c) a structure prediction component for predicting the secondary structure of the candidate sequences and determining the likelihood of giving a fluorescence in the absence of the selected sequence by the candidate sequences; (d) a processing component comprising a machine learning component, wherein said processing component is further operable for (i) receiving data from and delivering data to said storage component; (ii) determining the binding probability between the candidate sequences and nucleotides involved in the formation or stabilization of the G-quadruplex structure, thereby determining the likelihood of giving a fluorescence in the absence of the selected sequence by the candidate sequence; (iii) determining the minimal free energy of one or more of: (i) the heterodimer of aptamer-target nucleic acid, (ii) the homodimer of aptamer, (iii) the homodimer of target nucleic acid, and the frequency of the aforementioned heterodimer and homodimer, thereby determining the likelihood of giving a fluorescence upon binding to the selected sequence by the candidate sequences; (iv) processing the data in the training data set to obtain a plurality of training data points; (v) processing the data in the test data set to obtain a plurality of test data points; (vi) testing the machine learning component using said test data points to obtain a test output; (vii) processing the test output and determining whether the test output is an optimal solution; and (viii) directing the sequence processing component to select another portion of the target nucleic acid based on the results of (vii).
In one embodiment, the module for processing the images and outputting results indicating the presence or absence of a particular type or subtype of influenza virus comprises pre-calibration module, calibration module, image processing module and diagnostic module.
In one embodiment, this invention provides a method for designing a probe for detecting a target sequence of nuclei acid in presence of a fluorogen, comprising the steps of: (a) selecting a target sequence; (b) selecting an aptamer sequence for forming a secondary structure comprising a fluorogen docking site that is destabilized; (c) generating one or more detecting sequences substantially complementary to a region on said target sequence and adding said one or more detecting sequences to an end of said aptamer sequence to form a probe sequence; (d) determining binding probability between complementary pairs of nucleotides in said probe sequence responsible for stabilizing of said fluorogen docking site; (e) obtaining value of one or more non-structural features related to said probe sequence and said target sequence; (f) Obtaining a first value indicative of probability of forming a heterodimer of said probe sequence and said target sequence from the results of (e); (g) Obtaining a second value indicative of probability of autofluorescence of said probe sequence from the results of (d); and (h) Determining if said probe sequence is a suitable probe candidate based on said first and second values.
In one embodiment, said non-structural features comprises: (a) minimal free energy of said heterodimer; (b) minimal free energy of a homodimer of said probe sequence; (c) minimal free energy of a homodimer of said target sequence; (d) value of delta G for binding of said heterodimer; and (e) frequency of minimal free energy structure of said heterodimer.
In one embodiment, said first value is obtained from the following equation:
first value=A(minimal free energy of homodimer of said probe sequence)+B (minimal free energy of homodimer of said target sequence)+C (minimal free energy of heterodimer of said probe sequence and said target sequence)+D (value of delta G for binding of heterodimer of said probe sequence and said target sequence)+E (frequency of minimal free energy structure of said heterodimer)
on/off ratio=contant+first value+error.
In one embodiment, said second value is sum of binding probabilities between complementary pairs of nucleotides in said probe sequence responsible for stabilizing of said fluorogen docking site, wherein binding probability of each of said complementary pairs of nucleotides has a specific coefficient obtained by multiple linear regression based on the following equation:
mean fluorescent count=constant+second value+error.
In one embodiment, said aptamer sequence comprises SEQ ID NO: 143.
In one embodiment, said aptamer sequence comprises SEQ ID NO: 1 and SEQ ID NO: 5 to form a P1 arm linked to said fluorogen docking site. In another embodiment, said complementary pairs of nucleotides of step (d) comprises the nucleotides 1 to 4 of SEQ ID NO: 1 being complementary to nucleotides 7 to 4 of SEQ ID NO: 5 respectively.
In one embodiment, said one or more detecting sequences comprises two detecting sequences, each linked to an end of said aptamer sequence.
In one embodiment, said one or more non-structural features of step (e) is identified by: (a) determining normalized mutual information scores for a plurality of non-structural features of said probe sequence to identify a shortlist of non-structural features; and (b) conducting principal component analysis on said shortlist of non-structural features to identify said one or more non-structural features of step (e).
In one embodiment, said target sequence is a region in the genome of a pathogen. In another embodiment, said pathogen is an RNA virus. In a further embodiment, said RNA virus is selected from the group consisting of influenza virus, SARS-CoV, Zika virus and hepatitis C virus.
In one embodiment, the method of this invention further comprises the step of experimentally validating said probe sequence of step (h) and fine tuning said first and second values of steps (f) and (g).
In one embodiment, this invention provides a probe designed based on the method of this invention.
In one embodiment, said probe sequence comprises: (a) an RNA selected from SEQ ID NOs: 143-170; or (b) an RNA obtained by DNA transcription of SEQ ID NOs: 6-32.
In one embodiment, said one or more detecting sequences comprise a sequence selected from the group of SEQ ID NOs: 88-141.
In one embodiment, said fluorogen is 3,5-difluoro-4-hydroxybenzylidene imidazolinone (DFHBI).
In one embodiment, this invention provides a probe for detecting a target sequence of nuclei acid in presence of a fluorogen, comprising: (a) an aptamer sequence comprising SEQ ID NO: 143; and (b) one or more detecting sequences comprising SEQ ID NOs: 88-141.
In one embodiment, said probe comprises: (a) an RNA obtained by DNA transcription of SEQ ID NOs: 6-32; or (b) an RNA selected from SEQ ID NOs: 144-170.
In one embodiment, said one or more detecting sequences are two detecting sequences, each linked to an end of said aptamer sequence and complimentary to a continuous region on said target sequence.
Oligonucleotides (oligos) that can hybridize to each other and be extended in a PCR reaction were designed in order to overcome the limitations in length of conventional oligo synthesis. After PCR using Phusion polymerase, the amplified DNA products were separated by gel electrophoresis and purified.
The purified DNA products were then used as a template in NEB HiScribe T7 Quick High Yield RNA Synthesis Kit for in vitro transcription. After DNasel treatment, the reaction was directly purified using 1:1 phenol:chloroform extraction, and the RNA concentration was measured with NanoDrop 2000. Molar concentration of the RNA probes and target RNA molecules were calculated based on their molecular weights according to their length.
After in vitro transcription, resulting RNA aptamers (1 μM) were mixed with its 22-bp RNA target molecules (1 μM), 2× aptamer folding buffer (20 mM Tris-HCl, 200 mM KCl, 10 mM MgCl2) and 1.5 μL of fluorophore DFHBI (200 μM)(Kikuchi N, 2016). Controls were set up by mixing the above components without RNA target molecule (aptamer only) or without RNA aptamer and RNA target molecule (blank), while the positive control was set up by mixing the aptamer folding buffer, DFHBI and untruncated miniSpinach. The mixtures were incubated in a dry bath at 90° C. for 5 minutes to allow refolding of the RNAs, and then incubated at 37° C. for 45 minutes Fluorescent signals were observed under blue light box (ChemiDoc) and the fluorescent intensity was measured by CLARIOstar plate reader at 447/501 Ex/Em.
Assay was done in triplicate and student's t-test was used for statistical analysis. Probability values (p-values) of 0.005 or less are regarded as statistically significant. The results are shown in
The aptamer refolding assay as described in Example 2 was used for specificity and sensitivity analysis.
For specificity, aptamer probe candidates (1 μM) were mixed with their target or non-target sequences (1 μM) and DFHBI, the resulting fluorescence was measured. Blank consisted of only buffer, DFHBI and nuclease free water was prepared. The data were analyzed using Two-Way ANOVA (
For sensitivity, 2 μM aptamer (N2-694 and N9-545 aptamers) was mixed with its target at different concentrations (i.e., 1.5 μM, 1 μM, 0.5 μM, 0.2 μM, 0.1 μM and 0.05 μM) and DFHBI, and the resulting fluorescence was measured. Blank consisted of only buffer, DFHBI and nuclease free water. Detection limit of the aptamer probes was determined by fluorescent signals measured by CLARIOstar plate reader with Ex/Em 447/501 and visually by photos taken by ChemiDoc Imager under SYBR Green mode with Blue Trans Light Excitation (
Folding of the present RNA aptamer probes in the presence of sodium ion, potassium ion, calcium ion or magnesium ion in different concentrations were tested.
Modified from aptamer refolding assay described in Example 2, ions were added to the aptamer folding buffer in the form of salt solution. Table 9 lists the concentration of different ions in nasal fluid and in aptamer folding buffer. Table 10 lists the range of concentrations of each type of ion in the final reaction mixtures.
Bio-Rad Real-time PCT system was used to monitor the change in fluorescent signals and the time required for signal development.
Reaction mixtures in a 96-well plate were set up as follows: aptamer probe (1 μM), target RNA (1 μM), 30 μL of 2× folding buffer, 1.5 μL of DFHBI (200 μM) and nuclease-free water to make a final volume of 50 μL.
Aptamer control was prepared similarly by replacing the target RNA by nuclease-free water and untruncated miniSpinach positive control.
The thermocycler was set up as follows:
The reaction mixtures were first heated at 95° C. for 5 minutes, then allowed to be cooled slowly to 25° C.
For melting curve (dissociation) analysis, reaction mixtures which underwent refolding as described in the preceding paragraph were heated from 4° C. to 95° C. over the course of one hour.
RNA aptamer probes and their RNA targets were co-expressed using the Novagen Duet vector system in E. coli using standard procedures for co-expression of recombinant proteins in E. coli.
pRSFDuet-1 vector was modified to generate a T7 promoter-based aptamer expression system, and target genes were cloned into the multiple cloning site of pACYCDuet.
After IPTG induction in co-transformed BL21 Star (DE3) (provided by http://2018.igem.org/Team:Hong_Kong-CUHK/Collaborations NUS Singapore-A team), the cells were collected and resuspended in a medium containing 200 μM DFHBI, fluorescence was measured by BMG CLARIOStar microplate reader (
MiniSpinach was used as a positive control.
This example describes the components of a device representing some embodiments of the presence invention and its estimated production cost.
The following components were used:
Table 11 provides estimated costs of manufacture. The prices are shown as of September 2019 and assuming that 100 filter are purchased.
This example describes the procedures for evaluating the performance of Tracer of example 6. The evaluation comprises two major parts: accuracy and precision.
Plate reader routinely used in laboratories was used as a comparison device for Tracer. The mobile phone used to collect signal was iPhone 6S. E. coli cells expressing GFP were lysed and serial dilutions of green fluorescent protein (GFP) solution were prepared without further GFP purification and used as test samples. GFP solutions with relative concentrations of 0.00001 to 0.5 were prepared. The samples to be measured by Tracer were transferred to sample tubes, while samples to be measured by the microplate reader were transfer into microplates (tubes from Gene Company LTD, part #23140 were used in this experiment). Samples were put one-by-one, into Tracer for detection. Images were collected using the mobile phone. The obtained images were analyzed using matlab as follows: the detected light was first outlined on the image, the image was converted into greyscale and the average relative light intensity was calculated. For comparison, fluorescent signals of each sample were measured using the plate reader. Each sample was measured three times and the measurements averaged. Results are shown in Table 12.
As can be seen from the graph in
GFP solutions with concentration very close to each other were prepared. The range of fluorescent intensity level of the GFP solutions was made comparable with that of the present RNA aptamer probe. The measurements and analysis was performed as described above for the accuracy determination.
As can be seen from the overall trend of the graphs in
In this example, a rational way of designing RNA aptamer is proposed. Based on observation of experimental data and literature review, 20 non-structural parameters and 3 structural parameters that can potentially affect the performance of Spinach aptamer are identified. Then feature selection based on mutual information and dimension reduction by principal component analysis is used to analyze the 20 non-structural parameters. Finally, multivariate linear regression is performed on the experimental data to generate a scoring function that can be used to predict the performance (fold change) of aptamers. Using the scoring function and other structural information, a program that can screen and select RNA aptamer designs is designed.
The aptamers are designed to target a 22-bp subsequence of influenza virus. Spinach is a sequence of RNA that can bind with 3,5-difluoro-4-hydroxybenzylidene imidazolinone (DFHBI) and give out green florescent light.
According to its crystal structures, Spinach consists of two arms, P1 and P2, surrounding a docking site of DFHBI form by a G-quadruplex. After truncating the P1 arm of Spinach, a destabilized form of Spinach is obtained. Then a 11-bp arm is added to each side of the truncated sequence. Note that the two 11-bp sequences added are complimentary to the 22-bp subsequence being targeted. After modification, Spinach aptamer will only fold correctly and light up when hybridized to the target RNA (influenza RNA) (Ong, 2017).
In this example, 2 pairs of aptamer and target are generated:
Oligo synthesis is used for both RNA aptamers and target sequence. The products are purified by HPLC purification, and concentrations were calculated by nanodrop.
1 uM aptamer, 1 uM target and 1.5 ul 200 uM DFHBI are mixed in buffer (20 mM Tris-HCl, 200 mM KCl, 10 mM MgCl2, PH=7.5), then refolded in 90 degree Celsius for 5 minutes, incubated in 37 degree Celsius for 45 minutes (2018 iGem team).
Influenza virus RNA has a total length of around 14,000 nucleotides (Duesberg, 1968), while aptamer can only detect a short strand (usually 20-80 bp). Therefore, it is impossible to detect the whole RNA strand. A proposed solution is to target only 22-bp subsequence of influenza. Since the performance of aptamer change when the target subsequence is changed, how to choose the target subsequence rationally becomes an important question. This part will discuss how a model can be built and the performance of an aptamer design be predicted based on its sequence.
26 pairs of target and probe sequences are designed, synthesized, and tested for data analysis. Note that since the data size is very small, some special techniques such as dimension reduction and cross-validation will be introduced in later sections to avoid overfitting.
Through observation and literature review, some features that can potentially affect fold change are listed below. These features are divided into two main types, structural and non-structural. Non-structural features are calculated from the sequences of targets and probes. While structural feature refers to secondary structure predicted by calculating minimum free energy and binding probability. Non-structural features can be easily quantified. Therefore, regression is used to model them; Structural features are hard to quantify; thus, they are mainly used to reject unpromising designs.
Non-Structural Features:
The performance of Spinach aptamers is quantified by fold change. Fold change indicates how many times fluorescent level changes before and after target are added to Spinach aptamer solution.
Non-structural features are classified into 4 types: nucleobase percentage, melting temperature, minimum free energy, and delta G value. In this section, the relationship between quantified non-structural features and fold change will be found.
1. Nucleobase Percentage
Nucleobase percentages refer to the ratio of A, G, C, and U in target and probe. Since A-U forms a double hydrogen bond and C-G forms a triple bond. The ratio of C-G pairs may affect the thermal stability of the probe-target heterodimer.
2. Melting Temperature
The melting temperature is the temperature at which 50% of the base pairs have been broken. It gives information on when and how RNA strands hybridize and is therefore a potential parameter that can affect the binding of the probe and target. The analysis take into consideration both target MFE and probe MFE.
3. Minimum Free Energy
One RNA sequence can have different secondary structures under the same condition, and structure with minimum free energy (MFE) is the most thermodynamically stable one. The MFE of target, probe, target-probe heterodimer, target-target homodimer, and probe-probe homodimer are considered.
4. Delta G
Delta G value is closely related to the thermodynamic stability of the reaction product. Delta G for Heterodimer binding (P-T), Delta G for Homodimer binding (P-P), and Delta G for Homodimer binding (T-T) are all considered.
Structural Features:
There are two types of errors in aptamer design. Type one error, also known as a false negative, indicates that the probe-aptamer heterodimer does not fold correctly and cannot bind with DFHBI. Type two error refers to false positive, meaning that florescent is detected when there is no target presenting. Type two error is caused by the misfolding of one or multiple probe strings.
Secondary structures of candidate aptamer can be predicted by calculating the minimum free energy using the Vienna RNA python package (Hofacker, 2003). If the predicted secondary structure of a probe monomer or a probe-probe homodimer indicates the formation of the G-quadruplex docking site of DFHBI, the aptamer design is likely to have high false positive error (light-up without target). In the program, such aptamer designs will be discarded. Similarly, if the MFE structure of probe-target heterodimer does not form a binding structure, the design is predicted to have high false negative rate (does not give the signal when target is presenting) and will also be discarded.
To see whether each parameter can indeed indicate the value of fold change, normalized mutual information is calculated for each non-structural feature. This step is mainly used to filter out less relevant features.
Principal component analysis is a common technique to reduce the dimension of feature space (Wold, 1987). When performing PCA, the information threshold is set to 0.95, which indicates that after dimension reduction, at least 95% of the original information should be preserved. By PCA, the dimension of feature space is reduced from 19 to 5.
Linear regression is performed on the five-dimensional data obtained from PCA in the above step. Note that in this case, there are only 26 sets of data available. Since linear regression is simple regression function with less regression coefficients, using linear regression can reduce the Possibility of overfitting.
Intercept: 3.1860075866538455 coefficient: [−0.03175369, −0.27855286, −0.03642713, 0.08569549, 0.45963195, −0.34116744]
To evaluate the regression model, the standard approach is to split the data into the training set and testing set, usually with the ratio of 6:4. However, in this case, there are only have 26 sets of data, so simply splitting the data into training and testing sets may not be able to tell whether the model is overfitted. Therefore, a technique called cross-validation is used. Each time 15 data points are randomly picked as the training set, and the rest 11 as the testing set. The process is repeated for 10 times, and the error of the model is calculated by averaging the results of the 10 trials. As can be seen from the error scores of this model, the score varies a lot with different divisions of training and testing set, which indicates overfitting due to small data size.
−36.86732892, −2.31933821, −11.36632332, 65.89540659, −1.81236746, 0.42632973, −38.64697631, 0.84348793, −168.59368642, −3.72672957
The structural and non-structural parameters can be calculated from a python package called Vienna RNA. The Sciki-leam package is used for data analysis (Pedregosa, 2011).
In the data analysis part of the last section, a regression function is calculated to predict the performance of a specific aptamer. The software introduced will make use of the structural information and scoring function mentioned above to help screen and select the aptamer with the best performance in prediction.
The input of the software is a .fasta file containing the viral RNA sequence. A screening window slides from the first position to the end of the whole sequence. In each position, the sequence inside the window is selected as a candidate for aptamer design and is evaluated based on its MFE structure and the scoring function discussed previously. Each time the window moves by one nucleotide. After reaching the endpoint of the sequence, the program output aptamer sequences with scores higher than the predefined threshold.
Traditionally, RNA aptamers are generated using the SELEX procedure, which takes a long time and has no guarantee of generating the optimum design. In this example, a rational way of designing RNA aptamer is proposed. Based on pairs of Spinach aptamers and targets, several structural and non-structural parameters that can potentially affect the performance of aptamer are identified. Structural features can help detect un-promising design, while non-structural ones can be used to derive a regression model. For non-structural features, feature selection based on mutual information is used to filter out irrelevant parameters, and principal component analysis to reduce data dimension. Multivariate linear regression is then performed in the reduced dimension to generate a scoring function that can be used to predict the performance (fold change) of aptamers. This model is used along with structural features in the designing software to help screen and select aptamer designs. As more data points are obtained, the prediction of the software will become more accurate. Besides Spinach-DHFBI and influenza virus, the same algorithm can potentially be applied to other fluorogens and other target RNA sequences.
Number | Date | Country | |
---|---|---|---|
62911901 | Oct 2019 | US |