This application claims priority to Chinese application number 201811549079.8, filed Dec. 18, 2018. The above-mentioned patent application is incorporated herein by reference in its entirety.
The present invention relates to the field of molecular genetics techniques, and in particular, to a method for identifying plant lncRNA.
Long non-coding RNA (“lncRNA”) refers to a class of regulatory transcripts that have no protein-coding function and are greater than 200 nt in length. Researches indicate that the lncRNA can regulate the expression of genes at multiple levels, thus affecting the growth and development of plants, such as rice pollen fertility and Arabidopsis photomorphogensis. Plant growth is a complex process, which is regulated by multiple genes at multi-level, and the interactions between various genetic factors are more diverse. At present, the mechanisms of action of the lncRNA are still unclear. The study about the interaction between the lncRNA and the gene is mainly based on the principle of complementary base pairing, and the lncRNA could regulate the gene expression by interacting with its target gene in cis or trans at transcriptional, post-transcriptional, and epigenetic level.
At present, the interactions between the plant lncRNAs and their target genes only consider the sequence similarity between two transcripts, which would cause the false positive results for identification of the interactions between lncRNA and target gene. Moreover, the prediction mode is relatively simple and cannot accurately detect a functional gene that is interacted with the lncRNA. Therefore, the prior art lacks a method for accurately identification of plant lncRNA and target gene interaction.
Thus, it is desirable to provide a method for identification of plant lncRNA and target gene interaction, to address these and other deficiencies of the current art.
To achieve the above purposes and overcome the technical defects in the art, the method is provided that can accurately identify the interaction relationship between a plant lncRNA and a gene.
In one embodiment, a method for identifying plant lncRNA and target gene interaction includes the following steps: (1) obtaining population SNP genotype data of a plant candidate lncRNA and a plant candidate gene; (2) obtaining population expression abundance data of the plant candidate gene in the tested tissue; (3) performing phenotypic measurement on a tested trait to obtain the population phenotypic data; (4) performing association analysis using the population SNP genotype data in step (1) and the target trait population phenotypic data in step (3) to determine SNP loci significantly associated with the plant target trait; the determining condition including: the SNP loci significantly associated with the plant target trait simultaneously include the SNP loci in the plant candidate lncRNA and the SNP loci in the plant candidate gene; (5) performing association mapping analysis using the population SNP genotype data in step (1) and the population expression quantity data in step (2) to determine the SNP loci significantly associated with the expression level of the plant candidate gene; the determining condition including: the SNP loci within the plant candidate lncRNA are significantly associated with the expression level of the candidate gene; (6) calculating the correlation coefficient r between the population expression level data in step (2) and the target trait population phenotypic data in step (3) to determine the correlation therebetween; the determining condition including: the correlation coefficient r>0.5 or r<−0.5; the formula for calculating the correlation coefficient r being as follows:
where X is the expression quantity data of the plant candidate gene in the detected tissue, and Y is the target trait population phenotypic data; and (7) when the determining conditions in steps (4) through (6) are satisfied simultaneously, indicating that the plant candidate lncRNA and the plant candidate gene have an interaction relationship, and together affect the phenotypic variation of the plant target trait.
In one embodiment, the plant candidate lncRNA and the plant candidate gene in step (1) are expressed in the same tissue of a plant.
In another embodiment, the population SNP genotype data in step (1) is obtained based on plant whole genome re-sequencing data.
In a further embodiment, the frequency of the population SNP genotype of the plant candidate lncRNA and the plant candidate gene in step (1) is greater than 10%.
In yet another embodiment, software used for the association analysis in step (4) and step (5) is TASSEL v5.0.
In one embodiment, a model used for the association analysis is a mixed linear model.
In another embodiment, the association mapping method includes: obtaining a significance level P value of each SNP locus associated with the phenotype by using the software TASSEL v5.0; performing FDR test on the P value by using Q-value software to obtain a Q value; and screening SNP loci with P≤0.01 and Q≤0.1 as SNP loci significantly associated with the plant target trait.
In a further embodiment, the method for obtaining the population SNP genotype data in step (1) includes: performing whole genome sequencing on each individual in the used natural population to respectively obtain genomic sequences; performing sequence alignment on the genomic sequences to obtain whole genome genotype SNP data; and performing alignment using the plant candidate lncRNA and the plant candidate gene to the reference genome, and combining the whole genome genotype SNP data to obtain the population SNP genotype data of the plant candidate lncRNA and the plant candidate gene.
Embodiments of the invention provide a method for identifying plant lncRNA and gene interaction. The previous interaction relationship between the lncRNA and the target gene only considers the sequence similarity, and the identified gene interacted with the lncRNA has false positive. Moreover, identifying the interaction relationship between the lncRNA and the gene through sequence similarity lacks a biological significance. Therefore, the present invention utilizes a population genetics strategy to provide a method for identifying plant lncRNA and target gene interaction, and can accurately detect a functional gene interacted with the lncRNA, which has important biological significance.
The results of examples of the present invention show that the interaction relationship between the Populus tomentosa lncRNA LNC-0052611 and the gene Pto-COMT25 is obtained by the method provided by the present invention, and the interaction relationship affects the phenotypic variation of a Diameter at Breast Height (DBH) of the P. tomentosa.
Various additional features and advantages of the invention will become more apparent to those of ordinary skill in the art upon review of the following detailed description of one or more illustrative embodiments taken in conjunction with the accompanying drawing. The accompanying drawing, which is incorporated in and constitutes a part of this specification, illustrates one or more embodiments of the invention and, together with the general description given above and the detailed description given below, explain the one or more embodiments of the invention
The sole FIGURE is a flowchart showing the analysis of an identification method according to one embodiment of the invention.
The following clearly and completely describes the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. To make objectives, features, and advantages of the present invention clearer, the following describes embodiments of the present invention in more detail with reference to the accompanying drawing and specific implementations.
In some embodiments, the present invention provides a method for identifying plant lncRNA and gene interaction, including the following steps: (1) obtaining population SNP genotype data of a plant candidate lncRNA and a plant candidate gene; (2) obtaining population expression data of the plant candidate gene in the studied tissue; (3) performing phenotypic measurement of a tested trait to obtain the population phenotypic data; (4) performing association analysis using the population SNP genotype data in step (1) and the target trait population phenotypic data in step (3) to determine SNP loci significantly associated with the plant target trait; the determining condition including: the SNP loci significantly associated with the plant target trait simultaneously include the SNP loci in the plant candidate lncRNA and the SNP loci in the plant candidate gene; (5) performing association analysis using the population SNP genotype data in step (1) and the population expression quantity data in step (2) to determine the SNP loci associated with the expression level of the plant candidate gene; the determining condition including: the SNP loci within the plant candidate lncRNA are significantly associated with the expression level of the candidate gene; (6) calculating the correlation coefficient r between the population expression level data in step (2) and the target trait population phenotypic data in step (3) to determine the correlation therebetween; the determining condition including: the correlation coefficient r>0.5 or r<−0.5; the formula for calculating the correlation coefficient r being as follows:
where X is the expression quantity data of the plant candidate gene in the detected tissue, and Y is the target trait population phenotypic data; and (7) when the determining conditions in steps (4) through (6) are satisfied simultaneously, indicating that the plant candidate lncRNA and the plant candidate gene have an interaction relationship, and together affect the phenotypic variation of the plant target trait.
The method obtains the population SNP genotype data of the plant candidate lncRNA and the plant candidate gene.
The type of the plant is not particularly limited in the present invention, and in examples of the present invention, the plant is preferably P. tomentosa.
In one embodiment, the plant candidate lncRNA and the plant candidate gene are preferably expressed in the same tissue of the plant. In another embodiment, the frequency of the population SNP genotype of the plant candidate lncRNA and the plant candidate gene is preferably greater than 10%.
In a further embodiment, the population SNP genotype data is preferably obtained based on plant whole genome re-sequencing data. The method for obtaining the population SNP genotype data preferably includes: performing whole genome sequencing on each individual in the used natural population to respectively obtain genome sequences; performing sequence alignment on the genome sequences to obtain the whole genome genotype SNP data; and performing alignment using the plant candidate lncRNA and the plant candidate gene to the reference genome, and combining the whole genome genotype SNP data to obtain the population SNP genotype data of the plant candidate lncRNA and the plant candidate gene. The software used for the alignment is preferably Bioedit. The reference gene is preferably a published genome of the plant. The method first begins with whole genome re-sequencing, where each SNP locus on the genome has a fixed position on the genome. Secondly, the positions of the two candidate genes (the lncRNA and the candidate gene) in the reference genome can be determined by sequence alignment. Therefore, SNP data in the candidate gene can be determined based on the positions of the candidate genes in the genome.
In some embodiments, whole genome sequencing is preferably respectively performed on individuals in the used natural population to respectively obtain genomic sequences. The method for sequencing the whole genome is not particularly limited in the present invention, and a conventional sequencing method can be used.
In a further embodiment, sequence alignment is performed on the genomic sequences to obtain whole genome SNP genotype data. The method for sequence alignment is not particularly limited in the present invention, and a conventional sequence alignment method can be used.
In yet another embodiment, alignment is performed using the plant candidate lncRNA and the plant candidate gene to a reference genome, and the whole genome genotype SNP data is combined to obtain the population SNP genotype data.
In one embodiment, population expression quantity data of the plant candidate gene in the tissue is obtained. The method for obtaining the population expression quantity data of the plant candidate gene in the tissue is not particularly limited in the present invention, and a conventional method for obtaining the expression quantity data of the tissue can be used. The tissue is preferably a certain particular tissue. The tissue expressed by the plant candidate gene in the population is preferably identical to the tissue expressed by the plant candidate lncRNA and the plant candidate gene. The tissue is not particularly limited in the present invention, and any tissue of the plant can be used.
In another embodiment, phenotypic measurement is performed on a plant target trait to obtain the population phenotypic data. The method for performing phenotypic measurement on the plant target trait is not particularly limited in the present invention, and a conventional method can be used. The target trait is not particularly limited in the present invention, and any trait of the plant can be used.
In a further embodiment, association analysis is performed using the population SNP genotype data and the target trait population phenotypic data to determine an SNP locus significantly associated with the plant target trait, where the determining condition includes: the SNP loci significantly associated with the plant target trait simultaneously include SNP loci in the plant candidate lncRNA and SNP loci in the plant candidate gene. Software used for the association analysis is preferably TASSEL v5.0. A model used for the association analysis is preferably a mixed linear model. The method of association analysis preferably includes: obtaining a significance level P value of each SNP locus associated with phenotype by using software TASSEL v5.0; performing FDR test on the P value by using Q-value software to obtain a Q value; and screening SNP loci with P≤0.01 and Q≤0.1 as SNP loci significantly associated with the plant target traits. The purpose of performing multiplex test to obtain a Q value is to exclude false positive results. The resulting significantly associated SNP loci need to contain SNP loci both from the plant candidate lncRNA and gene, but the number and attributes of the SNP loci are not limited.
In a further embodiment, association analysis is performed on the population SNP genotype data and the population expression data to determine the SNP loci associated with the expression level of the plant candidate gene, where the determining condition includes: the SNP loci of the plant candidate lncRNA is significantly associated with the expression level of the candidate gene. The method for performing association analysis on the population SNP genotype data and the population expression data is the same as the method for performing association analysis on the population SNP genotype data and the target trait population phenotypic data, and will not be described herein. The SNP loci in the plant candidate lncRNA need to be significantly associated with the expression level of the plant candidate gene, but the number and attributes of the SNP loci are not limited.
In yet another embodiment, the correlation coefficient r between the population expression data and the target trait population phenotypic data is calculated to determine the correlation therebetween, where the determining condition includes: the correlation coefficient r>0.5 or r<−0.5, and the formula for calculating the correlation coefficient r is as follows:
where X is the expression quantity data of the plant candidate gene in the detected tissue, and Y is the target trait population phenotypic data.
In one embodiment, if the correlation coefficient r>0.5 or r<−0.5, a strong correlation exists between the population expression data and the target trait population phenotypic data, indicating that the expression level of the plant candidate gene can greatly affect the variation of the target trait. The correlation coefficient r value ranges from −0.5 to 0.5, indicating that the correlation therebetween is low.
In another embodiment, when the determining conditions in steps (4) through (6) are satisfied simultaneously, it is indicated that the plant candidate lncRNA and the plant candidate gene have an interaction relationship, and together affect the phenotypic variation of the plant target trait. In the present invention, the interaction pre-selection between the plant candidate lncRNA and the plant candidate gene is premised on the regulation of the selected target trait.
The method for identifying plant lncRNA and gene interaction according to the present invention will be further described in detail below with reference to specific examples. The technical solutions of the present invention include, but are not limited to, the following examples.
The interaction between the P. tomentosa lncRNA LNC-0052611 and the gene Pto-COMT25 is identified using a method for identifying plant lncRNA and gene interaction provided by embodiments of the present invention.
Step S1: SNP genotype data of the lncRNA LNC-0052611 and the gene Pto-COMT25 in the natural population of P. tomentosa is obtained, including the following specific steps:
Step S11: the one-year-old “LM50” clone of P. tomentosa planted in Guan County, Shandong Province is taken as experimental material, the mature xylem was collected for transcriptome sequencing, and in order to prevent RNA degradation, the collected mature xylem was placed in a liquid nitrogen environment (−196° C.) for storage immediately after the collection. RNA of the collected mature xylem was extracted using a Plant Qiagen RNAeasy kit (Qiagen China, Shanghai, China), and is transferred to a biotechnology company for lncRNA and transcriptome sequencing after quality assessment to detect lncRNA and mRNA expressed in the tissue. The lncRNA LNC-0052611 and the gene Pto-COMT25 expressed in the tissue are selected as candidate genetic factors, and the interaction relationship therebetween is further analyzed.
Step S12: Firstly, the genomic DNA is extracted from the 435 individuals of the natural population of P. tomentosa, which is used for re-sequencing, and the poplar reference genome, i.e. the genome of P. trichocarpa, is used for sequence alignment to obtain whole genome SNP genotype data. Secondly, the P. tomentosa lncRNA LNC-0052611 and the gene Pto-COMT25 were aligned to reference genome using bioedit software in order to extract population SNP genotype data of the two candidate genetic factors. Finally, the loci with the SNP genotype frequencies greater than 10% are screened as candidate SNPs for P. tomentosa lncRNA LNC-0052611 and the gene Pto-COMT25. See Table 1 for details of candidate SNPs.
Step 2: the mature xylems of 435 individuals in the natural population of P. tomentosa are collected, and the RNAs thereof are extracted respectively and transferred to the biotechnology company for transcriptome sequencing to obtain the population expression abundance data of genes expressed in the xylem of P. tomentosa, and the expression abundance of the candidate gene Pto-COMT25 in 435 individuals of the population is extracted.
Step 3: the DBH index of 435 individuals in the natural population of P. tomentosa is determined by using a growth trait measurement tool, and the phenotypic data of the index in the population is obtained.
Step 4: association analysis is performed using the SNPs within the lncRNA LNC-0052611 and the gene Pto-COMT25 and the population DBH index of P. tomentosa by using a mixed linear model in TASSEL v5.0 software, which is used for determining the SNP loci significantly associated with the DBH of P. tomentosa, where the determining condition includes: the SNP loci significantly associated with the plant target trait simultaneously includes the SNP loci in the plant candidate lncRNA and SNP loci in the plant candidate gene. The results show that SNP7 in the lncRNA LNC-0052611 and SNP45 and SNP61 in Pto-COMT25 are significantly associated with DBH trait (Table 2).
Step S: association analysis is performed on the SNPs in lncRNA and the population expression levels of Pto-COMT25 by using the mixed linear model in the TASSEL v5.0 software, and the SNP loci significantly associated with Pto-COMT25 are screened, where the screening condition includes: the SNP loci within the plant candidate lncRNA are significantly associated with the expression level of the candidate gene. It is found that SNP2, SNP6, SNP7, and SNP11 in lncRNA LNC-0052611 are significantly associated with the expression level of Pto-COMT25 (Table 3), which indicates that LNC-0052611 can affect the expression of Pto-COMT25 to some extent.
Step 6: the formula is calculated using the correlation coefficient, and the formula is as follows:
where X is the expression quantity data of the plant candidate gene in the detected tissue, and Y is the target trait population phenotypic data. The correlation coefficient between the expression quantity of Pto-COMT25 in the population and the DBH traits of the population is analyzed. The result shows that the correlation coefficient between the expression quantity and the DBH traits is r=0.553, which indicates that the expression level of Pto-COMT25 can affect the variation of DBH trait in P. tomentosa to some extent.
Step 7: the calculation results of steps (4) through (6) are comprehensively considered. The association results in step (4) showed that the SNP loci in lncRNA LNC-0052611 and Pto-COMT25 have a significant genetic effect on the variation of the DBH trait in P. tomentosa, which indicates that LNC-0052611 and Pto-COMT25 may affect the size of the DBH of P. tomentosa. The analysis results in step (5) indicated that LNC-0052611 may regulate the expression of Pto-COMT25. The research results in step (6) indicate that the expression level of Pto-COMT25 may affect the variation of the DBH trait of P. tomentosa to some extent. In view of the foregoing three points, an interaction relationship between lncRNA LNC-0052611 and the gene Pto-COMT25 exists, and their interaction affects the variation of the DBH trait in P. tomentosa.
It can be concluded from the above that an interaction relationship between P. tomentosa lncRNA LNC-0052611 and the gene Pto-COMT25 exists, and the interaction relationship affects the phenotypic variation of the DBH of P. tomentosa.
The embodiments described above are only descriptions of preferred embodiments of the present invention, and do not intended to limit the scope of the present invention. Various variations and modifications can be made to the technical solution of the present invention by those of ordinary skills in the art, without departing from the design and spirit of the present invention. The variations and modifications should all fall within the claimed scope defined by the claims of the present invention.
Number | Date | Country | Kind |
---|---|---|---|
201811549079.8 | Dec 2018 | CN | national |