The present invention relates to an apparatus and method for identifying the secondary structure of a protein using alpha carbon coordinates, and more particularly to an apparatus and method for identifying the secondary structure of a protein when only alpha carbon coordinates of the protein are given.
The structure of a protein which is a biomolecule responsible for important functions related to life phenomenon in vivo is receiving attention because the functions thereof are closely related to its structure. The structure of a protein is defined as primary, secondary, tertiary and quaternary structures. The primary structure indicates information about the amino acid sequences of the protein, and the secondary structure indicates a helix, a strand or a random coil, which are predetermined patterns made up of amino acid residues. Also the tertiary structure indicates a three-dimensional structure composed of secondary structures, and the quaternary structure indicates the form in which some protein chains are provided to interact with each other. Among such protein structures, the secondary structure is the base of the tertiary structure, and thus the obvious definition of the secondary structure is regarded as important in terms of research into protein structures.
Methods of defining the secondary structure include using the pattern of hydrogen bonding between the hydrogen atom (H) of an amide and the oxygen atom (O) of a carbonyl from respective atom coordinates of the protein via X-ray or nuclear magnetic resonance (NMR). In order to utilize this method, the positions of atoms of the backbone among atoms of the protein such as H, N, C, O, etc., have to be accurately found out, and their coordinates are used to calculate the presence or absence of hydrogen bonds thereby determining the secondary structure. A typical program using this method is DSSP (Dictionary of Protein Secondary Structure).
When DSSP runs, it estimates the position of H using information about O, N and C of respective amino acids, calculates hydrogen bond energy using the coordinates of four atoms, defines the hydrogen bond when the calculated energy is less than −0.5 Kcal/mol, and determines the secondary structure based on information about the hydrogen bonds. The hydrogen bond energy is calculated by the following Equation 1.
wherein q1 is the charge amount of hydrogen, q2 is the charge amount of oxygen, rON is the O—N distance, rCH is the C—H distance, rOH is the O—H distance, and rCN is the C—N distance.
Another method of determining the secondary structure of a protein is STRIDE (Structural Identification). In STRIDE, the position of a hydrogen is estimated and then the secondary structure is determined using such information, and this method is different from DSSP in terms of the energy calculation equation which determines the presence or absence of a hydrogen bond. On the other hand, PROSS determines the secondary structure using dihedral angles of a backbone.
According to DEFINE and VoTAP, the secondary structure of a protein is defined using only information about the alpha carbon (Cα). DEFINE simply compares the distance between alpha carbons with the standard distance of an ideal secondary structure and thus the secondary structure is defined when the corresponding distance is equal to the standard distance. VoTAP defines the three-dimensional structure of Voronoi tessellation per amino acid of a protein thus determining the secondary structure using the state of contact surface between the three-dimensional structures.
The use of X-ray or electron microscopy (EM) of a very large protein to investigate the protein's structure mainly provides only the alpha carbon coordinates, and in the case of a protein having a known structure, only the alpha carbon coordinates among some amino acid coordinates are known. In this case, the presence or absence of a hydrogen bond has to be determined using only the alpha carbon coordinates to thereby identify the secondary structure of a protein. Compared to DSSP using all of the known atom coordinates, the above conventional methods have an accuracy of a little over 80%. In order to accurately determine the presence or absence of a hydrogen bond, the orientation of four atoms (H, N, C, O) of the hydrogen bond is very important, but is not easy based on only the alpha carbon coordinates. Thus there is a need for a method that is able to accurately determine the secondary structure of a protein using only the alpha carbon coordinates.
Therefore, an object of the present invention is to provide an apparatus and method for identifying the secondary structure of a protein using alpha carbon coordinates, in which the secondary structure of a protein only the alpha carbon coordinates of which are known may be identified with high accuracy.
Another object of the present invention is to provide a computer-readable storage medium which stores a program that may execute, on a computer, the method of identifying the secondary structure of a protein using alpha carbon coordinates, in which the secondary structure of a protein only the alpha carbon coordinates of which are known may be identified with high accuracy.
In order to accomplish the above objects, the present invention provides an apparatus for identifying the secondary structure of a protein using alpha carbon coordinates, comprising a pseudo center fixing unit configured to receive a series of alpha carbon coordinates included in amino acid sequences of a target protein so that pseudo centers corresponding to respective alpha carbons are disposed at positions fixed between the respective alpha carbons and alpha carbons adjacent thereto; a helix determination unit configured to determine, based on a dihedral angle and a distance between a preset number of consecutive pseudo centers among the pseudo centers fixed for the target protein, whether the secondary structure formed by a plurality of amino acids corresponding to the consecutive pseudo centers is a helix; and a strand determination unit configured to determine, based on distances between pseudo centers included in different pseudo center sequences in a plurality of pseudo center sequences comprising a preset number of consecutive pseudo centers among pseudo centers other than those corresponding to the helix, whether the secondary structure formed by a plurality of amino acids corresponding to the pseudo centers of respective pseudo center sequences is a strand.
In addition, the present invention provides a method of identifying the secondary structure of a protein using alpha carbon coordinates, comprising disposing pseudo centers corresponding to respective alpha carbons at positions fixed between the respective alpha carbons and alpha carbons adjacent thereto based on a series of alpha carbon coordinates included in amino acid sequences of a target protein; determining, based on a dihedral angle and a distance between a preset number of consecutive pseudo centers among the pseudo centers fixed for the target protein, whether the secondary structure formed by a plurality of amino acids corresponding to the consecutive pseudo centers is a helix; and determining, based on distances between pseudo centers included in different pseudo center sequences in a plurality of pseudo center sequences comprising a preset number of consecutive pseudo centers among pseudo centers other than those corresponding to the helix, whether the secondary structure formed by a plurality of amino acids corresponding to the pseudo centers of respective pseudo center sequences is a strand.
In an apparatus and method for identifying the secondary structure of a protein using alpha carbon coordinates according to the present invention, the secondary structure including amino acids corresponding to pseudo centers can be identified based on the dihedral angle or the distance between pseudo centers using pseudo centers fixed between alpha carbons instead of using the given alpha carbon coordinates in unchanged form. Thereby, even when only the alpha carbon coordinates are known for the amino acids of a protein, the secondary structure of a corresponding protein can be identified, thus increasing the accuracy compared to conventional methods using alpha carbon coordinates.
Hereinafter, a detailed description will be given of an apparatus and method for identifying the secondary structure of a protein using alpha carbon coordinates according to preferred embodiments of the invention with reference to the appended drawings.
As illustrated in
The secondary structure of a protein, which will be identified using the apparatus for identifying the secondary structure of a protein according to the present invention, is first described. Amino acids which are the building blocks of a protein have an amino group (N—H), a carbonyl group (C═O), a side chain (R), and alpha carbon (Cα).
The secondary structure of a protein designates a structure in specific form via hydrogen bonding between the amino acids of a protein as illustrated in
The helix may include an alpha helix and a 3/10 helix, and the strand may include a parallel strand and an anti-parallel strand. A small ring which connects two secondary structures is referred to as a turn, and the structure other than the above three types of secondary structure is a random coil. The ratio of secondary structures of known proteins is given in Table 1 below.
Table 2 below shows the kind and ratio of the helix defined and subdivided by DSSP.
As is apparent from Table 2, the right-handed alpha helix and the right-handed 3/10 helix are very abundantly present in 36% and 3.8%, respectively, and the others are present to the extent of less than 0.001%.
The alpha helix, which is very frequently the type of helix, includes hydrogen bonds between amino acids which are spaced apart from each other while three amino acids are interposed therebetween, and the four consecutive amino acids form a helical shape.
The 3/10 helix includes hydrogen bonds between amino acids which are spaced apart from each other while two amino acids are interposed therebetween, and the three consecutive amino acids form a helical shape.
In addition, the strand which is another kind of secondary structure includes the parallel strand (which is referred to as “parallel”) and the anti-parallel strand (which is referred to as “anti-parallel”).
Although the secondary structure such as a turn is present in addition to the helix and the strand, its ratio of appearance is very small and it is mostly handled like a random coil, and hereinafter in the present invention the turn will be regarded as a random coil.
In order to identify the secondary structure of a given protein as mentioned above, that is, a helix including the alpha helix and the 3/10 helix and a strand including the parallel strand and the anti-parallel strand, the apparatus according to the present invention uses alpha carbon coordinates of the target protein in lieu of hydrogen bond energy. Also, the alpha carbon coordinates are not used unchanged but pseudo centers which are newly disposed between alpha carbons are used.
In order to use the pseudo centers to identify the secondary structure of a protein, the pseudo center fixing unit 110 receives a series of alpha carbon coordinates included in the amino acid sequences of a target protein so that pseudo centers corresponding to respective alpha carbons are disposed at positions fixed between respective alpha carbons and alpha carbons adjacent thereto.
As illustrated in (b) of
However, in some cases where EM or X-rays are used, as illustrated in (c) of
As is apparent from
Below, a method of identifying the secondary structure of a target protein is described in detail using the pseudo centers corresponding to respective alpha carbons by the pseudo center fixing unit 110.
The helix determination unit 120 is configured such that pseudo centers fixed for the target protein are classified into a plurality of groups comprising the preset number of consecutive pseudo centers, and whether the secondary structure formed by a plurality of amino acids corresponding to the pseudo centers of respective groups is a helix is determined based on the dihedral angle and the distance between the pseudo centers of respective groups.
As mentioned above, the helix includes an alpha helix and a 3/10 helix, and thus the helix determination unit 120 includes an alpha helix determination unit 122 and a 3/10 helix determination unit 124.
The alpha helix determination unit 122 is configured such that the secondary structure formed by the amino acid sequence corresponding to four pseudo centers is determined to be an alpha helix under conditions in which the distance between the first and fourth pseudo centers in four consecutive pseudo centers among the pseudo centers falls in the preset first distance range and the dihedral angle defined by the four pseudo centers falls in the preset first angle range.
Even when the distance between the pseudo center at position N and the pseudo center at position N+3 falls in the first distance range, if the directions of oxygen and hydrogen are inappropriate, the hydrogen bond is not formed. Hence, the dihedral angle must be known to estimate the direction of hydrogen. The dihedral angle is an angle defined by four points, which are pseudo centers at positions N, N+1, N+2 and N+3. If the dihedral angle falls in the preset first angle range, the given sequence is judged to be an alpha helix. For example, when the distance between the pseudo center at position N and the pseudo center at position N+3 falls in the first distance range of 4.21˜5.23 and the dihedral angle defined by the pseudo centers at positions N, N+1, N+2 and N+3 falls in the first angle range of 43.52˜78.32°, four consecutive amino acids at positions N to N+3 are determined to form the alpha helix.
On the other hand, in the case of proline among amino acids, oxygen (O) of the carbonyl group may form the hydrogen bond with another amino acid, but the amino group has no hydrogen (H) and thus it does not form the hydrogen bond with another amino acid, unlike the other 19 kinds of amino acids. Accordingly, in the case of proline, even when the dihedral angle and the distance between pseudo centers satisfy the preset ranges and thus the hydrogen bond is judged to be formed, the hydrogen bond cannot be actually formed and is not included in the bond.
Conclusively, determining whether the hydrogen bond is present or not is repeated, after which if such bonds are consecutively present, the secondary structure of a given protein is defined as an alpha helix.
The 3/10 helix determination unit 124 is configured such that the secondary structure formed by the amino acid sequence corresponding to four pseudo centers is determined to be a 3/10 helix under conditions in which, in the four consecutive pseudo centers, the distance between the first and third pseudo centers, the distance between the second and fourth pseudo centers and the distance between the first and fourth pseudo centers respectively fall in the preset second, third and fourth distance ranges, and the dihedral angle defined by the four pseudo centers falls in the preset second distance range.
As in the alpha helix, even when the distance between pseudo centers at positions N and N+2 falls in the preset distance range, it still has to be checked whether the directions of oxygen (O) and hydrogen (H) are appropriate to form the hydrogen bond using the dihedral angle. In the alpha helix, because pseudo centers which form the hydrogen bond are spaced apart from each other between which three pseudo centers are interposed, the dihedral angle is calculated using the pseudo centers at both ends forming the hydrogen bond and the pseudo centers (N, N+1, N+2, N+3) therebetween. However, in the case of the 3/10 helix, pseudo centers which form the hydrogen bond are spaced apart from each other between which two pseudo centers are interposed, and thus upon calculating the dihedral angle, the pseudo center at position N+3 (which is 7′ in
As the pseudo center at position N+3 is additionally included, the 3/10 helix determination unit 124 uses, as the additional determination conditions, the distance between the pseudo center at position N+1 (which is 5′ in
In the most preferred embodiment, the secondary structure formed by the amino acid sequence corresponding to four pseudo centers at positions N to N+3 may be determined to be a 3/10 helix under conditions in which the distance between the pseudo center at position N and the pseudo center at position N+2 falls in the second distance range of within 4.82, the distance between the pseudo center at position N+1 and the pseudo center at position N+3 falls in the third distance range of within 5.24, the distance between the pseudo center at position N and the pseudo center at position N+3 falls in the fourth distance range of 5.14˜9.12, and the dihedral angle defined by the pseudo centers at positions N, N+1, N+2 and N+3 falls in the second angle range of 42.1˜119.5°.
After determining whether the secondary structure is a helix using the pseudo centers fixed for the target protein, it is determined whether pseudo centers other than those corresponding to the helix correspond to a strand.
The strand determination unit 130 is configured such that, based on distances between pseudo centers included in different pseudo center sequences in a plurality of pseudo center sequences comprising the preset number of consecutive pseudo centers among pseudo centers other than those corresponding to the helix, whether the secondary structure formed by a plurality of amino acids corresponding to the pseudo centers of respective pseudo center sequences is a strand is determined.
In the secondary structure of the protein as mentioned above, the strand includes a parallel strand and an anti-parallel strand, and thus the strand determination unit 130 includes a parallel determination unit 132 and an anti-parallel determination unit 134.
The parallel determination unit 132 is configured such that, under conditions in which the distance between pseudo centers respectively included in different pseudo center sequences proceeding in the same direction falls in the preset fifth distance range and the distance between consecutive pseudo centers of the pseudo centers respectively included in the different pseudo center sequences falls in the preset sixth distance range, the secondary structure formed by the amino acid sequences corresponding to the different pseudo center sequences is determined to be a parallel strand.
In order to identify the secondary structure of a given protein, as illustrated in
The anti-parallel determination unit 134 is configured such that, under conditions in which the distance between pseudo centers respectively included in different pseudo center sequences proceeding in the opposite directions falls in the preset seventh distance range, the distance between consecutive pseudo centers of the pseudo centers respectively included in the different pseudo center sequences falls in the preset eighth distance range, and the distance between alpha carbons respectively corresponding to the pseudo centers respectively included in the difference pseudo center sequences falls in the preset ninth distance range, the secondary structure formed by the amino acid sequences corresponding to the different pseudo center sequences is determined to be an anti-parallel strand.
In the case when the above conditions in the anti-parallel strand, particularly pieces of information about the distances between pseudo centers and between alpha carbons are used, the accuracy is high in the middle portion of the amino acid sequence but relatively lower at the ends of the sequence. Thus, in the case of amino acids located at ends of the amino acid sequence, the amino acids associated with hydrogen bonds have to be discriminated from the amino acids that are not associated with hydrogen bonds. Specifically, in the case of the front end of the amino acid sequence in
In the case of the ends (amino acids at positions 20 and 62) of
In two amino acid sequences forming the anti-parallel strand, any one amino acid may form the hydrogen bond with another amino acid included in the third amino acid sequence.
Finally, the secondary structure of a protein, which does not correspond to the above four kinds of secondary structure (alpha helix, 3/10 helix, parallel strand and anti-parallel strand) is identified to be that of a random coil.
On the other hand, in the case of a helix, building blocks in which the distance between pseudo centers falls in the predetermined range may mainly correspond to a 1:1 ratio. Even if they correspond to a 1:2 ratio, a predetermined pattern of N+3 or N+4 should be formed to make the helix, and thus building blocks forming the hydrogen bonds may be estimated. In the case of a strand, it is provided in parallel form as in (h) of
Because one building block forms a hydrogen bond with another building block, when they correspond to a 1:2 ratio or more, any one is selected from among building blocks at a 1:2 ratio using information about the building blocks corresponding to a 1:1 ratio. For example, when the distance between building blocks at positions 10 and 30 falls in the preset range and all of the distances between the building blocks at positions 11 and 30, 31 and 32 fall in the preset ranges, the building blocks at positions 11 and 31 may be set so that they form the hydrogen bond based on the building blocks at positions 10 and 30 at a 1:1 ratio.
The apparatus for identifying the secondary structure of a protein according to the present invention and doing the same using DSSP were used on pre-established 183 protein data and the accuracies thereof are compared. The results are shown in Table 3 below.
As is apparent from Table 3, the accuracy of the invention based on DSSP is 90.91%, which is higher than the 83.2% which is the highest when using conventional methods depending on alpha carbon coordinates.
As illustrated in
Next, the helix determination unit 120 determines whether the amino acid sequence corresponding to consecutive pseudo centers forms the helix depending on the dihedral angle and the distance between the preset number of consecutive pseudo centers among the pseudo centers fixed for the target protein at step S1920. As such, the alpha helix determination unit 122 determines that the secondary structure formed by the amino acid sequence corresponding to four pseudo centers is a alpha helix under conditions in which, in the four consecutive pseudo centers, the distance between the first and fourth pseudo centers falls in the preset first distance range and the dihedral angle formed by the four pseudo centers falls in the preset first angle range at step S1930.
The 3/10 helix determination unit 124 determines that the secondary structure formed by the amino acid sequence corresponding to four pseudo centers is a 3/10 helix. This determination is made at step S1940 because of the conditions that, in the four consecutive pseudo centers, the distance between the first and third pseudo centers, the distance between the second and fourth pseudo centers and the distance between the first and fourth pseudo centers respectively fall in the preset second, third and fourth distance ranges and the dihedral angle formed by the four pseudo centers falls in the preset second distance range.
The strand determination unit 130 calculates the distances between pseudo centers included in different pseudo center sequences in a plurality of pseudo center sequences comprising the preset number of consecutive pseudo centers among pseudo centers other than those corresponding to the helix. The distances between pseudo centers are calculated to determine whether the corresponding structure is a strand at step 1950. To this end, the parallel determination unit 132 determines that, under conditions in which the distance between pseudo centers included in different pseudo center sequences proceeding in the same direction falls in the preset fifth distance range and the distance between consecutive pseudo centers of the pseudo centers included in the different pseudo center sequences falls in the preset sixth distance range, the secondary structure formed by the amino acid sequences corresponding to the different pseudo center sequences is regarded as a parallel strand at step S1960.
The anti-parallel determination unit 134 calculates the distance between alpha carbons respectively corresponding to pseudo centers included in different pseudo center sequences as the additional determination condition at step S1970. This unit 134 determines that the conditions are such that the distance between pseudo centers included in different pseudo center sequences proceeding in the opposite directions falls in the preset seventh distance range, the distance between consecutive pseudo centers of the pseudo centers included in the different pseudo center sequences falls in the preset eighth distance range, and the distance between alpha carbons respectively corresponding to the pseudo centers included in the different pseudo center sequences falls in the preset ninth distance range. As a result of these determinations, the secondary structure formed by the amino acid sequences corresponding to the different pseudo center sequences is determined to be an anti-parallel strand at step 1980.
The present invention may be implemented in the form of computer-readable code that is stored in a computer-readable storage medium. The computer-readable storage medium includes all types of storage devices in which computer system-readable data can be stored. Examples of the computer-readable storage medium are ROM (Read Only Memory), RAM (Random Access Memory), CD-ROM (Compact Disk-Read Only Memory), magnetic tape, a floppy disk, an optical data storage device, etc. Furthermore, the computer-readable storage medium may be implemented in the form of carrier waves (e.g. in the case of transmission via the Internet). Moreover, the computer-readable storage medium may be distributed across computer systems connected via a network, and may be configured such that computer-readable code is stored and executed in a distributed manner.
Although the preferred embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, without departing from the scope and spirit of the invention as disclosed in the accompanying claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2010-0031907 | Apr 2010 | KR | national |
This application is the National Stage of International Application No. PCT/KR2010/006033, filed on Sep. 6, 2010, and claims priority to and the benefit of Korean Patent Application No. 2010-0031907, filed on Apr. 7, 2010, the disclosure of which is incorporated herein by reference in its entirety.
Filing Document | Filing Date | Country | Kind | 371c Date |
---|---|---|---|---|
PCT/KR10/06033 | 9/6/2010 | WO | 00 | 9/14/2012 |