Claims
- 1. A method for classifying object sequences, comprising the computer implemented steps of:
obtaining a set of known aligned sequences, some of which form a first class exclusive of other sequences in the set, each known sequence in the set having a respective set of ni elements, different elements possessing different physical properties from a respective set of qi physical properties of interest, where i is sequence alignment position; for each known sequence, forming a respective vector of qi bits, a bit being set to 1 to indicate that a physical property is found in an element of the sequence and a bit being set to 0 to indicate that a physical property is absent from an element of the sequence; for each bit, defining a profile as a function of the probability of the bit being set to 1; given a test sequence to classify, forming a respective representative vector of q bits for the test sequence; assigning a score for the test sequence as a function of the defined profiles per bit and the bit values in the representative vector of the test sequence; and calculating probability of the test sequence being of the first class as a function of the assigned score.
- 2. A method as claimed in claim 1 wherein the set of physical properties of interest include hydrophobicity, helix propensity, sheet propensity, hydrogen donor propensity, hydrogen acceptor propensity, the state of being charged, aromaticity, sidechain linearity unbranched, sidechain volume, Phi-Psi flexibility and crosslinkability.
- 3. A method as claimed in claim 1 wherein the step of defining a profile includes defining probability of two terms LO(1) and LO(0) for each bit, where LO(1) is the log odds ratio of the probability of the bit being set to 1 given a sequence of the first class and the probability of the bit being set to 1 given a sequence not of the first class, and LO(0) is the log odds ratio of the probability of the bit being set to 0 given a sequence of the first class and the probability of the bit being set to 0 given a sequence not of the first class.
- 4. A method as claimed in claim 3 wherein the step of assigning a score includes:
for each bit in the representative vector of the test sequence, computing a bitwise score equal to (the value of the bit multiplied by the product of the probability of the bit equaling 1 in the first class and LO(1) of the corresponding bit in the representative vector of a known sequence) plus the product of (1-value of the bit) and the product of the probability of the bit equaling 0 in the first class and LO(0) of the corresponding bit in the representative vector of the known sequence.
- 5. A method as claimed in claim 1 further comprising normalizing the assigned score; and
the step of calculating probability includes calculating Eq 22.
- 6. A method as claimed in claim 5 wherein the step of calculating probability further includes calculating probability that distribution of the normalized score of the test sequence is equal to distribution of normalized scores for the known sequences of the first class.
RELATED APPLICATION
[0001] This application is a continuation of PCT/US01/44000, filed Nov. 6, 2001 and claims the benefit of U.S. Provisional Application No. 60/246,196, filed Nov. 6, 2000, the entire teachings of which are incorporated herein by reference.
Provisional Applications (1)
|
Number |
Date |
Country |
|
60246196 |
Nov 2000 |
US |
Continuations (1)
|
Number |
Date |
Country |
Parent |
PCT/US01/44000 |
Nov 2001 |
US |
Child |
10430685 |
May 2003 |
US |