METHOD OF PREDICTING CHROMATOGRAPHIC ELUTION ORDER OF COMPOUNDS

Information

  • Patent Application
  • 20200394513
  • Publication Number
    20200394513
  • Date Filed
    January 10, 2020
    5 years ago
  • Date Published
    December 17, 2020
    4 years ago
Abstract
Disclosed is a method for predicting an elution order of compounds in a mixture. The method includes (a) building a quantitative structure-retention relationship (QSRR) model and (b) predicting a chromatographic elution order of the compounds in the mixture on the basis of the QSRR model using mathematical programming. The mathematical programming is a non-linear programming technique in which a predicted elution order of the compounds is used as a constraint or a multi-objective optimization (MOO) in which a retention time prediction error and an elution order prediction error are used as objective functions. With the use of the method of the present disclosure, it is possible to optimize separation of complex mixtures in reversed-phase chromatography by enabling identification of accurate positions of individual compounds that provides higher certainty in identifying a given compound, e.g., during an “omics” analysis (proteomics, metabolomics, etc.).
Description
CROSS REFERENCE TO RELATED APPLICATION

The present application claims priority to Korean Patent Application No. 10-2019-0069924, filed Jun. 13, 2019, the entire contents of which is incorporated herein for all purposes by this reference.


BACKGROUND
1. Technical Field

The present disclosure relates to a method of predicting an order in which compounds in a mixture are eluted and, more particularly, to a method of accurately predicting elution order of compounds in a mixture on the basis of quantitative structure-retention relationships (QSRR) by using mathematical programming.


2. Description of Related Technology

QSRR-based prediction of retention time of each compound in a mixture is one of the most widely used powerful tools for development of chromatographic separation methods. While many studies and inventions have explained several advantages of prediction of retention time of each compound in a mixture to be analyzed with the use of a stationary phase in a chromatographic column, only few studies on prediction of elution order have been reported. In general, a QSRR model is considered as a highly accurate prediction method to theoretically estimate an order of eluted compounds in comparison to an order observed experimentally. However, QSRR models are applicable to only simple mixtures (i.e., simple mixtures of known composition of compounds).


However, the QSRR models are not so well applicable to reversed-phase liquid chromatography (RP-LC), which accounts for more than 90% of separation of complex mixtures (pharmaceuticals, proteins, etc.) because the RP-LC mechanism, although well described, is complex and is not yet fully understood. Thus, although QSRR models can predict retention time with a reasonable error for some cases, they often result in a poor prediction accuracy when it comes to cases where an elution order of hundreds or thousands of very close or overlapping peaks is predicted. Moreover, no research literature has presented an approach to solve problems with complex chromatography such as RP-LC.


SUMMARY

The present disclosure has been made to solve the problems occurring in the related arts, and an objective of the present disclosure is to provide a method of predicting elution order of compounds in a mixture, thereby optimizing separation of a complex mixture such as pharmaceuticals, peptides or proteins through reversed-phase liquid chromatography.


In order to accomplish the above objective, the present disclosure provides a method of predicting chromatographic elution order of compounds in a mixture, the method including: (a) building a model of quantitative structure-retention relationships; and (b) predicting the chromatographic elution order of the compounds in the mixture on the basis of the QSRR model by using mathematical programming, wherein the mathematical programming is (i) a non-linear programming technique in which the predicted elution order is used as a constraint or (ii) a multi-objective optimization (MOO) technique in which the both (retention time and elution order) prediction errors are used as objective functions.


In addition, the QSRR model obtained through (a) modeling may be a linear model represented by the following formula:






t
R,j
=a
1
x
j,1
+a
2
x
j,2
+ . . . +a
n
x
j,n


wherein tR,j are retention times of respective compounds j arranged in ascending order, xj,i (i=1, . . . , n) are molecular descriptors of respective compounds j, and ai (i=1, . . . n) are regression coefficients.


In addition, the QSRR relation model obtained through the (a) modeling may be a non-linear model obtained by using a non-linear regression equation, such as e.g., an artificial neural network (ANN).


In addition, in (b) prediction, the chromatographic elution order of the compounds in the mixture may be predicted under constraints of an inequality (Formula II) disclosed below by using non-linear programming (Formula I) described below.










min

a
_




{





j
=
1

m








(


t

R
,
j


-


a
1



x

j
,
1



-


a
2



x

j
,
2



-


a
3



x

j
,
3




)

2


+




j
=
1

m



α
j



}





(
I
)









a
1



(


x

j
,
1


-

x


j
+
1

,
1



)


+


a
2



(


x

j
,
2


-

x


j
+
1

,
2



)


+


a
3



(


x

j
,
3


-

x


j
+
1

,
3



)


-

α
j



0




(
II
)







In the inequality, aj is a positive relaxation parameter, and ā is a determinant vector set consisting of a1, a2, a3, and aj (j=1, 2, . . . , m−1)).


In (b) prediction, the chromatic elution order of the compounds in the mixture may be predicted by performing a multi-objective optimization based on objective functions that respectively represent a retention time prediction error represented by Formula III and an elution order prediction error represented by Formula IV.










%






RMSE


(

t
R

)



=







(



t
R

-


t
^

R



t
R


)

2



/


m



×
100





(
III
)







In Formula III, tR and {circumflex over (t)}R represent retention time measured through an analytical experiment and a retention time predicted by the model, respectively, and m represents the number of mixtures measured for each column.










%





RMSE






(
order
)


=







(



order

pred
.


-

order

obs
.




order

obs
.



)

2



/


m



×
100





(
IV
)







Herein, in Formula IV, orderobs, and orderpred, represent elution order determined through an analytical experiment and elution order determined from predicted retention time, respectively.


In (b) prediction, multi-objective optimization is carried out and a Pareto-optimal solution is selected by performing the steps described below:


(b-1) selecting a knee point that is an optimal compromise between the retention time prediction error and the elution order prediction error from a Pareto front consisting of Pareto solutions;


(b-2) moving to the next Pareto solution that reduces the elution order prediction error;


(b-3) verifying the solution using an applicability domain; and


(b-4) repeating (b-1) and (b-2) until an increase in retention time prediction error reaches predetermined threshold or an outlier in the applicability domain exceeds a second predetermined threshold.


The present disclosure relates to a method of accurately predicting elution order of compounds in a mixture using non-linear programming or multi-objective optimization (MOO) by using predicted elution order of compounds as a constraint or using a retention time prediction error and an elution order prediction error as objective functions. According to the present disclosure, separation of complex mixtures through chromatography, especially reversed-phase chromatography, can be optimized by enabling accurate identification of positions of compounds, which provides greater certainty in identifying a given compound.


The present disclosure relates to a method of accurately predicting elution order of compounds in a mixture through mathematical programming (non-linear programming) or multi-objective optimization (MOO) by using predicted elution order of compounds as a constraint or using a retention time error and an elution order error as objective functions on the basis of a quantitative structure-retention relationship (QSRR). According to the present disclosure, separation of complex mixtures through chromatography, especially reversed-phase chromatography, can be optimized by enabling accurate identification of positions of compounds, which provides greater certainty in identifying a given compound.





BRIEF DESCRIPTION OF THE DRAWINGS


FIG. 1 is a conceptual explanatory diagram illustrating a method of predicting chromatographic elution order of compounds by using a multi-objective optimization (MOO) technique;



FIGS. 2A, 2B and 2C illustrate the results of a multi-objective optimization technique in which a retention time prediction error and an elution order prediction error are used as objective functions and more particularly illustrates the results of comparison between % RMSE values of MLR and MLR-MOO for each column in two examples wherein FIG. 2A is retention time, FIG. 2B is elution order, and FIG. 2C is % RMSE value difference between an MLR model and an MLR-MOO model;



FIGS. 3A, 3B, 3C, 3D, 3E and 3F illustrate the results of a multi-objective optimization technique in which a retention time prediction error and an elution order prediction error are used as objective functions and more particularly illustrate the performance of the present disclosure for a linear QSRR (MLR) wherein FIG. 3A is retention time, FIG. 3B is elution order, and



FIG. 3C is an applicability domain in Example 1 (CS1); and FIG. 3D is retention time, FIG. 3E is elution order, and FIG. 3F is applicability domain in Example 2 (CS2); and



FIGS. 4A, 4B and 4C illustrate the results of non-linear programming using predicted elution order as a constraint and, more particularly, illustrate the results of comparison between % RMSE values of MLR and NLP (nonlinear programming) for each column in two examples in which FIG. 4A is retention time, FIG. 4B is elution order, and FIG. 4C is % RMSE value difference between an MLR model and an MLR-MOO model;



FIGS. 5A, 5B and 5C illustrate the results of non-linear programming using the predicted elution order as a constraint and more particularly shows the performance of the present disclosure (Example 2 (CS2)) for a linear QSRR (MLR) in which FIG. 5A is retention time prediction, FIG. 5B is elution order prediction, and FIG. 5C is applicability domain;



FIGS. 6A and 6B are graphs showing an MLR-MOO Pareto front wherein FIG. 6A is Example 1 (CS1) and FIG. 6B is Example 2 (CS2).





DETAILED DESCRIPTION

In describing embodiments of the present disclosure, well-known functions or constructions will not be described in detail when they may obscure the gist of the present invention. Embodiments in accordance with the concept of the present invention can undergo various changes to have various forms, and only some specific embodiments are illustrated in the drawings and described in detail in the present disclosure. While specific embodiments of the present disclosure are described herein below, they are only for illustrative purposes and should not be construed to limit the scope of the present disclosure. The present disclosure should be construed to cover not only the specific embodiments but also cover all modifications, equivalents, and substitutions that fall within the concept and technical spirit of the present disclosure.


The terminologies used herein are for the purpose of describing particular embodiments only and is not intended to limit the scope of the present disclosure. As used herein, the singular forms “a”, “an”, and “the” are intended to include the plural forms as well unless the context clearly indicates otherwise. It will be further understood that the terms “comprises”, “includes”, or “has” when used in the present disclosure specify the presence of stated features, regions, integers, steps, operations, elements and/or components, but do not preclude the presence or addition of one or more other features, regions, integers, steps, operations, elements, components and/or combinations thereof.


The present disclosure describes a method for solving problems with prediction of elution order through QSRR-based mathematical programming in which a retention time error and an elution order error are defined as parts of an objective function. For a linear model, the gist of the present disclosure is a general QSRR model defined by Formula 1.






t
R,j
=a
1
x
j,1
+a
2
x
j,2+ . . . +anxj,n  (1)


In Formula 1, tR,j are retention times of compounds j arranged in ascending order, xj,i (i=1, . . . , n) are molecular descriptors of compounds j, and ai (i 1, . . . , n) are regression coefficients, in which tR,j and xj,i are set such that their mean is adjusted to 0. The ai can be obtained through multiple linear regression (MLR). When the QSRR is non-linear, a non-linear modeling technique , such as e.g., artificial neural networks (ANN) may be used.


(1) Constrained Non-Linear Programming (NLP) Using Predicted Elution Order as Constraint


QSRR can be expressed as Formula 2 which is the formula of mathematical programming.










min
a





j




(


t

R
,
j


-


t
^


R
,
j



)

2






(
2
)







In Formula 2, iR,j=f(x) the modeled QSRR relation f(x) is a function with ai as a parameter. As in the embodiment to be described below, for example, when n=3 and when MLR is used as a relation model when=3, Formula 1 can be expressed as Formula 3.











min
a





j




(


t

R
,
j


-


t
^


R
,
j



)

2



=


min
a





j




(


t

R
,
j


-


a
1



x

j
,
1



-


a
2



x

j
,
2



-


a
3



x

j
,
3




)

2







(
3
)







That is, a typical problem with QSRR is attributed to non-linear programming. By arranging the predicted retention times in ascending order, it is easy to predict the elution order therefrom. However, as described above, although it is possible to predict retention times of a number of peaks with adequate accuracy (i.e., within in a tolerable prediction error range) with the QSRR model obtained from Formula 2, prediction of elution order with the QSRR may often result in low accuracy. This is obvious when considered on the basis of the regression equation.


In terms of mathematical programming, the problems mentioned above seem to be solved by introducing the following inequality constraints:











min
a





j





(


t

R
,
j


-


a
1



x

j
,
1



-


a
2



x

j
,
2



-


a
3



x

j
,
3




)

2






subject





to


:












t
^


R
,
j






t
^


R
,

j
+
1








or











a
1



x

j
,
1



+


a
2



x

j
,
2



+


a
3



x

j
,
3








a
1



x


j
+
1

,
1



+


a
2



x


j
+
1

,
2



+


a
3



x


j
+
1

,
3









(
4
)







For m compounds, the inequality constraints are represented as a vector-matrix notation x·a≤0, where x is a ((m−1)×3) matrix with [(xj,1−xj+1,1) (xj,2−xj+1,2) (xj,3−xj+1,3)] as the j-th row (j=1, 2, . . . , m−1), and a=[a1 a2 a3]T is established.


However, some numerical experiments have shown that the above constraints are so excessive that even simple mixtures have bad results. As a result, a low retention time prediction error and an elution order prediction error are simultaneously obtained using the relaxed inequality constraints shown below.













min

a
_





{





j
=
1

m








(


t

R
,
j


-


a
1



x

j
,
1



-


a
2



x

j
,
2



-


a
3



x

j
,
3




)

2


+




j
=
1

m



α
j



}






subject





to


:











a
1



(


x

j
,
1


-

x


j
+
1

,
1



)


+


a
2



(


x

j
,
2


-

x


j
+
1

,
2



)


+


a
3



(


x

j
,
3


-

x


j
+
1

,
3



)


-

α
j



0







(
5
)







Where αj is a positive relaxation parameter, ā is a decision vector composed of a1, a2, a3 and αj (j=1, 2, . . . , m−1). The inequality constraints for m compounds can be expressed as a vector-matrix notation shown below.













[









x

1
,
1


-

x

2
,
1







x

1
,
2


-

x

2
,
2







x

1
,
3


-

x

2
,
3






-
1



0


0








0






x

2
,
1


-

x

3
,
1







x

2
,
2


-

x

3
,
2







x

2
,
3


-

x

3
,
3





0



-
1



0








0













0


0



-
1









0






x

,
1


-

x


j
+
1

,
1







x

j
,
2


-

x


j
+
1

,
2







x

j
,
3


-

x


j
+
1

,
3




























































x


m
-
1

,
1


-

x

m
,
1







x


m
-
1

,
2


-

x

m
,
2







x


m
-
1

,
3


-

x

m
,
3





0


0


0





0



-
1








]





[




a
1






a
2






a
3






α
1






α
2











α
j











α

m
-
1





]


0




(
6
)








(2) Multi-Objective Optimization using Retention Time Prediction Error and Elution Order Prediction Error as Objective Function (MOO)


The problem with multi-objective optimization (MOO) is attributed to optimization with multiple objective functions. The general formula thereof is Formula 7.





min(g11), g22), . . . , gt t))   (7)





subject to αi∈A.


In Formula 7, an integer k (≥2) represents the number of objective functions g and a set A is a possible set of decision vectors α. In a multi-objective optimization, normally there are no solutions that minimize all objective functions. Therefore, attention is paid to the Pareto optimal solution, which is a solution that cannot improve objective functions without degrading at least one of the objective functions.


In the present disclosure, two objective functions are used, one representing the error of the retention time prediction and the other representing the error of the elution order prediction. The Pareto optimal solution is then selected according to the following procedure: (1) selecting the knee point which is the best compromise between the retention time prediction error and the elution order prediction error from the Pareto front consisting of the Pareto solutions; (2) moving to the next Pareto solution to reduce the elution order prediction error; (3) validating the solution using the applicability domain; and (4) repeating (2) and (3) until an increase in the retention time prediction error reaches a first predetermined threshold or until an outlier in the applicability domain exceeds a second predetermined threshold. This is conceptually illustrated in FIG. 1.


Hereinafter, the present disclosure will be described in more detail with reference to Examples.


The examples presented herein are merely illustrative of the present disclosure and are not intended to limit the scope of the present disclosure.


EXAMPLE

The following two examples demonstrate the applicability of the present disclosure: (i) CS1 which is a mixture of 62 organic compounds and (ii) CS2 which is a mixture of 98 synthetic peptides. Analysis for the first example CS1 was performed using a Supelcosil LC column with a gradient time of 10 minutes at 35° C. Analysis for the second example CS2 was performed using seven chromatographic columns (i.e., Xterra, Licrospher, PRP, Discovery RP-Amide C-16, Licrospher CN, Discovery HS F5-3 and Chromolith) at different gradient settings and temperatures. Chromatographic analysis data were obtained from the references.


The molecular descriptors used in each example for QSRR relation modeling are listed in Table 1 below.









TABLE 1







Molecular descriptors used in Examples (CS1 and CS2)








Molecular descriptors
Explanation












CS1
μ
dipole moment



δmin
excess charge of the most negatively




charged atom



SASA
solvent-accessible surface area


CS2
SumAA
sum of retention times of respective 20




naturally occurring amino acids



νDWvol.
Van der Waals volume



clogP
computerized octanol-water coefficient









In both examples, a linear model was considered as a specific form of the QSRR relation model, and control MLR model coefficients were calculated using a least-square method for comparison. The solution of a non-linear programming problem with relaxed constraints in Formula 5 was obtained using the interior-point method. The solution of a multi-objective optimization problem of Formula 7 was obtained using a genetic algorithm. In both methods, the coefficients of a control MLR model obtained for comparison were used as initial values in the optimization.


For the multi-objective optimization, the percentage root mean square error (% RMSE) of the retention time was used as an objective function representing a retention time prediction error.










%






RMSE


(

t
R

)



=







(



t
R

-


t
^

R



t
R


)

2



/


m



×
100





(
8
)







Where tR and {circumflex over (t)}R respectively represent the retention time measured through the analytical experiment and the retention time predicted by the model and m represents the number of mixtures measured for each column. The elution order prediction can be performed after sorting the retention times predicted by the QSRR model in ascending order, and % RMSE was used as the objective function representing the accuracy of the elution order.










%





RMSE






(
order
)


=







(



order

pred
.


-

order

obs
.




order

obs
.



)

2



/


m



×
100





(
9
)







Where orderobs, and orderpred. respectively represent the elution order determined through the analysis and the elution order determined from the predicted retention time.


When both of the methods NLP and MOO used a linear QSRR model (MLR), the accuracy of the elution order prediction was significantly increased (see FIGS. 2B and 4B) at the expense of the accuracy of the retention time prediction in both examples (see FIGS. 2A and 4A). As illustrated in FIGS. 2C and 4C, the maximum increases in % RMSE (tR) were about 15% and 20%, respectively while the maximum decreases in % RMSE (order) were about 80% and 260%, respectively.


Of the seven RP-LC columns used for both methods in both examples, FIGS. 3 and 5 illustrate the prediction results for two columns in both examples (CS1: Supelcosil LC, tG=10 min, T=35° C.; CS2: Xterra, tG=20 min, T=40° C.). That is, the prediction performance for each of the retention time and the elution order and the corresponding applicability domain were shown. The two examples CS1 and CS2 (CS1 (FIGS. 3A and 3B); CS2 (FIGS. 3D, 3E, 5A, and 5B)) show reasonable retention time prediction performance and elution order prediction performance. Nearly all analyte compounds included in both examples were well predicted and structurally important analytes included in a training set were within each applicability domain (FIGS. 3C, 3F, and 5C). The developed model is therefore considered to be stable and robust for the structurally distant analytes.


The optimal solutions obtained according to the procedure of finding the Pareto optimal solution while starting from the knee point are shown in FIG. 6A (CS1) and FIG. 6B (CS2). Predetermined thresholds were 10% (for % RMSE (tR)) at the knee point and 2 outliers in the applicability domain, respectively. In addition, as shown in Table 2 below, even in the case of the multi-objective optimization method, it was possible to attain a large decrease in elution order prediction error with a small increase in retention time prediction error in both of the examples.









TABLE 2







% RMSE at knee point and optimal point










% RMSE(tR)
% RMSE(order)
















CS1
Knee point
8.67
43.7




Optimal point
9.33
42.0



CS2
Knee point
11.6
19.8




Optimal point
12.1
18.1










While exemplary embodiments of the present disclosure have been described with reference to the accompanying drawings, those skilled in the art will appreciate that the present disclosure can be implemented in other different forms without departing from the technical spirit or essential characteristics of the exemplary embodiments. Therefore, it is noted that the exemplary embodiments described above are only for illustrative purposes and are not restrictive in all aspects.

Claims
  • 1. A method of predicting a chromatographic elution order of compounds in a mixture, the method comprising: (a) modeling a quantitative structure-retention relationship (QSRR) model; and(b) predicting a chromatographic elution order of the compounds in the mixture from the QSRR model using mathematical programming,wherein the mathematical programming is (i) a non-linear programming technique using a predicted elution order of the compounds as a constraint or (ii) a multi-objective optimization (MOO) technique using a retention time prediction error and an elution order prediction error as objective functions.
  • 2. The method according to claim 1, wherein the QSRR model obtained through the (a) modeling is a linear model represented by the following formula: tR,j=a1xj,1+a2xj,2+ . . . +anxj,n where tR,j are retention times of respective compounds j sorted in ascending order, xj,i (i=1, . . . , n) are molecular descriptors of respective compounds j, and ai (i=1, . . . , n) are regression coefficients.
  • 3. The method according to claim 1, wherein the QSRR model obtained through the (a) modeling is a non-linear model obtained by using artificial neural networks (ANN).
  • 4. The method according to claim 1, wherein on the (b) predicting, the chromatographic elution order of the compounds in the mixture is predicted by applying the following non-linear programming I under the following inequality constraints II:
  • 5. The method according to claim 1, wherein in the (b) predicting, the chromatographic elution order of the compounds in the mixture is predicted by performing multi-objective optimization on the basis of an objective function representing a retention time prediction error represented by Formula III and an elution order prediction error represented by Formula IV,
  • 6. The method according to claim 5, wherein in the (b) predicting, the multi-objective optimization of selecting a Pareto optimal solution is performed through the following: (b-1) selecting a knee point which is an optimal compromise between the retention time prediction error and the elution order prediction error from a Pareto front including the Pareto solutions;(b-2) moving to the next Pareto solution to reduce the elution order prediction error;(b-3) verifying the solution using an applicability domain; andb-4) repeating (b-1) and (b-2) until an increase in the retention time prediction error reaches a first predetermined threshold or an outlier in the applicability domain exceeds a second predetermined threshold.
Priority Claims (1)
Number Date Country Kind
10-2019-0069924 Jun 2019 KR national