This application is based on and claims priority under 35 U.S.C. § 119 to Korean Patent Application No. 10-2021-0084874, filed on Jun. 29, 2021 in the Korean Intellectual Property Office, and Korean Patent Application No. 10-2021-0111204, filed on Aug. 23, 2021 in the Korean Intellectual Property Office, the disclosures of which are incorporated by reference herein in their entirety.
The present disclosure relates to the solubility of a solute in a solvent, and more particularly, to a system and method for estimating solubility.
Solubility may indicate a characteristic of a solute dissolving in a solvent. Different solutes may have different solubilities in one solvent, and a solute may have different solubilities in different solvents. The solubility of a solute and solvent may be used as significant indicators for determining the use of a solution. Solubility may be detected through an experiment, but it may be practically difficult to repeat experiments for detecting solubility corresponding to each of many combinations of various solutes and solvents when the purpose of the experiments is to derive (e.g., identify) a solute and a solvent having desired solubility.
The teachings herein describe a system and method for quickly and accurately estimating solubility.
According to an aspect of the present disclosure, a method of estimating solubility includes obtaining input data representing a chemical structure of a target material; generating at least one descriptor based on the input data; obtaining at least one solubility parameter by providing the at least one descriptor to a machine learning model trained based on chemical structures and sample solubility parameters of sample materials; and calculating the solubility based on the at least one solubility parameter. The at least one descriptor includes at least one of a zero-dimensional descriptor, a one-dimensional descriptor, a two-dimensional descriptor, or a three-dimensional descriptor, each representing the chemical structure of the target material.
According to another aspect of the present disclosure, a system includes at least one processor; and a non-transitory storage medium storing instructions allowing the at least one processor to perform operations for solubility estimation when the instructions are executed by the at least one processor. The operations include an operation of obtaining input data representing a chemical structure of a target material; an operation of generating at least one descriptor based on the input data; an operation of obtaining at least one solubility parameter by providing the at least one descriptor to a machine learning model trained based on chemical structures and sample solubility parameters of sample materials; and an operation of calculating the solubility based on the at least one solubility parameter. The at least one descriptor includes at least one of a zero-dimensional descriptor, a one-dimensional descriptor, a two-dimensional descriptor, or a three-dimensional descriptor, each representing the chemical structure of the target material.
According to a further aspect of the present disclosure, a method of estimating solubility includes generating a machine learning model trained to derive at least one solubility parameter from at least one descriptor defining a chemical structure of a material. The generating of the trained machine learning model includes obtaining training data with respect to an attribute of a sample material; generating a plurality of sample descriptors based on the training data; extracting at least one sample solubility parameter of the sample material from the training data; and training the machine learning model based on the plurality of sample descriptors and the at least one sample solubility parameter. The plurality of sample descriptors include at least one of a zero-dimensional descriptor, a one-dimensional descriptor, a two-dimensional descriptor, or a three-dimensional descriptor, each representing a chemical structure of the sample material.
Embodiments of the inventive concept(s) described herein will be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:
The operations of methods described below may be performed by appropriate units, e.g., various hardware and/or software components, circuits, and/or modules. Software may include an ordered list of executable instructions for implementing logical functions and may be used by an instruction execution system, apparatus or device or embodied in any relevant processor-readable medium. An example of an instruction execution system, apparatus or device is a system, apparatus or device which includes one or more single-core processor and/or multi-core processor which execute(s) executable instructions.
The steps or blocks and functions of a method or algorithm described below may be embodied directly in hardware, a software module executed by a processor, or a combination thereof. When functions are implemented by software, the functions may be stored as at least one instruction or code in a non-transitory tangible computer-readable medium. A software module may be in random access memory (RAM), flash memory, read-only memory (ROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), registers, a hard disk, a removable disk, a CD-ROM, or any other form of storage medium.
Referring to
The input data may have a form representing the chemical structure of the target material. In some embodiments, the input data may include a string including a series of characters, which defines the chemical structure of the target material. For example, the input data may include a string expressed based on simplified molecular-input line-entry system (SMILES) code, smiles arbitrary target specification (SMARTS) code, international chemical identifier (InChi) code, or the like. Examples of a string are described with reference to
At least one descriptor may be generated in operation S40. The descriptor may have a value representing a characteristic of the target material, and there may be various descriptors corresponding to the target material. For example, as described below with reference to
At least one solubility parameter may be obtained in operation S60. For example, as shown in
The machine learning model ML may have a structure that is trained on training data. For example, the machine learning model ML may include an artificial neural network, a decision tree, a support vector machine, a Bayesian network, and/or a genetic algorithm. Hereinafter, an artificial neural network is mainly referred to in the descriptions of the machine learning model ML below, but embodiments are not limited thereto. As a non-limiting example, the artificial neural network may include a convolution neural network (CNN), a region with CNN (R-CNN), a region proposal network (RPN), a recurrent neural network (RNN), a stacking-based deep neural network (S-DNN), a state-space dynamic neural network (S-SDNN), a deconvolution network, a deep belief network (DBN), a restricted Boltzmann machine (RBM), a fully convolutional network, a long short-term memory (LSTM) network, or a classification network.
Solubility may be calculated in operation S80. For example, the solubility may be calculated based on the at least one solubility parameter obtained in operation S60. To determine the capacity of a solute to dissolve in a solvent, Gibbs free energy of mixing may be used. The Gibbs free energy of mixing may be calculated based on the variation of enthalpy of mixing, absolute temperature, and the variation of entropy of mixing. The Gibbs free energy of mixing that is less than zero may indicate that a solute dissolves well in a solvent. The Gibbs free energy of mixing that is greater than zero may indicate that a solute does not dissolve well in a solvent. The Gibbs free energy of mixing may be related to a dispersion force, a dipolar intermolecular force, and a hydrogen bond; and a distance R between solubility parameters in a Hansen space may be defined as Equation 1.
R
2=4(δDA−δDB)2+(δPA−δPB)2+(SHA−δHB)2 [Equation 1]
In Equation 1, δDA is a dispersion force (or energy by a dispersion force) between solute molecules. δPA is a dipolar intermolecular force (or energy by a dipolar intermolecular force) between the solute molecules. δHA is a force (or energy) of a hydrogen bond between the solute molecules. δDB is a dispersion force (or energy by a dispersion force) between solvent molecules. δPB is a dipolar intermolecular force (or energy by a dipolar intermolecular force) between the solvent molecules. δHB is a force (or energy) of a hydrogen bond between the solvent molecules. δDA, δPA, δHA, δDB, δPB, and δHB may be collectively referred to as Hansen solubility parameters. At least one solubility parameter may include a dispersion force parameter, a polar force parameter, and/or a hydrogen bond force parameter as Hansen solubility parameters. δDA, δPA, and δHA may be referred-to herein as first solubility parameters. δDB, δPB, and δHB may be referred-to herein as second solubility parameters. Solubility may be proportional to the reciprocal (1/R) of the distance R between solubility parameters. For example, solubility may be calculated as the reciprocal of R, i.e., 1/R. To calculate solubility based on Equation 1, solubility parameters of a solute and solubility parameters of a solvent may be obtained, and solubility may be calculated based on the obtained solubility parameters.
As described above, solubility may be quickly and accurately estimated using the machine learning model ML that is trained to output a solubility parameter from the chemical structure of a material, and accordingly, a solute and/or a solvent that is required by an application may be easily determined. In addition, because of easy determination of the solute and/or the solvent, the efficiency of applications using a solution may be enhanced. For example, as described with reference to
The semiconductor processes may include various sub-processes of forming patterns of the integrated circuit. For example, semiconductor processes may include photolithography, which may refer to a process of forming a pattern by transferring a geometric pattern from a photomask to a photosensitive chemical photoresist using light. Photoresist may include a positive photoresist, of which an exposed portion is soluble in developer, and a negative photoresist, of which an unexposed portion is soluble in developer.
Referring to
A positive photoresist may be applied to the wafer in a second state 22. As shown in
A photomask may be aligned above in the second state 22, and light, e.g., extreme ultraviolet (EUV) light, may be radiated to the aligned photomask. Accordingly, the positive photoresist exposed to the light, as shown in
Developer may be provided in the third state 23, and accordingly, a portion of a photoresist layer, which has been irradiated with the light, i.e., the material Y, may be dissolved in the developer and removed in a fourth state 24. A process of removing the portion of the photoresist layer, which has been chemically modified by light, may be referred to as developing. As shown in
As described above, the photoresist material X may be required to dissolve well in the first solvent, and the material Y modified from the photoresist material X by light may be required to dissolve well in the second solvent. Accordingly, to exactly form a designed pattern, it may be important to determine the photoresist material X, the first solvent, and the second solvent. To determine the photoresist material X, the first solvent, and the second solvent, the method of estimating solubility described above with reference to
Referring to
As described above with reference to
Referring to
In some embodiments, descriptors may include electrostatic descriptors. For example, as shown in the shaded portions in the table 50 of
Referring to
The machine learning model ML may generate Hansen solubility parameters δD, δP, and δH from the descriptor DES. For example, the machine learning model ML may have states, e.g., network topology, bias, and a weight, which are determined through training, and generate the Hansen solubility parameters δD, δP, and δH by processing the descriptor DES. When a solute is a composite of at least two materials, one or more solubility parameter may be calculated based on a weighted sum of at least two solubility parameters respectively corresponding to the at least two materials. When the solvent is a mixture of at least two solvents, one or more solubility parameters may be calculated based on a weighted sum of at least two solubility parameters corresponding to the at least two solvents. As described above with reference to
Referring to
Operation S40′ may include operations S42 and S44. First descriptors may be generated in operation S42, and second descriptors may be generated in operation S44. For example, the first descriptors indicating attributes of the solute may be generated from the first input data obtained in operation S22, and the second descriptors indicating attributes of the solvent may be generated from the second input data obtained in operation S24.
Operation S60′ may include operations S62 and S64. First solubility parameters may be obtained in operation S62, and second solubility parameters may be obtained in operation S64. For example, the first descriptors generated in operation S42 may be provided to the machine learning model ML in
Solubility may be calculated in operation S80′. For example, the distance R between solubility parameters in a Hansen space may be calculated from the first solubility parameters obtained in operation S62 and the second solubility parameters obtained in operation S64 using Equation 1, and the solubility of the solute in the solvent may be calculated as the reciprocal, 1/R, of the distance R.
Whether the solute is a composite may be determined in operation S62_2. As described above with reference to
When the solute is a composite, solubility parameters of materials may be obtained in operation S62_4. In other words, the solute that is a composite may be constituted of at least two materials, and solubility parameters corresponding to descriptors of each of the materials may be obtained from the machine learning model ML. For example, descriptors of the polymer of the photoresist material 30 of
The first solubility parameters may be calculated in operation S62_6. In other words, the solubility parameters of the solute may be calculated based on the solubility parameters obtained in operation S62_4. For example, when the solute is constituted of N materials (e.g., compounds) (where N is an integer greater than 1), the Hansen solubility parameters δDA, δPA, and δHA of the solute may be calculated using Equation 2.
In Equation 2, ci is a proportion of the mass or volume of an i-th material in the solute, δDAi is a dispersion force of the i-th material, δPAi is a polar force of the i-th material, and δHAi is an H-bond force of the i-th material, where 1≤i≤N.
When the solute is not a composite, the first solubility parameters may be obtained in operation S62_8. For example, descriptors of the solute may be provided to the machine learning model ML, and the machine learning model ML may provide the Hansen solubility parameters δDA, δPA, and δHA of the solute.
Whether the solvent is a mixture may be determined in operation S64_2. The solvent may be a mixture of at least two solvents, and the solubility parameters of the mixture, i.e., the second solubility parameters, may be derived using a different method than the solubility parameters of a solvent that is not a mixture. As shown in
When the solvent is a mixture, solubility parameters of solvents may be obtained in operation S64_4. In other words, solubility parameters corresponding to descriptors of each of the solvents in the mixture may be obtained from the machine learning model ML. For example, descriptors of a first solvent in the mixture may be provided to the machine learning model ML, and the machine learning model ML may provide the solubility parameters of the first solvent. Descriptors of a second solvent in the mixture may be provided to the machine learning model ML, and the machine learning model ML may provide the solubility parameters of the second solvent.
The second solubility parameters may be calculated in operation S64_6. In other words, the second solubility parameters may be calculated based on the solubility parameters obtained in operation S64_4. For example, when the mixture is constituted of M solvents (where M is an integer greater than 1), the Hansen solubility parameters δDB, δPB, and δHB of the mixture may be calculated using Equation 3.
In Equation 3, cj is a proportion of the mass or volume of a j-th solvent in the mixture, δDBj is a dispersion force of the j-th solvent, δPBj is a polar force of the j-th solvent, and δHBj is an H-bond force of the j-th solvent, where 1≤j≤M.
When the solvent is not a mixture, the second solubility parameters may be obtained in operation S64_8. For example, descriptors of the solvent may be provided to the machine learning model ML, and the machine learning model ML may provide the Hansen solubility parameters δDB, δPB, and δHB of the solvent.
Referring to
Training data may be obtained in operation S11. For example, the training data may include information about attributes of a plurality of sample materials. For example, the training data may include information about the chemical structure of a sample material and solubility parameters of the sample material, which are obtained through experiments. In the training data, the information about the chemical structure of a sample material may have various formats. For example, the information may have a string format including a series of characters, as described above with reference to
A plurality of sample descriptors may be generated in operation S13. For example, the training data obtained in operation S 11 may include information about the chemical structures of a plurality of sample materials, and descriptors of the sample materials, i.e., sample descriptors, may be generated based on the information included in the training data. Each of sample descriptors corresponding to a single sample material may include at least one number indicating an attribute of the sample material, as described above with reference to
Sample solubility parameters may be extracted from the training data in operation S15. As described above, the training data may include solubility parameters of each of the sample materials, wherein the solubility parameters are obtained through experiments, and the solubility parameters of sample materials, i.e., the sample solubility parameters, may be extracted from the training data in operation S15.
A database may be generated in operation S17. The database may be directly used for the training of the machine learning model ML. In some embodiments, the database may have the structure of
The machine learning model ML may be trained in operation S19. For example, the machine learning model ML may be trained based on a method using the database generated in operation S17. The machine learning model ML may be trained based on chemical structures of sample materials and sample solubility parameters. In some embodiments, the machine learning model ML may be trained based on supervised learning using a random forest and/or a Gaussian process. The machine learning model ML may be trained based on regression learning using at least one of a random forest and a Gaussian process. An example of operation S19 is described below with reference to
Referring to
A descriptor feature group may be set in operation S19_4. For example, sample descriptors corresponding to a reference importance level or higher may be selected from among the sample descriptors based on the importance levels identified in operation S19_2, and a descriptor feature group constituted of the selected sample descriptors may be set. In some embodiments, the at least one descriptor of the target material, which is generated in operation S40 in
The computing system 140 may include a stationary computing system, such as a desktop computer, a workstation, or a server, or a portable computing system such as a laptop computer. Referring to
The at least one processor 141 may be referred to as a processing unit and may execute a program like a CPU, a GPU, an NPU, or a DSP. For example, the at least one processor 141 may access the memory subsystem 144 through the bus 146 and execute instructions stored in the memory subsystem 144. In some embodiments, the computing system 140 may further include an accelerator as a dedicated hardware device designed to perform a certain function at a high speed. In some embodiments, the machine learning model ML in
The I/O interface 142 may include an input device, such as a keyboard or a pointing device, and/or an output device such as a display device or a printer, or may provide access to an input device and/or an output device. A user may trigger execution of a program 145_1 and/or loading of data 145_2 through the I/O interface 142, input the input data in
The network interface 143 may provide access to a network outside the computing system 140. For example, the network may include a plurality of computing systems and communication links. The communication links may include wired links, optical links, wireless links, or other types of links.
The memory subsystem 144 may store the program 145_1 for the method of estimating solubility, which has been described above with reference to the accompanying drawings, or at least part of the program 145_1. The at least one processor 141 may perform at least part of the method of estimating solubility by executing a program (or instructions) stored in the memory subsystem 144. The memory subsystem 144 may include ROM, RAM, or the like.
The storage 145 as a non-transitory storage medium may not lose data stored therein even when power supplied to the computing system 140 is interrupted. For example, the storage 145 may include a non-volatile memory device or a storage medium such as a magnetic tape, an optical disk, or a magnetic disk. The storage 145 may be detachable from the computing system 140. As shown in
While the inventive concept(s) of the present disclosure have been particularly shown and described with reference to embodiments thereof, it will be understood that various changes in form and details may be made therein without departing from the spirit and scope of the following claims.
Number | Date | Country | Kind |
---|---|---|---|
10-2021-0084874 | Jun 2021 | KR | national |
10-2021-0111204 | Aug 2021 | KR | national |