A Microfiche Source Code Appendix forms part of this application. The appendix, which includes a source code listing relating to an embodiment of the invention, includes 33 frames on 1 sheet of microfiche.
This patent document (including the source code appendix) contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document as it appears in the Patent and Trademark Office file or records, but otherwise reserves all copyright rights whatsoever.
This application relates to enhancing structure diagram generation.
A molecule is typically represented in a computer by a connection table that identifies atoms in the molecule and specifies connections (“bonds”) among the identified atoms. The connection table may also describe associated properties such as atom type, bond order, charge, and stereochemistry. A diagrammatic representation of the molecule may be derived from the connection table. Examples of a connection table and a corresponding diagram are illustrated in
In chemistry, with reference to
A ring system, which is also known as a “cyclic system”, is a group of rings such that (1) each ring shares one or more bonds with another ring in the group and (2) the group cannot be divided into smaller cyclic systems. An arrangement in which two rings are connected by a linking, non-cyclic (“acyclic”) bond is considered to include two cyclic systems, not one. As used herein, “ring system” has a meaning consistent with an understanding that a spiro ring includes two distinct ring systems.
A method and a system are provided for enhancing structure diagram generation (“SDG”). In SDG, aesthetic two-dimensional (“2-D”) coordinates for use in a diagrammatic representation (“diagram”) of a molecule are derived from a connection table for the molecule. SDG may also improve the aesthetic qualities of a chemical structure diagram having existing coordinates, if available. SDG is enhanced by expressing the symmetry present in the molecule, by making use of symmetry in the 2-D dynamics used to lay out rings and chains, by construction of bridges using an open polygon method together with a potential function, and by an elegant approach to the relative positioning of molecules (“free rectangle method”).
Other features and advantages will become apparent from the following description, including the drawings, and from the claims.
Structure diagram generation (“SDG”) is a process in which two dimensional (“2-D”) coordinates are derived from a connection table for a structure, allowing a diagram of the molecule to be displayed or printed. SDG is described in detail in H. E. Helson, “Structure Diagram Generation”, in “Reviews in Computational Chemistry”, K. B. Lipkowitz and D. B. Boyd, Eds., Wiley-VCH, New York, 1999, Vol. 13, at 313–398, which is incorporated herein. This application is filed simultaneously with a United States patent application entitled DERIVING CHEMICAL STRUCTURAL INFORMATION, Ser. No. 09/502,810 filed Feb. 11, 2000, which is incorporated herein.
Current methods include a less refined implementation of SDG. For example, earlier versions of CambridgeSoft Corporation's ChemDraw® program employed a feature that allowed users to regularize bond lengths and angles, and lay out ring systems. ChemDraw®, however, did not consult symmetry when creating two-dimensional organic structures, was unable to satisfactorily fabricate bridges, and lacked the ability to inter-position molecules. Although other methods of chemical structure generation have employed different methodologies, they suffered the same shortcomings, and there was no consideration in these methods for symmetry. Thus, there have been problems associated in the art with the creation of two-dimensional organic structures.
The present invention provides new methods of SDG. The new methods enhance SDG by improving the layout of chemical structure diagrams. The enhanced SDG provides users the ability to more quickly recognize chemical molecules. The enhanced SDG also allows users to more quickly recognize important features of chemical molecules, such as symmetry. As a result of the methods of the invention, the enhanced SDG methods are also useful for purposes of publication. In addition, the methods provide users the ability to improve chemical structure diagrams quickly and efficiently, thus avoiding tedious manipulation.
In SDG, the two-dimensional coordinates may be derived with or without preexisting coordinates. Cases without preexisting coordinates (“de novo” cases) are common and include chemical name translation, isomer enumeration, translation from a linear notation such as SMILES, nickname/superatom expansion, and automated structure elucidation.
In cases in which preexisting coordinates are available (“structure cleanup” cases), it may be possible to improve a structure diagram while preserving some or all existing stylistic choices. For example, if a structure diagram is drawn with or imported into a structure drawing program, the program may be directed to “clean up” undesirable aspects of the structure diagram. In another example, diagram improvements may be needed in the case of a synthesis planning program, in which structure diagrams are generally well drawn but may have had bonds broken and reformed in awkward locations.
SDG may also be needed in conversions of structure diagrams from three-dimensional (“3-D”) to 2-D. In at least some cases, structure diagrams that are stored and manipulated in a 3-D form may be converted to 2-D diagrams upon display to make the structure diagrams more easily recognizable to human users.
As a result, a connection table to which SDG is applied may have 2-D or 3-D coordinates or may lack coordinates.
SDG includes at least four possible stages (“phases”): perception, pre-assembly analysis, assembly, and post-assembly. The pre-assembly phase, if applicable, may include deriving a feature such as the shape of a ring system that is subsequently attached whole to an acyclic portion in the assembly phase. In the assembly phase, the neighbors of an atom that has been positioned (a “seed” atom) are each examined in turn, and are positioned at respective aesthetic angles and distances from the seed atom.
SDG is enhanced as described below.
In a first aspect of the enhancement, symmetry is used in the assembly phase, i.e., for general layout. Chemical structure diagrams that express molecular symmetry facilitate human interpretation of the chemical structures that are represented. For example, the presence of symmetry provides clues for the molecular substance's synthesis. Symmetry affects the substance's physical properties (particularly those affected by entropy), such as melting and boiling points, and heat of vaporization. Symmetry can affect the substance's light-bending properties. In particular, a substance that has a plane of symmetry is not “optically active”. In general, since the human eye tends to recognize symmetry quickly, a diagram that expresses a molecule's symmetry allows the symmetrical characteristics of the molecule to be rapidly perceived by a human viewer.
According to the enhancement, when a diagram is to be produced for a molecule, symmetry inherent in the molecule is perceived, and during layout of the structure diagram, representations of atoms and bonds are positioned to express the perceived symmetry. In a specific implementation, a plane of symmetry (also known as a mirror plane) perceived in a molecule is expressed vertically or horizontally (see
In a first step in using symmetry in general layout (see
Additionally, a “pivot” point is determined for each orbit (step 1020). The pivot point is determined to be the one or more atoms or bonds that resides at the graph-theoretic center of the atoms and bonds in the orbit, i.e., those atoms (bonds) having the smallest value of the largest graph-theoretic distance to any other atom (bond) in the orbit. The graph-theoretic distance between two atoms (bonds) is equal to the number of bonds (atoms) in the shortest path between them. For example, in
The “order” of each orbit for each instance of symmetry is also determined (step 1030). The order indicates whether the instance corresponds to a two-fold rotation, a three-fold rotation, a four-fold rotation, and so on, or a reflection. In cases in which the symmetry of an orbit includes both reflection and an N-fold rotation, N being greater than 2, it is advantageous to treat the instance as having an order indicating that the instance corresponds to the N-fold rotation. Thus, rotational symmetry takes priority over reflection if the associated rotation is at least three-fold.
When an atom or bond is positioned during the assembly phase (step 1040) (see
After all atoms and bonds have been placed, the structure diagram is rotated so that its mirror plane is horizontal or vertical (step 1060) (see
In another aspect of the enhancement of SDG, symmetry is used in a “dynamics” method of layout. A 2-D version of molecular dynamics is used in some situations to lay out structure diagrams of molecules in connection with designing new ring systems, improving existing ring systems, or laying out or improving acyclic portions. Such an effort may use a predefined set of optimal bond lengths and angles (“parameters”), or may seek to equalize adjacent lengths and angles. The process is iterative, wherein in each iteration the difference between a current parameter and an optimal parameter is calculated for each atom and bond, and is interpreted as a corrective force on the atom or bond, which affects the position of the atom or bond as submitted to the next iteration. The iterative process continues until the net corrective force on every atom or bond is zero or nearly zero, so that the structure diagram for the molecule is determined to be at equilibrium.
A method of adding symmetry as a parameter in dynamic ring layout is now described (
The instance of symmetry, regardless of character and origin, may be represented in any of several ways. In a specific implementation, the instance is represented by two lists of groups: a list of equivalent triplets of atoms, and a list of equivalent pairs of bonds (see
In each iteration for each triplet, a respective force term (“Fa”) is added for the atom in the center of the triplet (step 2020). An optimal interior angle (“optimal angle”) of the triplet of atoms is derived, as the average of the interior angles of all the triplets in an orbit, i.e., in a group of symmetrically equivalent atoms or bonds. Fa is based on, and in a specific implementation is proportional to, the difference between the optimal angle and the current angle. Fa acts along the angle's bisector, in a direction that would bring the angle closer to the optimal angle. Fa may compete with other terms, such as a bond angle term for equalizing adjacent bond angles.
In each iteration, another respective force term (“Fb”) is added for each symmetric bond (step 2030). Fb has the effect of lengthening or shortening a bond to make the bond's length more similar to the lengths of the other bonds in the orbit. A bond's length is changed by moving the atoms at the bond's endpoints closer together or farther apart. Thus Fb is expressed by treating Fb as a force on each of its two adjacent atoms. Fb may compete with other terms, such as a bond length term for equalizing adjacent bond lengths.
During each iteration, a net force on each atom is calculated, as the sum of the forces including Fa and Fb acting on the atom (step 2040). The position of each atom is moved by an amount proportional to the respective net force. In a specific implementation, the iterative process is determined to be complete when the largest net force to be accounted for in the iteration is smaller than a specified threshold size (step 2050).
Construction of bridged cyclic systems may involve problems of atom and bond overlap, and irregular angles. In another aspect of the enhancement of SDG, bridges in cyclic systems are constructed using an open polygon method in conjunction with a potential function. In an example illustrated in
In the open polygon method, coordinates of missing points are derived from two grounding points, the number of missing points, and an optimal bond length (“d”), such that, as shown in
The open polygon method can be used to create bridges. In
Rating=c1*Congestion+c2*max (0, (180−α)−threshold)+c3*|scale−1.0|
In such a function, c1, c2 and c3 are constants determined in a specific implementation; and scale is the ratio of d to the standard bond length. In a specific implementation, the bond angle term is active only above a certain threshold, such as 120 degrees. The version of the bridge that minimizes the rating is chosen.
In another aspect of the enhancement of SDG, a placement procedure is executed to arrange molecule structure diagrams closely together without overlapping. In at least some cases, the procedure is executed as a final step of SDG, after the molecule structure diagrams have been produced individually. The procedure is analytic in that the procedure does not rely on an indefinite number of iterations and is not affected by the starting positions of the components.
A specific implementation of the procedure is now described (
A “free rectangle” list is maintained that keeps track of which areas of the display area are unused (step 3020). The list is initialized to one free rectangle that occupies all of 2-D space and extends from negative infinity to positive infinity in both X and Y dimensions.
The boxes are sorted, and each is treated as follows, in order of decreasing area (step 3030). A free rectangle is selected that is closest to the center of the boxes and that is large enough to contain the instant box (step 3040). The center of a collection (“conglomeration”) of boxes is defined as the average of the centers of the boxes weighted by the boxes' respective areas, or, as the center of the smallest rectangle that can enclose the boxes. The instant box is positioned flush with that corner of the free rectangle that is closest to the center of the growing collection (initially at coordinates (0,0)), and is imprinted on the free rectangle (step 3050). In imprinting, the original free rectangle is replaced by zero or more new free rectangles. New free rectangles may be created in the leftover space, i.e., wherever the box does not occlude the original rectangle (see
In a specific implementation, overlapping free rectangles may be merged to help avoid a profusion of inconsequential free rectangles (step 3060). For example, rules may be enforced that dictate that two free rectangles should be merged such that the resulting free rectangle does not extend over any points not contained in either progenitor, provided that the percentage of area lost in the merger is less than a specified size, such as ten percent of the original area.
The conglomerate of boxes is translated so that its center is at coordinates (0,0) (step 3070). The molecule diagram coordinates are translated so that their centers coincide with their corresponding box centers (step 3080).
A practical example of the molecule arrangement procedure is illustrated in
All or a portion of the procedures described above may be implemented in hardware or software, or a combination of both. In at least some cases, it is advantageous if the technique is implemented in computer programs executing on one or more programmable computers, such as a personal computer running or able to run an operating system such as UNIX, Linux, Microsoft Windows 95, 98, 2000, or NT, or MacOS, that each include a processor, a storage medium readable by the processor (including volatile and non-volatile memory and/or storage elements), at least one input device such as a keyboard, and at least one output device. Program code is applied to data entered using the input device to perform the technique described above and to generate output information. The output information is applied to one or more output devices such as a display screen of the computer.
In at least some cases, it is advantageous if each program is implemented in a high level procedural or object-oriented programming language such as Perl, C, C++, or Java to communicate with a computer system. However, the programs can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language.
In at least some cases, it is advantageous if each such computer program is stored on a storage medium or device, such as ROM or optical or magnetic disc, that is readable by a general or special purpose programmable computer for configuring and operating the computer when the storage medium or device is read by the computer to perform the procedures described in this document. The system may also be considered to be implemented as a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner.
Other embodiments are within the scope of the following claims. For example, a non-human entity such as a computer program may serve as a source for input information such as the connection table or as a recipient of output information such as diagrammatic data. In another example, one or more techniques based on the description herein may be applied to adapting structure diagrams for purposes other than presentation to a human user.
This application claims the benefit of U.S. Provisional Application Ser. No. 60/119,654 entitled STRUCTURE DIAGRAM GENERATION filed on Feb. 11, 1999, incorporated herein.
Number | Name | Date | Kind |
---|---|---|---|
4473890 | Araki et al. | Sep 1984 | A |
4747059 | Hirayama et al. | May 1988 | A |
4811217 | Tokizane et al. | Mar 1989 | A |
4908781 | Levinthal et al. | Mar 1990 | A |
5025388 | Cramer et al. | Jun 1991 | A |
5157736 | Boyer et al. | Oct 1992 | A |
5249137 | Wilson et al. | Sep 1993 | A |
5345516 | Boyer et al. | Sep 1994 | A |
5379234 | Wilson et al. | Jan 1995 | A |
5418944 | DiPace et al. | May 1995 | A |
5424963 | Turner et al. | Jun 1995 | A |
5434796 | Weininger | Jul 1995 | A |
5448498 | Namiki et al. | Sep 1995 | A |
5461580 | Facci et al. | Oct 1995 | A |
5486995 | Krist et al. | Jan 1996 | A |
5577239 | Moore et al. | Nov 1996 | A |
5619421 | Venkataraman et al. | Apr 1997 | A |
5699268 | Schmidt | Dec 1997 | A |
5740072 | Still et al. | Apr 1998 | A |
5740425 | Povilus | Apr 1998 | A |
5778377 | Marlin et al. | Jul 1998 | A |
5841678 | Hasenberg et al. | Nov 1998 | A |
5851272 | Vicenzi | Dec 1998 | A |
5854992 | Shakhnovich et al. | Dec 1998 | A |
5874564 | Ecker et al. | Feb 1999 | A |
5940807 | Purcell | Aug 1999 | A |
5950192 | Moore et al. | Sep 1999 | A |
5956711 | Sullivan et al. | Sep 1999 | A |
5978804 | Dietzman | Nov 1999 | A |
5978848 | Maddalozzo, Jr. et al. | Nov 1999 | A |
5980096 | Thalhammer-Reyero | Nov 1999 | A |
6007691 | Klock, Jr. | Dec 1999 | A |
6014449 | Jacobs et al. | Jan 2000 | A |
6023659 | Seilhamer et al. | Feb 2000 | A |
6023683 | Johnson et al. | Feb 2000 | A |
6038562 | Anjur et al. | Mar 2000 | A |
6055516 | Johnson et al. | Apr 2000 | A |
6061636 | Horlbeck | May 2000 | A |
6081789 | Purcell | Jun 2000 | A |
6119104 | Brumbelow et al. | Sep 2000 | A |
6125383 | Glynias et al. | Sep 2000 | A |
6128582 | Wilson et al. | Oct 2000 | A |
6128619 | Fogarasi et al. | Oct 2000 | A |
6178384 | Kolossv.ang.ry | Jan 2001 | B1 |
6185506 | Cramer et al. | Feb 2001 | B1 |
6185548 | Schwartz et al. | Feb 2001 | B1 |
6189013 | Maslyn et al. | Feb 2001 | B1 |
6199017 | Tomonaga et al. | Mar 2001 | B1 |
6219622 | Schmidt | Apr 2001 | B1 |
6226620 | Oon et al. | May 2001 | B1 |
6236989 | Mandyam et al. | May 2001 | B1 |
6240374 | Cramer et al. | May 2001 | B1 |
6246410 | Bergeron et al. | Jun 2001 | B1 |
6256647 | Toh et al. | Jul 2001 | B1 |
6272472 | Danneels et al. | Aug 2001 | B1 |
6295514 | Agrafiotis et al. | Sep 2001 | B1 |
6311134 | Sorenson | Oct 2001 | B1 |
6319668 | Nova et al. | Nov 2001 | B1 |
6323852 | Blower, Jr. et al. | Nov 2001 | B1 |
6324522 | Peterson et al. | Nov 2001 | B2 |
6326962 | Szabo | Dec 2001 | B1 |
6332138 | Hull et al. | Dec 2001 | B1 |
6341314 | Doganata et al. | Jan 2002 | B1 |
6453064 | Aikawa et al. | Sep 2002 | B1 |
6505172 | Johnson et al. | Jan 2003 | B1 |
6519611 | Zong | Feb 2003 | B1 |
6542903 | Hull et al. | Apr 2003 | B2 |
6571245 | Huang et al. | May 2003 | B2 |
6582233 | Clark | Jun 2003 | B1 |
6584412 | Brecher | Jun 2003 | B1 |
6618852 | van Eikeren et al. | Sep 2003 | B1 |
6631381 | Couch et al. | Oct 2003 | B1 |
6654736 | Ellis et al. | Nov 2003 | B1 |
6665685 | Bialic | Dec 2003 | B1 |
6675105 | Hogarth et al. | Jan 2004 | B2 |
6678577 | Stylli et al. | Jan 2004 | B1 |
6721754 | Hurst et al. | Apr 2004 | B1 |
6751615 | Nisler et al. | Jun 2004 | B2 |
6871198 | Neal et al. | Mar 2005 | B2 |
6884394 | Hehenberger et al. | Apr 2005 | B1 |
7054754 | Brecher | May 2006 | B1 |
7295931 | Helson | Nov 2007 | B1 |
7356419 | Culot et al. | Apr 2008 | B1 |
20020049548 | Bunin | Apr 2002 | A1 |
20020165853 | Gogolak | Nov 2002 | A1 |
Number | Date | Country |
---|---|---|
401161578 | Jun 1989 | JP |
WO 9958474 | Nov 1999 | WO |
Number | Date | Country | |
---|---|---|---|
60119654 | Feb 1999 | US |