Canonizing molecular graphs

A chemical compound should be unambiguously identifiable by a unique label. For decades, traditional chemical nomenclature describing the structure served this purpose more or less well. However, with compounds under study becoming more and more complex, chemical names also became ever more complex. As a result, many chemical names now are lengthy, difficult to pronounce and unwieldy. Chemical names were thus superseded by drawings of the structures in chemists everyday-life, which is considered by many the natural language of molecular science. [Pg.204]

On the other hand, a structure can be drawn in various ways, such that there is no one-to-one correspondence between a compound and a particular drawing. Further, the atoms in a drawing may be numbered in memy ways (n = n (n - 1) 2 1 numberings for a compound containing n atoms), so that the computer representations derived (connection tables, bond matrices), although unambiguous, are not unique. This was discussed above in detail and leads to the introduction of unlabeled structures. [Pg.204]

For some time registry numbers seemed to be a solution to the problem, at least for the bench chemist and the layman, in that a new registry number is attributed to a compound when it is first registered by Chemical Abstracts Service (CAS RN) or Beilstein (BRN). This number then serves as the compound s unique ID. This procedure leaves the agency with the problem to compare a seemingly new compound to all those already present in the database. As a further principal limitation, a RN is not available for an unpublished compound. Furthermore, registry numbers can also be given to mixtures thus one compound may also have several RNs. [Pg.204]

Many canonization methods have been proposed. For earlier procedures described in the chemical literature see the paper by Jochum and Gasteiger and references cited therein [143]. Randic considered a bond matrix as canonical if the minimum binary number resulted when the rows of its upper half were concatenated [241]. Hendrickson instead used the maximum number obtained from the upper half matrix [122], Kvasnicka and Pospichal prefered the maximum number obtained from the lower half matrix [170,171]. Although such an extremality requirement obviously leads to a unique numbering, this is not necessary. Rather, the goal may be achieved using one of many procedures, provided it is well-defined and leaves no room for arbitrariness. Other canonization procedures have also been developed [37,330]. [Pg.205]

The real merit of graph invariants in the present context is that they often allow the comparison of two compounds without the need for a rigorous isomorphism test. Similarly, vertex-in-graph invariants, though sometimes identical for nonequivalent nodes, often allow easy comparison of graph nodes, which renders the ensuing rigorous canonization far less difficult. [Pg.205]

V. Kvasnicka, J. Pospichal, Canonical Indexing and Constructive Enumeration of Molecular Graphs, J.Chem.Inf.Comput.Sci., 30 (1990) 99-105. [Pg.57]

Local vertex invariants are used to calculate several molecular - topological indices by applying different operators such as addition of LOVIs, addition of squares of LOVIs, addition of reciprocal geometric means for any pair of adjacent vertices. Moreover, they can be used to obtain - canonical numbering of molecular graphs and compare molecules in order to study - molecular branching and centricity. [Pg.281]

Faulon, J.-L. (1998). Isomorphism, Automorphism Partitioning, and Canonical Labeling Can Be Solved in Polynomial-Time for Molecular Graphs. J.Chem.lnf.Comput.Scl, 38,432-444. [Pg.566]

The elements of the decimal adjacency vector are integers that were used for canonical numbering of molecular graphs [Randic, 1974]. [Pg.6]

Two-dimensional representations alternative to the molecular graph are the linear notation systems, for example, Wiswesser Line Notation system (WLN) [Smith and Baker, 1975], SMILES [Weininger, 1988, 1990, 2003 Weininger, Weininger et al., 1989 Convard, Dubost et al., 1994 Hinze and Welz, 1996], and SMARTS (SMART - Daylight Chemical Information Systems, 2004). CAST (CAnonical representation of STereochemistry) is a method that gives a linear notation that canonically represents stereochemistry around a specific site in a molecule [Satoh, Koshino et al, 2000, 2001, 2002],... [Pg.514]

Faulon, J.-L. (1998) Isomorphism, automorphism partitioning, and canonical labeling can be solved in polynomial-time for molecular graphs. J. Chem. Inf. Comput. Sci., 38, 432-444. [Pg.1037]

A precondition for an efficient manipulation of BE-matrices in the computer is a canonical indexing of the atoms in a molecule. In order to generate a unique numbering we use the connectivity matrix of the molecular graph and the labels already assigned to its vertices, i.e. the chemical symbols. [Pg.48]

Partitioning, and Canonical Labeling Can Be Solved in Polynomial-Time for Molecular Graphs. [Pg.276]

Canonization of molecular structures. Often two or more seemingly different molecular graphs represent one and the same chemical compound. In particular, the atoms in a molecule can be numbered in various ways, which may lead to problems in compound identification. To avoid such problems, structural formulas have to be generated in a canonized data structure, so that two libraries are easily compeued to detect overlaps. [Pg.7]

Line (1) mns through the whole library of molecular graphs M,. In line (2) the size of substructures is limited by setting a lower and upper Umit for the number of edges. In line (3) S is canonically numbered, and in line (4) the count of S in M, is incremented. If a substructure is encountered for the first time, it is inserted into Map and associated with a vector of zeroes and size of . At the end, Map[S][i] contains the count of S in Mj. [Pg.251]

V. Kvasnicka and). Pospfchal. Canonical indexing and constructive enumeration of molecular graphs./. Chem. Inf Comput. Sci., 30 99-105,1990. [Pg.466]

The problem of canonical coding, graph isomorphism, and graph automorphism has both mathematical and chemical significance. The mathematical formulation of the problem is briefly set out below, and some cormections with the chemical counterpart are presented. In the subsequent sections, the main algorithms used in chemistry for canonical coding of molecular graphs and constitutional symmetry perception are presented and compared. [Pg.168]

In a series of papers Uchino used the matrix multiplication method for obtaining the canonical code and automorphisms of a molecular graph. He considered adjacency, distance, and open walks matrices in a series of efficient algorithms which offer the automorphism partition of graphs. [Pg.181]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...