Chemical structure, representation connectivity tables

A connection table has been the predominant form of chemical structure representation in computer systems since the early 1980s and it is an alternative way of representing a molecular graph. Graph theory methods can equally well be applied to connection table representations of a molecule. [Pg.40]

One of the most widely used chemical structure-encoding schemas in the pharmaceutical industry is the MDL Connection Table (CT) File Format. Both Molfile and SD File are based on MDL CT File Format to represent chemical structures. A Molfile represents a single chemical structure. An SD File contains one to many records, each of which has a chemical structure and other data that are associated with the structure. MDL Connection Table File Format also supports RG File to describe a single Rgroup query, rxnfile, which contains structural information of a single reaction, RD File, which has one to many records, each of which has a reaction and data associated with the reaction, and lastly, MDL s newly developed XML representation of the above—XD File. The CT File Format definition can be downloaded from the MDL website http //www.mdl.com/downloads/public/ctfile/ctfile.jsp. [Pg.3]

The conversion of chemical names and identihers into appropriate chemical structure representations offers the ideal path for chemists and organizations to mine chemical information. Because chemical names are not unique and a multitude of labels can map to a single chemical entity, the facile conversion of alphanumeric text identihers to a connection table representation enables superior data capture, representation, indexing, and mining. The industry s need to mine more information from both the historical corpus as well as new sources is obvious, and a number of researchers have initiated research into the domain of chemical identiher text mining and conversion. Multiple efforts have been made in the held of bioinformatics research,8 and, while interesting as a parallel, in this chapter we will focus the efforts to extract and convert identihers related to chemical entities rather than, for example, genes, enzymes, or proteins. [Pg.23]

The lUPAC International Chemical Identifier (InChl) is a relatively recent arrival on the chemical structure representation scene, and combines some of the characteristics of connection table, line notation and registry number identifier. A comprehensive technical description has yet to be published, though substantial details are given in the documentation which accompanies the open-source software provided by lUPAC, and a number of authors have provided good overviews. " ... [Pg.171]

Data attached to arbitrary atoms (and bonds) can be used to (a) annotate a particular set of atoms, e.g., as part of a pharmacophore or as reactive sites (b) annotate fragments, e.g., with stoichiometric multipliers of a salt or solvate, as major or minor, or as active or inert (c) describe an unknown portion of a structure by attaching descriptive data to a nvdl or atom (d) explain more fully the nature of a particular site (atom), e.g., stereochemical purity, isotopic purity (see Figure 10). In short, the use of SGroup data permits a user-extensible chemical structure representation. The user can define new data fields and attach values to atoms. These values are fully a part of the connection table and are searchable both with exact match and with substructure searching. [Pg.229]

The usual chemical structure representation (CSR) of a compound in chemical information systems is the connection table, adjacency matrix, linear notation, or structural lists which describe the molecular structure. The CSR contains all the information required to characterize fully the chemical structure of a compound, i.e., the chemical nature of atoms, bonding information, stereochemistry, and geometry (see Structure Representation). [Pg.172]

The atom-bond connection table is the predominant form of chemical structure representation in computer systems. A connection table can have a greater or lesser degree of sophistication, and can contain a larger or smaller amount of information. At its most basic level, the connection table represents the structure by listing the atoms and bonds present in tabular form (Table 1). Each atom is arbitrarily numbered, and each row in the table shows for an atom its element type and the number(s) of the atom(s) to which it is directly connected. The bond order of the connection is shown as an integer code (1 = single bond, 2 = double bond, etc.). [Pg.2820]

For transmi.ssion and exchange of chemical structure representations, appropriate Internet protocols will become increasingly important, though basic connection table file formats (and for large databases, where compactness will remain important, modem line notations) will retain significant u.se for some time to come, and will indeed remain important aspects of the Internet protocols, albeit hidden from all but the programmers. [Pg.2825]

To learn more about connection tables and matrix representations of chemical structures... [Pg.15]

The representation of a chemical reaction should include the connection table of all participating species starting materials, reagents, solvents, catalysts, products) as well as Information on reaction conditions (temperature, concentration, time, etc.) and observations (yield, reaction rates, heat of reaction, etc.). However, reactions are only Insuffclently represented by the structure of their starting materials and products,... [Pg.199]

Figure 6-1. Different forms of representation of a chemical graph a) labeled (numbered) graph b) adjacency matrix c) connectivity table, type I d) connectivity table, type II f) line notations g) structural index.

Figure 10.3-16. Graphical representation of the chemical structure of the reactants and products of a chemical reaction a) as a 2D image b) with structure diagrams showing all atoms and bonds of the reactants and products to indicate how this information is stored in a connection table.

Four main approaches have been suggested for the representation of chemical structures in machine-readable form fragment codes, systematic nomenclature, linear notations, and connection tables. [Pg.188]

Structure searching is the chemical equivalent of graph isomorphism, that is, the matching of one graph against another to determine whether they are identical. This can be carried out very rapidly if a unique structure representation is available, because a character-by-character match will then suffice to compare two structures for identity. However, connection tables are not necessarily unique, because very many different tables can be created for the same molecule depending upon the way in which the atoms in the molecule are numbered. Specifically, for a molecule containing N atoms, there are N ... [Pg.189]

Historically, most chemists have modeled the structure of molecules using a highly idealized platonic representation, where atoms are represented as vertices and bonds as paths between vertices. Chemoinformatics has very successfully adopted this representation and based many of its techniques around the metaphor of the connection table , i.e., a list of all atoms and bonds, which occur in the molecule. While this approach is quite successful for well defined chemical entities, it begins to break down for rapidly interconverting isomers, for example, and is completely inappropriate for polymers. In the majority of cases, the successful application of chemoinformatics to a given problem depends on the availability of a connection table. [Pg.112]

Eakin [13] describes the chemical structure information system at Imperial Chemical Industries Ltd., where registration is based on Wiswesser Line Notation. For connection tables, the unique, unambiguous representation is derived automatically, i.e., a single, invariant numbering of the connection table is algorithmically derived. [Pg.135]

With the variety of chemical substance representations, i.e., fragment codes, systematic nomenclature, linear notations, and connection tables, a diversity of approaches and techniques are used for substructure searching. Whereas unique, unambiguous representations are essential for some registration processes, it is important to note that this often cannot be used to advantage in substructure searching. With connection tables, there is no assurance that the atoms cited in the substructure will be cited in the same order as the corresponding atoms in the structure. With nomenclature or notation representation systems, a substructural unit may be described by different terms or... [Pg.135]

Depending on the sophistication needed, substructure searching can be accomplished with a variety of the representations of a chemical substance. Some substructure searches can only be adequately answered by a complete atom-by-atom and bond-by-bond search for which a connection table, with its explicit description of full structural detail, is essential. [Pg.137]

Substantial attention and progress has been made in the development of procedures to effect conversion between chemical substance representations. Zamora and Davis [26] describe an algorithm to convert a coordinate representation of a chemical substance (derived from input by a chemical typewriter) to a connection table. An approach for interactive input of a structure diagram and conversion of this representation to a connection table suitable for substructure searching is discussed by Feldmann [27]. The conversion of systematic nomenclature to connection tables offers a powerful editing tool as well as a potential mechanism for conversion of name files to connection tables this type of conversion is described by Vander Stouw [28]. [Pg.140]

The conversion from a connection table to other unambiguous representations is substantially more difficult. The connection table is the least structured representation and incorporates no concepts of chemical significance beyond the list of atoms, bonds, and connections. A complex set of rules must be applied in order to derive nomenclature and linear notation representations. To translate from these more structured representations to a connection table requires primarily the interpretation of symbols and syntax. The opposite conversion, from the connection table to linear notation, nomenclature, or coordinate representation first requires the detailed analysis of the connection table to identify appropriate substructural units. The complex ordering rules of the nomenclature or notation system or the esthetic rules for graphic display are then applied to derive the desired representation. [Pg.141]

It can be seen that a number of information in CM is redundant (each bond is listed twice) and that a large portion of matrix is empty (elements are equal to zero). This indicates the structure can be represented more economically with a table of constant width w. Such representation requires only wN instead of N variables. In the i-th row of the new representation w data associated with i-th atom (chemical symbol of the element, sequential numbers and bond types to its neighbors) are stored. Such representation is called the connection table of a chemical structure or CT (Fig. 4.2). [Pg.70]

As a compound is represented by a square matrix having a number of rows and columns equal to its number of atoms, it is clear that such a representation cannot be used for input purposes, but is rather an internal representation of structures and reactions. Furthermore, as the numbering of the atoms in a chemical compound is a priori arbitrary, several connectivity tables can be deduced from a given compound. Thus, special algorithms are required to obtain a canonical connectivity table. For example, the Chemical Abstracts Services have recourse to an algorithm devised by Morgan [234]. [Pg.320]

Barnard, J. M., M. F. Lynch, and S. M. Welford. 1982. Computer storage and retrieval of generic structures in chemical patents. 4. An extended connection table representation for generic structures. J. Chem. Inf. Comput. Sci. 22(3) 160-164. [Pg.74]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...