Connection tables encoding

The additional stereoinformation has to be derived from the graphical representation and encoded into stereodescriptors, as described above. The stereodescriptors are then stored in corresponding fields of the connection table (Figure 2-76) [50, 51]. [Pg.82]

The molecular editor consists of a java applet that is embedded in the HTML document. It encodes the drawing into a connection table in inol-format, which is sent to the web server. [Pg.528]

Leatherface is a 2-D molecular editor that modifies properties of atoms and bonds in molecular connection tables according to rules specified by the user. Unlike Permute, Leatherface encodes no chemical knowledge and neither processes nor generates 3-D structures. Its real strength is that it allows the user to impose a very detailed and precisely specified chemical view on large numbers of connection tables. [Pg.279]

If one s purpose is to determine only the presence or absence in a data base of a specific structure, this can be accomplished with the search option IDENT , as is shown in Figure 11. This program hash-encodes the query structure connection table and searches through a file of hash-encoded connection table for an exact match. The search, which is very fast by substructure search standards, has been designed specifically for those users who, to comply with the Toxic Substances Control Act [26l have to determine the presence or absence of specific compounds in Environmental Protection Agency files. [Pg.271]

Three sets of molecular descriptors that can be computed from a molecular connection table are defined. The descriptors are based on the subdivision and classification of the molecular surface area according to atomic properties (such as contribution to logP, molar refractivity, and partial charge). The resulting 32 descriptors are shown (a) to be weakly correlated with each other (b) to encode many traditional molecular descriptors and (c) to be useful for QSAR, QSPAR, and compound classification. [Pg.261]

Better starting points for developing QSPRs are connection tables that encode the molecular constitution, including information about atom and bond types. Molecular... [Pg.12]

One of the most widely used chemical structure-encoding schemas in the pharmaceutical industry is the MDL Connection Table (CT) File Format. Both Molfile and SD File are based on MDL CT File Format to represent chemical structures. A Molfile represents a single chemical structure. An SD File contains one to many records, each of which has a chemical structure and other data that are associated with the structure. MDL Connection Table File Format also supports RG File to describe a single Rgroup query, rxnfile, which contains structural information of a single reaction, RD File, which has one to many records, each of which has a reaction and data associated with the reaction, and lastly, MDL s newly developed XML representation of the above—XD File. The CT File Format definition can be downloaded from the MDL website http //www.mdl.com/downloads/public/ctfile/ctfile.jsp. [Pg.3]

This is a representation of the 2D chemical structural formula, encoded as a connectivity table (Chapter 1, Section 1.1), i.e. as a graph, in which the nodes are atoms and the edges are bonds hence the structure may be described in terms of atom and bond properties, as illustrated in Figure 3.1 a. [Pg.77]

A special type of bond-type assignment problem concerns tautomerism. Some chemical databases provide specific tautomeric bond types, or analyse a query for either the keto or enol constructions and add the missing one (as an. OR. option) automatically. In the CSD, only the single tautomeric form identified by the crystal structure determination is encoded in the connectivity tables. The software has not yet been upgraded to accommodate searches for both forms, hence the onus is on the user to encode both representations if that is the desired goal of a query. [Pg.104]

For many computer tasks and for the transfer of structiural information from one computer program to another, a linear representation of the chemical structure may be more suitable. " A popular linear representation is the SMILES notation. Part of its appeal is that for acyclic structures the SMILES is similar to the traditional linear diagram. For example, ethane is denoted by CC and ethylene C=C. Examples of additional SMILES are given in Figure 4. SMILES is the basis of a chemical information system, and this notation provides a convenient framework for more sophisticated computer coding of chemistry described below. For some internal computer functions, structures encoded in a linear notation may be converted to connection tables. [Pg.218]

There are only three fundamental ways in which connection table formats differ what information is stored, how that information is represented, and how those representations are encoded. GEMINI achieves a high degree of generality by dividing its task into these three parts. [Pg.195]

A string-processing language is used to specify a connection table format. This language allows succinct specification of both the content and encoding of most external formats that we have encountered, but is currently Umited to character-oriented files. [Pg.195]

Connection table formats specify an encoding for each kind of informational representation which can be used. An encoding is a convention used to indicate the possible values that representations can take. The number of possible encodings for any given representation is very high. For example, the R/S representation of absolute chirality can be encoded as the characters R and S , the digits 1 and 2 , or many others. [Pg.197]

A complete encoding convention for a connection table format also specifies how the representational encodings are organised (e.g., in a file or character stream). [Pg.197]

The approach used here for connection table interpretation assumes that there are limited numbers of kinds of information which exist in the desired formats, and that it is known how to convert the information from each of these to a suitable internal form. The interpretation of any instance of a connection table can then be divided into two completely independent parts extracting the information from the table and converting it to internal form. This allows automated interpretation, given the meaning (representation and encoding) of each field of a connection table format. [Pg.197]

Estabhsh a set of information types which is adequate to describe the information contained in the desired connection table formats. Select a single data representation and encoding for each type of information this will become GEMINI S internal encoding. [Pg.197]

Collect the set of data representations used in the desired connection table formats, and select a single encoding convention for each one. This will be known as the set of transfer variables. [Pg.197]

The choice of the set of variables used internally to represent transfer information is not critical as long as it accurately describes all information to be derived from the desired connection table format. A slight benefit is associated with choosing a set of variables which store information independently. Any encoding convention may be used which is adequate to specify all desired values (including missing, as appropriate). [Pg.198]

There are few enough different informational representations used in connection tables that they can be explicitly enumerated this is the set of transfer variables. An encoding convention is selected for each one. The GEMINI transfer variables are shown in Table 1. [Pg.198]

Since target connection tables use representations other than those used in GEMINI S internal encoding, conversions will sometimes be needed to produce an internal encoding. The methods used for this are known as conversion algorithms. [Pg.198]

Note that the set of transfer variables is redundant with respect to the variables used in the internal encoding (i.e., an internal variable can be specified by more than one transfer variable). This is a natural consequence of the multiplicity of representations available for connection table information. Furthermore, a connection table format may be inherently redundant (i.e., the same information may be specified more than once, or in more than one way). Therefore, another algorithm is needed to specify the order that conversion algorithms are applied to the transfer variables to produce an internal encoding. This algorithm is listed in Table 2. [Pg.198]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...