Descriptor binary

If the binary descriptors for the objects s and t are substructure keys the Hamming distance Eq. (6)) gives the number of different substructures in s and t (components that are 1 in either s or but not in both). On the other hand, the Tanimoto coefficient (Eq. (7)) is a measure of the number of substructures that s and t have in common (i.e., the frequency a) relative to the total number of substructures they could share (given by the number of components that are 1 in either s or t). [Pg.407]

The similarity matrices are constructed by one in-house program developed inside CHIRBASE using the application development kit of ISIS. They contain the similarity coefficients as expressed by the Tanimoto method. In ISIS, the Tanimoto coefficients are calculated from a set of binary descriptors or molecular keys coding the structural fragments of the molecules. [Pg.113]

To demonstrate the use of binary substructure descriptors and Tanimoto indices for cluster analysis of chemical structures we consider the 20 standard amino acids (Figure 6.3) and characterize each molecular structure by eight binary variables describing presence/absence of eight substructures (Figure 6.4). Note that in most practical applications—for instance, evaluation of results from searches in structure databases—more diverse molecular structures have to be handled and usually several hundred different substructures are considered. Table 6.1 contains the binary substructure descriptors (variables) with value 0 if the substructure is absent and 1 if the substructure is present in the amino acid these numbers form the A-matrix. Binary substructure descriptors have been calculated by the software SubMat (Scsibrany and Varmuza 2004), which requires as input the molecular structures in one file and the substructures in another file, all structures are in Molfile format (Gasteiger and Engel 2003) output is an ASCII file with the binary descriptors. [Pg.270]

FIGURE 6.5 PCA score plot (a) of n — 20 standard amino acid structures characterized by m — 8 binary descriptors (27.1% and 20.5% of the total variance preserved in PCI and PC2). In the lower plots (b) presence/absence of selected four substructures is indicated. [Pg.272]

For similarity searching, all molecules are described by an appropriate binary descriptor (consisting of only zeros and ones). Such a binary fingerprint contains all structural information for a particular molecule and was applied at Aventis to identify new Kvl.5 inhibitors in the compound collection. [Pg.228]

For binary descriptors, the most commonly used distance function is the Tanimoto (or Jaccard) coefficient given in (5). Here x and y are two binary sets (encoded molecules). AND is the bitwise and operation (a bit in the result is set if both corresponding bits in the two operands are set), and lOR the bitwise inclusive or operation (a bit in the result is set if either corresponding bits in the two operands is set). The result, T, is a measure of the number of features shared by the two molecules relative to the ones they could have in common. [Pg.139]

When using binary descriptors, the Euclidean distance can be reformed to the form given in (8). Where NOT(x) denotes the binary complement of x, and the expression XOR x, NOT(y)) represents the number of bits that are identical in x and y (either ones or zeroes). [Pg.139]

A special case of cell-based methods is a diversity measure proposed for binary fingerprints. Unlike continuous descriptors, binary descriptors such as structural keys and hashed fingerprints can be compared using fast binary operations to give rapid estimates of molecular similarity, diversity, and complementarity. The most common example of a diversity measure applied to binary descriptors is the binary union (inclusive or ). This can be exploited in a number of different ways elegant examples can be found in the following references. ... [Pg.142]

In addition to binary descriptors, molecular holograms are also useful for similarity searches. Similar to fingerprint, hologram is a vector that contains numerical... [Pg.4017]

The total number of binary descriptors (i.e. the matrix columns) is... [Pg.183]

The most common subcases of indicator variables are the binary descriptors, which are bi-valued variables taking the value of 1 when the considered characteristic is present in the molecule and the value of 0 when the characteristic is absent these descriptors are usually indicated by the symbol Ichan where char is the considered characteristic. [Pg.234]

Binary descriptors should be used when the considered characteristic is really a dual characteristic of the molecule or when the considered quantity cannot be represented in a more informative numerical form. In any case, the - mean information content of a binary descriptor /char is low (the maximum value is 1 when the proportions of 0 and 1 are equal), thus the standardized Shannon s entropy = /char/log2 , where n is the number of elements, gives a measure of the efficiency of the collected information. [Pg.234]

Binary descriptors are used in -> Free-Wilson analysis and DARC/PELCO analysis. [Pg.234]

Special distance measures must be used for data whose variables are represented by -> binary descriptors, i.e. variables represented by values either zero or one. [Pg.397]

Wavelets transforms are useful for compression of descriptors for searches in binary descriptor databases and as alternative representations of molecules for neural networks in classification tasks. [Pg.97]

If a training has been performed in reverse mode, a descriptor command will be available — instead of a property command — which opens a chart containing a comparison of two descriptors. In contrast to a property vector, a descriptor can be directly searched for in a binary descriptor database (e.g., to search for corresponding structures). The result window contains then a hit list and two three-dimensional molecule models one displaying the original molecule of the test set entry (if available), and the other showing the molecule of the actually selected entry in the hit list of similar molecules. [Pg.157]

A molecule with an RDF descriptor most similar to the one retrieved from the neural network is searched in the binary descriptor database using the minimum RMS error or the highest correlation coefficient between the descriptors. [Pg.184]

The query compound is considered as unknown that is, only infrared spectrum is used for prediction. The prediction of a molecule is performed by a search for the most similar descriptors in a binary descriptor database. The database contains compressed low-pass filtered D20 transformed RDF descriptors of 64 components each. The descriptors originally used for training (Cartesian RDF, 128 components) were compressed in the same way before the search process. [Pg.184]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...