Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...

Articles Figures Tables About

Searching for Similar Molecules in a Data Set

We have seen that RDF descriptors are one-dimensional representations of the 3D structure of a molecule. A classification of molecular structures containing characteristic structural features shows how the descriptor preserves effectively the 3D structure information. For this experiment, Cartesian RDF descriptors were calculated for a mixed data set of 100 benzene derivatives and 100 cyclohexane derivatives. Each compound was assigned to one of these classes, and a Kohonen neural network was trained with these data. The task for the Kohonen network was to classify the compounds according to their Cartesian RDF descriptors. [Pg.191]

FIGURE 6.10 Results of the classifieation with 100 benzene derivatives (dark) and 100 cycloaliphatic compounds (light). The topological map shows a clear distinction between compounds with planar benzene ring systems and nonplanar cyclic systems (rectangular network, Cartesian RDF, 128 components). [Pg.191]

Obviously, a Kohonen network is able to distinguish between planar and nonpla-nar structures via their descriptors. The use of this classification is that if we present a new unclassified compound to this trained network, its central neuron would indicate whether it is a benzene or cyclohexane derivative. [Pg.192]

The training for such an experiment is performed within seconds on a conventional personal computer. The prediction for a new compound just takes a few milliseconds the network is able to assign thousands of structures to their proper structure type within a few seconds. A comparable substructure search with conventional algorithms would take at least several hours. The method ideally is suited for automated in analysis high-throughput screening tasks. [Pg.192]

However, the task just described is a quite simple one, and the question is whether a classification is still successful and accurate with other structure types. If we extend a training data set with another class of structure, a new cluster occurs. The next experiment shows this with a small set of steroids included in the training set. Parts of the neurons containing descriptors of cyclic aliphatic compound split, and the steroid descriptors appear. [Pg.192]


See other pages where Searching for Similar Molecules in a Data Set is mentioned: [Pg.191]   


SEARCH



Data set

Search similarity

Searching for similarity

Similar molecules

Similarity searching

© 2024 chempedia.info