Structural data retrieval

In the first naive attempt a search for structures of compounds containing N—C—C grouping without any constraints on bond types or number of substituents at N and C atoms gave a huge amount of 7481 structures. This result prompted us to eliminate such formally enamine-like compounds as porphyrines, phthalocyanines, tetraazamacrocyclic compounds, purine and pyrimidine derivatives, as well as to formulate more precise conditions for selection of proper enamine groupings. To this end a new structural data retrieval from CSDS was performed in four stages. [Pg.93]

III. STATISTICAL ANALYSIS OF STRUCTURAL DATA RETRIEVED FROM THE CSD... [Pg.172]

Ikble 13.2. Relative occurrence of typical hydrogen-bonding contacts between functional groups of small molecule ligands and various polar amino acid residues in protein/ligand complexes, as indicated by structural data retrieved from the Brookhaven File (bb = backbone, Ar = aromatic moiety)... [Pg.558]

Structural data are readily available from the database, but it must be assumed that they were not originally collected with the requirements of the particular correlation in mind. It is essential that the substructure investigated be tightly defined, and that this is confirmed by careful checks of structures in the data set retrieved. (This requirement is squarely at odds with that for statistical respectability, for which the largest possible data set is generally desirable.) A last resort, if the question is important enough, is to collect new data. The structures examined can then be designed to answer specific questions. The answers to the questions will not, however, be available for months or even years. [Pg.92]

After the spectral matching process has been completed, the list of compounds with the top matching daughter spectra are identified and retrieved for each daughter spectrum in the reference compound. The molecular structures of the compounds with best matching spectra are drawn and compared for common substructures. The common substructures yield candidate spectrum/substructure correlations. Additional compounds are then tested to confirm or modify each correlation. Once the daughter spectrum is correlated with one or more substructures, this daughter spectrum is stored in the spectrum data base and is linked to the associated substructures stored in the structure data base. [Pg.328]

Data Retrieval System (DRS)(2). We find it particularly important to have the plotting programs readily available since this is, as I will illustrate by several examples, an important first step in analysis of structure-activity data. [Pg.301]

Chemoinformatics refers to the systems and scientific methods used to store, retrieve, and analyze the immense amount of molecular data that are generated in modern drug-discovery efforts. In general, these data fall into one of four categories structural, numerical, annotation/text, and graphical. However, it is fair to say that the molecular structure data are the most unique aspect that differentiate chemoinformatics from other database applications (1). Molecular structure refers to the 1-, 2-, or 3-D representations of molecules. Examples of numerical data include biological activity, p/C, log/5, or analytical results, to name a few. Annotation includes information such as experimental notes that are associated with a structure or data point. Finally, any structure... [Pg.65]

Resch et al. [39] summarise and discuss the whole chain from data retrieval, processing, analysis and visualisation. The general design and structure of such a chain is depicted in Fig. 2, starting with generalised sensors. These sensors can be traditional fixed monitors, but can also be mobile sensors installed on cars, ships or at short-term locations, for example on lampposts. These mobile or moveable sensors can be equipped with a geo-positioning system (GPS) to be employed in geo-information systems (GIS) (e.g. [40]). These applications have been made possible by miniaturisation of the GPS as well as the transfer of data by mobile phone systems. [Pg.290]

Physicochemical properties of amino acids are very useful descriptors for understanding the structures and properties of proteins. These properties are expressed numerically in indexes that can be retrieved from the AAindex database. Design an index database of physicochemical properties of amino acids with Microsoft Access that may facilitate the data retrieval according to their chemical similarities ... [Pg.101]

II. Product Summary Jubilant Biosys has created some content products in the bioinformatics and chemoinformatics area. These products leverage Jubilant s curation services to incorporate extensive curated databases with structured query modules and front-ends for data retrieval. The content is for the drug discovery process, specifically in the areas of target prioritization and lead identification. The databases are available in Oracle, SD format, and ISIS/Base DB formats and can be exported. The database can be queried across text, structure, substructure and sequences with built-in query modules. Some of the key parameters on which information is curated are ... [Pg.164]

The Protein Data Bank (PDB http //www.pdb.org) is the worldwide repository of three-dimensional structural data of biological macromolecules, such as proteins and nucleic acids (Berman et al. 2003). The Protein Data Bank uses several text file-based formats for data deposition, processing, and archiving. The oldest of these is the Protein Data Bank format (Bernstein 1977), which is used both for deposition and for retrieval of results. It is a plain-text format whose main part, a so-called primary structure section, contains the atomic coordinates within the sequence of residues (e.g., nucleotides or amino acids) in each chain of the macromolecule. Embedded in these records are chain identifiers and sequence numbers that allow other records to reference parts of the sequence. Apart from structural data, the PDB format also allows for storing of various metadata such as bibliographic data, experimental conditions, additional stereochemistry information, and so on. However, the amount of metadata types available is rather limited owing to the age of the PDB format and to its relatively strict syntax rules. [Pg.91]

As will become apparent in the later chapters, the statistical nature of the conclusions is such that the present amount of data in the crystallographic data base is inadequate. We find ourselves presenting results that show well-defined trends, but in doing so, have come to realize that we need a tenfold increase in the information from X-ray and neutron crystallography before the conclusions drawn can become really definitive. This tenfold increase in structural data will surely come within the next decade, and will be useful only if the publication, data storage, and retrieval mechanisms keep pace with the accelerating rate of data acquisition. [Pg.14]

Her early endeavors to characterize the branch points in starch, and pursuit of the then-elusive a-(l — 6) linkage in starch, drew Dr. Jeanes to dextrans, since these polysaccharides contain a-(l — 6) linkages as their main structural feature. Dextrans are a family of D-glucans produced micro-bially from sucrose they contain from 50 to 100% a-(l — 6) linkages, depending on the microbial strain used. Dr. Jeanes became an authority on dextran sources, structures, and industrial applications. She published comprehensive bibliographies on dextran, the first in 1950 and another in 1978. These were a labor of love, produced with much effort before the days of automated data retrieval. [Pg.8]

Folder/directory structures File structures File sizes File properties File access Storage media Storage capacity Data retrieval rates User interfaces... [Pg.717]

At about the same time, Terp et al. [11] used GRID/CPCA to analyze 10 MMPs with the intention of highlighting regions that could be potential sites for obtaining selectivity. Some of the structures were retrieved from the RCSB protein data bank [53], others were obtained through homology modeling [56]. To facilitate the analysis, the authors used the cut-out tool to focus on each of the six subsites in turn. [Pg.72]

The starting dataset used to develop the 3D quantitative structure property relationship (3D-QSPR) model consisted of 370 commercially available compounds. Activity data and 2D structures were retrieved from the Cerep database [18]. Inhibition of CYP 3A4 was reported as inhibition of the formation of 6y9-hydroxy-tes-tosterone [19]. Ketoconazole was used as reference compound so that all values are expressed as percentages. The log of the normalized CYP3A4 inhibition per-... [Pg.209]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...