Duplicate structures

The transformation used above to enumerate tautomers would lead to identical products when applied to symmetrically substituted pyrazoles. The set of structures generated in the enumeration process is converted to a sorted list of canonical SMILES [23] from which duplicates are easily eliminated. Structures registered in alternative tautomeric forms are converted to identical lists of SMILES that can each be represented by their common first member. This effectively extends the definition of canonical SMILES to cover an ensemble of tautomeric forms and makes it possible to check for duplicate structures without having to register multiple forms [16, 26]. [Pg.281]

There are nine heptane isomers of formula C7H16. Write structural formulas for each. Name each by the IUPAC system. (In working a problem such as this, proceed systematically by constructing first the heptane, then all the possible hexanes, the pentanes, and so on. Should you inadvertently duplicate a structure, this will become apparent when you name it duplicate names usually are easier to spot than duplicate structures.)... [Pg.65]

Approximately 3,500 structures were removed because of missing atoms (usually H) in the coordinate sets, atoms with incorrect valences (usually H with 0 or more than one bond) and bad bond angles. Finally, duplicate structures were removed to leave just the lowest R-factor entry. The space groups included in the analysis are given in Table 2. [Pg.190]

Normalization of the structure data set e.g., removal of explicit hydrogen atoms, standard representation of nitro, azido and similar groups, deionization , and removal of duplicate structures). [Pg.165]

Before finally storing the structure in the database, the registration program may search the database for some level of match to the input structure or reaction, and skip the registration if it is a duplicate. This is sometimes termed "deduplication" through "exact match" searching. There is usually some redundancy in chemical databases, and to save search time and disk space, most companies do not store duplicate structures or reactions, but rather store pointers to them. [Pg.378]

Information that is put into this Database is derived from published reports of crystal structure determinations. The data extracted from the scientific literature in this way include the atomic coordinates, information on the space group, chemical connectivity, and the literature reference to each structure determination. Each compound listed in the Database is identified by a six-letter code (the REFCODE), unique to each crystal structure determination. Duplicate structures and remeasurements of the same crystal structure are identified by an additional two digits after the REFCODE. Scientific journals are scanned regularly by the Database staff for reports of crystal structure determinations, and the data are then entered into this Database. Structural data are also deposited by journals, for example. Chemical Communications, that publish articles, but do not have space for atomic coordinates. All crystallographic data reported in the literature are tested by the Database staff for internal consistency, precision, and chemical reasonableness. In... [Pg.693]

SECS proved to be useful in mechanistic as well as synthetic research. The program was used to systematically generate all Wagner-Meerwein products, with elimination of duplicate structures and calculation of strain energies, in order to find the most likely mechanism for acid-catalyzed rearrangement of tetrahydro-Binor-S to diamantane 24... [Pg.294]

Another experimental technique to study the self-association of phenols is to investigate how molecules of phenols pack together in the crystalline state. This type of analysis is made possible by the availability of the computer-based CSD. The CSD contains unit-cell dimensions of more than 230,000 (April 2001 release) three-dimensional crystal-structure determinations that have been studied by X-ray or neutron diffraction. Each crystal structure is identified by a unique six-letter code, called its REFCOD, with an additional two digits for duplicate structures and measurements. [Pg.549]

Hints systematize your approach to these problems. For the isomers of a six carbon formula, for example, start with the isomer containing all six carbons in a straight chain, then the isomers containing a five-carbon chain, then a four-carbon chain, etc. Carefully check your answers to AVOID DUPLICATE STRUCTURES.)... [Pg.45]

It might be helpful to delay the creation of the indexes when the schema is first created and its tables populated. This is especially true if millions of compounds are to be entered at one time. However, if there are duplicate structures and the table contains even two rows with the same isosmi, it will not be possible to create a unique index on the isosmi column until only a unique set of isosmi values exists. The creation of a unique index does not fix nonunique values. It simply prevents nonunique values. In order to find duplicate structures in a table, the following SQL can be used. [Pg.162]

CONFORT performs an exhaustive conformational analysis of a molecule [71]. Two different search modes either generate a user-defined number of conformations, or output a maximally diverse set of conformations, which was used in this study. The diversity metric is based on interconformational distances that circumvent the generation of duplicate structures. The conformations are relaxed and optimized by applying only internal coordinates and analytic gradients and by the Tripos force field package. [Pg.207]

As always one must guard against drawing duplicate structural formulas of the same isomer. Location of the double bond between the far right pair of carbons gives a duplicate of 1(a) while location of the double bond between the second and third carbons from the right-hand side gives a duplicate of I(i). There is only one constitutional isomer with skeleton II ... [Pg.234]

Z = 4 (both structures have a single independent molecule in the cell). A suitable clustering of the complete set of predicted structures should thus remove duplicate structures of this type. [Pg.350]

When a reaction is applied to a given list of structures, it is frequently true that some product structures occur many times in the "raw products list In mechanistic studies, this is the desired result because each occurrence of a product represents a unique reaction pathway (see Results and Discussion) In structure elucidation studies, though, the important information is the chemical identity of, not the pathways to, each product, and in such applications it is necessary to eliminate duplicate structures This is not a simple matter because although structures are chemically equivalent their representations within the... [Pg.201]

One issue we have not yet addressed with orderly generation is computational complexity. Although orderly generation is certainly faster than labeled enumeration followed by a removal of the duplicated structures, is it the optimum solution First we have to ask what optimum means when... [Pg.236]

Such duplicates should be avoided to keep search and answer spaces as small as possible. Thus, the duplicate structures must be detected during structure generation and removed. In addition, it is important to describe chemical compounds as precisely eis possible e.g. during the search for QSPRs. In fact, there are molecular descriptors that... [Pg.69]

Remove duplicate structures (two structures are considered identical if they have the same atom types and the corresponding atoms are less than 0.5 A apart). In the authors example, after this step, 12,654 structures remained. [Pg.95]

Due to the close developmental relationship of the urinary and the genital tract, malformations frequently occur in both of these systems. Major renal anomalies are common in patients presenting with unilateral obstruction or agenesis of duplicated structures derived from the mullerian duct. [Pg.144]

The program ONESMILE removes duplicate structures from a sorted list of SMILES. Thus, after the MODSMI transformations, sorting and ONESMILE would be used to produce a file of the unique molecules. Notice that this is possible only because all MODSMI operations produce the unique SMILES with the result that each particular molecular structure is represented by the same SMILES string regardless of the order of the atoms in the structure from which it originated. [Pg.322]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...