Hamming distance frequencies

If the binary descriptors for the objects s and t are substructure keys the Hamming distance Eq. (6)) gives the number of different substructures in s and t (components that are 1 in either s or but not in both). On the other hand, the Tanimoto coefficient (Eq. (7)) is a measure of the number of substructures that s and t have in common (i.e., the frequency a) relative to the total number of substructures they could share (given by the number of components that are 1 in either s or t). [Pg.407]

Relatedness refers to the Hamming distance between master sequence and mutant and is expressed by the number of mutation events that are required to produce the mutant from the master. The frequency of individual mutants in the quasispecies is determined by their fitness and the Hamming distance from the master sequence. [Pg.196]

Fig. 2.5. A quasi-species-type mutant distribution around a master sequence. The quasi-species is an ordered distribution of polynucleotide sequences (RNA or DNA) in sequence space. A fittest genotype or master sequence /m, which is commonly present at highest frequency, is surrounded in sequence space by a cloud of closely related sequences. Relatedness of sequences is expressed (in terms of error classes) by the number of mutations which are required to produce them as mutants of the master sequence. In case of point mutations the distance between sequences is the Hamming distance. In precise terms, the quasi-species is defined as the stable stationary solution of Eq. (2) [16,19, 20], In reality, such a stationary solution exists only if the error rate of replication lies below a maximal value called the error threshold. In this region, i.e. below...

An important feature of the replication-mutation kinetics of Eq. (2) is its straightforward accessibility to justifiable model assumptions. As an example we discuss the uniform error model [18,19] This refers to a molecule which is reproduced sequentially, i.e. digit by digit from one end of the (linear) polymer to the other. The basic assumption is that the accuracy of replication is independent of the particular site and the nature of the monomer at this position. Then, the frequency of mutation depends exclusively on the number of monomers that have to be exchanged in order to mutate from 4 to Ij, which are counted by the Hamming distance of the two strings, d(Ij,Ik) ... [Pg.12]

The frequency at which the set of sequences with Hamming distance d is produced as error copies of the template is... [Pg.246]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...