Statistics of Sequences

Bryant, S. H., and Altschul, S. F. (1995). Statistics of sequence-structure threading. Curr. [Pg.271]

Sanchez has also derived a melting theory of random copolymers from the statistics of sequence placements, such that for crystalline content Xc and monomer mole fraction x. [Pg.278]

In conclusion, the isospedfic steric control should be due to the chirality of M, whereas the syndiospecific steric control should be due to the chirality of the last unit of the growing chain end. This conclusion also agrees with the statistics of sequence distribution observed in ethylene-propylene copolymers (74). [Pg.46]

Analysis of tlie global statistics of protein sequences has recently allowed light to be shed on anotlier puzzle, tliat of tlie origin of extant sequences [170]. One proposition is tliat proteins evolved from random amino acid chains, which predict tliat tlieir length distribution is a combination of the exponentially distributed random variable giving tlie intervals between start and stop codons, and tlie probability tliat a given sequence can fold up to fonii a compact... [Pg.2844]

For example, Stolorz et al. [88] derived a Bayesian formalism for secondary structure prediction, although their method does not use Bayesian statistics. They attempt to find an expression for / ( j. seq) = / (seq j.)/7( j.)//7(seq), where J. is the secondary structure at the middle position of seq, a sequence window of prescribed length. As described earlier in Section II, this is a use of Bayes rule but is not Bayesian statistics, which depends on the equation p(Q y) = p(y Q)p(Q)lp(y), where y is data that connect the parameters in some way to observables. The data are not sequences alone but the combination of sequence and secondary structure that can be culled from the PDB. The parameters we are after are the probabilities of each secondary structure type as a function of the sequence in the sequence window, based on PDB data. The sequence can be thought of as an explanatory variable. That is, we are looking for... [Pg.338]

Next consider a product ensemble, UN, composed of sequences of N statistically independent symbols each drawn from the ensemble U. The probability of a sequence, u = ( > n) is given by... [Pg.198]

Let XN,YN be a product ensemble of sequences of N input letters, x = ( j, , cbn), and N output letters, y = (flt , %), from a discrete memoryless channel. The probability distribution on the input, Pr(x) is arbitrary and does not assume statistical independence between letters. However, since the channel is memoryless, Pr(y x) satisfies... [Pg.212]

Statistical characteristics of the second type define the microstructure of copolymer chains. The best known characteristics in this category are the fractions P [/k) (probabilities) of sequences Uk involving k monomeric units. The simplest among them are the dyads U2, the complete set of which, for example, for a binary copolymer is composed of four pairs of monomeric units M2M, M2M2. The number of the types of k-ad in chains of m-component copolymers grows exponentially as mk so that with practical purposes in mind it is generally enough to restrict the consideration to sequences Uk] with moderate values of k. Their calculation turns out to be rather useful... [Pg.165]

A user-friendly computer program has been developed (A.S.Yakovlev, S.LKuch-anov Copolymerization for Windows ) which makes it possible at any values of conversion to calculate for m=2-6 along with the composition of monomer mixture x, such statistical characteristics as instantaneous X and average (x j copolymer composition as well as the fractions (P Uk of sequences Uk with k=2-4 and... [Pg.180]

In some very recent work by Karssenberg et al. [130], attempts have been made to improve the analytical ability of a technique like NMR spectroscopy to effectively predict the distribution of sequence lengths in polyethylene-alkene copolymers. They analyzed the entire [ C-NMR spectrum for homogeneous ethylene-propene copolymers. They used quantitative methods based on Markov statistics to obtain sequence length distributions as shown in Figure 22 [130]. The... [Pg.162]

The prime objective of this concise review is to provide an illustration of the interaction of these two disciplines using particular examples. In choosing the examples, we seek to demonstrate the potentialities of the conformation-dependent design of the sequences of monomeric units in heteropolymer macromolecules. Under such a design, their chemical structure is controlled not only by the kinetic parameters of a reaction system but also by the conformational statistics of polymer chains. [Pg.143]

Eigenvalues of the operator Qr are real while the largest of them, Af, equals unity by definition. As a result, in the limit n-> oo all items in the sum (Eq. 38), excluding the first one, Q Q f = Xr/Xfh will vanish. In this case, chemical correlators will decay exponentially along the chain on the scale n 1/ In AAt values n < n the law of the decay of these correlators differs, however, from the exponential one even for binary copolymers. This obviously testifies to non-Markovian statistics of the sequence distribution in molecules (see expression Eq. 11). The closer is to unity, the greater are the values of n. The situation when n 1 corresponds to proteinlike copolymers. [Pg.158]

The FID library was applied to the task of predicting the protein folds encoded in complete genomes using the recently developed program IMPALA, which is a modification of PSI-BLAST that effectively reverses the search protocol (Schaffer et al., 1999). PSI-BLAST compares a PSSM to a database of sequences by contrast, a single search by IMPALA is a comparison of a sequence to a library of PSSMs (Fig. 3B). Statistical tests with IMPALA have shown that the theory used for the evaluation of BLAST results is applicable with minimal modifications. [Pg.258]

Now let us examine the distribution and position of disulfides in proteins. The simplest consideration is distribution in the sequence (see Fig. 51), which is apparently quite random, except that there must be at least two residues in between connected half-cystines. Even rather conspicuous patterns such as two consecutive halfcystines in separate disulfides turn out, when the distribution is plotted for the solved structures (Fig. 51), to occur at only about the random expected frequency. The sequence distribution of halfcystines is influenced by the statistics of close contacts in the three-dimensional structures, but apparently there are no strong preferences of the cystines that could influence the three-dimensional structure. [Pg.229]

Fox, R.J. and. Huisman, G.W., Enzyme optimization moving from blind evolution to statistical exploration of sequence-function space. Trends Biotechnol, 2008, 26, 132-138. [Pg.115]

Statistical copolymers are copolymers in which the sequential distribution of the monomeric units obeys known statistical laws e.g. the monomeric-unit sequence distribution may follow Markovian statistics of zeroth (Bemoullian), first, second or a higher order. Kinetically, the elementary processes leading to the formation of a statistical sequence of monomeric units do not necessarily proceed with equal a priori probability. These processes can lead to various types of sequence distribution comprising those in whieh the arrangement of monomeric units tends towards alternation, tends towards... [Pg.370]

Those block copolymers, derived from more than two monomers, that also exhibit statistical block sequence arrangements are named according to the principles of Rule 2.1. [Pg.375]

The study of the stereoregularity of the polymers prepared, provides also Information about the stereoregulating mechanism. The probability of formation of the different types of sequences, was determined on the basis of the resonance of the quaternary carbon of pVP (12). The NMR spectrum performed at 15 MHz allows one to determine the concentration of triads. The values summarized In Table 4 do not agree with those expected for bernoullllan statistics. Hence, more than the last unit of the living chain Is Involved In the process. In order to obtain more precise Information about the process, It is necessary to measure the probability of formation of pentads. Such measurements are possible with spectra performed at 63 MHz (Figure 18). In spite... [Pg.260]

This relation gives a physical interpretation for the parameter cr a 1/2 equals the average length of a helical sequence in a sufficiently long chain at the midpoint of a helix-coil transition. Thus, as a becomes smaller, the helical portion of such a chain consists, on the statistical average, of a smaller number of sequences. [Pg.76]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...