PSOLA

The caveat of the PSOLA analysis and resynthesis method is that it only works satisfactorily on sounds containing regular cycles displaying a single energy point, referred to as the local maxima the human voice fulfils this requirement, but not all sounds do. The good news is that PSOLA is fast, as it deals directly with the sound samples in the time-domain. [Pg.118]

The Praat system on the accompanying CD-ROM (in the folder praat) features an efficient PSOLA tool. Please refer to the user manual which is accessed via the program s Menu panel for instructions on how to use the PSOLA tool. [Pg.119]

Pitch-scale modification with the PSOLA method 301... [Pg.293]

The PSOLA method. The PSOLA (Pitch Synchronous OverLap-Add) method [Moulines and Charpentier, 1990] was designed mainly for the modification of speech signals. For time-scale modifications, the method is a slight variation of the technique described above, in which the length of the repeated/discarded segments is adjusted... [Pg.450]

Figure 7.8 Pitch-scale modification with the PSOLA method. The short-time segments extracted from the original signal (top) are overlap/added at a different rate in the modified signal. Here, the pitch is raised (P (t) < P (f)), and segment 2 is repeated to compensate for the modification of the duration.

Perhaps the mostly widely used second generation signal processing techniques are the family called pitch synchronous overlap and add, (shortened to PSOLA and pronounced /p ax s ow 1 ax/). These techniques are used to modify the pitch and timing of speech but do so without performing any explicit source/filter separation. The basis of all the PSOLA techniques is to isolate individual pitch periods in the original speech, perform modification, and then resynthesise to create the final waveform. [Pg.427]

Time domain pitch synchronous overlap and add or TD-PSOLA is widely regarded as the most popular PSOLA technique and indeed may well be the most popular algorithm overall for pitch and timing adjustment [194], [322], [474]. [Pg.427]

One of the key steps in both TD-PSOLA and FD-PSOLA is proper manipulation of the epochs. First, an epoch detector of the type deseribed in Seetion 12.7.2 is used to find the instants of glottal closure. This results in the analysis epoch sequence T" =< >. From this, the local... [Pg.428]

Figure 14.2 Timing manipulation with PSOLA. Here the original pitch is kept but the section of speech is made longer by the duplication of frames.

Figure 14.3 Pitch manipulation with PSOLA. A new set of synthesis epoehs are created which are...

We can now ask ourselves, how does TD-PSOLA work Or in other words, after all we have said about explicit source/filter separation how is it that we have been able to change the characteristics... [Pg.431]

Figure 14.5 Explanation of how it is possible to change pitch in PSOLA without changing the spectral envelope characteristics.

The above technique bears some similarities to the TD-PSOLA technique in that it uses a pitch-synchronous analysis to isolate individual pitch periods, after which modification and resynihesis is performed. In fact in a technique called linear prediction pitch synchronous overlap and add or LP-PSOLA, we can use the PSOLA more or less directly on the residual rather than the waveform. As above, epoch detection is used to find the epochs. The residual is then separated into a number of symmetrical frames centred on the epoch. Pitch modification is performed by moving the residual frames closer or further away, and duration modification is performed by duplication or elimination of frames, in just the same way as in TD-PSOLA. The only difference is that these operations are performed on the residual which is then fed into the LP filter to produce speech. This technique differs only from the Hunt technique in the shape of the frames. Both techniques uses window functions with their highest point at the epoch in Hunt s technique the windows are asymmetrical with the idea that they are capturing a single impulse, in LP-PSOLA the windows are symmetrical. In listening tests, the two techniques produced virtually identical quality speech. [Pg.435]

The family of techniques known as sinusoidal models use this as their basic building block and performs speech modification by finding the sinusoidal components for a waveform and performing modification by altering the parameters of the above equation, namely the amplitudes, phases and frequencies. It has some advantages over models such as TD-PSOLA in is that it allows adjustments in the frequency domain. While frequency domain adjustments are possible in the linear prediction techniques, the sinusoidal techniques facilitate this with far fewer assumptions about the nature of the signal and in particular don t assume a source and all-pole filter model. [Pg.436]

In principle we could perform Fourier analysis to find the model parameters, but for reasons explained below, it is in fact advantageous to follow a different procedure that is geared towards our synthesis goals. For purposes of modifying pitch, it is useful to perform the analysis in a pitch synchronous manner and in fact one of the main advantages of sinusoidal modelling is that the accuracy of this does not have to be as high as that for PSOLA [420], [293], [520]. [Pg.436]

Given the parameters of the model, we can reconstruct a time domain waveform for each frame by use of the synthesis Equation 14.3. Figure 14.6 shows a real and resynthesised frame of speech. An entire waveform can be resynthesised by overlapping and adding the frames just as with the PSOLA method (in fact the use of overlap add techniques was first developed for conjunction with sinusoidal models). [Pg.438]

The only real drawback of TD-PSOLA in terms of quality is that it is very sensitive to errors in epoch placements. In fact, it is safe to say that if llie epochs are not marked with extremely high accuracy, then the speech quality from TD-PSOLA systems can sound very poor. The effect of inaccurate epochs is to make the synllietic speech soimd hoarse as if the speaker is straining then-voice, or has an infection or some other ailment. This is not surprising as it is known that the effect hoarseness in natural speech arises because of irregular periodicity in the source. [Pg.441]

PSOLA which operates in the time domain. It separates the original speech into fi-ames pitch-s5mchronousfy and performs modification by overlapping and adding these fi ames onto a new set of epochs, created to match the synthesis specification. [Pg.446]

Residual excited linear prediction performs LP analysis, but uses the whole residual in res mthesis rather than an impulse. The residual is modified in a manner very similar to that of PSOLA. [Pg.446]

MBROLA is a PSOLA like technique which uses sinusoidal modelling to decompose each frame and from this resynthesise the database at a constant pitch and phase, thus alleviating many problems in inaccurate epoch detection. [Pg.446]

Big Chemical Encyclopedia

Chemical substances, components, reactions, process design ...