Next |
Prev |
Top
|
JOS Index |
JOS Pubs |
JOS Home |
Search
Parameter Modifications (Step 6)
The possibilities that STFT techniques offer for modifying the
analysis results before resynthesis have an enormous number of
musical applications. Quatieri and McAulay
[20] give a good
discussion of some useful modifications for speech applications. By
scaling and/or resampling the amplitude and the frequency
trajectories, a host of sound transformations can be accomplished.
Time-scale modifications can be accomplished by resampling the
amplitude, frequency, and phase trajectories. This can be done
simply by changing the hop size
in the resynthesis (although for
best results the hop size should change adaptively, avoiding
time-scale modifications during voice consonants or attacks, for
example). This has the effect of slowing down or speeding up the
sound while maintaining pitch and formant structure. Obviously this
can also be done for a time-varying modification by having a
time-varying hop size
. However, due to the sinusoidal
representation, when a considerable time stretch is done in a
``noisy'' part of a sound, the individual sinewaves start to be heard
and the noise-like quality is lost.
Frequency transformations, with or without time scaling, are also
possible. A simple one is to scale the frequencies to alter pitch and
formant structure together. A more powerful class of spectral
modifications comes about by decoupling the sinusoidal frequencies
(which convey pitch and inharmonicity information) from the spectral
envelope (which conveys formant structure so important to speech
perception and timbre). By measuring the formant envelope of a
harmonic spectrum (e.g., by drawing straight lines or splines across
the tops of the sinusoidal peaks in the spectrum and then smoothing),
modifications can be introduced which only alter the pitch or only
alter the formants. Other ways to measure formant envelopes include
cepstral smoothing [15] and the fitting of
low-order LPC models to the inverse FFT of the squared magnitude of
the spectrum [9]. By modulating the flattened (by dividing
out the formant envelope) spectrum of one sound by the
formant-envelope of a second sound, ``cross-synthesis'' is obtained.
Much more complex modifications are possible.
Not all spectral modifications are ``legal,'' however. As mentioned
earlier, multiplicative modifications (simple filtering,
equalization, etc.) are straightforward; we simply zero-pad
sufficiently to accomodate spreading in time due to convolution. It
is also possible to approximate nonlinear functions of the spectrum
in terms of polynomial expansions (which are purely multiplicative).
When using data derived filters, such as measured formant envelopes,
it is a good idea to smooth the spectral envelopes sufficiently that
their inverse FFT is shorter in duration than the amount of
zero-padding provided. One way to monitor time-aliasing distortion is
to measure the signal energy at the midpoint of the inverse-FFT
output buffer, relative to the total energy in the buffer, just
before adding it to the final outgoing overlap-add reconstruction;
little relative energy in the ``maximum-positive'' and ``minimum
negative'' time regions indicates little time aliasing. The general
problem to avoid here is drastic spectral modifications which
correspond to long filters in the time domain for which insufficient
zero-padding has been provided. An inverse FFT of the spectral
modification function will show its time duration and indicate
zero-padding requirements. The general rule (worth remembering in
any audio filtering context) is ``be gentle in the frequency
domain.''
Next |
Prev |
Top
|
JOS Index |
JOS Pubs |
JOS Home |
Search
Download parshl.pdf