Next |
Prev |
Top
|
JOS Index |
JOS Pubs |
JOS Home |
Search
In this paper an analysis/synthesis technique based on a sinusoidal
representation was presented that has proven to be very appropriate
for signals which are well characterized as a sum of inharmonic
sinusoids with slowly varying amplitudes and frequencies. The
previously used harmonic vocoder techniques have been relatively
unwieldy in the inharmonic case, and less robust even in the harmonic
case. PARSHL obtains the sinusoidal representation of the input
sound by tracking the amplitude, frequency, and phase of the most
prominent peaks in a series of spectra computed using the Fast
Fourier Transform of successive, overlapping, windowed data frames,
taken over the duration of a sound. We have mentioned some of the
musical applications of this sinusoidal representation.
Continuing the work with this analysis/synthesis technique we are
implementing PARSHL on a Lisp Machine with an attached FPS AP120B
array processor. We plan to study further its sound transformation
possibilities and the use of PARSHL in conjunction with other
analysis/synthesis techniques such as Linear Predictive Coding (LPC)
[10].
The basic ``FFT processor'' at the heart of PARSHL provides a ready
point of departure for many other STFT applications such as FIR
filtering, speech coding, noise reduction, adaptive equalization,
cross-synthesis, and many more. The basic parameter trade-offs
discussed in this paper are universal across all of these
applications.
Although PARSHL was designed to analyze piano recordings, it has
proven very successful in extracting additive synthesis parameters
for radically inharmonic sounds. It provides interesting effects when
made to extract peak trajectories in signals which are not
describable as sums of sinusoids (such as noise or ocean recordings).
PARSHL has even demonstrated that speech can be intelligible after
reducing it to only the three strongest sinusoidal components.
The surprising success of additive synthesis from spectral peaks
suggests a close connection with audio perception. Perhaps timbre
perception is based on data reduction in the brain similar to that
carried out by PARSHL. This data reduction goes beyond what is
provided by critical-band masking. Perhaps a higher-level theory of
``timbral masking'' or ``main feature dominance'' is appropriate,
wherein the principal spectral features serve to define the timbre,
masking lower-level (though unmasked) structure. The lower-level
features would have to be restricted to qualitatively similar
behavior in order that they be ``implied'' by the louder features.
Another point of view is that the spectral peaks are analogous to the
outlines of figures in a picture--they capture enough of the
perceptual cues to trigger the proper percept; memory itself may then
serve to fill in the implied spectral features (at least for a time).
Techniques such as PARSHL provide a powerful analysis tool toward
extracting signal parameters matched to the characteristics of
hearing. Such an approach is perhaps the best single way to obtain
cost-effective, analysis-based synthesis of any sound.
Next |
Prev |
Top
|
JOS Index |
JOS Pubs |
JOS Home |
Search
Download parshl.pdf