Conclusions

Search JOS Website, CCRMA Website, or Web

JOS Home Page

JOS Online Publications

Index of terms in JOS Website

PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation

Applications

Acknowledgments

Click for http://ccrma.stanford.edu/CCRMA/Courses/152/hearing.html

Click for http://ccrma.stanford.edu/~jos/bbt/Bark_Frequency_Scale.html

Click for http://ccrma.stanford.edu/~jos/parshl/

Click for http://www.harmony-central.com/Computer/Programming/Audio-EQ-Cookbook.txt

Click for http://scienceworld.wolfram.com/physics/Noise.html

Click for http://ccrma.stanford.edu/~jos/filters/FIR_Digital_Filters.html

This paper describes a system for audio analysis, modification, and synthesis, based on the Short Time Fourier Transform (STFT). The system is intended both as a tool for sound manipulation, and as a means to reinforce people's intuitions regarding the relationships between timbre and the harmonic structure of music and other audio signals, as conveyed via their spectrograms. This is done by creating a 3D spectrogram which shows a sound's harmonic structure in great detail as it is sampled. — Click for http://cnmat.cnmat.berkeley.edu/~alan/MS-html/MSv2.html#RTFToC1

The Fast Fourier Transform (FFT) is a fast algorithm for computing the Discrete Fourier Transform (DFT) — Click for http://faculty.prairiestate.edu/skifowit/fft/

Click for http://www.fact-index.com/l/li/linear_predictive_coding.html

The Fast Fourier Transform (FFT) is a fast algorithm for computing the Discrete Fourier Transform (DFT) — Click for http://faculty.prairiestate.edu/skifowit/fft/

Spectrum analysis of sound is analogous to decomposing white light into its component colors by means of a prism — Click for http://ccrma.stanford.edu/~jos/mdft/DFT_Applications.html

Click for http://www.ee.columbia.edu/~dpwe/resources/matlab/pvoc/

Click for http://www.glenbrook.k12.il.us/gbssci/phys/Class/sound/u11l4d.html

Click for http://ccrma.stanford.edu/~jos/mdft/Sinusoids.html

Search JOS Website, CCRMA Website, or Web

JOS Home Page

JOS Online Publications

Index of terms in JOS Website

PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation

Applications

Acknowledgments

Conclusions

In this paper an analysis/synthesis technique based on a sinusoidal representation was presented that has proven to be very appropriate for signals which are well characterized as a sum of inharmonic sinusoids with slowly varying amplitudes and frequencies. The previously used harmonic vocoder techniques have been relatively unwieldy in the inharmonic case, and less robust even in the harmonic case. PARSHL obtains the sinusoidal representation of the input sound by tracking the amplitude, frequency, and phase of the most prominent peaks in a series of spectra computed using the Fast Fourier Transform of successive, overlapping, windowed data frames, taken over the duration of a sound. We have mentioned some of the musical applications of this sinusoidal representation.

Continuing the work with this analysis/synthesis technique we are implementing PARSHL on a Lisp Machine with an attached FPS AP120B array processor. We plan to study further its sound transformation possibilities and the use of PARSHL in conjunction with other analysis/synthesis techniques such as Linear Predictive Coding (LPC) [10].

The basic ``FFT processor'' at the heart of PARSHL provides a ready point of departure for many other STFT applications such as FIR filtering, speech coding, noise reduction, adaptive equalization, cross-synthesis, and many more. The basic parameter trade-offs discussed in this paper are universal across all of these applications.

Although PARSHL was designed to analyze piano recordings, it has proven very successful in extracting additive synthesis parameters for radically inharmonic sounds. It provides interesting effects when made to extract peak trajectories in signals which are not describable as sums of sinusoids (such as noise or ocean recordings). PARSHL has even demonstrated that speech can be intelligible after reducing it to only the three strongest sinusoidal components.

The surprising success of additive synthesis from spectral peaks suggests a close connection with audio perception. Perhaps timbre perception is based on data reduction in the brain similar to that carried out by PARSHL. This data reduction goes beyond what is provided by critical-band masking. Perhaps a higher-level theory of ``timbral masking'' or ``main feature dominance'' is appropriate, wherein the principal spectral features serve to define the timbre, masking lower-level (though unmasked) structure. The lower-level features would have to be restricted to qualitatively similar behavior in order that they be ``implied'' by the louder features. Another point of view is that the spectral peaks are analogous to the outlines of figures in a picture--they capture enough of the perceptual cues to trigger the proper percept; memory itself may then serve to fill in the implied spectral features (at least for a time).

Techniques such as PARSHL provide a powerful analysis tool toward extracting signal parameters matched to the characteristics of hearing. Such an approach is perhaps the best single way to obtain cost-effective, analysis-based synthesis of any sound.

Download parshl.pdf

``PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation'', by Julius O. Smith III and Xavier Serra, Proceedings of the International Computer Music Conference (ICMC-87, Tokyo), Computer Music Association, 1987.
Copyright © 2005-12-28 by Julius O. Smith III and Xavier Serra
Center for Computer Research in Music and Acoustics (CCRMA), Stanford University
[Automatic-links disclaimer]