Parameter Modifications (Step 6)

Search JOS Website, CCRMA Website, or Web

JOS Home Page

JOS Online Publications

Index of terms in JOS Website

PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation

Peak Matching (Step 5)

Synthesis (Step 7)

Pictorial examples of Fourier analysis, synthesis, and transformation. — Click for http://ccrma.stanford.edu/~jos/mdft/

Click for http://ccrma.stanford.edu/~jos/mdft/Signal_Metrics.html

Click for http://search.freefind.com/find.html?id=3891388&pageid=r&mode=ALL&query=Alias

Click for http://ccrma.stanford.edu/~jos/mdft/Zero_Padding.html

Click for http://ccrma.stanford.edu/~jos/mdft/Convolution.html

Click for http://ccrma.stanford.edu/~jos/mdft/Zero_Padding.html

Click for http://www.harmony-central.com/Computer/Programming/Audio-EQ-Cookbook.txt

Click for http://ccrma.stanford.edu/~jos/filters/What_Filter.html

The Fast Fourier Transform (FFT) is a fast algorithm for computing the Discrete Fourier Transform (DFT) — Click for http://faculty.prairiestate.edu/skifowit/fft/

Click for http://www.fact-index.com/l/li/linear_predictive_coding.html

Spectrum analysis of sound is analogous to decomposing white light into its component colors by means of a prism — Click for http://ccrma.stanford.edu/~jos/mdft/DFT_Applications.html

Click for http://www.glenbrook.k12.il.us/gbssci/phys/Class/sound/u11l4d.html

Click for http://scienceworld.wolfram.com/physics/Noise.html

Click for http://ccrma.stanford.edu/~jos/mdft/Sinusoids.html

Time scaling refers to modifying the time-duration of a signal without modifying its audio spectral characteristics. It is often used to slow down music without changing the pitch of any of the instruments. It is not possible to do this exactly in general, but good approximate methods exist using, e.g., the phase vocoder. — Click for http://www.dspdimension.com/html/timepitch.html

Digital Audio Resampling Home Page: Theory, software, and references pertaining to bandlimited interpolation and sampling-rate conversion — Click for http://ccrma.stanford.edu/~jos/resample/

This paper describes a system for audio analysis, modification, and synthesis, based on the Short Time Fourier Transform (STFT). The system is intended both as a tool for sound manipulation, and as a means to reinforce people's intuitions regarding the relationships between timbre and the harmonic structure of music and other audio signals, as conveyed via their spectrograms. This is done by creating a 3D spectrogram which shows a sound's harmonic structure in great detail as it is sampled. — Click for http://cnmat.cnmat.berkeley.edu/~alan/MS-html/MSv2.html#RTFToC1

Search JOS Website, CCRMA Website, or Web

JOS Home Page

JOS Online Publications

Index of terms in JOS Website

PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation

Peak Matching (Step 5)

Synthesis (Step 7)

Parameter Modifications (Step 6)

The possibilities that STFT techniques offer for modifying the analysis results before resynthesis have an enormous number of musical applications. Quatieri and McAulay [20] give a good discussion of some useful modifications for speech applications. By scaling and/or resampling the amplitude and the frequency trajectories, a host of sound transformations can be accomplished.

Time-scale modifications can be accomplished by resampling the amplitude, frequency, and phase trajectories. This can be done simply by changing the hop size in the resynthesis (although for best results the hop size should change adaptively, avoiding time-scale modifications during voice consonants or attacks, for example). This has the effect of slowing down or speeding up the sound while maintaining pitch and formant structure. Obviously this can also be done for a time-varying modification by having a time-varying hop size . However, due to the sinusoidal representation, when a considerable time stretch is done in a ``noisy'' part of a sound, the individual sinewaves start to be heard and the noise-like quality is lost.

Frequency transformations, with or without time scaling, are also possible. A simple one is to scale the frequencies to alter pitch and formant structure together. A more powerful class of spectral modifications comes about by decoupling the sinusoidal frequencies (which convey pitch and inharmonicity information) from the spectral envelope (which conveys formant structure so important to speech perception and timbre). By measuring the formant envelope of a harmonic spectrum (e.g., by drawing straight lines or splines across the tops of the sinusoidal peaks in the spectrum and then smoothing), modifications can be introduced which only alter the pitch or only alter the formants. Other ways to measure formant envelopes include cepstral smoothing [15] and the fitting of low-order LPC models to the inverse FFT of the squared magnitude of the spectrum [9]. By modulating the flattened (by dividing out the formant envelope) spectrum of one sound by the formant-envelope of a second sound, ``cross-synthesis'' is obtained. Much more complex modifications are possible.

Not all spectral modifications are ``legal,'' however. As mentioned earlier, multiplicative modifications (simple filtering, equalization, etc.) are straightforward; we simply zero-pad sufficiently to accomodate spreading in time due to convolution. It is also possible to approximate nonlinear functions of the spectrum in terms of polynomial expansions (which are purely multiplicative). When using data derived filters, such as measured formant envelopes, it is a good idea to smooth the spectral envelopes sufficiently that their inverse FFT is shorter in duration than the amount of zero-padding provided. One way to monitor time-aliasing distortion is to measure the signal energy at the midpoint of the inverse-FFT output buffer, relative to the total energy in the buffer, just before adding it to the final outgoing overlap-add reconstruction; little relative energy in the ``maximum-positive'' and ``minimum negative'' time regions indicates little time aliasing. The general problem to avoid here is drastic spectral modifications which correspond to long filters in the time domain for which insufficient zero-padding has been provided. An inverse FFT of the spectral modification function will show its time duration and indicate zero-padding requirements. The general rule (worth remembering in any audio filtering context) is ``be gentle in the frequency domain.''

Download parshl.pdf

``PARSHL: An Analysis/Synthesis Program for Non-Harmonic Sounds Based on a Sinusoidal Representation'', by Julius O. Smith III and Xavier Serra, Proceedings of the International Computer Music Conference (ICMC-87, Tokyo), Computer Music Association, 1987.
Copyright © 2005-12-28 by Julius O. Smith III and Xavier Serra
Center for Computer Research in Music and Acoustics (CCRMA), Stanford University
[Automatic-links disclaimer]