Sinusoidal Modeling Systems

International Standard Book Number — Click for https://mathworld.wolfram.com/ISBN.html

The Short Time Fourier Transform (STFT) computes the spectrum (DFT) of successive time frames of a signal. — Click for https://ccrma.stanford.edu/~jos/sasp/Short_Time_Fourier_Transform.html

Spectrum analysis of sound is analogous to decomposing white light into its component colors by means of a prism — Click for https://ccrma.stanford.edu/~jos/mdft/Example_Applications_DFT.html

Fast Fourier Transforms (FFT) are fast algorithms for computing the Discrete Fourier Transform (DFT) — Click for https://ccrma.stanford.edu/~jos/mdft/Fast_Fourier_Transform_FFT.html

A sinusoidal model for sound approximates each tonal component of the sound as a sum of slowly varying sinusoids. For tonal sounds such as from vibrating strings or wind instruments (including voiced speech), a sinusoidal model can provide a compact, high-fidelity representation. In addition to providing an intuitive, malleable representation for sound, sinusoidal models are also used in advanced audio compression. — Click for https://ccrma.stanford.edu/~jos/sasp/Sinusoidal_Modeling_Sound.html

A sinusoid is any function of the form A sin(ω t+φ), where t is the independent variable, and A, ω, φ are fixed parameters of the sinusoid called the amplitude, (radian) frequency, and phase, respectively. Sinusoidal motion is produced by any 'pure' vibration, such as that of an ideal tuning fork or mass-spring system. — Click for https://ccrma.stanford.edu/~jos/mdft/Sinusoids.html

Pitch is the perceived fundamental frequency of a sound. — Click for http://en.wikipedia.org/wiki/Pitch_(music)

A periodic signal is a signal that forever repeats itself. — Click for http://en.wikibooks.org/wiki/Signals_and_Systems/Periodic_Signals

The amplitude envelope, or relatively slowly-changing outline of a sound waveform, makes for a useful first approximation of the instantaneous loudness of the sound. — Click for http://en.wikipedia.org/wiki/ADSR_envelope

Click for https://ccrma.stanford.edu/~jos/mdft/Sinusoids_Exponentials.html

Pressing a key on the piano causes a felt-tipped hammer to strike a vibrating string. — Click for http://www.speech.kth.se/music/5_lectures/

A signal is typically a real-valued function of time. A discrete-time signal is typically a real-valued function of discrete time, and is therefore a time-ordered sequence of real numbers. — Click for http://ccrma.stanford.edu/~jos/filters/Definition_Signal.html

A bandpass filter accepts input signal energy in a certain spectral band, and rejects the energy outside the band. — Click for http://en.wikipedia.org/wiki/Band-pass_filter

The size of the passband, typically expressed in Hz. — Click for http://en.wikipedia.org/wiki/Bandpass

A set of filters that decompose a signal into a set of components — Click for http://en.wikipedia.org/wiki/Filter_bank

A filter in the audio signal processing context is any operation that accepts a signal as an input and produces a signal as an output. Most practical audio filters are linear and time invariant, in which case they can be characterized by their impulse response or their frequency response. — Click for https://ccrma.stanford.edu/~jos/filters/What_Filter.html

Sinusoidal Modeling Systems

With the phase vocoder, the instantaneous amplitude and frequency are normally computed only for each ``channel filter''. A consequence of using a fixed-frequency filter bank is that the frequency of each sinusoid is not normally allowed to vary outside the bandwidth of its channel bandpass filter, unless one is willing to combine channel signals in some fashion which requires extra work. Ordinarily, the bandpass center frequencies are harmonically spaced. I.e., they are integer multiples of a base frequency. So, for example, when analyzing a piano tone, the intrinsic progressive sharpening of its partial overtones leads to some sinusoids falling ``in the cracks'' between adjacent filter channels. This is not an insurmountable condition since the adjacent bins can be combined in a straightforward manner to provide accurate amplitude and frequency envelopes, but it is inconvenient and outside the original scope of the phase vocoder (which, recall, was developed originally for speech, which is fundamentally periodic (ignoring ``jitter'') when voiced at a constant pitch). Moreover, it is relatively unwieldy to work with the instantaneous amplitude and frequency signals from all of the filter-bank channels. For these reasons, the phase vocoder has largely been effectively replaced by sinusoidal modeling in the context of analysis for additive synthesis of inharmonic sounds, except in constrained computational environments (such as real-time systems). In sinusoidal modeling, the fixed, uniform filter-bank of the vocoder is replaced by a sparse, peak-adaptive filter bank, implemented by following magnitude peaks in a sequence of FFTs. The efficiency of the split-radix, Cooley-Tukey FFT makes it computationally feasible to implement an enormous number of bandpass filters in a fine-grained analysis filter bank, from which the sparse, adaptive analysis filter bank is derived. An early paper in this area is included as Appendix H.

Thus, modern sinusoidal models can be regarded as ``pruned phase vocoders'' in that they follow only the peaks of the short-time spectrum rather than the instantaneous amplitude and frequency from every channel of a uniform filter bank. Peak-tracking in a sliding short-time Fourier transform has a long history going back at least to 1957 [210,281]. Sinusoidal modeling based on the STFT of speech was introduced by Quatieri and McAulay [221,169,222,174,191,223]. STFT sinusoidal modeling in computer music began with the development of a pruned phase vocoder for piano tones [271,246] (processing details included in Appendix H).

Sinusoidal Modeling Systems

``Spectral Audio Signal Processing'', by Julius O. Smith III, W3K Publishing, 2011, ISBN 978-0-9745607-3-1. Copyright © 2022-02-28 by Julius O. Smith III Center for Computer Research in Music and Acoustics (CCRMA), Stanford University

``Spectral Audio Signal Processing'', by Julius O. Smith III, W3K Publishing, 2011, ISBN 978-0-9745607-3-1.
Copyright © 2022-02-28 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA), Stanford University