Vocoder-Based Additive-Synthesis Limitations

International Standard Book Number — Click for https://mathworld.wolfram.com/ISBN.html

The sampling rate of a discrete-time signal is defined as the number of samples per second. Its units are thus in Hertz (Hz). Shannon's Sampling Theorem states that the original continuous-time signal can be recovered exactly from the samples if and only if the sampling rate is higher than twice the highest frequency present in the original signal. Any higher frequencies will alias to frequencies below half the sampling rate. — Click for https://ccrma.stanford.edu/~jos/mdft/Sampling_Theory.html

In signal processing, noise is a random signal. Broadband noise contains energy over a wide range of frequencies. — Click for http://en.wikipedia.org/wiki/White_noise

A sinusoid is any function of the form A sin(ω t+φ), where t is the independent variable, and A, ω, φ are fixed parameters of the sinusoid called the amplitude, (radian) frequency, and phase, respectively. Sinusoidal motion is produced by any 'pure' vibration, such as that of an ideal tuning fork or mass-spring system. — Click for https://ccrma.stanford.edu/~jos/mdft/Sinusoids.html

Noise (in signal processing) is a random signal. In the language of statistical signal processing, a noise signal is typically modeled as 'stochastic process', which is in turn defined as a sequence of random variables. — Click for https://ccrma.stanford.edu/~jos/sasp/What_Noise.html

Transients are short intervals during which the signal evolves quickly in some nontrivial or relatively unpredictable way (Bello, 2005). In computer music, the short attack portion of a sound is often treated as a transient signal. — Click for http://en.wikipedia.org/wiki/Transient_%28acoustics%29

The Short Time Fourier Transform (STFT) computes the spectrum (DFT) of successive time frames of a signal. — Click for https://ccrma.stanford.edu/~jos/sasp/Short_Time_Fourier_Transform.html

A sinusoidal model for sound approximates each tonal component of the sound as a sum of slowly varying sinusoids. For tonal sounds such as from vibrating strings or wind instruments (including voiced speech), a sinusoidal model can provide a compact, high-fidelity representation. In addition to providing an intuitive, malleable representation for sound, sinusoidal models are also used in advanced audio compression. — Click for https://ccrma.stanford.edu/~jos/sasp/Sinusoidal_Modeling_Sound.html

A set of filters that decompose a signal into a set of components — Click for http://en.wikipedia.org/wiki/Filter_bank

A filter in the audio signal processing context is any operation that accepts a signal as an input and produces a signal as an output. Most practical audio filters are linear and time invariant, in which case they can be characterized by their impulse response or their frequency response. — Click for https://ccrma.stanford.edu/~jos/filters/What_Filter.html

Fast Fourier Transforms (FFT) are fast algorithms for computing the Discrete Fourier Transform (DFT) — Click for https://ccrma.stanford.edu/~jos/mdft/Fast_Fourier_Transform_FFT.html

A signal is typically a real-valued function of time. A discrete-time signal is typically a real-valued function of discrete time, and is therefore a time-ordered sequence of real numbers. — Click for http://ccrma.stanford.edu/~jos/filters/Definition_Signal.html

A periodic signal is a signal that forever repeats itself. — Click for http://en.wikibooks.org/wiki/Signals_and_Systems/Periodic_Signals

The amplitude envelope, or relatively slowly-changing outline of a sound waveform, makes for a useful first approximation of the instantaneous loudness of the sound. — Click for http://en.wikipedia.org/wiki/ADSR_envelope

Vocoder-Based Additive-Synthesis Limitations

Using the phase-vocoder to compute amplitude and frequency envelopes for additive synthesis works best for quasi-periodic signals. For inharmonic signals, the vocoder analysis method can be unwieldy: The restriction of one sinusoid per subband leads to many ``empty'' bands (since radix-2 FFT filter banks are always uniformly spaced). As a result, we have to compute many more filter bands than are actually needed, and the empty bands need to be ``pruned'' in some way (e.g., based on an energy detector within each band). The unwieldiness of a uniform filter bank for tracking inharmonic partial overtones through time led to the development of sinusoidal modeling based on the STFT, as described in §G.11.2 below.

Another limitation of the phase-vocoder analysis was that it did not capture the attack transient very well in the amplitude and frequency envelopes computed. This is because an attack transient typically only partially filled an STFT analysis window. Moreover, filter-bank amplitude and frequency envelopes provide an inefficient model for signals that are noise-like, such as a flute with a breathy attack. These limitations are addressed by sinusoidal modeling, sines+noise modeling, and sines+noise+transients modeling, as discussed starting in §10.4 below (as well as in §10.4).

The phase vocoder was not typically implemented as an identity system due mainly to the large data reduction of the envelopes (piecewise linear approximation). However, it could be used as an identity system by keeping the envelopes at the full signal sampling rate and retaining the initial phase information for each channel. Instantaneous phase is then reconstructed as the initial phase plus the time-integral of the instantaneous frequency (given by the frequency envelope).

Vocoder-Based Additive-Synthesis Limitations

``Spectral Audio Signal Processing'', by Julius O. Smith III, W3K Publishing, 2011, ISBN 978-0-9745607-3-1. Copyright © 2022-02-28 by Julius O. Smith III Center for Computer Research in Music and Acoustics (CCRMA), Stanford University

``Spectral Audio Signal Processing'', by Julius O. Smith III, W3K Publishing, 2011, ISBN 978-0-9745607-3-1.
Copyright © 2022-02-28 by Julius O. Smith III
Center for Computer Research in Music and Acoustics (CCRMA), Stanford University