Pitch perception is one of the many fascinating mysteries of hearing. Starting with a basic definition of pitch, we present here the progression of puzzling phenomena. Based on our HERO neural architecture, we built a model of pitch perception. We show how our model resolves the apparent paradoxes.
The pitch of a pure tone is simply its frequency. Here are 3 pure tones, G2(.wav) at 199.3Hz, Ab2(.wav) at 211.1, and F#2 (.wav) at 188.1, a semitone (about 6%) apart.
A harmonic tone consists of a sum of pure tones (each one called a partial), whose frequencies are in integer ratios of 1, 2, 3, ... Partials related in this way are called the harmonics of the tone. The frequency of the first harmonic is the fundamental frequency. Here is a harmonic tone with a fundamental frequency equal to the pure G2's frequency(.wav).
Now we juxtapose the pure G2 with the harmonic tone(.wav). They have the same pitch, but sound very different. The harmonic tone is lower than pure Ab2(.wav), and higher than pure F#2(.wav).
The pitch of a harmonic tone is defined to be its fundamental frequency, which is the frequency of the pure tone which has the same pitch.
Here's a juxtaposition of two harmonic tones of G2(.wav). They sound virtually the same, with a subtle difference. The second harmonic tone is missing its first harmonic, therefore its fundamental frequency. When we juxtapose the pure G2 with this second G2(.wav) , they have the same pitch. The above definition of pitch still holds operationally, but the fundamental frequency is not directly measurable, and must now be derived from the the higher harmonics. The pitch of a tone without a fundamental is termed "virtual pitch".
Virtual fundamental frequency has some practical applicatioins in the making of pianos. Instead of providing the strings that would vibrate at the lowest frequencies, piano makers use shorter strings that vibrate at the frequencies of the higher harmonics to create the senasation of the absent fundamental frequency.
Assuming harmonicity, or almost harmonicity with each partial's frequency varying within a few percent of the harmonic frequency, many algorithms (e.g., Martin, Wyse)have been proposed to calculate the virtual fundamental frequency when any number of harmonics may be absent. Some psychoacoustic experiments indicate that the virtual fundamental frequency is perceived even when as many as 10 of the lower partials are missing.
Here is a G2 with its first 10 harmonics missing(.wav). It has a definite pitch.
The comparisons with the pure G2(.wav), Ab2(.wav) and F#2(.wav), however, are not as convincing as the previous ones.
See the HERO model transcription of a tone with first 3 harmonics missing.
Here is a pure G2 followed by another complex tone(.wav) having the same pitch. We combine the harmonic G2 with this new complex tone(.wav). The quality is weird, but it has a single definite pitch.
The complext tone is called a stretched tone. Its fundamental frequency is 190.08, and not equal to pure G2's freqency. The ratios of its partials are
These ratios vary by significant amounts from harmonicity. The partials are related by 1s, 2s, 3s, ..., where s is a stretching/compressing exponent near 1.0. The stretched tone example here is stretched by s=1.04963. The ratio of the second partial to the first, 2.07, is called a stretched octave. The ratios between any two partials are uniformly stretched by the same exponent.
|harmonic tone||stretched tone|
Scales may be constructed for stretched tones in an analogous fashion to harmonic tones, to form the basis for new types of music. The book "Tuning, Timbre, Spectrum, Scale" and companion CD by Setharesgives an in-depth exposition of this and related subjects.
See the HERO model transcriptionof a streched tone and a harmonic tone of the same pitch.
Here is a sequence of four tones G2.B2.G2.B2(.wav) repeating a 2 tone pattern of low to high. Here is a different sequence of four tones F#3.G3.F#3.G3(.wav) repeating a 2 tone pattern of high to low. In fact, the second G2 tone in the first sequence and the second G3 tone in the second sequence are the same harmonic G2 with its fundamental missing. In the first instance, its pitch is perceived as being lower than B2, and in the second instance, its pitch is perceived as being higher than F#3. Omission of the fundamental introduces an octave ambiguity, which is resolved by context.
Researchers have created many other examples(Shepard scale, Deutsch tritone paradox) in which the judgment of whether one tone is higher than another is confounded because the spectral contents harbour pitch ambiguity.
The Shepard scale is an apparently endlessly ascending or descending sequence of discrete tones. A subtler version is the Risset glide. Here is one version of it(.wav). Do you hear any hints of vowels? Are they distracting you from finding the deception?
Here is how the HERO model perceives it.
Take an upward harmonic glide beginning at G2 (.wav) and a downward harmonic glide beginning at A4(.wav). Make them collide(.wav) and you hear one tone gliding up from below then bouncing down, and another gliding down from above, and then bouncing back up.
Now collide an upward pure glide beginning at G2 with the same downward harmonic glide(.wav). Can you hear the glides persisting through their crossing point?
|bouncing pitches||crossing pitches|
All things being equal, colliding glides tend to be perceived as bouncing pitches. This tendency is weakened if the glides have very different spectral shapes.
Here is how the HERO model hears them.
The preference for bouncing pitches may be related to a preference for vibrato in sustaining notes on musical instruments or voice.
Vibrato is a low-rate (about 5Hz) and modest (a few %)modulation of the frequencies of a tone. A more technical term for it is micromodulation. Bregman discusses at length the role micromodulation may play in gluing partials together into a stream, and consequently in separating one stream from other simultaneous streams.
Here is G2 with a vibrato of 4Hz(.wav), and a frequency excursion of 1.25 semitones. The tone is heard as sustained, but a high pitch and a low pitch may be distinguished. With the same frequency excursion, we now increase the rate of modulation to 8Hz(.wav). The tone is now heard to have apparent amplitude fluctuation at the vibrato rate, but no distinct high and low pitches. At a modulation rate of 16Hz(.wav) , a single pitch is heard with tremulo. At 32Hz(.wav) , the tone is "buzzy".
|4 Hz vibrato||8 Hz vibrato||16 Hz vibrato||32 Hz vibrato|
Here is what the HERO model makes of vibratos.
As confusing as pitch is for a single tone, the difficulties are compounded with multiple tones, multiple voices or instruments. Yet people are able to hear the pitches of a particular instrument in an orchestra. This ability is related to the ability to recognize speech in noise. Solving the computational problem of pitch would enable automatic transcription of pitch. It would also point the way to solving the problem of recognizing speech in noise.
See how the HERO model picks out the dominant voice in some polyphonic passages.