The study of animal vocalizations and sounds first became important about 1950 following the development of portable tape-recorders and sound spectrographs.   These instruments allowed scientists to record and then to analyze in detail the structure of sounds.

How Animals Make Sounds

Many vocalizations and other sounds produced by animals are specialized for communication with other individuals of the same species.   Animals use a multitude of different mechanisms to produce sounds.  

Mammals use two thin folds in the larynx (vocal chords) to produce sound.   Small muscles control the tension on the vocal chords and thus the timing and pitch of the sounds.   The cavities in the throat, mouth, and nose sometimes resonate and thus modify the nature of the sounds issuing from the animal's mouth.  

Birds produce their songs by means of a special organ (the syrinx) which consists of two (or in some species four) thin membranes in the wall of the bronchi or lower trachea.   As many as twelve pairs of tiny muscles control the tension of these membranes.   Birds can produce exceptionally pure whistles with rapidly changing pitch and intricate timing.  

Frogs have a larynx with vocal chords, although the structures are not homologous with those of mammals.   Crickets scrape a series of bumps (a comb) on one wing against a thickened ridge (a file) on the other.   There are many other mechanisms of sound production by animals.

The important point is that, regardless of the exact mechanism for sound production, the detailed structure of a sound reflects the precise neuromuscular coordination that produces it.   Analysis of sounds thus provides a way to compare neuromuscular coordinations in detail.   It provides an opportunity to study the structure of behavior with great precision and convenience.

Sound and Sound Frequencies

Sound is a pressure wave, in which the molecules of the medium (air or water) move minute distances to create rapidly alternating higher and lower pressure.   These changes in pressure are transduced by a microphone into changes in electrical voltage.   The amplitude of the changes in pressure determines the intensity (our sensation of the loudness) of the sound.   The rate of the changes (cylces/second or Hertz, abbreviated Hz) is the frequency of sound.   The frequency of sound determines our sensation of pitch.

A sound that consists of a sinusoidal change of pressure at a constant frequency is a pure tone.   A clear whistle is an example.   The frequency of a tone often changes (called frequency modulation).   Many birds' songs consist of pure tones that change rapidly in frequency.   We are for the most part unaware of the intricate changes in frequency in birds' songs, but a spectrograph can display these changes for us to study at leisure.

A spectrograph is an instrument that displays the frequency of sound as a function of time.   Such a display is called a spectrogram or sonagram.   A pure whistle thus appears as a horizontal line at a particular frequency (which, remember, determines our sensation of its pitch).   Many birds' songs appear as a series of lines that sweep upward or downward, or up and downward, sometimes in extremely brief intervals of time.   Nevertheless, close inspection of the spectrogram reveals that only one frequency is present at any instant.

Many sounds do not contain just a single frequency at a time.   In fact, it is rather difficult to produce such a sound.   Instead, sounds often have many frequency components, and some consist of white noise (all frequencies are present at once).  

Frequency Components of Complex Sounds

Any waveform can be decomposed into a series of sinusoidal frequency components, each with appropriate amplitudes.   This mathematical process is called a Fourier Transform.   A fundamental theorem states that a waveform can be converted into a unique set of frequency components, and vice versa.   A spectrograph does this transformation for us.   It decomposes a complex waveform into its simultaneous frequency components.

A spectrograph performs this operation repeatedly, in successive small intervals of time.   So a spectrogram displays the frequency components present in each small interval of time.   The duration of the time interval for analysis is called the analysis period.  

If pulses of sound are separated by intervals longer than the analysis period, the spectrogram displays these pulses separated by gaps (analysis periods with no sound).   In contrast, if pulses of sound are separated by intervals shorter than the analysis period, then the spectrogram displays continuous horizontal bands, the frequency components of a pulsed sound.   A spectrograph often allows us to choose the duration of the analysis period, so we can change the appearance of pulses of sound.

Our ears perform a similar operation in analyzing the frequency components of sound.   The cochlea is a mechanical mechanism for separating the frequency components of sounds.   These frequency components determine our sensations of pitch.   The cochlea, like the spectrograph, analyzes sound in small intervals of time, about 1/20 to 1/50 of a second (0.02 - 0.05 seconds or 20-50 milliseconds).

A sound consisting of pulses separated by intervals longer than 20-50 ms is perceived as a series of distinct pulses.   In contrast, a sound consisting of pulses repeated more often than once every 20 ms is perceived as a continuous sound with a specific timbre determined by its frequency components.


Sound Spectrograph

In this lab, our spectrograph is a computer with frequency-analysis software.   The spectrogram is displayed on the monitor.

You can think of the spectrogram as a result of the following steps:  

  1. First, the computer digitizes the continuous waveform of a sound into a series of numbers (44100 numbers/second) that indicate the amplitude of the waveform at evenly spaced moments of time.
  2. Then the software computes the Fourier Transform of this waveform (the exact algorithm is called a Fast Fourier Transform or FFT).   An FFT converts the series of numbers representing the waveform into a series of numbers representing frequency components.  
  3. The software repeatedly computes FFTs in successive brief intervals of time.   The duration of these intervals, as explained above, is called the analysis period.   It is set by choosing the length of the FFT.   The length of the FFT (also called the Transform Size) is simply the number of points converted into frequency components.   More points, longer analysis period.

Animal Vocalizations

First look at the spectrograms of some birds' songs.   Notice the vertical scale in frequency (depends on the display, often 0-11000 Hz or 0-11 kHz) and the horizontal scale in seconds (also depends on the display, often tens or hundreds of milliseconds).

Confirm that in many species a single frequency is present at a time and notice that this frequency can change remarkably fast.   Calculate the slopes of some frequency sweeps (Hz/second).

Look at spectrograms of some other animals (crickets, frogs, wolves).

Human Voice

Look at spectrograms of the human voice when you choose a relatively short analysis period (for instance, Transform Size = 128).   Human voices consist of a very rapid series of pulses of sound.   Each pulse is created by the vocal chords momentarily opening or closing.   These impulses appear as thin vertical lines on the spectrogram.   What does this indicate about the frequency composition of the pulses?   What is the repetition rate of these pulses?  

Use the microphone to compare the voices of men and women.

The frequency composition of a voice depends on the resonant frequencies in the throat and nose (pharynx).   Frequencies in the impulses from the larynx that match a resonance of the pharynx are emphasized in the sound that issues from the mouth.   Other frequencies are attenuated.   The emphasized frequencies are called formants.

Compare spectrograms of the same person's voice with a relatively long analysis period (Transform Size = 512) and with a relatively short analysis period (Transform Size = 128).   Does the sound appear as a series of closely spaced vertical lines (impulses) in one case, but a series of closely spaced horizontal lines (harmonics) in the other case?   Which analysis period produces which display?   Why?   Does the spacing of impulses correspond to the spacing of harmonics as follows:

interval between frequency components (in Hz) =
1 / interval between impulses (in seconds)

What is the difference between harmonics and formants?   Which is responsible for the basic difference between male and female voices?   What anatomical difference between men and women could produce this difference in sound production?


Catchpole, C.K. and P.J.B Slater.   1995.   Bird Song:   Biological Themes and Variations.   Cambridge Univ. Press, Cambridge.

Denes, P. B., and E. N. Pinson.   1963.   The Speech Chain: the Physics and Biology of Spoken Language. Bell Telephone Labs. (Also Anchor Books 1973). A true classic!