Updated: 20 June 2007


R. Haven Wiley

A spectrogram (or sonagram) is a display of frequency as a function of time.   The intensity of each frequency component at a particular time is represented by the darkness of the display at that point.   A spectrograph is the instrument that produces such a display.

This introduction makes no attempt to explain the mathematical principles of spectrum analysis.   It just introduces some rules of thumb for those who need to use a spectrograph quickly.   Following this introduction to Some Fundamentals of Spectrum Analysis, there is a step-by-step (but brief!) manual for using the computer program WildSpectra .

Fundamental theory shows that any waveform can be decomposed into series of sine waves and cosine waves (called frequency components) with appropriate amplitudes and phases.   Adding the sine and cosine waves produces the original waveform.   Sound is such a waveform.   It is pressure as a funtion of time (converted by a microphone into electrical voltage as a function of time).

Our ears decompose sound waveforms into frequency components.   Human ears analyze about 1/20 to 1/50 of a second (0.05-0.02 s) of sound at a time.   Within each of these little time periods, the cochlea mechanically separates the frequency components.   We can hear these frequency components changing every 0.05-0.02 s.   For instance, pulses of sound that occur less frequently than 20-50 times/second sound to us like distinct pulses.   Pulses that occur more frequently than 50 times/second sound like a continuous sound with the frequency components of a pulsed waveform.

The time period within which an ear (or some other spectrum analyzer) analyzes the frequency components in sound is called the analysis period.   How pulses of sound appear in a spectrogram thus depends on whether the pulses occur more or less rapidly than the analysis period.

Digital spectral analysis has many basic similarities with an ear.   In particular, it analyzes sound within a particular (user selected) analysis period.   The analysis period affects how a pulsed sound will appear in the resulting spectrogram.

Nowadays the first step in spectral analysis is to digitize the sound (convert the continuously varying electrical signal from a microphone into a series of numbers representing the voltage at evenly spaced points in time).   Basic theory determines how many points you need in order to describe a signal completely.

Consider a signal that includes no frequency components above W.   Then all of the information required to describe this signal is captured by digitizing the signal at a frequency of 2W.   In other words, if a waveform includes no frequencies above W, then a series of numbers can be used to reproduce the orginal waveform provided the digitizing frequency equals or exceeds 2W, called the Nyquist frequency.


FFT stands for Fast Fourier Transform.   Fast, because it uses a particular computational algorithm discovered in the 1950's.   Fourier Transform, because Fourier discovered the mathematical relationship between any waveform and its frequency components in the early 1800's.

To perform a Fourier or spectral analysis, you must understand the relationships among three parameters:

  • digitizing frequency (at least twice the maximal frequency in the signal as just explained)

  • analysis period

  • transform size (number of points analyzed at a time, also called frame length or FFT length).

These three parameters have a simple relationship:

transform size (in units of numbers of points)   =
digitizing frequency (number of points / second)   X   analysis period (seconds)

This simple equation leads to the . . .

When you select the transform size,
you also determine the analysis period of the spectrum.

A second law makes a similar statement about the number of frequency components, although it is a little trickier to understand.   An FFT converts a series of numbers representing a waveform into an equally large series of numbers representing frequencies.   In other words, a series of numbers in the time domain is converted to an equal series of numbers in the frequency domain.

Actually, half of the numbers in the frequency domain are the amplitudes of sine waves and the other half are the amplitudes of cosine waves with the same frequencies.   Adding the sine and the cosine waves at any one frequency produces a sine wave with a particular phase in relation to sine waves at other frequencies.   If we ignore the phase information (just as vertebrate ears do) by ignoring the cosine terms, we are left with just the sine terms or a number of frequency components equal to 1/2 the number of digitized points.   Thus we reach the . . .

When you select the number of digitized points to analyze,
you also determine the number of frequency components in the spectrum.

Some of these frequency components might have amplitudes of zero and thus not show up in a spectrogram, but the number of possible frequency components is nevertheless fixed.   Once again there is a simple relationship:

spacing of frequency components in the analysis (Hz or cycles/second)   =
maximum frequency in the signal (Hz) / number of possible frequency components


maximal possible frequency W / (transform size / 2)   =
spacing of frequency components (Hz)

Note the very important inverse relationship between time resolution and frequency resolution.   Transform size determines the analysis period (as emphasized above) and hence the time resolution.   The larger the transform size, the longer the analysis period and the lower the time resolution.   Transform size also determines frequency resolution.   The larger the transform size, the smaller the spacing of frequency components and the greater the frequency resolution.

So you cannot have it both ways:   you either have a large transform size, with high frequency resolution and low time resolution, or you have a small transform size, with low frequency resolution and high time resolution.   Or you can compromise!


Four kinds of waveforms are important in biological signals:   a pure sine wave, an impulse (click or pop), a square wave, and a pulsed sine wave.   It helps to understand the spectra of these waveforms.


(1)   A pure sine wave is easy:   it has a single sinusoidal frequency component and appears as a single horizontal line on a spectrogram.


(2)   An impulse includes all frequencies in an instant of time:   it appears as a vertical line on a spectrogram.


(3)   A square wave occurs when pressure rises and falls abruptly in a regular sequence.   For instance, our vocal chords produce something like a square wave of pressure:   as they flap open and closed, pressure rises and falls abruptly.  

If we set the analysis period to some duration shorter than the interval between the opening and closing of the vocal chords, then in some intervals nothing changes (the chords stay open or stay closed for the entire interval) and in some intervals the chords pop open or flap shut.   So the spectrogram displays a series of impulses, each represented by a vertical line on the spectrogram (all frequency components) separated by gaps.

If we set the analysis period to some duration longer than the interval between the opening and closing of the vocal chords, then during each analysis period pressure rises and falls one or more times in a square wave.   Any periodic non-sinusoidal signal (a square wave is an example) has a series of frequency components.   The lowest frequency component (h) occurs at a frequency equal to

h = 1 / (period of a signal such as a square wave).

The other frequency components are integer multiples of the first one:   2h, 3h, 4h and so forth.   Usually the first component (called the first harmonic or fundamental frequency) is the strongest.

For vocal chords flapping open and closed, a spectrogram displays either a series of vertical impulses (when the analysis period is very short) or a series of horizontal harmonics (when the analysis period is longer).   Note that the same information is available from each display:

period of impulses = 1 / lowest frequency component = 1 / h.


(4)   The fourth waveform of interest is a pulsed sine wave, a sine wave broken into pulses with silence between them.   There are many examples of such sounds made by animals:   birds' trills, crickets' songs, frogs' mating or advertising calls.

If the analysis period is shorter than the pulses, then the spectrogram displays a series of short horizontal lines separated by gaps.   The horizontal lines represent the frequency of the sine wave during the pulses.   This frequency is called the carrier frequency.

If the analysis period is longer than a complete period (pulse plus gap), then the spectrogram displays the horizontal line for the carrier frequency and a series of sidebands above and below it.   This spectrum is really a combination of the spectrum for a pure sine wave (a horizontal line) and the spectrum for a square wave (a series of harmonics with the fundamental equal to 1 / period).   Rather than frequency components spaced equally above the baseline, we see frequency components spaced equally above and below the carrier frequency.   The former are called harmonics, the latter sidebands.   For sidebands, as for harmonics, the spacing equals 1 / period.


This quick introduction has skipped over many variations and details.   The spectrum of a square wave (and thus of a pulsed sine wave) changes when the square wave is not symmetrical (the pulses and gaps are not equal in length).   But you can still recognize the basic pattern.   Enough for now ...