Updated: 20 June 2007

SOME FUNDAMENTALS OF SPECTRUM ANALYSIS
R. Haven Wiley

A spectrogram (or sonagram) is a display of frequency as a function of
time. The intensity of each frequency component at a particular
time is represented by the darkness of the display at that point.
A spectrograph is the instrument that produces such a display.
This introduction makes no attempt to explain the mathematical principles
of spectrum analysis. It just introduces some rules of thumb for
those who need to use a spectrograph quickly. Following this
introduction to Some Fundamentals of Spectrum Analysis, there is a
step-by-step (but brief!)
manual for using the computer program
WildSpectra
.
Fundamental theory shows that any waveform can be decomposed into series
of sine waves and cosine waves (called frequency components) with
appropriate amplitudes and phases. Adding the sine and cosine waves
produces the original waveform. Sound is such a waveform. It
is pressure as a funtion of time (converted by a microphone into
electrical voltage as a function of time).
Our ears decompose sound waveforms into frequency components.
Human ears analyze about 1/20 to 1/50 of a second (0.05-0.02 s) of sound
at a time. Within each of these little time periods, the cochlea
mechanically separates the frequency components. We can hear these
frequency components changing every 0.05-0.02 s. For instance,
pulses of sound that occur less frequently than 20-50 times/second sound
to us like distinct pulses. Pulses that occur more frequently than
50 times/second sound like a continuous sound with the frequency
components of a pulsed waveform.
The time period within which an ear (or some other spectrum analyzer)
analyzes the frequency components in sound is called the analysis period.
How pulses of sound appear in a spectrogram thus depends on
whether the pulses occur more or less rapidly than the analysis period.
Digital spectral analysis has many basic similarities with an ear. In
particular, it analyzes sound within a particular (user selected) analysis
period. The analysis period affects how a pulsed sound will appear in
the resulting spectrogram.
Nowadays the first step in spectral analysis is to digitize the sound (convert
the continuously varying electrical signal from a microphone into a series of
numbers representing the voltage at evenly spaced points in time).
Basic theory determines how many points you need in order to describe a signal
completely.
Consider a signal that includes no frequency components above W. Then
all of the information required to describe this signal is captured by
digitizing the signal at a frequency of 2W. In other words, if a
waveform includes no frequencies above W, then a series of numbers can be used
to reproduce the orginal waveform provided the digitizing frequency equals or
exceeds 2W, called the Nyquist frequency.
FOURIER (SPECTRAL) ANALYSIS OF SIGNALS
FFT stands for Fast Fourier Transform. Fast, because it uses a
particular computational algorithm discovered in the 1950's.
Fourier Transform, because Fourier discovered the mathematical
relationship between any waveform and its frequency components in the
early 1800's.
To perform a Fourier or spectral analysis, you must understand the
relationships among three parameters:
- digitizing frequency (at least twice the maximal
frequency in the signal as just explained)
- analysis period
- transform size (number of points analyzed at a time, also
called frame length or FFT length).
These three parameters have a simple relationship:
-
transform size (in units of numbers of points) =
-
digitizing frequency (number of points / second) X
analysis period (seconds)
This simple equation leads to the . . .
FIRST LAW OF FREQUENCY ANALYSIS
When you select the transform size,
you also determine the analysis period of the spectrum.
A second law makes a similar statement about the number of frequency
components, although it is a little trickier to understand. An FFT
converts a series of numbers representing a waveform into an equally large
series of numbers representing frequencies. In other words, a
series of numbers in the time domain is converted to an equal
series of numbers in the frequency domain.
Actually, half of the numbers in the frequency domain are the amplitudes
of sine waves and the other half are the amplitudes of cosine waves with
the same frequencies. Adding the sine and the cosine waves at any
one frequency produces a sine wave with a particular phase in relation to
sine waves at other frequencies. If we ignore the phase information
(just as vertebrate ears do) by ignoring the cosine terms, we are left
with just the sine terms or a number of frequency components equal to
1/2 the number of digitized points. Thus we reach the . . .
SECOND LAW OF FREQUENCY ANALYSIS
When you select the number of digitized points to analyze,
you also determine the number of frequency components in the spectrum.
Some of these frequency components might have amplitudes of zero and thus
not show up in a spectrogram, but the number of possible frequency
components is nevertheless fixed. Once again there is a simple
relationship:
-
spacing of frequency components in the analysis (Hz or cycles/second) =
-
maximum frequency in the signal (Hz)
/
number of possible frequency components
or
-
maximal possible frequency W / (transform
size
/ 2) =
-
spacing of frequency components (Hz)
Note the very important inverse relationship between time resolution
and frequency resolution. Transform size determines the
analysis period (as emphasized above) and hence the time resolution.
The larger the transform size, the longer the analysis period and
the lower the time resolution. Transform size also determines
frequency resolution. The larger the transform size, the smaller
the spacing of frequency components and the greater the frequency
resolution.
So you cannot have it both ways: you either have a large transform
size, with high frequency resolution and low time resolution, or you
have a small transform size, with low frequency resolution and high time
resolution. Or you can compromise!
FOUR IMPORTANT KINDS OF WAVEFORMS
Four kinds of waveforms are important in biological signals:
a pure sine wave, an impulse (click or pop), a square wave, and a
pulsed sine wave. It helps to understand the spectra of these
waveforms.
(1) A pure sine wave is easy: it has a single sinusoidal
frequency component and appears as a single horizontal line on a spectrogram.
(2) An impulse includes all frequencies in an instant of time:
it appears as a vertical line on a spectrogram.
(3) A square wave occurs when pressure rises and falls abruptly
in a regular sequence. For instance, our vocal chords produce something
like a square wave of pressure: as they flap open and closed, pressure
rises and falls abruptly.
If we set the analysis period to some duration shorter than the interval
between the opening and closing of the vocal chords, then in some
intervals nothing changes (the chords stay open or stay closed for the
entire interval) and in some intervals the chords pop open or flap shut.
So the spectrogram displays a series of impulses, each represented
by a vertical line on the spectrogram (all frequency components)
separated by gaps.
If we set the analysis period to some duration longer than the interval
between the opening and closing of the vocal chords, then during each
analysis period pressure rises and falls one or more times in a square
wave. Any periodic non-sinusoidal signal (a square wave is an
example) has a series of frequency components. The lowest
frequency component (h) occurs at a frequency equal to
h = 1 / (period of a signal such as a square wave).
The other frequency components are integer multiples of the first one:
2h, 3h, 4h and so forth. Usually the first component (called
the first harmonic or fundamental frequency) is the strongest.
For vocal chords flapping open and closed, a spectrogram displays either
a series of vertical impulses (when the analysis period is very short) or
a series of horizontal harmonics (when the analysis period is longer).
Note that the same information is available from each display:
period of impulses = 1 / lowest frequency component = 1 / h.
(4) The fourth waveform of interest is a pulsed sine wave, a
sine wave broken into pulses with silence between them. There are many
examples of such sounds made by animals: birds' trills, crickets'
songs, frogs' mating or advertising calls.
If the analysis period is shorter than the pulses, then the spectrogram
displays a series of short horizontal lines separated by gaps. The
horizontal lines represent the frequency of the sine wave during the
pulses. This frequency is called the carrier frequency.
If the analysis period is longer than a complete period (pulse plus gap),
then the spectrogram displays the horizontal line for the carrier
frequency and a series of sidebands above and below it. This
spectrum is really a combination of the spectrum for a pure sine wave (a
horizontal line) and the spectrum for a square wave (a series of
harmonics with the fundamental equal to 1 / period). Rather than
frequency components spaced equally above the baseline, we see frequency
components spaced equally above and below the carrier frequency.
The former are called harmonics, the latter sidebands. For
sidebands, as for harmonics, the spacing equals 1 / period.
This quick introduction has skipped over many variations and details.
The spectrum of a square wave (and thus of a pulsed sine wave)
changes when the square wave is not symmetrical (the pulses and gaps are
not equal in length). But you can still recognize the basic
pattern. Enough for now ...
