Chapter 1: the sounds of language
Phonetics: the study of how speech sounds are made and perceived.
- The sounds of speech
- The production of speech sounds
- Combination of speech sounds
- Description of speech sounds
- Representation of speech sounds by written symbols.
Phonology: they study of how a language organizes those speech sounds into a meaningful system.
- The distribution of sounds in a language
- The patterning of speech sounds in a language
- (tacit) Rules that govern pronunciation of sounds in a language
- Underlying representation: relationship between phonemes, the way words are stored in our brain.
- Surface representation: relationship between allophones, the way words are pronounced.
Phonetics
Vocal tract
Sound is vibrating air. Speaking means your vocal tract (lungs, trachea, larynx, mouth and nose) to
get air moving and vibrating. Speaking begins with breath. To begin you put down your diaphragm,
which enlarges the lungs, which draws air in. Then the diaphragm relaxes and air is forced up the
windpipe, or trachea. At the top of the trachea there is the Adam’s apple, the larynx. Inside the larynx
lie the vocal folds across the top of the trachea. The air flowing out of the trachea causes them to flap
open and closed very quickly.
Above the larynx, at the base of the tongue, is the epiglottis. This is a muscular structure that folds
down over the larynx when you swallow, to prevent food going into your lungs. There is an open area
at the back of the mouth, the Pharynx. It allows the tongue freedom for front and back movement.
Non-humans have the larynx high up at the back of the mouth, so because they have no pharynx, they
can never talk.
We have active and passive articulators that we use to shape speech sounds. Active articulators
include the lips and tongue. Although the tongue has no bones, parts can move independently. The
tongue front (tongue tip + tongue blade), the tongue body (main mass, = dorsum) and the tongue
root (part in the pharynx) are separate active articulators.
Passive articulators lie along the top of the vocal tract. The alveolar ridge (bony rise behind the
teeth), the postalveolar (from alveolar ridge to the hard plate (top of the mouth)). In the back of the
,mouth you have the soft plate or velum, a muscular structure that regulates the velar port: the opening
in the back of the mouth that connects nose and mouth. At the end of the velum is the uvula, the little
pink hanging thing.
How will the speaker get the air moving? The usual choice is pulmonic egressive: air moving out of
the lungs. What to do with the vocal folds? Sounds produced with vocal fold vibrations are voiced,
otherwise they are voiceless. With some letters, the vocal folds are held apart far and long enough to
allow an extra air coming out: aspiration. Will the velum be open or not? If the velum is open, so that
air flows in the nose, the sound is nasal. Is the velum closed, then it is oral. Which active articulator?
The manners of articulation include:
- Stop: if the airflow out of the mouth is completely cut off (p,k,t).
- Fricative: steam of air between articulators is forced between them and becomes turbulent and noisy
(s,z,f,v).
- Affricatives: stop + fricative (ch).
- Approximant: if the active articulator moves to narrow the vocal tract, but not so much that fricative
sounds are made. Glides (yell, well), R and L are approximates. L-sounds are called laterals, and r-
sounds rhotic.
- Vowels: most open manner, vocal tract is wide open, air flows out freely.
Oral stops, fricative and affricatives are obstruents, they make audible sounds by obstructing the
airflow. Nasal stops, approximants and vowels are sonorants, they make audible sounds by letting the
air resonate. Almost always voiced.
Writing down sounds using a phonetic alphabet is called phonetic transcription. The International
phonetic alphabet (IPA) would be universal: there would be enough symbols so that every sound in
every language is represented. The alphabet would be unambiguous: every sound would have one
symbol, and every symbol one sound. If there are 2 sounds in 1 cell, the left one is voiceless.
Each articulator can move to more than one place of articulation:
- Bilabial: if the lower lip and upper lip come together (p,b,m).
- Labiodental sound: if the lower lip makes contact with the upper teeth, (f,v).
- Dental fricatives are made when the tongue tip moves to the upper teeth.
- Alveolar: the tongue tip at the alveolar ridge (t,d,n,l,s,z).
- Postalveolar: the blade of the tongue making a constriction at the postalveolar place of the
articulation (ʃ).
- Retroflex: if the tip of the tongue curls back(r).
- Palatal: the whole middle section of the tongue, is pushed straight up.
- Velar: the tongue body moves up to make constriction against the velum, high in the back of the
mouth (k,g).
- Uvular: constrictions further back. Begin with a k or g, then move the tongue backwards.
Constrictions can also be made deep in the throat, with the tongue root toward the pharyngeal wall.
We can find both of these in Arabic and Hebrew.
- Laryngeal: consonants can be made with only the larynx as articulator. No single language makes
consonants using all of the places of articulation.
The English language uses more than a dozen different vowel sounds, also the vowel quality differs
per dialect. We can classify vowels that refer to the highest point of the tongue during the vowel. The
tongue body moves up for high vowels (I, i, ɨ u, ʊ), down for the low vowels (a, æ) and in the middle
for the mid vowels (e, ɛ , o, ə, ʌ, ɔ). The tongue moves forward in the mouth for the front vowels (I, i,
e, æ, ɛ) and backward for the back vowels (u, o, ɔ, a, ʊ). Vowels also differ with lip rounding. The
back vowels are round, all the other ones are unround. Tense vowels (e, i, o, u) are longer and
higher. Tense vowels (i,e,o,u) are longer and slightly higher. Lax ones are (I, ə, ɔ, ʊ)
, schwa /ə/ = ‘uh’ sound.
ð- th sound like ‘brother’
ʃ- sh sound like ‘shop’
ɜ- si sound like ‘decision’
Speaking involves stringing sounds together into larger units. Aspects that influence stretches of sound
larger than a single segment are called suprasegmentals:
- Length: differences in vowel length are unintentional results of how different vowels are articulated.
Low vowels (mouth wide open) take longer to articulate than high vowels, for which little movement
is necessary. In some languages, two segments may differ in length alone. Some are intentionally held
longer, like aa, pp (= a:, p:). Long consonants are geminates.
- Tone and intonation: the pitch of a voice tells us a lot about female or male, old or young etc. high
pitch tells us that someone is frightened, low pitch that someone’s angry. Tone refers to convey
meaning at the word level, intonation refers to convey meaning at the sentence or discourse level. The
pitch differences indicate only the role that the reference is playing to the object (feline) in a
conversation. Most languages also use pitch to distinguish different words: tone. The majority of the
world’s languages are tonal.
- Syllable structure: sonority is a relative openness of the vocal tracts, which corresponds directly to
the relative loudness of a sound. Low vowels are the most sonorous, voiceless stops the least sonorous.
A syllable can be defined as a way of organizing sounds around a peak of sonority. The most sonorous
element of a syllable, the peak itself, is the nucleus. Lower sonority sounds before the nucleus are
called the onset, those following the nucleus, the coda. The nucleus and coda together form the
rhyme. Sonority doesn’t account for everything.
- Stress: is a prominence relation between syllables: certain syllables are longer, louder or more clearly
articulated. Some languages don’t use stress. Foot: grouping of a stressed syllable and neighboring
unstressed syllables. Languages in which stress is completely predictable are called fixed stress
systems. In other languages stress is unpredictable, you just have to memorize the stress patterns:
lexical stress patterns. Paradigmatic stress is a system in which the stress patterns depend on what
part of speech a word is (verb, noun).
Acoustic phonetics
Speech sounds are caused by moving air. Articulation is all about getting air to move in ways that can
be heard. Different objects have different frequencies of vibration, which determine the pitch of sound.
Fast vibration means higher pitched sound. The air particles follow the same back and forth vibration.
These moving patterns of vibration are called waves. When the sound waves reach our ears, they set
the eardrum vibrating according to the same pattern. Inside our ears, the vibrations set off never
impulses, which are interpreted by our brains as sound.
The vocal folds are the reed, the source of the vibration. The column of air in the mouth is the filter of
the vibration: instrument. And the speaker is the musician, changing the shape of air to produce sound.
Thinking of speech like this is the source-filter theory of speech production. As air passes out of the
trachea and over the vocal folds, the folds begin to vibrate. They flap open and closed at a frequency
between 100 and 300 Hz. On top of the basic flapping movement, there are many different sub ripples
in the moving vocal folds. These ripples create harmonics. The basic rate of vibration, fundamental
frequency, determines the pitch, but the harmonics create the different qualities of different sounds.
As the vocal folds vibrate, the air in the vocal tract vibrates in the same way. The air in the vocal tract
filters the harmonic structure of the sound. Differently shaped bodies of air will tend to vibrate at
, different frequencies. Harmonics that are in tune with the frequency are amplified, those that are not in
tune will be reduced. The speaker controls the filter by moving tongue and lips, amplifying some
harmonics and blocking out others. The most strongly amplified frequencies are called formants.
Different vowels have different formant structures. So, depending on the shape of tongue and lips,
each vowel sound has a characteristic, complex pattern of vibration. The vibration moves out of the
lips, into the world. The sound waves travel through the air (340 m/s) until the impinge on a
membrane tuned to receive them.
The ear
- The outer ear: consists of the visible shell of the ear (pinna) and the ear canal leading to the eardrum.
Pinna helps to capture sounds, locate sounds in space. Ear canal protects the eardrum, amplify sounds
that are relevant for speech.
- The middle ear: eardrum vibrates when air comes down the ear canal. Behind the ear drum 3 bones:
the ossicles. Transfer through middle ear helps amplify very soft and very loud sound.
- The inner ear: vibration travels through bones to the cochlea. Here, the hearing takes place. The
cochlea is divided in a lower and upper chamber, filled with fluid. The cochlear membrane separates
the 2 chambers and is 3 mm thick, bony on one and thin at the other. All along the membrane are tiny
hair cells, the cilia, each attached to a nerve ending and waving in the cochlear fluid.
Eardrum vibrates, ossicles vibrates, oval window vibrates (membrane of the inner ear), vibration in
the fluid of inner ear. The vibration patterns in the fluid mirror those of sound created in the vocal tract
of the speaker. In response to given vibration, cilia are active, sending signals to the brain about the
frequencies in the incoming sound wave. The brain turns the frequencies into sounds.
recall that objects of different size in the ear tend to vibrate at different frequencies.
For measuring speech, ears are the most reliable. Vibrations can be displayed with oscilloscopes and
sounds spectrographs. Now, speech analysis is done by computer: analog-to-digital conversion. It can
show a waveform: varying amplitude of vibrations are plotted over time. Vowels have the greatest
amplitude, obstruent the least, and nasal/unstressed intermediate. It can also display a pitch track:
frequency/time figure. It measure the fundamental frequency. Neither of these figures tells us about
the quality of vowel sounds, about the vocal tract shape that made them. In a spectrogram, the
computer teases apart the component frequencies of the sound wave also in a frequency/time figure. A
dark bar at a certain frequency means that frequency is strongly represented in the sound. Each vowel
has a pattern of 2/3 most prominent frequencies (formants) above the fundamental frequency of the
vocal fold.
Phonology
Pairs of words that differ in only a single sound in the same position within the words are called
minimal pairs (den-then). The difference between those sounds is contrastive: change one sounds and