Showing Visually What Your Speech Sounds Like

When working on pronunciation and accent modification, I give learners as much information as possible to describe sounds and suprasegmental patterns: verbal explanations, physical demonstrations, auditory models, visual cues, gestural enhancements. I tap the desk, clap my hands, breathe deeply, wave my hands in the air, dance around the room; I use mirrors, rubber bands, strips of paper, straws, and tongue depressors. But sometimes, learners just can't get beyond the theory to make adequate changes in producing the invisible aspects of prosody: stress, rhythm, linking and phrasing.

Showing spectrograms of speech – theirs and mine – or different renditions of mine – is helpful for those with a strong visual learning modality. Using low-cost or free software, such as Apple GarageBand or Sourceforge Audacity, I record speech samples. In addition to creating sound files to be played, I take screen shots of the sound tracks that can be viewed while listening to the sound files. 

A spectrogram lets us see the mountain peaks and valleys. The varying levels of amplitude – the height and weight of the mountains – give information about how loud, strong, and long a syllable was pronounced. The depth and width of the valleys give information about phrasing, pause groups, hesitations, and linking. From the visual representation, we can see how the word stres, phrase stress, rhythm, phrasing, and voice modulation of one speaker (or one rendition of a speaker) differs from another.

In this example, I recorded a story "Father's Idea of Fun" from my book, Phrase by Phrase Pronunciation in American English. On one track I recorded a very monotonous rendition, inspired by suprasegmental features in the voice recordings made by some of my students. On a second track, I recorded a more expressive rendition, applying prosodic elements intended to serve as a model.
With the spectrograms of the two tracks juxtaposed, the difference in rhythm is visible. In version one, the syllable and word length hardly varies, whereas in version two the longer stressed syllables are interspersed with short weaker reduced syllables and words. The key words (focus words) in version two are evident from the higher mountains, whereas the flat speech of version one appears as a collection of much shorter hills.

You can listen to the sound file in stereo, with both tracks playing at the same time. Alternatively, you can use your computer’s sound control panel to listen to each track individually. Listen to the left channel separately and look at the top track in the figures below. Listen to the right channel separately and look at the bottom track in the figures. Or, using stereo ear buds, use one ear bud at a time to listen to each track. 





The spectrograms above were made with Audacity. The ones below were made with GarageBand. The ones below show shorter utterances and compare the speech of a fluent but accented French speaker of English with my speech.

Although there is certainly not only one right way to say a sentence, we can enhance our learners' understanding of how their stress, rhythm, and phrase patterns sound to our ears by letting them see how their speech looks to our eyes. Using sound recording software and pictures of the spectrograms, we can prepare both auditory and visual models for them to endeavor to achieve.

