The melody of speech


In spoken language, it’s not only the words we say that matter, but also how we say them. When we speak, we constantly produce small changes in the tone of our voice, how loudly we speak, or when we decide to pause for a bit (the technical term for these aspects of speech is prosody). How we speak can convey how we feel about something, but it can also affect the meaning of our words. As we will see next, the same sentence can have different meanings, just by changing how we say it.

Speech melody and meaning go hand in hand

In a recent post on Talkling, Sophie Slaats explained how chatGPT can have trouble interpreting certain sentences. The example sentence “Rose saw the man with the binoculars” can be interpreted in two ways: it can either mean that Rose is looking through binoculars or that Rose sees a man who is carrying them. By using different manners of speech, we can make it clear which of these two interpretations we mean. It turns out that when speakers convey that Rose is supposed to have the binoculars, they briefly pause between “man” and “with the…”. Before the pause, the tone of their voice goes up a little. However, if they instead want to say that the man was carrying the binoculars, they pause briefly between “saw” and “the man…”, again with a slight rise in their tone before the pause. In this way, speakers use the way they say things to clarify which words in a sentence belong together. Listeners, in turn, know very well how to interpret these pauses and differences in tone. As a result, the listener understands which words in the sentence are grouped together. Therefore, while the sentences are exactly the same, it is the melody of speech that creates a difference in meaning. Put differently, the melody and rhythm of speech directly influences the grammatical structure that our conversational partner pieces together in their mind.

Differences between speakers

While humans are really good at using their voice to communicate, they happen to do this in very different ways. Because every person’s vocal tract is unique, there are big differences in how high or low and how loudly or quietly someone speaks. On top of that, we all know people around us who speak very melodiously, whereas others speak in a tone that stays largely flat. This means that when we interpret someone’s speech, we have to be quite flexible to take all of this into account. For example, listening to an utterance in isolation it may be quite difficult to know whether someone tried to communicate a question or a statement to us. Let’s say we compared the way your neighbour asks the question “You’re leaving?” your sibling states “You’re leaving!” If we looked at the measurements of the tone height and duration of those two examples (your neighbour’s question and your sibling’s statement), they could actually be exactly the same because of the differences between speakers. Judging just those two physical properties of speech (tone height and speech duration), it is sometimes impossible to know whether what we just heard was meant as a statement or a question. So how do listeners solve this issue? Researchers think that as soon as you hear someone speak, you quickly collect information about how their sentences sound on average. Because you know what that speaker usually sounds like when they state something or ask a question, you quickly figure out whether what you just heard was likely to be one or the other. Just as well! Because when we mishear “You’re leaving?” for “You’re leaving.”, chances are that the conversation can still be saved. But imagine instead of hearing The editor thought: “This writer is silly” arriving at the interpretation “The editor” thought this writer, “is silly”