Genetic factors in vocabulary development – Interview with Dr. Ellen Verhoef


Dr. Ellen Verhoef was a PhD student at the Max Planck Institute for Psycholinguistics. She defended her thesis entitled ‘Why do we change how we speak? Multivariate genetic analyses of language and related traits across development and disorder’ on March 5 2021.

What was the main question in your dissertation?

With my PhD work I wanted to broaden our knowledge of the genetic factors that contribute to differences in language abilities in the first few years of life. I explored whether I could link such genetic factors with subsequent language, literacy and cognitive development, and also investigated the genetic overlap of these traits with neurodevelopmental disorders, such as autism and attention-deficit/hyperactivity disorder.

Can you explain the (theoretical) background a bit more?

When we look at language development in children, we often distinguish between speaking and understanding. Skills related to spoken language and language understanding start to develop in utero, and rapidly evolve during the first few years of life. Infants usually understand their first words between the ages of six and nine months. The first spoken word often emerges a bit later, around the first birthday.

There are large differences between children with regard to their language development. Some children understand or can produce more words than others at a certain age. When we study a large group of children, we see that the number of words children speak at a certain age tells us something about how many words they can produce at a later age. For example, a child who speaks relatively few words at the age of one year also tends to speak relatively few words at a later age. The number of words children speak at a very young age can even tell us something about children’s language, literacy and cognitive abilities in middle childhood (7-13 years). For any parents reading, it is important to note that these predictions only describe general patterns observed when studying groups of many children at the same time. They do not provide enough precision to predict how well any one child will be able to read at the age of nine years based on the number of words they speak at the age of two years.

Children not only differ in their language abilities, they also differ in the genetic code stored in their cells (DNA), which we can collect, for example, via saliva or blood. Our DNA consists of millions of building blocks, and every building block in which people may differ from one another is called a DNA variant. Some DNA variants occur in only very few people, whereas others are more common. We define a DNA variant as common if it occurs with a frequency of at least 1% in the European population. Previous research has shown that all common DNA variants together can account for part (13-14%) of the differences in young children’s vocabulary. We also know that relationships between early word production and later language and reading skills can be partially explained by DNA variants.

Why is it important to answer this question?

It is important to know more about the biological and genetic bases of language because language is such an important aspect of our lives. Early-life language skills are partly predictive of later language, literacy and cognitive skills. A better understanding of the biology underlying such links may, eventually, allow for earlier identification of children that are at risk for language delays. This could lead to interventions at an earlier age, so that language difficulties may have a smaller impact on their later lives.

Can you tell us about one particular project? (question, method, outcome)

In one project I studied whether DNA variants that are associated with the number of words children speak at three years of age also have an effect on the number of words children understand at three years of age, and on thirteen language and literacy skills assessed between middle childhood and early adolescence.

For this research, I used an advanced statistical technique that combines a well-established method known as “structural equation modelling” with genetic information from DNA variants. This technique makes it possible to examine genetic relationships between multiple language- and literacy-related skills within one analysis. With other techniques, we can only study two skills at the same time. Each genetic factor identified with this technique captures the effects of multiple DNA variants together.

From these analyses we discovered that genetic factors that are associated with the number of words three-year-olds can speak and understand, also have an impact on children’s language and especially literacy skills in middle childhood. Interestingly, the largest effects were observed for a genetic factor related to word understanding and not word production. This distinction is relevant as there are, so far, few genetic studies available on early word understanding. To investigate why understanding seems especially important for later performance, additional studies are required.

What was your most interesting/ important finding?

My most interesting finding was that DNA variants that have effects on differences in vocabulary size during the first few years of life also have an impact on language- and literacy-related skills later-on in life.

What are the consequences/ implications of this finding? How does this push science or society forward?

These findings suggest that genetic information that impacts later language and literacy skills is already reflected in a measure of vocabulary administered years earlier in life. In particular, the role of early-life word understanding was not known before. We need to perform additional studies to investigate the exact genes and mechanisms involved in these relationships over development.

What do you want to do next?

I am currently a postdoctoral researcher at the MPI and will continue this line of research into other aspects of early development that are related to language, including motor and social skills. In addition, I would like to identify which sets of genes contribute to links between vocabulary, language and literacy skills across development in order to get a better understanding of how genes influence the development of our brains, and ultimately, the development of our language system.

Interviewer: Julia Egger Merel Wolf
Editing: Merel Wolf
Dutch translation: Lynn Eekhof
German translation: Lynn Eekhof
Final editing: Eva Poort