Use a Language Corpus to Improve Your Vocabulary


22 July 2023

A language corpus is a huge collection of texts taken from print, video, audio and other sources. It is a powerful tool that can help you learn more about English words and phrases.

In today's article, we look at the most widely used corpus: the Corpus of Contemporary American English, or COCA.

COCA contains more than one billion words from 1990-2019. And it has many ways to help learners find out more about how words are used.

Screenshot of Corpus of Contemporary American English, captured on July 20, 2023.
Screenshot of Corpus of Contemporary American English, captured on July 20, 2023.

Take the adjective profound as an example. In a dictionary, you will find that one meaning of profound is the adjective deep. So, you might think English speakers will say that a body of water is profound.

But English speakers never say that.

A corpus like COCA shows that the most common words that follow profound are effect or ability. And the most common words that follow the adjective deep are breath or water. For example, we will say:

Her teaching has a profound effect on students' learning.

You need to be a strong swimmer to swim in deep water.

COCA makes it easy to read and hear examples of how to use vocabulary correctly.

How to use COCA

First, you will need to register with COCA to use the collection. You can register for two kinds of accounts. A free account limits you to 20 searches for words within any 24-hour period. A paid account will give you 200 searches within any 24-hour period.

Here are the steps for using COCA to find examples of how a word is used. We will use the word profound to show this.

  • On the home page, type the word profound in the white search box on the left side of the screen.
  • Click "find matching strings" just below the search box. This will take you to a new webpage.
  • On this new page, you will see the word profound in blue. Click on it.

  • You will now see a page with 100 examples using the word profound, and the word profound is shown in green. The source of each example is shown on the left side.
  • If you click on the source on the left, COCA will show you a larger part of the text containing the example.
  • If you want to see more examples, click the two forward page arrows >> at the top or bottom left. This brings you to the next 100 examples. In this way, you might find more kinds of sources for the examples.

COCA also has a tool that lets you see many useful things about a word on one page. We will show this with the word deep.

  • On the COCA homepage, click on "Word" in blue just above the white search box.
  • Type the word deep in the search box.
  • Click on the "See detailed info for the word" just below the search box.
  • Now you will see a page with definitions, audio pronunciation, links to translations, synonyms, links to videos with the word, and more. This page is called the "Word homepage".

  • Scroll down the page. You will see another long list of example sentences, with words in many colors. The adjective deep is in green and the nouns that follow it are in blue.

You will see that in these examples, the noun breath occurs the most after the adjective deep.

You may wonder, how can we know for sure that the most common words that follow deep are breath or water?

Scroll back to the top of the "Word homepage" for deep. On the right, you will see the word COLLOCATES in blue. (A collocation is a word or phrase that is often used with another word or phrase.) Clicking on COLLOCATES will show you lists of words that follow deep. And you will see that breath and water are the first two words on the list.

Why the corpus is helpful

Knowing what words sound the best together can be hard for people learning English. A corpus can be helpful because it shows you ways that words are used together.

See if you can learn to use COCA by searching for the adjectives utter and diverse. Utter is both a verb and an adjective, so you must tell COCA to only search for adjectives. Here is how to do that:

  • Just to the right of the white search box, click on the [POS] letters in grey.
  • Next, click on POS again and then select "adj. ALL". This makes COCA search only for adjectives.
  • In the search box, delete the letters ADJ and type the word you want to search.

Now follow the steps shown above for profound and deep, but this time, use the words utter and diverse.

If you find on the "Word homepage" for utter that COCA has searched for the verb form, you can change it to adjective. To do this, click on ADJ in blue just below the word utter on the top left area of the page.

When you perform a search, what do you notice about the nouns that follow utter? About the nouns that follow diverse?

Write to us and let us know what you found.

Learning how to use a corpus can take some time. COCA has a guide showing you the many ways to use it. You can also find tutorials on YouTube.

Just remember that a corpus is a tool that can help you deepen your understanding of vocabulary and the ways to use it.

And in another Education Tips, we will explain more about the importance of collocations and how a corpus can help you learn them.

I'm Andrew Smith. And I'm Jill Robbins.

Andrew Smith wrote this article for VOA Learning English.

____________________________________________________

Words in This Story

corpus -n. a large collection spoken or written texts that are used for language research.

pronunciation -n. the way a word or phrase is pronounced.

translation -n. the meaning of a word in a different language

synonym -n. a word that means the same, or is very close in meaning, to another word