Introducing a New General Service List

Introduction

In ELT, especially with beginner learners, a key question is what general vocabulary they need to know. One place to start answering this question is vocabulary lists. One of the most famous is West’s General Service List from 1953 (though based on an Interim Report on Vocabulary Selection from 1936). However, this has often been criticised over the years, for example because of the contradictory principles behind its creation, and its utility. By now, there is general consensus that the GSL is rather too outdated for use as a basis for decisions on ELT vocabulary instruction nowadays. Therefore, this study proposes a New General Service List, compiled on the basis of four language corpora including a total of over 12 billion running words. ‘The New GSL is conceived of as a list of the most frequent English vocabulary [in British English] suitable for both receptive and productive use, primarily intended for beginner learners.’

Article: Brezina, V. & D. Gablasova (2015), ‘Is There a Core General Vocabulary? Introducing the New General Service List’, Applied Linguistics, Vol. 36/1, pp. 1-22. Available here

This article explains how Brezina and Gablasova compiled their New GSL and explains some of the decisions. Understanding the linguistic background to such lists can help teachers make informed choices when using them to plan curricula or design materials.

As well as being more up-to-date, another key difference to West’s lists is that Brezina & Gablasova’s New GSL takes lemmas (i.e. headwords plus inflectional forms of same word class) as lexical units, rather than word families (i.e. headword plus inflectional, derivative and nominal forms).  The authors give the example of ‘develop’ to illustrate this:

Headword Lemma Word family
develop developsdeveloped, developing developsdeveloped, developing, undeveloped,  underdeveloped,  developmentdevelopmentsdeveloper, developers

Brezina & Gablasova suggest that this impacts positively on the usability of their New GSL in ELT, as it avoids the problem with assuming that (1) the meaning of derived word is ‘largely transparent and can be understood on the basis of the knowledge of the individual morphological components’ and (2) the learners have sufficient morphological skills to connect the meanings of derived words to the meaning of the headword. They also say that this enabled them to limit the scope of their New GSL more precisely to the most frequent vocabulary items than would be the case if working with word families.

Research

This New GSL was based on an analysis of four major corpora: The Lancaster-Oslo-Bergen Corpus (LOB)The British National Corpus (BNC)The BE06 Corpus of British English (BE06), and EnTenTen12. The selection of the lemmas for inclusion in the New GSL was based on three main criteria: (1) word frequency, (2) dispersion, and (3) stability of a lexical item across different corpora (i.e. its use and frequency is stable across a range of written and spoken contexts).

The main part of Breziina & Gablasova’s research consisted of these steps:

  1. Creation of wordlists based on the four corpora (LOBBNCBE06, and EnTenTen12).
  2. Comparison of wordlists pairwise (RQ1).
  3. Identification of a common lexical core among the four wordlists and extraction of the shared items (RQ2).
  4. Identification of lexical items reflecting recent vocabulary changes in the English language based on BE06 and EnTenTen12 (RQ3).
  5. Creation of the new-GSL.

The lexical core common to all four wordlists showed almost 71 per cent overlap. A comparison of the two most recent corpora (BE06-3000 and EnTenTen-3000) showed new frequently used words that did not occur in the other two. These were new words (e.g. email, website), new meanings/functions of old words (e.g. user networkmobile), or old words with recent prominence (e.g. medium, kid). The New GSL presented here is thus composed of 2494 items; 2116 from the common lexical core across all four corpora, and 378 of these items representing recent lexical development.

Comparison to West and AWL

Breziina & Gablasova’s New GSL is around 40% shorter than West’s list, mainly due to the decision to use lemmas instead of word families as lexical units. Still, around 80% of the texts in these four major corpora is covered by their core of under 3000 lemmas.

The majority of items from the New GSL‘s first 1,000 words are included in West’s first 1,000 word families.  However, the largest proportion of the New GSL‘s second 1,000 words also overlaps with West’s first 1,000 word families – which may indicate that West’s first 1,000 word families include a some less frequent derivational or nominal forms of high frequency words.

Brezina & Gablasova also compare their New GSL to Coxhead’s (2000) AWL, as this is an extension based on West’s GSL. This comparison highlights what other researchers have previously criticised, namely that some words in the AWL would perhaps better be categorised as general vocabulary rather than academic vocabulary. For example, 97 items from the New GSL‘s first 1,000 words occurred in the specialized AWL, such as ‘coupleimageteamcomputerareaindividualenvironment, and job.’

Also, 178 of the New GSL‘s items are not included in West’s GSL nor the AWL, especially words representing modern colloquial language use and referring to new technologies and developments. This again can serve to highlight how outdated West’s list is.

Conclusion

Brezina & Gablasova’s research indicates that there might really be a stable common core of vocabulary items which it would be advisable for English learners to know.

For those involved in linguistics research, the study demonstrates a transparent methodology, which would also allow for extensions of this New GSL, although some limitations are mentioned by the authors.

For ELT teachers and materials writers, this wordlist can provide a sound basis for decisions on vocabulary instruction. As an appendix to their article, Brezina & Gablasova provide their New GSL as a word document. It displays the 2494 words, practically visualised to show the word classes and frequency bands of the lexical items. You can find a copy here: new_GSL_alphabetical.

 

References
  • Brezina, V. & D. Gablasova (2015), ‘Is There a Core General Vocabulary? Introducing the New General Service List’, Applied Linguistics, Vol. 36/1, pp. 1-22.
  • Coxhead, A. (2000). ‘A new academic word list’, TESOL Quarterly, vol. 34 /2, pp. 213-38.
  • West, M. (1953). A General Service List of English Words: with Semantic Frequencies and a Supplementary Word-List for the Writing of Popular Science and Technology (Longman)
Clare Maas on TwitterClare Maas on Wordpress
Clare Maas
Lecturer in EFL and EAP at Trier University (Germany)
Clare holds post-graduate qualifications from the University of Wales and Trinity College London. Before moving into tertiary education, she taught English at German grammar schools, and English for Specific Purposes at several language academies in the UK and Germany. Her professional interests include EAP materials development and CPD for teachers. She also blogs at ClaresELTCompendium.wordpress.com.

2 thoughts on “Introducing a New General Service List”

  1. readers may note that there is a developing website that can used to profile vocab in text amongst other features [http://corpora.lancs.ac.uk/vocab/]
    also worth pointing out that there is another wordlist called the New General Service List or NGSL [http://www.newgeneralservicelist.org/] which uses word families; hence both the new-GSL (which Clare wrote about) and the NGSL can be seen as complementary depending on whether you want the advantages of lemmas or word families
    ta
    mura

    1. Thanks, Mura! I decided to summarise this article because I think it’s interesting for teachers who use such lists to see how they are compiled – only then can they really decide whether they, as you say, want the advantages of lemmas or word families! 🙂

Leave a Reply