Gap-fill, Sentence Writing or Composition – Which Task Leads to Better Vocabulary Learning?

Di Zou (The Education University of Hong Kong) investigates the effectiveness of three different vocabulary tasks and comes up with interesting conclusions. But things may not be as straightforward as they seem.


L2 vocabulary researchers generally agree that there is a correlation between the level of engagement with the vocabulary learning task and vocabulary retention. In other words, the more mental effort is required to complete the task, the higher are the chances that the new words will be remembered. In order to conceptualise mental effort, two well-known L2 vocabulary researchers Batia Laufer and Jan Hulstijn proposed the Involvement Load Hypothesis (2001), according to which the amount of involvement in vocabulary tasks can be measured according to three factors: need, search, evaluation. A number of studies have tried to test the hypothesis by comparing the effectiveness of vocabulary tasks, for example: writing sentences with new words vs. a cloze exercise (Keating, 2008) or sentence writing vs. sentence completion combined with dictionary consultation (Laufer, 2003). These studies have yielded somewhat contradicting results. Keating’s study carried out with Spanish beginner learners of English found that a sentence writing task was more involving (i.e. it induced a higher involvement load) than a cloze exercise, but also took longer to complete. Conversely, the 2006 study conducted by Keith Folse (famous for his book Vocabulary Myths) found that completing a series of cloze tasks (3 times) was more effective than writing sentences with new words.

The Study

In a recent special issue of Language Teaching Research (guest-edited by Batia Laufer herself), Di Zou puts the Involvement Load Hypothesis to the test again comparing the effectiveness of:

  • Cloze exercise
  • Sentence writing
  • Composition writing

About 30-40 participants – students at a university in Hong Kong, all non-English majors – were assigned to one of the three groups according to the task type above. All three groups were provided with a list of 10 target words and their glosses based on dictionary definitions. Group 1 was given a text on the topic of procrastination with target words gapped out. Group 2 had to to write sentences with new words. Similarly, Group 3 was asked to use new words in writing but produce a coherent composition containing all target words. Note that Groups 2 and 3 were not given any text.

An unexpected test was administered to all the participants at the end of the experiment and another one (‘delayed post-test’) a week later in order to measure how many words were learned by the participants. These quantitative measures were supplemented by self-reporting – both during the tasks (‘think-aloud’ protocols) and after completion of the task (interviews) – in order to probe deeper into the strategies used by the students while completing the tasks.


Participants in the composition writing group got the highest scores on the test (15.9), followed by the sentence writing group (12.3) and the cloze exercise group with the lowest score (8.3). Predictably, the delayed post-test yielded slightly lower scores in all three groups but the order remained the same with the composition writing group in the lead.


The researcher discusses the results in light of cognitive processes involved in encoding information such as chunking (not chunking of the Michael Lewis kind, i.e. multi-word units) and hierarchical organisation.  She shows how the participants doing the composition task had to structure new information in a meaningful way in order to produce coherent piece of writing. To that effect they had to relate the new words to each other as well as the chosen macro-context. Thus, in this group the involvement load was the highest. Conversely, students writing sentences with new words didn’t have to associate words with each other because they had to write isolated sentences – they only had to create micro-contexts for the individual target words. Finally, the gap-fill group wasn’t involved in systematic organisation of information; in fact, some of them didn’t even try to make sense of the text focusing only on the sentences with blanks. 

Personal Thoughts

Despite the optimistic reporting of the results I have some reservations about the author’s claim that gap-fills are a less effective form of learning new words. Many of the sentences produced by the participants – especially in the composition group with (seemingly) superior results – are full of miscollocations and inappropriacies:

When a disaster happens, life is indispensable rather than money or power.

Jack got seriously drunk and divulged Linda’s privacy to others

(target words are underlined)

Vocabulary learning involves not only learning the meaning and form (spelling & pronunciation) of a new word, but the ability to use the word and knowing restrictions on its use. Admittedly, remembering the meaning and form is the first step in the word learning process whereas the ability to use it develops gradually over time (and it was not the aim of the study). However, getting students to write original sentences or a composition with new words without first focusing on how the words should be used is nothing short of setting them up for failure. This not only may lead to students forming unhelpful primings, it is also demanding on the teacher who will then need to give students feedback on their writing, specifically with regard to usage and collocation. If almost every student-produced sentence with a new word is a mangled mess, individual feedback will be time consuming for the teacher and potentially frustrating for students.

This time would be better spent on producing a simple fill-in-the-blank exercise by taking examples from a good learners’ dictionary (see hereand deleting the target words. Such exercises are easier to check and, as Folse (2006) points out, “students will always end up with a correct English example sentence to study”. In sum, I wouldn’t ditch gap-fill exercises and rush into more productive activities such as writing sentences with new words in the initial stages of learning, especially with ‘difficult’ higher-level words such as those in this study.


Folse, K. S. (2006). The effect of type of written exercise on L2 vocabulary retention. TESOL Quarterly40(2), 273-293.

Keating, G. D. (2008). Task effectiveness and word learning in a second language: The involvement load hypothesis on trial. Language Teaching Research12(3), 365-386.

Laufer, B. (2003). Vocabulary acquisition in a second language: Do learners really acquire most vocabulary by reading? Some empirical evidence. Canadian modern language review59(4), 567-587.

Laufer, B., & Hulstijn, J. (2001). Incidental vocabulary acquisition in a second language: The construct of task-induced involvement. Applied linguistics22(1), 1-26.

Zou, D. (2017). Vocabulary acquisition through cloze exercises, sentence-writing and composition-writing: Extending the evaluation component of the involvement load hypothesis. Language Teaching Research21(1), 54-75.

Leo Selivan
(Soon-to-be) author and blogger at Leoxicon
Leo started his teaching career with the British Council in Tel Aviv, where he taught adult and teen learners before moving into materials and course development, and teacher training. As teacher trainer, he delivered professional development workshops all over the Eastern Mediterranean and the Caucasus. Currently, Leo is a freelance lecturer working with both pre- and in-service teachers.

20 thoughts on “Gap-fill, Sentence Writing or Composition – Which Task Leads to Better Vocabulary Learning?”

  1. There’s a similar article, exploring the Laufer / Hulstijn hypothesis, in the context of vocabulary-focused post-reading tasks, in the latest volume of ‘System’. The results are not conclusive in any way.
    I’m wondering about the extent to which this hypothesis can be meaningfully explored. Besides the task type itself, there are so many variables: not least, the learnability of the item, which will depend, among other things, on individual learner differences, and the classroom context (e.g. what task types are the learners most subjected to, how is their work evaluated). So … is it possible for technicist research of this kind to tell us anything useful? Can it tell us anything that is actionable in language teaching ? The hypothesis makes intuitive sense, but is it falsifiable? Perhaps not. And, if not, what is the point of such research?

    1. Hi, Phil.

      I must admit that it was your post on the OUP blogsite that prompted me to look into this matter, particularly this line: “a gap-fill repeated a number of times is likely to lead to more learning in the same amount of time than a more creative or imaginative exercise”. So I’m happy this summary has caught your eye!

      Just like you, I’m quite fond of gap-fills, but, as someone who’s willing to explore and experiment, I have tried getting students to study new items out of context on their own and then write sentences with them. It was far from successful – particularly with higher levels / more difficult words – and I ended up re-teaching the target items.

      Thank you for the inspiration and this comment.

      P.S. For other readers, Phil’s post can be found here:

  2. Thanks for reporting on that, Leo. It’s really good to hear that others (yourself and Philip) also feel rather skeptical about this kind of vocabulary research. So often I find myself reading vocab research and shouting at the page “Yes, but what about …?!” As Philip says , and as you point out in this case, there always seem to be so many factors that aren’t taken into account, to the point where I often feel the research becomes pretty meaningless. I do wonder if it’s even possible to design research into vocab acquisition that can really tell us anything useful and generalizable. I guess I’ll just have to keep on reading … and shouting at the page …

    1. Thank you, Julie. I often feel like you when reading research articles. I don’t know if you’ve seen my article in Modern English Teacher earlier this year about the applicability and usefulness of some SLA research – it was full of ‘What about…?’ and ‘So what?’ moments. Nevertheless, there’s a lot of good research, and I don’t want to be completely dismissive of the study in hand. I think her interpretation of the superior results in the Composition group is convincing and, combined with other studies exploring the Involvement Load, helps better understand the complicated process of L2 vocabulary learning.

  3. I may be missing something, but why would one assume that the three different activity types are mutually exclusive? I would think that they could be put on a (more or less) receptive to productive continuum. Agreed, there is not always time to work through the full progression, and learning is inevitably messy so learners will make mistakes at all levels, but isn’t there value in learning from mistakes as well?

    I also agree wholeheartedly with Philip Kerr – I don’t really understand how this study controlled for the variables he mentioned, as well as others, although maybe that is clarified in the article itself.

    1. Hi, Randi. Good to have you here!

      Absolutely, they are not mutually exclusive, and I agree with your progression / continuum, where you first work on receptive knowledge using gap-fills (though technically gap-fills can also be used for productive practice if you don’t supply a word bank and learners have to retrieve the target items from memory) and then slowly move to more challenging tasks, in which learners have to use target items in their own contexts.

      As regards controlling for variables, it’s quite hard with an experimental design like this (3 groups – 3 treatments), especially with such variables as individual learner differences and, possibly, what Philip referred to as the “classroom context”, don’t you think?

  4. hello all
    from what i can understand Joe Barcroft’s TOPRA model seems quite promising? in a 2017 paper he explains how levels of processing theory (on which involvement load hypothesis draws influence) can’t be extended “unqualified” from its L1 results into L2 domain; he argues his TOPRA model which separates semantic, structural and mapping (form-meaning connections) components of vocabulary learning provides a better way to theorise word learning that can lead to more robust teaching applications (though actual experiments using TOPRA have yet to use classroom type instructions)

    1. Thank you, Mura.
      I’m familiar with some of Barcroft’s research but not this paper – it’s very new, perhaps you should write a summary for here 🙂

      Knowing his other work I’m not surprised that his model would compartmentalize different aspects of word knowledge. Let’s see if subsequent research investigating the model yields promising results.

      P.S. I can’t even imagine what TOPRA stands for…

  5. Thank you for this thought-provoking article! I totally agree with you about the difficulties in having students produce their own sentences using new vocabulary words. If they don’t fully understand a particular word yet, then they obviously won’t know how to use it correctly in a sentence. So why ask them to do something they don’t yet know how to do?

    Personally, in my experience both as a language learner and teacher, I’ve found that learners need lots of input before they are able to produce output. In other words, when it comes to learning vocabulary, students need to hear or read a particular word in meaningful contexts many, many times. After hearing it used many, many times, the students are eventually able to use the word on their own. I wrote more about this on my blog, in case you’re interested:

    Thanks for your post!

    1. I’m glad you’ve found it thought-provoking, Allison!
      I wouldn’t say that being able to use a new word is contingent on fully understanding its meaning; it depends more on knowing its collocations, grammatical patterns etc. I do agree with you though that A LOT of repeated encounters are necessary before new language becomes part of the learners’ productive repertoire.

      Thank you for your comment and the link – I’ll check out your post.

  6. There is much to be said for testing with several sentences containing the target word and gapping different words, e.g. its collocates, prepositions, co-text that construes prosody, etc. and even “half gapping” its affixes. There is also much to be said for gapping paragraphs, rather an isolated unrelated sentences. There is also much to be said for teaching different features of the same word at different times. A test of the effectiveness of gapfills could be profitably extended to these alternatives.

  7. Hi Leo

    As a part-time EFL teacher with no time for serious research I’m very happy when useful posts like this pop up on Facebook. Thank you to all researchers!

    I agree with you that we shouldn’t rush into creative tasks when learning something new. I tend to be approached by immigrant workers who have been unhappy in large classes and want 1:1. Typically, they speak with enthusiasm but are low in confidence when it comes to grammar and accuracy. Also, each has had a unique learning journey that means that we could be plugging a gap from A1 today, and working on a B1 skill next week. Sometimes they need to learn a particular function (for work reasons) and I can’t just assume that all the relevant foundations have been laid. I’ve found that my clients’ learning is more successful if they’re given controlled, supported tasks before being asked to do something creative like writing sentences. Gap-fills, along with “put the words in the right order”, and taking turns with me (with questions I always take turns both asking and answering), help to build their confidence. As you quoted, they end up with correct English example sentences that they can study – helpful when responding to the challenge of composition.

    Coincidentally, last night a new client actually raised this subject and thanked me for my approach. She also liked the way I had colour-coded words to make the text easier to understand. Bells began to ring in my head. If you had dyslexia, even if you were very intelligent you might need to receive information in smaller, easy-to-read chunks, and to have a little more thinking time to process it. Going straight into creative writing could put some learners at a disadvantage compared to their peers.

    This is hardly a scientific sample but I’d be interested to know if it reflects the experience of other teachers in similar situations. What my (admittedly, limited) experience tells me to do is this: SCAFFOLD, scaffold, then remove the remaining supports but be ready to help if necessary.

    1. Hi Rebecca,

      It seems that your classroom observations are in agreement with others, for example Randi’s earlier comment about the receptive to productive continuum and Allison’s comment that learners need a lot of exposure before they can produce new items.

      Just to reiterate, I’m not totally against more creative tasks which push learners towards more productive use and require them to stretch their resources (perhaps, not at the first encounter though). But I think these tasks should be given after some focus on the use and not just the meaning of target items. In other words, yes, they should be scaffolded.

      I’m glad you found this post useful and hope you come back to visit ELT Research Bites.

  8. I do not agree.

    Were the example sentences:
    “When a disaster happens, life is indispensable rather than money or power.
    Jack got seriously drunk and divulged Linda’s privacy to others”
    a result of the test or of the productive sentence creation treatment? I currently do not have access to the paper, but will receive it shortly.

    If the treatment, one should assume that a productive task like sentence creation will generate errors. That is generally a purpose of a task, a focus on form following a task is meant to catch and correct the error. Use of this technique does lead to more rapid learning.

    Gap fill is typically a focus on form. It is a type of rote learning. Although both techniques do work, task based has proven to be a more effective means to acquire a language.

    1. Thank you for the comment, Jake.

      It is an interesting take on the results: a task (composition writing) as opposed to a rote-learning exercise (gap-fill), although the study does not conceptualise these two techniques that way. But leaving aside the debate of whether tasks are more superior to exercises, students in this experiment were effectively asked to do what they have not been taught – to USE the words when they were only taught their MEANINGS via glosses which did not include example sentences. I do not question the fact that this may lead to more rapid learning and reinforcement of the form-meaning link but the amount of time a teacher will need to spend correcting all the lexical inappropriacies produced by learners afterwards will simply negate the effect.

      The main point I was trying to make is that gap-fills, if designed appropriately, can help focus not only on the meaning but also use / co-text / collocations and, if used repeatedly, can result in better learning (like in Folse’s study). And, of course, they don’t have to be mechanical, rote exercises and can be approached creatively, as James suggested above.

      To answer your question, the above sentences were produced in the course of the experiment, during the composition writing stage.

      Let me know if you have some further thoughts after you read the article.

  9. Niovi Hatzinikolaou
    Very intresting indeed and thanks for sharing! I agree that often sentencewriting can be counterproductive as students are often not familiar with the usage. From my experience, sentence writing works well with elementary students as the vocabulary is quite simple at this level. I have noticed that more advanced students regularly misuse words when trying to provide a context for them.This is because noticing of the typical context of a word, its collocations etc is minimal even in gap filling exercises which are present in all coursebooks. The question then is how can we make students aware of how words are used? It’s certainly not fun and extremely time consuming to have them focus on every single word they encounter and talk about their usage

    1. Hello Niovi,

      I’m pretty much on the same page with you, particularly regarding different approaches needed for elementary and post-intermediate learners. Clarifying usage may seem time-consuming but it’s time well spent. With higher-level learners it should be done at the same time as clarifying meaning because less frequent words (such as the ones in the study) tend to be restricted to certain contexts and occur with a limited range of collocates.

      To illustrate using one of the target items in the study, I would teach “renege” together with “promise(s)” which, according to the Corpus of Contemporary American English (COCA) most commonly follows “renege”: “renege on (your) promise”. Similarly, teaching “divulge” without its common collocates – information, secret, details – is, in my opinion, denying the learner an important aspect of knowing the word.

      Thank you for taking the time to comment.

Leave a Reply