Gap-fill, Sentence Writing or Composition – Which Task Leads to Better Vocabulary Learning?

Di Zou (The Education University of Hong Kong) investigates the effectiveness of three different vocabulary tasks and comes up with interesting conclusions. But things may not be as straightforward as they seem.


L2 vocabulary researchers generally agree that there is a correlation between the level of engagement with the vocabulary learning task and vocabulary retention. In other words, the more mental effort is required to complete the task, the higher are the chances that the new words will be remembered. In order to conceptualise mental effort, two well-known L2 vocabulary researchers Batia Laufer and Jan Hulstijn proposed the Involvement Load Hypothesis (2001), according to which the amount of involvement in vocabulary tasks can be measured according to three factors: need, search, evaluation. A number of studies have tried to test the hypothesis by comparing the effectiveness of vocabulary tasks, for example: writing sentences with new words vs. a cloze exercise (Keating, 2008) or sentence writing vs. sentence completion combined with dictionary consultation (Laufer, 2003). These studies have yielded somewhat contradicting results. Keating’s study carried out with Spanish beginner learners of English found that a sentence writing task was more involving (i.e. it induced a higher involvement load) than a cloze exercise, but also took longer to complete. Conversely, the 2006 study conducted by Keith Folse (famous for his book Vocabulary Myths) found that completing a series of cloze tasks (3 times) was more effective than writing sentences with new words.

The Study

In a recent special issue of Language Teaching Research (guest-edited by Batia Laufer herself), Di Zou puts the Involvement Load Hypothesis to the test again comparing the effectiveness of:

  • Cloze exercise
  • Sentence writing
  • Composition writing

About 30-40 participants – students at a university in Hong Kong, all non-English majors – were assigned to one of the three groups according to the task type above. All three groups were provided with a list of 10 target words and their glosses based on dictionary definitions. Group 1 was given a text on the topic of procrastination with target words gapped out. Group 2 had to to write sentences with new words. Similarly, Group 3 was asked to use new words in writing but produce a coherent composition containing all target words. Note that Groups 2 and 3 were not given any text.

An unexpected test was administered to all the participants at the end of the experiment and another one (‘delayed post-test’) a week later in order to measure how many words were learned by the participants. These quantitative measures were supplemented by self-reporting – both during the tasks (‘think-aloud’ protocols) and after completion of the task (interviews) – in order to probe deeper into the strategies used by the students while completing the tasks.


Participants in the composition writing group got the highest scores on the test (15.9), followed by the sentence writing group (12.3) and the cloze exercise group with the lowest score (8.3). Predictably, the delayed post-test yielded slightly lower scores in all three groups but the order remained the same with the composition writing group in the lead.


The researcher discusses the results in light of cognitive processes involved in encoding information such as chunking (not chunking of the Michael Lewis kind, i.e. multi-word units) and hierarchical organisation.  She shows how the participants doing the composition task had to structure new information in a meaningful way in order to produce coherent piece of writing. To that effect they had to relate the new words to each other as well as the chosen macro-context. Thus, in this group the involvement load was the highest. Conversely, students writing sentences with new words didn’t have to associate words with each other because they had to write isolated sentences – they only had to create micro-contexts for the individual target words. Finally, the gap-fill group wasn’t involved in systematic organisation of information; in fact, some of them didn’t even try to make sense of the text focusing only on the sentences with blanks. 

Personal Thoughts

Despite the optimistic reporting of the results I have some reservations about the author’s claim that gap-fills are a less effective form of learning new words. Many of the sentences produced by the participants – especially in the composition group with (seemingly) superior results – are full of miscollocations and inappropriacies:

When a disaster happens, life is indispensable rather than money or power.

Jack got seriously drunk and divulged Linda’s privacy to others

(target words are underlined)

Vocabulary learning involves not only learning the meaning and form (spelling & pronunciation) of a new word, but the ability to use the word and knowing restrictions on its use. Admittedly, remembering the meaning and form is the first step in the word learning process whereas the ability to use it develops gradually over time (and it was not the aim of the study). However, getting students to write original sentences or a composition with new words without first focusing on how the words should be used is nothing short of setting them up for failure. This not only may lead to students forming unhelpful primings, it is also demanding on the teacher who will then need to give students feedback on their writing, specifically with regard to usage and collocation. If almost every student-produced sentence with a new word is a mangled mess, individual feedback will be time consuming for the teacher and potentially frustrating for students.

This time would be better spent on producing a simple fill-in-the-blank exercise by taking examples from a good learners’ dictionary (see hereand deleting the target words. Such exercises are easier to check and, as Folse (2006) points out, “students will always end up with a correct English example sentence to study”. In sum, I wouldn’t ditch gap-fill exercises and rush into more productive activities such as writing sentences with new words in the initial stages of learning, especially with ‘difficult’ higher-level words such as those in this study.


24 thoughts on “Gap-fill, Sentence Writing or Composition – Which Task Leads to Better Vocabulary Learning?”

  1. Prof. Batia Laufer has tried commenting here but her comment didn’t go through (twice), so she asked me to post it on her behalf.

    Hello all,

    Thanks, Leo, for sending me the link to your paper. I can see it generated a lively discussion. As I am one of the two people responsible for the Involvement Load Hypothesis, I feel I should comment on some of the comments related to the hypothesis.

    Regarding uncontrolled variables – in our paper on ILH we specifically say that the hypothesis predicts task effectiveness when other variables are held constant, e.g. word difficulty, learner proficiency, number of encounters with the word, etc. Folse compared 3 gap fills with one sentence writing, i.e. introduced another variable into the comparison. This is not what we had in mind when we suggested the theory. (Some variations, like within group differences, are taken care of by appropriate statistics).

    Another assumption about task effectiveness is that a task given to learners can be performed. If they are unable to write sentences, let alone compositions, they shouldn’t do it. We cannot expect something to be effective if it is not done properly.

    And a comment about research papers in general. Some research is good, but not very useful for teaching, some is not very good (Philip mentioned a paper in System 2016. It is methodologically flawed and the results cannot be trusted…). But there is a lot of good research that has important implications for teaching and curriculum. The problem is that it is often difficult to distinguish between these types of studies.

    Finally, a word on TOPRA (type of processing resource allocation) model mentioned in the discussion. I’d be extremely careful with applying the findings of the studies to teaching. The results supporting TOPRA are based on laboratory-like experiments (e.g. look at a word and write a sentence in 24 seconds) and on tasks that teachers hardly use in real life (e.g. count the letters in a word, think of how pleasant the referent of the word is).



  2. Thanks for your comments Leo,

    This is a very interesting discussion. I have read Zou’s article now and still concur with her conclusions. As a simplification, sentence composition is mentally more demanding than gap fill, so the initial productive task is more challenging. The greater challenge leads to a greater number of errors. However, I believe “noticing” is also occurring at a higher level in composition. So, when errors are corrected they will be noticed at a deeper level with composition than with gap fill. The resulting memory retention should also be higher.

    As has been noted, the multiplicity of controlled and uncontrolled variables in this sort of research can obscure meaningful results. Knowing that leads me back to first principles in SLA and to my original conclusion that activities (in this case composition) that simulate tasks will result in more acquisition. These are variables that I can control.

    1. As a followup to your comment above. I agree that good gap fills are easier to create, for students to complete and for instructors to mark. They are less time consuming. Good tasks are also easy to create, but more demanding to complete. The ultimate goal remains the same: Correct production/comprehension of a lexical item. My specific point is it will take fewer tasks to achieve the same goal.

      1. Thank you for revisiting this discussion, Jake.

        Absolutely. ‘Noticing’ will occur if the learners are later shown a good model or asked to compare their sentences with example sentences in a dictionary (so the time spent on that should also be taken into account).

        I also agree that the goal is the correct comprehension/production, and I’m all for ‘pushed output’ – pushing learners to try out and experiment with new lexical items. But I believe this should be done in more controlled conditions in the initial stages of learning. My personal experience and the observations of other commenters on here (see, for example, Niovi) show that giving learners free production tasks with new lexical items too soon will often result in miscollocation and inappropriacies or, in Folse’s (2006), “a word heap”. This might not be the case with beginners when target words are pretty basic, more generic and have wider collocability but it becomes more prominent with post-intermediate learners when the nature of new words demands a more careful study of usage, restrictions on their usage, collocations, patterns etc. Interestingly I already came to similar conclusions when investigating something else (contextualised vs decontextualised teaching) – I blogged about it a long time ago (you can see it HERE).

        I do see your point about fewer but more demanding tasks (as opposed to a series of shorter, less demanding tasks) but, all things being equal, we should also think about potential frustration for the teacher correcting and learners receiving feedback on their failed attempts to use new lexis.

  3. Niovi Hatzinikolaou
    Very intresting indeed and thanks for sharing! I agree that often sentencewriting can be counterproductive as students are often not familiar with the usage. From my experience, sentence writing works well with elementary students as the vocabulary is quite simple at this level. I have noticed that more advanced students regularly misuse words when trying to provide a context for them.This is because noticing of the typical context of a word, its collocations etc is minimal even in gap filling exercises which are present in all coursebooks. The question then is how can we make students aware of how words are used? It’s certainly not fun and extremely time consuming to have them focus on every single word they encounter and talk about their usage

    1. Hello Niovi,

      I’m pretty much on the same page with you, particularly regarding different approaches needed for elementary and post-intermediate learners. Clarifying usage may seem time-consuming but it’s time well spent. With higher-level learners it should be done at the same time as clarifying meaning because less frequent words (such as the ones in the study) tend to be restricted to certain contexts and occur with a limited range of collocates.

      To illustrate using one of the target items in the study, I would teach “renege” together with “promise(s)” which, according to the Corpus of Contemporary American English (COCA) most commonly follows “renege”: “renege on (your) promise”. Similarly, teaching “divulge” without its common collocates – information, secret, details – is, in my opinion, denying the learner an important aspect of knowing the word.

      Thank you for taking the time to comment.

  4. I do not agree.

    Were the example sentences:
    “When a disaster happens, life is indispensable rather than money or power.
    Jack got seriously drunk and divulged Linda’s privacy to others”
    a result of the test or of the productive sentence creation treatment? I currently do not have access to the paper, but will receive it shortly.

    If the treatment, one should assume that a productive task like sentence creation will generate errors. That is generally a purpose of a task, a focus on form following a task is meant to catch and correct the error. Use of this technique does lead to more rapid learning.

    Gap fill is typically a focus on form. It is a type of rote learning. Although both techniques do work, task based has proven to be a more effective means to acquire a language.

    1. Thank you for the comment, Jake.

      It is an interesting take on the results: a task (composition writing) as opposed to a rote-learning exercise (gap-fill), although the study does not conceptualise these two techniques that way. But leaving aside the debate of whether tasks are more superior to exercises, students in this experiment were effectively asked to do what they have not been taught – to USE the words when they were only taught their MEANINGS via glosses which did not include example sentences. I do not question the fact that this may lead to more rapid learning and reinforcement of the form-meaning link but the amount of time a teacher will need to spend correcting all the lexical inappropriacies produced by learners afterwards will simply negate the effect.

      The main point I was trying to make is that gap-fills, if designed appropriately, can help focus not only on the meaning but also use / co-text / collocations and, if used repeatedly, can result in better learning (like in Folse’s study). And, of course, they don’t have to be mechanical, rote exercises and can be approached creatively, as James suggested above.

      To answer your question, the above sentences were produced in the course of the experiment, during the composition writing stage.

      Let me know if you have some further thoughts after you read the article.

  5. Hi Leo

    As a part-time EFL teacher with no time for serious research I’m very happy when useful posts like this pop up on Facebook. Thank you to all researchers!

    I agree with you that we shouldn’t rush into creative tasks when learning something new. I tend to be approached by immigrant workers who have been unhappy in large classes and want 1:1. Typically, they speak with enthusiasm but are low in confidence when it comes to grammar and accuracy. Also, each has had a unique learning journey that means that we could be plugging a gap from A1 today, and working on a B1 skill next week. Sometimes they need to learn a particular function (for work reasons) and I can’t just assume that all the relevant foundations have been laid. I’ve found that my clients’ learning is more successful if they’re given controlled, supported tasks before being asked to do something creative like writing sentences. Gap-fills, along with “put the words in the right order”, and taking turns with me (with questions I always take turns both asking and answering), help to build their confidence. As you quoted, they end up with correct English example sentences that they can study – helpful when responding to the challenge of composition.

    Coincidentally, last night a new client actually raised this subject and thanked me for my approach. She also liked the way I had colour-coded words to make the text easier to understand. Bells began to ring in my head. If you had dyslexia, even if you were very intelligent you might need to receive information in smaller, easy-to-read chunks, and to have a little more thinking time to process it. Going straight into creative writing could put some learners at a disadvantage compared to their peers.

    This is hardly a scientific sample but I’d be interested to know if it reflects the experience of other teachers in similar situations. What my (admittedly, limited) experience tells me to do is this: SCAFFOLD, scaffold, then remove the remaining supports but be ready to help if necessary.

    1. Hi Rebecca,

      It seems that your classroom observations are in agreement with others, for example Randi’s earlier comment about the receptive to productive continuum and Allison’s comment that learners need a lot of exposure before they can produce new items.

      Just to reiterate, I’m not totally against more creative tasks which push learners towards more productive use and require them to stretch their resources (perhaps, not at the first encounter though). But I think these tasks should be given after some focus on the use and not just the meaning of target items. In other words, yes, they should be scaffolded.

      I’m glad you found this post useful and hope you come back to visit ELT Research Bites.

  6. There is much to be said for testing with several sentences containing the target word and gapping different words, e.g. its collocates, prepositions, co-text that construes prosody, etc. and even “half gapping” its affixes. There is also much to be said for gapping paragraphs, rather an isolated unrelated sentences. There is also much to be said for teaching different features of the same word at different times. A test of the effectiveness of gapfills could be profitably extended to these alternatives.

  7. Thank you for this thought-provoking article! I totally agree with you about the difficulties in having students produce their own sentences using new vocabulary words. If they don’t fully understand a particular word yet, then they obviously won’t know how to use it correctly in a sentence. So why ask them to do something they don’t yet know how to do?

    Personally, in my experience both as a language learner and teacher, I’ve found that learners need lots of input before they are able to produce output. In other words, when it comes to learning vocabulary, students need to hear or read a particular word in meaningful contexts many, many times. After hearing it used many, many times, the students are eventually able to use the word on their own. I wrote more about this on my blog, in case you’re interested:

    Thanks for your post!

    1. I’m glad you’ve found it thought-provoking, Allison!
      I wouldn’t say that being able to use a new word is contingent on fully understanding its meaning; it depends more on knowing its collocations, grammatical patterns etc. I do agree with you though that A LOT of repeated encounters are necessary before new language becomes part of the learners’ productive repertoire.

      Thank you for your comment and the link – I’ll check out your post.

  8. hello all
    from what i can understand Joe Barcroft’s TOPRA model seems quite promising? in a 2017 paper he explains how levels of processing theory (on which involvement load hypothesis draws influence) can’t be extended “unqualified” from its L1 results into L2 domain; he argues his TOPRA model which separates semantic, structural and mapping (form-meaning connections) components of vocabulary learning provides a better way to theorise word learning that can lead to more robust teaching applications (though actual experiments using TOPRA have yet to use classroom type instructions)

    1. Thank you, Mura.
      I’m familiar with some of Barcroft’s research but not this paper – it’s very new, perhaps you should write a summary for here 🙂

      Knowing his other work I’m not surprised that his model would compartmentalize different aspects of word knowledge. Let’s see if subsequent research investigating the model yields promising results.

      P.S. I can’t even imagine what TOPRA stands for…

  9. I may be missing something, but why would one assume that the three different activity types are mutually exclusive? I would think that they could be put on a (more or less) receptive to productive continuum. Agreed, there is not always time to work through the full progression, and learning is inevitably messy so learners will make mistakes at all levels, but isn’t there value in learning from mistakes as well?

    I also agree wholeheartedly with Philip Kerr – I don’t really understand how this study controlled for the variables he mentioned, as well as others, although maybe that is clarified in the article itself.

    1. Hi, Randi. Good to have you here!

      Absolutely, they are not mutually exclusive, and I agree with your progression / continuum, where you first work on receptive knowledge using gap-fills (though technically gap-fills can also be used for productive practice if you don’t supply a word bank and learners have to retrieve the target items from memory) and then slowly move to more challenging tasks, in which learners have to use target items in their own contexts.

      As regards controlling for variables, it’s quite hard with an experimental design like this (3 groups – 3 treatments), especially with such variables as individual learner differences and, possibly, what Philip referred to as the “classroom context”, don’t you think?

  10. Thanks for reporting on that, Leo. It’s really good to hear that others (yourself and Philip) also feel rather skeptical about this kind of vocabulary research. So often I find myself reading vocab research and shouting at the page “Yes, but what about …?!” As Philip says , and as you point out in this case, there always seem to be so many factors that aren’t taken into account, to the point where I often feel the research becomes pretty meaningless. I do wonder if it’s even possible to design research into vocab acquisition that can really tell us anything useful and generalizable. I guess I’ll just have to keep on reading … and shouting at the page …

    1. Thank you, Julie. I often feel like you when reading research articles. I don’t know if you’ve seen my article in Modern English Teacher earlier this year about the applicability and usefulness of some SLA research – it was full of ‘What about…?’ and ‘So what?’ moments. Nevertheless, there’s a lot of good research, and I don’t want to be completely dismissive of the study in hand. I think her interpretation of the superior results in the Composition group is convincing and, combined with other studies exploring the Involvement Load, helps better understand the complicated process of L2 vocabulary learning.

  11. There’s a similar article, exploring the Laufer / Hulstijn hypothesis, in the context of vocabulary-focused post-reading tasks, in the latest volume of ‘System’. The results are not conclusive in any way.
    I’m wondering about the extent to which this hypothesis can be meaningfully explored. Besides the task type itself, there are so many variables: not least, the learnability of the item, which will depend, among other things, on individual learner differences, and the classroom context (e.g. what task types are the learners most subjected to, how is their work evaluated). So … is it possible for technicist research of this kind to tell us anything useful? Can it tell us anything that is actionable in language teaching ? The hypothesis makes intuitive sense, but is it falsifiable? Perhaps not. And, if not, what is the point of such research?

    1. Hi, Phil.

      I must admit that it was your post on the OUP blogsite that prompted me to look into this matter, particularly this line: “a gap-fill repeated a number of times is likely to lead to more learning in the same amount of time than a more creative or imaginative exercise”. So I’m happy this summary has caught your eye!

      Just like you, I’m quite fond of gap-fills, but, as someone who’s willing to explore and experiment, I have tried getting students to study new items out of context on their own and then write sentences with them. It was far from successful – particularly with higher levels / more difficult words – and I ended up re-teaching the target items.

      Thank you for the inspiration and this comment.

      P.S. For other readers, Phil’s post can be found here:

