Corpus Use, Search Engines, and Writing Proficiency

Corpus and search engine FACing

Fluency, accuracy and complexity (FAC) are traditional measures to assess writing proficiency. Corpora and search engines have been used to help students with their written work. Luo’s open access article reports on Chinese college students using the BNCweb corpus interface and the Baidu search engine interface in a quasi-experimental study on the effects of data-driven learning (DDL) on writing. Gains were seen in fluency and accuracy but not complexity.

Luo, Q. (2016). The effects of data-driven learning activities on EFL learners’ writing development. SpringerPlus, 5(1), 1255. Retrieved from


  • Participants
    48 Chinese college freshman with an intermediate level of English
    26 experimental group (BNCweb corpus interface), 22 control group (Baidu search engine interface)
    Course credits for participating
  • Instruments
    Same pre and post test, a 30 min writing task – My view on cell-phones
    Search engine Baidu
    Juku Grading system – an automatic essay grader
  • Treatment
    Intact groups, met twice a week. Pre-test for both groups. Experimental group – training on BNCweb. Control group – training on Baidu search engine. Both groups received sentences (taken from a learner corpus) with errors underlined that needed to be either corrected or improved. Both groups after training asked to write and revise 5 essays. First draft marked by Luo with parts underlined that needed to be corrected or revised. Participants finished second draft at home. URL links and other stored information were submitted along with second draft as a way to check that BNCweb and Baidu were consulted. Feedback was given on second drafts. As well as post-test essay, ten participants from experimental group chosen at random for a semi-structured interview in Chinese about their experience.
  • Measures collected
    Fluency: average number of words per essay; Accuracy: proportion of error-free T-units per T-unit, errors per 100 words, errors include collocations, verbs, articles, preposition, word choice, subject-verb agreement, sentence structure; Complexity: vocab – type-token ratio; grammar – mean number of clauses per T-unit
  • Data results
    Experimental group showed a significant increase in both fluency and accuracy between pre and post test. No significant differences for complexity. This pattern of significant improvements in fluency & accuracy but no significant improvement for complexity is repeated for the experimental group when taking the control group into account. For the semi-structured interview most participants reported positive sentiments, some reported they preferred Baidu for finding complex language.

The results show positive impacts of corpus consultation by learners on writing fluency and accuracy but not for complexity. Information on the nature of the writing tasks in the treatment would be useful e.g. length, topic as this could have confounded the post-test writing.

What I think teachers would also benefit from is awareness of the factors affecting DDL use that is included in the literature review, namely, task type (this study looked at error correction and revision); direct vs indirect DDL (this study used both); language proficiency (this study used intermediate learners); choice of corpora.

I was not surprised that the study found users of BNCweb only used simple searches and omitted the more sophisticated searches. Search engines still have an advantage which can be exploited, read more about that here – Google Giveth and Google Taketh, Developments in Google as a Corpus.

Mura Nava on GoogleMura Nava on TwitterMura Nava on Wordpress
Mura Nava
Interested in most things language wise. Member of TaWSIG, Teachers as Workers Special Interest Group which promotes discussion and action on working conditions in language teaching. Check us out at

Leave a Reply