Measuring Speech: A Comprehensibility Scale for Teachers

How do you accurately measure someone’s comprehensibility? For that matter, what does comprehensibility really mean? Is it a synonym of intelligibility? To what extent do individual sounds, suprasegmentals, fluency, vocabulary, and grammar play a role in all of this? Although pronunciation can often be easily assessed, these assessments are often poorly designed, are too limited (or too detailed), and sometimes use the “native-speaker” as a criteria by which speech is judged (when the reality is this idea is contentious). In the 2017 article summarized below, Isaacs*, Trofimovich, and Foote describe the process by which they used evidence and expert raters to design a “user-oriented second language comprehensibility scale for English-medium universities”.

Intelligibility and Comprehensibility

Many rating scales use these terms interchangeably or without clearly defining them. However, while there is no exact consensus on the meaning of these two terms, they tend to be measured in different ways in research experiments. Therefore, it’s important to make a clear distinction between them.


Intelligibility refers to understanding a speaker in terms of each uttered word. This is usually measured by how many words a listen can detect based on their transcription of the speech. Exams such as the TOEFL iBT or IELTS, however, use this term without offering clear explanation or elaboration of what it actually means. While they refer to intelligibility, they are not transcribing but rather focusing on their perceptions of how easy or difficult understanding the language is. Therefore, what they are actually focusing on is comprehensibility.

Comprehensibility refers to the “perceived ease or difficulty in understanding L2 speech”. In research, this is measured using a rating scale (very difficult to very easy to understand). The current scale focuses on comprehensibility, it is to understand speech, because research has shown that the ease or difficulty of listening has a greater effect on listeners than the ability to clearly parse every uttered word.

They defined comprehensibility as how “effortful” the speech is to understand and described the language features that are most relevant for determining this at different ability levels.

Goals of the Scale

The researchers wanted to create a formative tool for assessing comprehensibility of English speakers from a variety of language backgrounds based on oral production tasks. However, they also saw the potential of their scale to inform instruction:

  1. As a diagnostic tool to find strengths and weaknesses, which could help teachers select specific features of language to focus on in class that most affect comprehensibility (leaving how accented they sound aside)
  2. As a way of promoting “pronunciation literacy” and understanding of various dimensions of comprehensibility, such as pronunciation, fluency, and lexicogrammar
  3. As a method to promote pronunciation integration though oral tasks
  4. As a self-awareness raising tool for students in order to help them find areas to focus on.

Method of Development

The scale was developed based on recordings of international university students in Canada (150) and the UK (85) who completed a series of tasks: a picture description task (from Isaacs and Trofimovich, 2012), a graph description task (from the retired Test of Spoken English), IELTS speaking task 2, and a TOEFL iBT integrated speaking task.

These tasks were than rated by experienced EAP teachers from Canada (6) and the UK (4). All of these raters had master’s degrees and experience with pronunciation, some with experience teaching or assessing tests like the IELTS

Raters listened to several samples specifically chosen to meet a certain goal of analysis for the scale. Comments were recorded and used to revise the scale, making it usable for students from different language backgrounds based on their performance on different speaking tasks. The revised scale was presented at the next session and the process was repeated. The scale was whittled down and expanded throughout its development based on comments and suggestions by the raters, as well as previous research on pronunciation, comprehensibility, or oral task assessment.

Some of the changes during this process included clarifying descriptors, emphasizing authentic as opposed to native-like speech, splitting levels, and making different bands for pronunciation and fluency as main categories, and vocabulary and grammar as secondary. While both categories are important for comprehension, pronunciation and fluency were seen as the most crucial, as these “open the door” to finer grained analyses of language. Basically, you need to be able to understand the words and meaning first before you can assess grammar and vocabulary.

Levels 5 (highest) to 3 of the 6-level scale. Click to embiggen.


This research article detailed the rationale and method of development of the scale, but did not offer validation of the scale. That is actually the next step in their research. Until that research is published, what we have is a unique scale that tries to have clear language descriptions applicable to speakers from a variety of language backgrounds, is supposedly easy to use by teachers (the various oral tasks would only take a few minutes to record by students), and focuses on a critical element of language use: how easy it is to understand someone.

At first glance of the descriptors, the scale seems extremely subjective. But, it’s important to keep in mind that this is intended. Comprehensibility is a subjective measurement of how much effort a listener makes in understanding a speaker. A teacher, boss, or store clerk is going to judge someone not by the individual words they choose or the exactness of the grammar, but their ability to understand the overall message that is coming at them. Hence, the subjective nature of the scale seems apt.

One issue that concerns me is that this subjective rating was purposefully designed to be based on the experiences of EAP instructors rather than the lay public. This is something they made explicit in their comments to the raters during development. According to the authors, “The goal was to make the scale as user-friendly as possible for EAP teachers to incorporate in their classrooms, encouraging them to rely on their expertise when assessing students.” While this makes sense, it means that the overall rating does not paint a complete picture of the student, as an EAP teacher, who usually has experience working with linguistically diverse students, will perceive language differently than people outside of the ELT world. Perhaps non-expert raters (professors and “lay people”) can be used in the future as a way to compare assessments of comprehensibility and get an overall better picture.

Where Can I Get the Scale?

The scale is freely available from here. It includes the scale and instructions.


Isaacs, T., & Trofimovich, P. (2012). Deconstructing comprehensibility: Identifying the linguistic influences on listeners’ L2 comprehensibility ratings. Studies in Second Language
Acquisition, 34, 475–505.

Isaacs, T., Trofimovich, P., & Foote, J. A. (2017). Developing a user-oriented second language comprehensibility scale for English-medium universities. Language Testing. Retrieved from 

*Thank you very much to Dr. Isaacs, who took the time to read and contribute to this summary!

Featured photo by Couleur (Pixabay)


Anthony Schmidt on TwitterAnthony Schmidt on Wordpress
Anthony Schmidt
English language Instructor at University of Tennessee, Knoxville
Anthony Schmidt is editor of ELT Research Bites. He also has his own blog at Offline, he is a full-time English language instructor in a university IEP program. He is interested in all aspects of applied linguistics, in particular English for Academic Purposes.

3 thoughts on “Measuring Speech: A Comprehensibility Scale for Teachers”

  1. Hi,
    Thanks for this summary. Really interesting stuff.
    My concern with this and most other pronunciation research is that most of the time the raters are ‘native speakers’ from the Inner Circle (as is the case here). This of course doesn’t reflect the diversity of the English language. It also skews the results, because what might not be intelligible or comprehensible to a ‘native speaker’ might well be perfectly intelligible or comprehensible to a ‘non-native speaker’. If you’re really aiming to create a scale that is to be used internationally, rather than for example in ESL contexts only, what you’d want to do is to have a fairly representative sample of raters from all three of the Kachruvian Circles. What do you think?

    1. Hi Marek,

      Your idea certainly echoes one of my criticisms in the Takeaway above. Testing this scale on only “native speakers” does limit its applicability internationally. The fact that this establishes comprehensibility – rather than intelligibility and accentedness – as a goal is a step in the right direction, though. Yes, it was devised by native speakers, but they tried as best they could not to judge speakers against the native model. That being said, they were native speakers, so they really only could judge them by what was comprehensible to them. So, there is a kind of paradox there. Nevertheless, the instrument is not meant to be used for just any context, but for inner circle universities where students will need to be understood by inner circle native speakers. For that purpose, I think it was devised well. I guess the increasing internationalization of faculty and students, however, kind of muddles who this is or will really be useful for. Again, it’s a step in the right direction and needs further research to refine it or test its validity.

      1. Hi Anthony,
        I absolutely agree that it’s a step in the right direction. However, as you pointed out yourself, these students will not just need to be understood by ‘native speakers’, but by a much more international audience,which could have been reflected when designing the scale. Let’s see what happens at validation stage and whether they go for a more international group of raters.
        By the way, I’m a huge fan of the blog! There’s such a huge gap between research and practice in ELT, so it’s great to see you trying to bridge it here. Keep it up!

Leave a Reply