Assessing
Grammar
Differing
Notions Of ‘Grammar’for Assessment
Introduction
The
study of grammar has had a long and important role in the history of second
language and foreign language teaching. For centuries, to learn another
language, or what I will refer to generically as an L2, meant to know the
grammatical structures of that language and to cite prescriptions for its use.
Grammar was used to mean the analysis of a language system, and the study of
grammar was not just considered an essential feature of language learning, but
was thought to be sufficient for learners to actually acquire another language
(Rutherford, 1988). Grammar in and of itself was deemed to be worthy of study –
to the extent that in the Middle Ages in Europe, it was thought to be the
foundation of all knowledge and the gateway to sacred and secular understanding
(Hillocks and Smith, 1991). Thus, the central role of grammar in language
teaching remained relatively uncontested until the late twentieth century. Even
a few decades ago, it would have been hard to imagine language instruction
without immediately thinking of grammar.
Grammar
and linguistics
When
most language teachers, second language acquisition (SLA) researchers and
language testers think of ‘grammar’, they call to mind one of the many
paradigms (e.g., ‘traditional grammar’ or ‘universal grammar’) available for
the study and analysis of language. Such linguistic grammars are typically
derived from data taken from native speakers and minimally constructed to
describe well-formed utterances within an individual framework. These grammars
strive for internal consistency and are mainly accessible to those who have
been trained in that particular paradigm. Since the 1950s, there have been many
such linguistic theories – too numerous to list here – that have been proposed
to explain language phenomena. Many of these theories have helped shape how L2
educators currently define grammar in educational contexts. Although it is
beyond the purview of this book to provide a comprehensive review of these
theories, it is, nonetheless, helpful to mention a few, considering both the
impact they have had on L2 education and the role they play in helping define
grammar for assessment purposes.
Form-based
perspectives of language
Several
syntactocentric, or form-based, theories of language have provided grammatical
insights to L2 teachers. I will describe three: traditional grammar, structural
linguistics and transformational-generative grammar. One of the oldest theories
to describe the structure of language is traditional grammar. Originally based
on the study of Latin and Greek, traditional grammar drew on data from literary
texts to provide rich and lengthy descriptions of linguistic form. Unlike some
other syntactocentric theories, traditional grammar also revealed the
linguistic meanings of these forms and provided information on their usage in a
sentence (Celce-Murcia and Larsen-Freeman, 1999). Traditional grammar supplied
an extensive set of prescriptive rules along with the exceptions.
Traditional
grammar has been criticized for its inability to provide descriptions of the
language that could adequately incorporate the exceptions into the framework
and for its lack of generalizability to other languages. In other words,
traditional grammar postulated a separate, uniquely language-specific set of
rules or ‘parameters’ for every language.
Form-
and use-based perspectives of language
The
three theories of linguistic analysis described thus far have provided insights
to L2 educators on several grammatical forms. These insights provide
information to explain what structures are theoretically possible in a
language. Other linguistic theories, however, are better equipped to examine
how speakers and writers actually exploit linguistic forms during language use.
For example, if we wish to explain how seemingly similar structures like I like
to read and I like reading connote different meanings, we might turn to those
theories that study grammatical form and use interfaces. This would address
questions such as: Why does a language need two or more structures that are
similar in meaning? Are similar forms used to convey different specialized
meanings? To what degree are similar forms a function of written versus spoken
language, or to what degree are these forms characteristic of a particular
social group or a specific situation? It is important for us to discuss these
questions briefly if we ultimately wish to test grammatical forms along with
their meanings and uses in context.
Communication-based
perspectives of language
Other
theories have provided grammatical insights from a communication based
perspective. Such a perspective expresses the notion that language involves
more than linguistic form. It moves beyond the view of language as patterns of
morpho syntax observed within relatively decontextualized sentences or
sentences found within natural-occurring corpora. Rather, a communication-based
perspective views grammar as a set of linguistic norms, preferences and
expectations that an individual invokes to convey a host of pragmatic meanings
that are appropriate, acceptable and natural depending on the situation. The
assumption here is that linguistic form has no absolute, fixed meaning in
language use but is mutable and open to interpretation by those who use it in a
given circumstance. Grammar in this context is often co-terminous with language
itself, and stands not only for form, but also for meaningfulness and pragmatic
appropriacy, acceptability or naturalness – a topic I will return to later
since I believe that a blurring of these concepts is misleading and potentially
problematic for language educators.
What
is pedagogical grammar?
A
pedagogical grammar represents an eclectic, but principled description of the
target-language forms, created for the express purpose of helping teachers
understand the linguistic resources of communication. These grammars provide
information about how language is organized and offer relatively accessible ways
of describing complex, linguistic phenomena for pedagogical purposes.
Research
On L2 Grammar Teaching, Learning And Assessment
Research
on L2 teaching and learning
Over
the years, several of the questions mentioned above have intrigued language
teachers, inspiring them to experiment with different methods, approaches and
techniques in the teaching of grammar. To determine if students had actually learned
under the different conditions, teachers have used diverse forms of assessment
and drawn their own conclusions about their students. In so doing, these
teachers have acquired a considerable amount of anecdotal evidence on the
strengths and weaknesses of using different practices to implement L2 grammar
instruction. These experiences have led most teachers nowadays to ascribe to an
eclectic approach to grammar instruction, whereby they draw upon a variety of
different instructional techniques, depending on the individual needs, goals and
learning styles of their students.
Comparative
methods studies
The
comparative methods studies sought to compare the effects of different
language-teaching methods on the acquisition of an L2. These studies occurred
principally in the 1960s and 1970s, and stemmed from a reaction to the
grammar-translation method, which had dominated language instruction during the
first half of the twentieth century. More generally, these studies were in
reaction to form-focused instruction (referred to as ‘focus on forms’ by Long,
1991), which used a traditional structural syllabus of grammatical forms as the
organizing principle for L2 instruction. According to Ellis (1997),
form-focused instruction contrasts with meaning-focused instruction in that
meaning-focused instruction emphasizes the communication of messages (i.e., the
act of making a suggestion and the content of such a suggestion) while
formfocused instruction stresses the learning of linguistic forms. These can be
further contrasted with form-and-meaning focused instruction (referred to by
Long (1991) as ‘focus-on-form’), where grammar instruction occurs in a
meaning-based environment and where learners strive to communicate meaning
while paying attention to form.
Non-interventionist
studies
While
some language educators were examining different methods of teaching grammar in
the 1960s, others were feeling a growing sense of dissatisfaction with the
central role of grammar in the L2 curriculum. As a result, questions regarding
the centrality of grammar were again raised by a small group of L2 teachers and
syllabus designers who felt that the teaching of grammar in any form simply did
not produce the desired classroom results. Newmark (1966), in fact, asserted
that grammatical analysis and the systematic practice of grammatical forms were
actually interfering with the process of L2 learning, rather than promoting it,
and if left uninterrupted, second language acquisition, similar to first
language acquisition, would proceed naturally.
At
the same time, the role of grammar in the L2 curriculum was also being
questioned by some SLA researchers (e.g., Dulay and Burt, 1973; Bailey, Madden
and Krashen, 1974) who had been studying L2 learning in instructed and
naturalistic settings. In their attempts to characterize the L2 learner’s inter
language at one or more points along the path toward target-like proficiency,
several researchers came to similar conclusions about L2 development.
Empirical
studies in support of non-intervention
The
non-interventionist position was examined empirically by Prabhu (1987) in a
project known as the Communicational Teaching Project (CTP) in southern India. This
study sought to demonstrate that the development of grammatical ability could
be achieved through a task-based, rather than a form-focused, approach to language
teaching, provided that the tasks required learners to engage in meaningful
communication. In the CTP, Prabhu(1987) argued against the notion that the development
of grammatical ability depended on a systematic presentation of grammar
followed by planned practice. However, in an effort to evaluate the CTP program,
Beretta and Davies(1985) compared classes involved in the CTP with classes
outside the project taught with a structural-oral-situational method. They
administered a battery of tests to the students, and found that the CTP
learners outperformed the control group on a task-based test, whereas the
non-CTP learners did better on a traditional structure test. These results lent
partial support to the non-interventionist position by showing that task-based
classrooms based on meaningful communication can also be effective in promoting SLA.
However, these results also showed that again students do best when they are
taught and tested in similar ways.
Possible
implications of fixed developmental order to language assessment
The
notion that structures appear to be acquired in a fixed developmental order and
in a fixed developmental sequence might conceivably have some relevance to the
assessment of grammatical ability. First of all, these findings could give
language testers an empirical basis for constructing grammar tests that would
account for the variability inherent in a learner’s interlanguage. In other
words, information on the acquisitional order of grammatical items could
conceivably serve as a basis for selecting grammatical content for tests that
aim to measure different levels of developmental progression, such as Chang
(2002, 2004) did in examining the underlying structure of a test that attempted
to measure knowledge of the relative clauses.
Problems
with the use of development sequences as a basis for assessment
Although
developmental sequence research offers an intuitively appealing complement to
accuracy-based assessments in terms of interpreting test scores, I believe this
method is fraught with a number of serious problems, and language educators
should use extreme caution in applying this method to language testing. This is
because our understanding of natural acquisitional sequences is incomplete and
at too early a stage of research to be the basis for concrete assessment
recommendations (Lightbown, 1985; Hudson, 1993). First, the number of
grammatical sequences that show a fixed order of acquisition is very limited,
far too limited for all but the most restricted types of grammar tests.
Interventionist
studies
Not
all L2 educators are in agreement with the non-interventionist position to
grammar instruction. In fact, several (e.g., Schmidt, 1983; Swain, 1991) have
maintained that although some L2 learners are successful in acquiring selected
linguistic features without explicit grammar instruction, the majority fail to
do so. Testimony to this is the large number of non-native speakers who
emigrate to countries around the world, live there all their lives and fail to
learn the target language, or fail to learn it well enough to realize their
personal, social and long-term career goals. In these situations, language
teachers affirm that formal grammar instruction of some sort can be of benefit.
Empirical
studies in support of intervention
Aside
from anecdotal evidence, the non-interventionist position has come under
intense attack on both theoretical and empirical grounds with several SLA
researchers affirming that efforts to teach L2 grammar typically results in the
development of L2 grammatical ability. Hulstijn (1989) and Alanen (1995)
investigated the effectiveness of L2 grammar instruction on SLA in comparison
with no formal instruction. They found that when coupled with meaning-focused
instruction, the formal instruction of grammar appears to be more effective
than exposure to meaning or form alone. Long (1991) also argued for a focus on
both meaning and form in classrooms that are organized around meaningful and
sustained communicative interaction.
Research
on instructional techniques and their effects on acquisition
Much
of the recent research on teaching grammar has focused on four types of
instructional techniques and their effects on acquisition. Although a complete
discussion of teaching interventions is outside the purview of this book (see
Ellis, 1997; Doughty and Williams, 1998), these techniques include form- or
rule-based techniques, input-based techniques, feedback-based techniques and
practice-based techniques (Norris and Ortega, 2000). Form- or rule-based
techniques revolve around the instruction of grammatical forms.
Grammar
processing and second language development
In
the grammar-learning process, explicit grammatical knowledge refers to a
conscious knowledge of grammatical forms and their meanings. Explicit knowledge
is usually accessed slowly, even when it is almost fully automatized (Ellis, 2001b).
DeKeyser (1995) characterizes grammatical instruction as ‘explicit’ when it
involves the explanation of a rule or the request to focus on a grammatical
feature. Instruction can be explicitly deductive, where learners are given
rules and asked to apply them, or explicitly inductive, where they are given
samples of language from which to generate rules and make generalizations.
Implications
for assessing grammar
The
studies investigating the effects of teaching and learning on grammatical
performance present a number of challenges for language assessment. First of
all, the notion that grammatical knowledge structures can be differentiated
according to whether they are fully automatized (i.e., implicit) or not (i.e.,
explicit) raises important questions for the testing of grammatical ability
(Ellis, 2001b). Given the many purposes of assessment, we might wish to test
explicit knowledge of grammar, implicit knowledge of grammar or both. For
example, in certain classroom contexts, we might want to assess the learners’
explicit knowledge of one or more grammatical forms, and could, therefore, ask
learners to answer multiple-choice or short-answer questions related to these
forms.
The
Role Of Grammar In Models Of Communicative Language Ability
The
role of grammar in models of communicative competence
Every
language educator who has ever attempted to measure a student’s communicative
language ability has wondered: ‘What exactly does a student need to “know” in
terms of grammar to be able to use it well enough for some real-world purpose?’
In other words, they have been faced with the challenge of defining grammar for
communicative purposes. To complicate matters further, linguistic notions of
grammar have changed over time, as we have seen, and this has significantly
increased number of components that could be called ‘grammar’. In short,
definitions of grammar and grammatical knowledge have changed over time and
across context, and I expect this will be no different in the future.
Rea-Dickins’
definition of grammar
In
discussing more specifically howgrammatical knowledge might be tested within a
communicative framework, Rea-Dickins (1991) defined ‘grammar’ as the single
embodiment of syntax, semantics and pragmatics. She argued against Canale and
Swain’s (1980) and Bachman’s (1990b) multi-componential view of communicative
competence on the grounds that componential representations overlook the
interdependence and interaction between and among the various components. She
further stated that in Canale and Swain’s (1980) model, the notion of
grammatical competence was limited since it defined grammar as ‘structure’ on
the one hand and as ‘structure and semantics’ on the other, but ignored the
notion of ‘structure as pragmatics’. Similarly, she added that in Bachman’s
(1990b) model, grammar was defined as structure at the sentence level and as
cohesion at the suprasentential level, but this model failed to account for the
pragmatic dimension of communicative grammar.
Larsen-Freeman’s
definition of grammar
Another
conceptualization of grammar that merits attention is Larsen- Freeman’s (1991,
1997) framework for the teaching of grammar in com municative language teaching
contexts. Drawing on several linguistic theories and influenced by language
teaching pedagogy, she has also characterized grammatical knowledge along three
dimensions: linguistic
form,
semantic meaning and pragmatic use. Form is defined as both morphology, or how
words are formed, and syntactic patterns, or how words are strung together.
This dimension is primarily concerned with linguistic accuracy. The meaning
dimension describes the inherent or literal message conveyed by a lexical item
or a lexico-grammatical feature. This dimension is mainly concerned with the
meaningfulness of an utterance.
What
is meant by ‘grammar’ for assessment purposes?
Regardless
of the assessment purpose, if we wish to make inferences about grammatical
ability on the basis of a grammar test or some other form of assessment, it is
important to know what we mean by ‘grammar’ when attempting to specify
components of grammatical knowledge for measurement purposes. With this goal in
mind, we need a definition of grammatical knowledge that is broad enough to
provide a theoretical basis for the construction and validation of tests in a
number of contexts. At the same time, we need our definition to be precise
enough to distinguish it from other areas of language ability.
Towards
A Definition Of Grammatical Ability
Defining
grammatical constructs
Although
our basic underlying model of grammar will remain the same in all testing
situations (i.e., grammatical form and meaning), what it means to ‘know’
grammar for different contexts will most likely change (see Chapelle, 1998). In
other words, the type, range and scope of grammatical features required to
communicate accurately and meaningfully will vary from one situation to
another. For example, the type of grammatical knowledge needed to write a
formal academic essay would be very different from that needed to make a train
reservation. Given the many possible ways of interpreting what it means to
‘know’ grammar, it is important that we define what we mean by ‘grammatical
knowledge’ for any given testing situation. A clear definition of what we
believe it means to ‘know’ grammar for a particular testing context will then
allow us to construct tests that measure grammatical ability.
Definition
of key terms
Before
continuing this discussion, it might be helpful if I clarified some of the key
terms.
Knowledge of
phonological or graphological form and meaning
Knowledge
of phonological/graphological form enables us to understand and produce
features of the sound or writing system, with the exception of meaning-based
orthographies such as Chinese characters, as they are used to convey meaning in
testing or language-use situations.
Knowledge of lexical
form and meaning
Knowledge
of lexical form enables us to understand and produce those features of words
that encode grammar rather than those that reveal meaning. This includes words
that mark gender (e.g., waitress), countability (e.g., people) or part of
speech (e.g., relate, relation). For example, when the word think in English is
followed by the preposition about before a noun, this is considered the grammatical
dimension of lexis, representing a co-occurrence restriction with prepositions.
One area of lexical form that poses a challenge to learners of some languages
is word formation. This includes compounding in English with a noun + noun or a
verb + particle pattern.
Knowledge of
morphosyntactic form and meaning
Knowledge
of morphosyntactic form permits us to understand and produce both the
morphological and syntactic forms of the language. This includes the articles,
prepositions, pronouns, affixes (e.g., -est), syntactic structures, word order,
simple, compound and complex sentences, mood, voice and modality. A learner who
knows the morphosyntactic form of the English conditionals would know that: (1)
an if-clause sets up a condition and a result clause expresses the outcome; (2)
both clauses can be in the sentence-initial position in English; (3) if can be
deleted under certain conditions as long as the subject and operator are
inverted; and (4) certain tense restrictions are imposed on if and result clauses.
Knowledge of cohesive
form and meaning
Knowledge
of cohesive form enables us to use the phonological, lexical and
morphosyntactic features of the language in order to interpret and express
cohesion on both the sentence and the discourse levels. Cohesive form is
directly related to cohesive meaning through cohesive devices (e.g., she, this,
here) which create links between cohesive forms and their referential meanings
within the linguistic environment or the surrounding co-text. Halliday and
Hasan (1976, 1989) list a number of grammatical forms for displaying cohesive
meaning.
Knowledge of information
management form and meaning
Knowledge
of information management form allows us to use linguistic forms as a resource
for interpreting and expressing the information structure of discourse. Some
resources that help manage the presentation of information include, for
example, prosody, word order, tense-aspect and parallel structures. These forms
are used to create information management meaning.
Knowledge of
interactional form and meaning
Knowledge
of interactional form enables us to understand and use linguistic forms as a
resource for understanding and managing talk-ininteraction. These forms include
discourse markers and communication management strategies. Discourse markers
consist of a set of adverbs, conjunctions and lexicalized expressions used to
signal certain language functions.
Designing
Test Tasks To Measure L2 Grammatical Ability.
How
does test development begin?
Every
grammar-test development project begins with a desire to obtain (and often
provide) information about how well a student knows grammar in order to convey
meaning in some situation where the target language is used. The information
obtained from this assessment then forms the basis for decision-making. Those
situations in which we use the target language to communicate in real life or
in which we use it for instruction or testing are referred to as the target
language use (TLU) situations (Bachman and Palmer, 1996). Within these
situations, the tasks or activities requiring language to achieve a
communicative goal are called the target language use tasks.
What
do we mean by ‘task’?
The
notion of ‘task’ in language-learning contexts has been conceptualized in many
different ways over the years. Traditionally, ‘task’ has referred to any
activity that requires students to do something for the intent purpose of
learning the target language. A task then is any activity (i.e., short answers,
role-plays) as long as it involves a linguistic or nonlinguistic (circle the
answer) response to input. Traditional learning or teaching tasks are
characterized as having an intended pedagogical purpose – which may or may not
be made explicit; they have a set of instructions that control the kind of
activity to be performed; they contain input (e.g., questions); and they elicit
a response.
What
are the characteristics of grammatical test tasks?
As
the goal of grammar assessment is to provide as useful a measurement as possible
of our students’ grammatical ability, we need to design test tasks in which the
variability of our students’ scores is attributed to the differences in their
grammatical ability, and not to uncontrolled or irrelevant variability
resulting from the types of tasks or the quality of the tasks that we have put
on our tests. As all language teachers know, the kinds of tasks we use in tests
and their quality can greatly influence how students will perform.
The
Bachman and Palmer framework
Bachman
and Palmer’s (1996) framework of task characteristics represents the most
recent thinking in language assessment of the potential relationships between
task characteristics and test performance. In this framework, they outline five
general aspects of tasks, each of which is characterized by a set of
distinctive features. These five aspects describe characteristics of (1) the
setting, (2) the test rubrics, (3) the input, (4) the expected response and (5)
the relationship between the input and response.
Describing
grammar test tasks
When
language teachers consider tasks for grammar tests, they call to mind a large
repertoire of task types that have been commonly used in teaching and testing
contexts. We now know that these holistic task types constitute collections of
task characteristics for eliciting performance and that these holistic task
types can vary on a number of dimensions. We also need to remember that the
tasks we include on tests should strive to match the types of language-use
tasks found in real-life or language instructional domains.
Selected-response
task types
Selected-response
tasks present input in the form of an item, and test takers are expected to
select the response. Other than that, all other task characteristics can vary.
For example, the form of the input can be language, non-language or both, and
the length of the input can vary from a word to larger pieces of discourse. In
terms of the response, selected response tasks are intended to measure
recognition or recall of grammatical form and/or meaning.
Limited-production
task types
Limited-production
tasks are intended to assess one or more areas of grammatical knowledge
depending on the construct definition. Unlike selected-response items, which
usually have only one possible answer, the range of possible answers for
limited-production tasks can, at times, be large – even when the response
involves a single word.
Developing Tests To Measure L2
Grammatical Ability
What
makes a grammar test ‘useful’?
We
concluded in the last chapter that the goal of every grammar test was to obtain
(and provide) information on how well a student knows or can use grammar to
convey meaning in some situation where the target language is used. The
responses to the test items can then be used as a basis for assigning scores
and for making inferences about the student’s underlying grammatical ability.
We discussed these responses in terms of inferences because it is not possible
to observe a person’s grammatical ability directly; rather, we must infer the
underlying ability from responses to questions or from samples of actual
performance.
The
quality of reliability
Similarly,
the scores from tests or components of tests can also be characterized as being
reliable when the tests provide the same results every time we administer them,
regardless of the conditions under which they are administered.
The
quality of construct validity
The
second quality that all ‘useful’ tests possess is construct validity. Bachman
and Palmer (1996) define construct validity as ‘the extent to which we can
interpret a given test score as an indicator of the ability(ies), or
construct(s), we want to measure. Construct validity also has to do with the
domain of generalization to which our score interpretations generalize’ (p.
21). In other words, construct validity not only refers to the meaningfulness
and appropriateness of the interpretations we make based on test scores, but it
also pertains to the degree to which the score-based interpretations can be
extrapolated beyond the testing situation to a particular TLU domain (Messick
1993).
The
quality of authenticity
A
third quality of test usefulness is authenticity, a notion much discussed in
language testing since the late 1970s, when communicative approaches to
language teaching were first taking root. Building on these discussions,
Bachman and Palmer (1996) refer to ‘authenticity’ as the degree of
correspondence between the test-task characteristics and the TLU task
characteristics. Given the framework for test-task characteristics discussed in
Chapter 5, they provide a systematic way of matching test tasks with TLU tasks
in terms of the features of the test setting, rubrics, input, expected response
and the relationship between the input and response.
The
quality of interactiveness
A
fourth quality of test usefulness outlined by Bachman and Palmer (1996) is
interactiveness. This quality refers to the degree to which the aspects of the
test-taker’s language ability we want to measure (e.g., grammatical knowledge,
language knowledge) are engaged by the test task characteristics (e.g, the
input response, and relationship between the input and response) based on the
test constructs. In other words, the task should engage the characteristics we
want to measure (e.g., grammatical knowledge) given the test purpose, and
nothing else (e.g., topical knowledge, affective schemata); otherwise, this may
mask the very constructs we are trying to measure. In the case of grammar
assessment, test tasks can be characterized as ‘interactive’ to the extent that
they require individuals to draw on and manage their cognitive and
metacognitive strategies (i.e., their strategic competence) in order to use
grammatical knowledge accurately and meaningfully.
The
quality of impact
Testing
plays an important role in society. Tests serve as gate-keeping devices or
doors to opportunity. They can be used to punish and to praise. It is,
therefore, important to recognize that tests reflect and represent the social,
cultural and political values of any given society, and in the evaluation of
test usefulness, we must take into consideration the possible consequences that
may ensue from the decision to use test results for decision-making. Bachman
and Palmer (1996) refer to the degree to which testing and test score decisions
influence all aspects of society and the individuals within that society as
test impact.
The
quality of practicality
Scores
from a grammar test could be highly reliable and provide a basis for making
valid inferences, but at the same time completely lacking in practicality. It
may be completely beyond our means with respect to the available human,
material or time resources. Test practicality is not a quality of a test
itself, but is a function of the extent to which we are able to balance the
costs associated with designing, developing, administering, and scoring a test
in light of the available resources (Bachman, personal communication, 2002).
Overview
of grammar-test construction
Each
testing situation is specific unto itself, with a specific purpose, a specific
audience and a specific set of parameters that will affect the test design and
development process. As a result, there is no one ‘right’ way to develop a
test; nor are there any recipes for ‘good’ tests that could generalize to all
situations. There are, however, several frameworks of test development that
have been proposed (e.g., Alderson, Clapham and Wall, 1995; Bachman and Palmer,
1996; Brown, 1996; Davidson and Lynch, 2002) which serve to guide the
test-development process so that the qualities of test usefulness will not be
ignored.
Illustrative Tests Of Grammatical
Ability
The
First Certificate in English Language Test (FCE)
Purpose
The
First Certificate in English (FCE) exam was first developed by the University
of Cambridge Local Examinations Syndicate (UCLES, now Cambridge ESOL) in 1939
and has been revised periodically ever since. This exam is the most widely
taken Cambridge ESOL examination with an annual candidature of over 270,000
(see http://www.cambridgeesol.org/ exam/index.cfm). The purpose of the FCE
(Cambridge ESOL, 2001a) is to assess the general English language proficiency
of learners as measured by their abilities in reading, writing, speaking,
listening, and knowledge of the lexical and grammatical systems of English
(Cambridge ESOL, 1995, p. 4). More specifically, the FCE is a level-three exam
in the Cambridge main suite of exams, and consists of five compulsory subtests
or ‘papers’: reading, writing, use of English, listening and speaking
(Cambridge ESOL, 1996, p. 8). Students who pass the FCE are assumed to have
sufficient proficiency to handle routine office jobs (clerical, managerial) and
to take courses given in English (Cambridge ESOL, 2001a, p. 6). Given that the
FCE can be used as certification of English language proficiency for certain
types of jobs, it is considered a high-stakes test.
The
Comprehensive English Language Test (CELT)
Purpose
The
Comprehensive English Language Test (CELT) (Harris and Palmer, 1970a, 1986) was
designed to measure the English language ability of nonnative speakers of
English. The authors claim in the technical manual (Harris and Palmer, 1970b)
that this test is most appropriate for students at the intermediate or advanced
levels of proficiency. English language proficiency is measured by means of a
structure subtest, a vocabulary subtest and a listening subtest. According to
the authors, these subtests can be used alone or in combination (p. 1). Scores
from the CELT have been used to make decisions related to placement in a
language program, acceptance into a university and achievement in a language
course (Harris and Palmer, 1970b, p. 1), and for this reason, it may be
considered a high-stakes test. One or more subtests of the CELT have also been
used as a measure of English language proficiency in SLA research.
Learning-Oriented Assessments Of
Grammatical Ability
What
is learning-oriented assessment of grammar?
In
reaction to conventional testing practices typified by large-scale, discrete-
point, multiple-choice tests of language ability, several educators (e.g.,
Herman, Aschbacher and Winters, 1992; Short, 1993; Shohamy, 1995; Shepard,
2000) have advocated reforms so that assessment practices might better capture
educational outcomes and might be more consistent with classroom goals, curricula
and instruction.
Implementing
learning-oriented assessment of grammar
Considerations
from grammar-testing theory
The
development procedures for constructing large-scale assessments of grammatical
ability discussed in Chapter 6 are similar to those needed to develop
learning-oriented assessments of grammar for classroom purposes with the
exception that the decisions made from classroom assessments will be somewhat
different due to the learning-oriented mandate of classroom assessment. Also,
given the usual low-stakes nature of the decisions in classroom assessment, the
amount of resources that needs to be expended is generally less than that
required for large-scale assessment. In this section, without repeating what
was discussed in Chapter 6, I will highlight some of the implications this
mandate might have for test design and operationalization.
Considerations
from L2 learning theory
Given
that learning-oriented assessment involves the collection and interpretation of
evidence about performance so that judgments can be made about further language
development, learning-oriented assessment of grammar needs to be rooted not
only in a theory of grammar testing or language proficiency, but also in a
theory of L2 learning. What is striking in the literature is that models of
language ability rarely refer to models of language learning, and models of
language learning rarely make reference to models of language ability. In
learning-oriented assessment, the consideration of both perspectives is
critical.
Illustrative
example of learning-oriented assessment
Let
us now turn to an illustration of a learning-oriented achievement test of
grammatical ability.
Making
assessment learning-oriented
The
On Target achievement tests were designed with a clear learning mandate. The
content of the tests had to be strictly aligned with the content of the
curriculum. This obviously had several implications for the test design and its
operationalization. From a testing perspective, the primary purpose of the Unit
7 achievement test was to measure the students’ explicit as well as their
implicit knowledge of grammatical form and meaning on both the sentence and
discourse levels.
Challenges
and new directions in assessing grammatical ability
The
state of grammar assessment
In
the last fifty years, language testers have dedicated a great deal of time to
discussing the nature of language proficiency and the testing of the four
skills, the qualities of test usefulness (i.e., reliability, authenticity), the
relationships between test-taker or task characteristics and performance, and
numerous statistical procedures for examining data and providing evidence of
test validity
Challenge
1: Defining grammatical ability
One
major challenge revolves around how grammatical ability has been defined both
theoretically and operationally in language testing. As we saw in Chapters 3
and 4, in the 1960s and 1970s language teaching and language testing maintain eda
strong syntax to centric view of language rooted largely in linguistic
structuralism. Moreover, models of language ability, such as those proposed by
Lado (1961) and Carroll (1961), had a clear linguistic focus, and assessment
concentrated on measuring language elements – defined in terms of
morphosyntactic forms on the sentence level – while performing language skills.
Challenge
2: Scoring grammatical ability
A
second challenge relates to scoring, as the specification of both form and
meaning is likely to influence the ways in which grammar assessments are
scored. As we discussed in Chapter 6, responses with multiple criteria for
correctness may necessitate different scoring procedures. For example, the use
of dichotomous scoring, even with certain selected response items, might need
to give way to partial-credit scoring, since some wrong answers may reflect
partial development either in form or meaning. As a result, language educators
might need to adapt their scoring procedures to reflect the two dimensions of
grammatical knowledge.
Challenge
3: Assessing meanings
The
third challenge revolves around ‘meaning’ and how ‘meaning’ in a model of
communicative language ability can be defined and assessed. The ‘communicative’
in communicative language teaching, communicative language testing,
communicative language ability, or communicative competence refers to the
conveyance of ideas, information, feelings, attitudes and other intangible
meanings (e.g., social status) through language.
Challenge
4: Reconsidering grammar-test tasks
The
fourth challenge relates to the design of test tasks that are capable of both
measuring grammatical ability and providing authentic and engaging measures of
grammatical performance. Since the early 1960s, language educators have
associated grammar tests with discrete-point, multiple-choice tests of
grammatical form. These and other ‘traditional’ test tasks (e.g.,
grammaticality judgments) have been severely criticized for lacking in
authenticity, for not engaging test-takers in language use, and for promoting
behaviors that are not readily consistent with communicative language teaching.
Challenge
5: Assessing the development of grammatical
Ability
The
fifth challenge revolves around the argument, made by some researchers, that
grammatical assessments should be constructed, scored and interpreted with
developmental proficiency levels in mind. This notion stems from the work of
several SLA researchers (e.g. Clahsen, 1985; Pienemann and Johnson, 1987;
Ellis, 2001b) who maintain that the principal finding from years of SLA
research is that structures appear to be acquired in a fixed order and a fixed
developmental sequence. Furthermore, instruction on forms in non-contiguous
stages appears to be ineffective. As a result, the acquisitional development of
learners, they argue, should be a major consideration in the L2 grammar
testing.
Final
remarks
Despite
loud claims in the 1970s and 1980s by a few influential SLA researchers that
instruction, and in particular explicit grammar instruction, had no effect on
language learning, most language teachers around the world never really gave up
grammar teaching. Furthermore, these claims have instigated an explosion of
empirical research in SLA, the results of which have made a compelling case for
the effectiveness of certain types of both explicit and implicit grammar
instruction. This research has also highlighted the important role that meaning
plays in learning grammatical forms.
Assessing
Vocabulary
The
Place Of Vocabulary In Language Assessment
Recent
trends in language testing
However,
scholars in the field of language testing have a rather different perspective
on vocabulary-test items of the conventional kind. Such items fit neatly into
what language testers call the discrete point approach to testing. This
involves designing tests to assess whether learners have knowledge of
particular structural elements of the language: word meanings, word forms,
sentence patterns, sound contrasts and so on. In the last thirty years of the
twentieth century, language testers progressively moved away from this
approach, to the extent that such tests are now quite out of step with current
thinking about how to design language tests, especially for proficiency
assessment.
Three
dimensions of vocabulary assessment
Up
to this point, I have outlined two contrasting perspectives on the role of vocabulary
in language assessment. One point of view is that it is perfectly sensible to
write tests that measure whether learners know the meaning and usage of a set
of words, taken as independent semantic units. The other view is that
vocabulary must always be assessed in the context of a language-use task, where
it interacts in a natural way with other components of language knowledge. To
some extent, the two views are complementary in that they relate to different
purposes of assessment.
Discrete
- embedded
The
first dimension focuses on the construct which underlies the assessment
instrument. In language testing, the term construct refers to the mental
attribute or ability that a test is designed to measure. In the case of a
traditional vocabulary test, the construct can usually be labelled as
`vocabulary knowledge' of some kind. The practical significance of defining the
construct is that it allows us to clarify the meaning of the test results.
Normally we want to interpret the scores on a vocabulary test as a measure of
some aspect of the learners' vocabulary knowledge, such as their progress in
learning words from the last several units in the course book, their ability to
supply derived forms of base words (like scientist and scientific, from
science), or their skill at inferring the meaning of unknown words in a reading
passage. Thus, a discrete test takes vocabulary knowledge as a distinct
construct, separated from other components of language competence.
Selective
- comprehensive
The
second dimension concerns the range of vocabulary to be included in the
assessment. A conventional vocabulary test is based on a set of target words
selected by the test-writer, and the test-takers are assessed according to how
well they demonstrate their knowledge of the meaning or use of those words.
This is what I call a selective vocabulary measure. The target words may either
be selected as individual words and then incorporated into separate test items,
or alternatively the test-writer first chooses a suitable text and then uses
certain words from it as the basis for the vocabulary assessment.
Context-independent
- context-dependent
The
role of context, which is an old issue in vocabulary testing, is the basis for
the third dimension. Traditionally contextualisation has meant that a word is
presented to test-takers in a sentence rather than as an isolated element. From
a contemporary perspective, it is necessary to broaden the notion of context to
include whole texts and, more generally, discourse. In addition, we need to recognise
that contextualisation is more than just a matter of the way in which
vocabulary is presented. The key question is to what extent the test takers are
being assessed on the basis of their ability to engage with the context
provided in the test.
An
overview of the book
The
three dimensions are not intended to form a comprehensive model of vocabulary
assessment. Rather, they provide a basis for locating the variety of assessment
procedures currently in use within a common framework and, in particular, they
offer points of contact between tests which treat words as discrete units and
ones that assess vocabulary more integratively in a task-based testing context.
At various points through the book I refer to the dimensions and exemplify
them. Since a large proportion of work on vocabulary assessment to date has
involved instruments which are relatively discrete, selective and context
independent in nature, this approach may seem to be predominant in several of
the following chapters. However, my aim is to present a balanced view of the
subject, and I discuss measures that are more embedded, comprehensive and
context dependent wherever the opportunity arises, and especially in the last
two chapters of the book.
References:
Purpura, james. 2004.
ASSESSING GRAMMAR. United Kingdom: University Press Cambridge.
Read, John. 2000.
ASSESSING VOCABULARY. United Kingdom: University Press Cambridge.