Tugas language assesment

Rabu, 13 Mei 2020

Summary Assessing Grammar and Assessing Vocabulary

Assessing Grammar

Diﬀering Notions Of ‘Grammar’for Assessment

Introduction

The study of grammar has had a long and important role in the history of second language and foreign language teaching. For centuries, to learn another language, or what I will refer to generically as an L2, meant to know the grammatical structures of that language and to cite prescriptions for its use. Grammar was used to mean the analysis of a language system, and the study of grammar was not just considered an essential feature of language learning, but was thought to be suﬃcient for learners to actually acquire another language (Rutherford, 1988). Grammar in and of itself was deemed to be worthy of study – to the extent that in the Middle Ages in Europe, it was thought to be the foundation of all knowledge and the gateway to sacred and secular understanding (Hillocks and Smith, 1991). Thus, the central role of grammar in language teaching remained relatively uncontested until the late twentieth century. Even a few decades ago, it would have been hard to imagine language instruction without immediately thinking of grammar.

Grammar and linguistics

When most language teachers, second language acquisition (SLA) researchers and language testers think of ‘grammar’, they call to mind one of the many paradigms (e.g., ‘traditional grammar’ or ‘universal grammar’) available for the study and analysis of language. Such linguistic grammars are typically derived from data taken from native speakers and minimally constructed to describe well-formed utterances within an individual framework. These grammars strive for internal consistency and are mainly accessible to those who have been trained in that particular paradigm. Since the 1950s, there have been many such linguistic theories – too numerous to list here – that have been proposed to explain language phenomena. Many of these theories have helped shape how L2 educators currently deﬁne grammar in educational contexts. Although it is beyond the purview of this book to provide a comprehensive review of these theories, it is, nonetheless, helpful to mention a few, considering both the impact they have had on L2 education and the role they play in helping deﬁne grammar for assessment purposes.

Form-based perspectives of language

Several syntactocentric, or form-based, theories of language have provided grammatical insights to L2 teachers. I will describe three: traditional grammar, structural linguistics and transformational-generative grammar. One of the oldest theories to describe the structure of language is traditional grammar. Originally based on the study of Latin and Greek, traditional grammar drew on data from literary texts to provide rich and lengthy descriptions of linguistic form. Unlike some other syntactocentric theories, traditional grammar also revealed the linguistic meanings of these forms and provided information on their usage in a sentence (Celce-Murcia and Larsen-Freeman, 1999). Traditional grammar supplied an extensive set of prescriptive rules along with the exceptions.

Traditional grammar has been criticized for its inability to provide descriptions of the language that could adequately incorporate the exceptions into the framework and for its lack of generalizability to other languages. In other words, traditional grammar postulated a separate, uniquely language-speciﬁc set of rules or ‘parameters’ for every language.

Form- and use-based perspectives of language

The three theories of linguistic analysis described thus far have provided insights to L2 educators on several grammatical forms. These insights provide information to explain what structures are theoretically possible in a language. Other linguistic theories, however, are better equipped to examine how speakers and writers actually exploit linguistic forms during language use. For example, if we wish to explain how seemingly similar structures like I like to read and I like reading connote diﬀerent meanings, we might turn to those theories that study grammatical form and use interfaces. This would address questions such as: Why does a language need two or more structures that are similar in meaning? Are similar forms used to convey diﬀerent specialized meanings? To what degree are similar forms a function of written versus spoken language, or to what degree are these forms characteristic of a particular social group or a speciﬁc situation? It is important for us to discuss these questions brieﬂy if we ultimately wish to test grammatical forms along with their meanings and uses in context.

Communication-based perspectives of language

Other theories have provided grammatical insights from a communication based perspective. Such a perspective expresses the notion that language involves more than linguistic form. It moves beyond the view of language as patterns of morpho syntax observed within relatively decontextualized sentences or sentences found within natural-occurring corpora. Rather, a communication-based perspective views grammar as a set of linguistic norms, preferences and expectations that an individual invokes to convey a host of pragmatic meanings that are appropriate, acceptable and natural depending on the situation. The assumption here is that linguistic form has no absolute, ﬁxed meaning in language use but is mutable and open to interpretation by those who use it in a given circumstance. Grammar in this context is often co-terminous with language itself, and stands not only for form, but also for meaningfulness and pragmatic appropriacy, acceptability or naturalness – a topic I will return to later since I believe that a blurring of these concepts is misleading and potentially problematic for language educators.

What is pedagogical grammar?

A pedagogical grammar represents an eclectic, but principled description of the target-language forms, created for the express purpose of helping teachers understand the linguistic resources of communication. These grammars provide information about how language is organized and oﬀer relatively accessible ways of describing complex, linguistic phenomena for pedagogical purposes.

Research On L2 Grammar Teaching, Learning And Assessment

Research on L2 teaching and learning

Over the years, several of the questions mentioned above have intrigued language teachers, inspiring them to experiment with diﬀerent methods, approaches and techniques in the teaching of grammar. To determine if students had actually learned under the diﬀerent conditions, teachers have used diverse forms of assessment and drawn their own conclusions about their students. In so doing, these teachers have acquired a considerable amount of anecdotal evidence on the strengths and weaknesses of using diﬀerent practices to implement L2 grammar instruction. These experiences have led most teachers nowadays to ascribe to an eclectic approach to grammar instruction, whereby they draw upon a variety of diﬀerent instructional techniques, depending on the individual needs, goals and learning styles of their students.

Comparative methods studies

The comparative methods studies sought to compare the eﬀects of diﬀerent language-teaching methods on the acquisition of an L2. These studies occurred principally in the 1960s and 1970s, and stemmed from a reaction to the grammar-translation method, which had dominated language instruction during the ﬁrst half of the twentieth century. More generally, these studies were in reaction to form-focused instruction (referred to as ‘focus on forms’ by Long, 1991), which used a traditional structural syllabus of grammatical forms as the organizing principle for L2 instruction. According to Ellis (1997), form-focused instruction contrasts with meaning-focused instruction in that meaning-focused instruction emphasizes the communication of messages (i.e., the act of making a suggestion and the content of such a suggestion) while formfocused instruction stresses the learning of linguistic forms. These can be further contrasted with form-and-meaning focused instruction (referred to by Long (1991) as ‘focus-on-form’), where grammar instruction occurs in a meaning-based environment and where learners strive to communicate meaning while paying attention to form.

Non-interventionist studies

While some language educators were examining diﬀerent methods of teaching grammar in the 1960s, others were feeling a growing sense of dissatisfaction with the central role of grammar in the L2 curriculum. As a result, questions regarding the centrality of grammar were again raised by a small group of L2 teachers and syllabus designers who felt that the teaching of grammar in any form simply did not produce the desired classroom results. Newmark (1966), in fact, asserted that grammatical analysis and the systematic practice of grammatical forms were actually interfering with the process of L2 learning, rather than promoting it, and if left uninterrupted, second language acquisition, similar to ﬁrst language acquisition, would proceed naturally.

At the same time, the role of grammar in the L2 curriculum was also being questioned by some SLA researchers (e.g., Dulay and Burt, 1973; Bailey, Madden and Krashen, 1974) who had been studying L2 learning in instructed and naturalistic settings. In their attempts to characterize the L2 learner’s inter language at one or more points along the path toward target-like proﬁciency, several researchers came to similar conclusions about L2 development.

Empirical studies in support of non-intervention

The non-interventionist position was examined empirically by Prabhu (1987) in a project known as the Communicational Teaching Project (CTP) in southern India. This study sought to demonstrate that the development of grammatical ability could be achieved through a task-based, rather than a form-focused, approach to language teaching, provided that the tasks required learners to engage in meaningful communication. In the CTP, Prabhu(1987) argued against the notion that the development of grammatical ability depended on a systematic presentation of grammar followed by planned practice. However, in an eﬀort to evaluate the CTP program, Beretta and Davies(1985) compared classes involved in the CTP with classes outside the project taught with a structural-oral-situational method. They administered a battery of tests to the students, and found that the CTP learners outperformed the control group on a task-based test, whereas the non-CTP learners did better on a traditional structure test. These results lent partial support to the non-interventionist position by showing that task-based classrooms based on meaningful communication can also be eﬀective in promoting SLA. However, these results also showed that again students do best when they are taught and tested in similar ways.

Possible implications of ﬁxed developmental order to language assessment

The notion that structures appear to be acquired in a fixed developmental order and in a fixed developmental sequence might conceivably have some relevance to the assessment of grammatical ability. First of all, these findings could give language testers an empirical basis for constructing grammar tests that would account for the variability inherent in a learner’s interlanguage. In other words, information on the acquisitional order of grammatical items could conceivably serve as a basis for selecting grammatical content for tests that aim to measure different levels of developmental progression, such as Chang (2002, 2004) did in examining the underlying structure of a test that attempted to measure knowledge of the relative clauses.

Problems with the use of development sequences as a basis for assessment

Although developmental sequence research offers an intuitively appealing complement to accuracy-based assessments in terms of interpreting test scores, I believe this method is fraught with a number of serious problems, and language educators should use extreme caution in applying this method to language testing. This is because our understanding of natural acquisitional sequences is incomplete and at too early a stage of research to be the basis for concrete assessment recommendations (Lightbown, 1985; Hudson, 1993). First, the number of grammatical sequences that show a fixed order of acquisition is very limited, far too limited for all but the most restricted types of grammar tests.

Interventionist studies

Not all L2 educators are in agreement with the non-interventionist position to grammar instruction. In fact, several (e.g., Schmidt, 1983; Swain, 1991) have maintained that although some L2 learners are successful in acquiring selected linguistic features without explicit grammar instruction, the majority fail to do so. Testimony to this is the large number of non-native speakers who emigrate to countries around the world, live there all their lives and fail to learn the target language, or fail to learn it well enough to realize their personal, social and long-term career goals. In these situations, language teachers affirm that formal grammar instruction of some sort can be of benefit.

Empirical studies in support of intervention

Aside from anecdotal evidence, the non-interventionist position has come under intense attack on both theoretical and empirical grounds with several SLA researchers affirming that efforts to teach L2 grammar typically results in the development of L2 grammatical ability. Hulstijn (1989) and Alanen (1995) investigated the effectiveness of L2 grammar instruction on SLA in comparison with no formal instruction. They found that when coupled with meaning-focused instruction, the formal instruction of grammar appears to be more effective than exposure to meaning or form alone. Long (1991) also argued for a focus on both meaning and form in classrooms that are organized around meaningful and sustained communicative interaction.

Research on instructional techniques and their effects on acquisition

Much of the recent research on teaching grammar has focused on four types of instructional techniques and their effects on acquisition. Although a complete discussion of teaching interventions is outside the purview of this book (see Ellis, 1997; Doughty and Williams, 1998), these techniques include form- or rule-based techniques, input-based techniques, feedback-based techniques and practice-based techniques (Norris and Ortega, 2000). Form- or rule-based techniques revolve around the instruction of grammatical forms.

Grammar processing and second language development

In the grammar-learning process, explicit grammatical knowledge refers to a conscious knowledge of grammatical forms and their meanings. Explicit knowledge is usually accessed slowly, even when it is almost fully automatized (Ellis, 2001b). DeKeyser (1995) characterizes grammatical instruction as ‘explicit’ when it involves the explanation of a rule or the request to focus on a grammatical feature. Instruction can be explicitly deductive, where learners are given rules and asked to apply them, or explicitly inductive, where they are given samples of language from which to generate rules and make generalizations.

Implications for assessing grammar

The studies investigating the effects of teaching and learning on grammatical performance present a number of challenges for language assessment. First of all, the notion that grammatical knowledge structures can be differentiated according to whether they are fully automatized (i.e., implicit) or not (i.e., explicit) raises important questions for the testing of grammatical ability (Ellis, 2001b). Given the many purposes of assessment, we might wish to test explicit knowledge of grammar, implicit knowledge of grammar or both. For example, in certain classroom contexts, we might want to assess the learners’ explicit knowledge of one or more grammatical forms, and could, therefore, ask learners to answer multiple-choice or short-answer questions related to these forms.

The Role Of Grammar In Models Of Communicative Language Ability

The role of grammar in models of communicative competence

Every language educator who has ever attempted to measure a student’s communicative language ability has wondered: ‘What exactly does a student need to “know” in terms of grammar to be able to use it well enough for some real-world purpose?’ In other words, they have been faced with the challenge of defining grammar for communicative purposes. To complicate matters further, linguistic notions of grammar have changed over time, as we have seen, and this has significantly increased number of components that could be called ‘grammar’. In short, definitions of grammar and grammatical knowledge have changed over time and across context, and I expect this will be no different in the future.

Rea-Dickins’ definition of grammar

In discussing more specifically howgrammatical knowledge might be tested within a communicative framework, Rea-Dickins (1991) defined ‘grammar’ as the single embodiment of syntax, semantics and pragmatics. She argued against Canale and Swain’s (1980) and Bachman’s (1990b) multi-componential view of communicative competence on the grounds that componential representations overlook the interdependence and interaction between and among the various components. She further stated that in Canale and Swain’s (1980) model, the notion of grammatical competence was limited since it defined grammar as ‘structure’ on the one hand and as ‘structure and semantics’ on the other, but ignored the notion of ‘structure as pragmatics’. Similarly, she added that in Bachman’s (1990b) model, grammar was defined as structure at the sentence level and as cohesion at the suprasentential level, but this model failed to account for the pragmatic dimension of communicative grammar.

Larsen-Freeman’s definition of grammar

Another conceptualization of grammar that merits attention is Larsen- Freeman’s (1991, 1997) framework for the teaching of grammar in com municative language teaching contexts. Drawing on several linguistic theories and influenced by language teaching pedagogy, she has also characterized grammatical knowledge along three dimensions: linguistic

form, semantic meaning and pragmatic use. Form is defined as both morphology, or how words are formed, and syntactic patterns, or how words are strung together. This dimension is primarily concerned with linguistic accuracy. The meaning dimension describes the inherent or literal message conveyed by a lexical item or a lexico-grammatical feature. This dimension is mainly concerned with the meaningfulness of an utterance.

What is meant by ‘grammar’ for assessment purposes?

Regardless of the assessment purpose, if we wish to make inferences about grammatical ability on the basis of a grammar test or some other form of assessment, it is important to know what we mean by ‘grammar’ when attempting to specify components of grammatical knowledge for measurement purposes. With this goal in mind, we need a definition of grammatical knowledge that is broad enough to provide a theoretical basis for the construction and validation of tests in a number of contexts. At the same time, we need our definition to be precise enough to distinguish it from other areas of language ability.

Towards A Definition Of Grammatical Ability

Defining grammatical constructs

Although our basic underlying model of grammar will remain the same in all testing situations (i.e., grammatical form and meaning), what it means to ‘know’ grammar for different contexts will most likely change (see Chapelle, 1998). In other words, the type, range and scope of grammatical features required to communicate accurately and meaningfully will vary from one situation to another. For example, the type of grammatical knowledge needed to write a formal academic essay would be very different from that needed to make a train reservation. Given the many possible ways of interpreting what it means to ‘know’ grammar, it is important that we define what we mean by ‘grammatical knowledge’ for any given testing situation. A clear definition of what we believe it means to ‘know’ grammar for a particular testing context will then allow us to construct tests that measure grammatical ability.

Definition of key terms

Before continuing this discussion, it might be helpful if I clarified some of the key terms.

Knowledge of phonological or graphological form and meaning

Knowledge of phonological/graphological form enables us to understand and produce features of the sound or writing system, with the exception of meaning-based orthographies such as Chinese characters, as they are used to convey meaning in testing or language-use situations.

Knowledge of lexical form and meaning

Knowledge of lexical form enables us to understand and produce those features of words that encode grammar rather than those that reveal meaning. This includes words that mark gender (e.g., waitress), countability (e.g., people) or part of speech (e.g., relate, relation). For example, when the word think in English is followed by the preposition about before a noun, this is considered the grammatical dimension of lexis, representing a co-occurrence restriction with prepositions. One area of lexical form that poses a challenge to learners of some languages is word formation. This includes compounding in English with a noun + noun or a verb + particle pattern.

Knowledge of morphosyntactic form and meaning

Knowledge of morphosyntactic form permits us to understand and produce both the morphological and syntactic forms of the language. This includes the articles, prepositions, pronouns, affixes (e.g., -est), syntactic structures, word order, simple, compound and complex sentences, mood, voice and modality. A learner who knows the morphosyntactic form of the English conditionals would know that: (1) an if-clause sets up a condition and a result clause expresses the outcome; (2) both clauses can be in the sentence-initial position in English; (3) if can be deleted under certain conditions as long as the subject and operator are inverted; and (4) certain tense restrictions are imposed on if and result clauses.

Knowledge of cohesive form and meaning

Knowledge of cohesive form enables us to use the phonological, lexical and morphosyntactic features of the language in order to interpret and express cohesion on both the sentence and the discourse levels. Cohesive form is directly related to cohesive meaning through cohesive devices (e.g., she, this, here) which create links between cohesive forms and their referential meanings within the linguistic environment or the surrounding co-text. Halliday and Hasan (1976, 1989) list a number of grammatical forms for displaying cohesive meaning.

Knowledge of information management form and meaning

Knowledge of information management form allows us to use linguistic forms as a resource for interpreting and expressing the information structure of discourse. Some resources that help manage the presentation of information include, for example, prosody, word order, tense-aspect and parallel structures. These forms are used to create information management meaning.

Knowledge of interactional form and meaning

Knowledge of interactional form enables us to understand and use linguistic forms as a resource for understanding and managing talk-ininteraction. These forms include discourse markers and communication management strategies. Discourse markers consist of a set of adverbs, conjunctions and lexicalized expressions used to signal certain language functions.

Designing Test Tasks To Measure L2 Grammatical Ability.

How does test development begin?

Every grammar-test development project begins with a desire to obtain (and often provide) information about how well a student knows grammar in order to convey meaning in some situation where the target language is used. The information obtained from this assessment then forms the basis for decision-making. Those situations in which we use the target language to communicate in real life or in which we use it for instruction or testing are referred to as the target language use (TLU) situations (Bachman and Palmer, 1996). Within these situations, the tasks or activities requiring language to achieve a communicative goal are called the target language use tasks.

What do we mean by ‘task’?

The notion of ‘task’ in language-learning contexts has been conceptualized in many different ways over the years. Traditionally, ‘task’ has referred to any activity that requires students to do something for the intent purpose of learning the target language. A task then is any activity (i.e., short answers, role-plays) as long as it involves a linguistic or nonlinguistic (circle the answer) response to input. Traditional learning or teaching tasks are characterized as having an intended pedagogical purpose – which may or may not be made explicit; they have a set of instructions that control the kind of activity to be performed; they contain input (e.g., questions); and they elicit a response.

What are the characteristics of grammatical test tasks?

As the goal of grammar assessment is to provide as useful a measurement as possible of our students’ grammatical ability, we need to design test tasks in which the variability of our students’ scores is attributed to the differences in their grammatical ability, and not to uncontrolled or irrelevant variability resulting from the types of tasks or the quality of the tasks that we have put on our tests. As all language teachers know, the kinds of tasks we use in tests and their quality can greatly influence how students will perform.

The Bachman and Palmer framework

Bachman and Palmer’s (1996) framework of task characteristics represents the most recent thinking in language assessment of the potential relationships between task characteristics and test performance. In this framework, they outline five general aspects of tasks, each of which is characterized by a set of distinctive features. These five aspects describe characteristics of (1) the setting, (2) the test rubrics, (3) the input, (4) the expected response and (5) the relationship between the input and response.

Describing grammar test tasks

When language teachers consider tasks for grammar tests, they call to mind a large repertoire of task types that have been commonly used in teaching and testing contexts. We now know that these holistic task types constitute collections of task characteristics for eliciting performance and that these holistic task types can vary on a number of dimensions. We also need to remember that the tasks we include on tests should strive to match the types of language-use tasks found in real-life or language instructional domains.

Selected-response task types

Selected-response tasks present input in the form of an item, and test takers are expected to select the response. Other than that, all other task characteristics can vary. For example, the form of the input can be language, non-language or both, and the length of the input can vary from a word to larger pieces of discourse. In terms of the response, selected response tasks are intended to measure recognition or recall of grammatical form and/or meaning.

Limited-production task types

Limited-production tasks are intended to assess one or more areas of grammatical knowledge depending on the construct definition. Unlike selected-response items, which usually have only one possible answer, the range of possible answers for limited-production tasks can, at times, be large – even when the response involves a single word.

Developing Tests To Measure L2 Grammatical Ability

What makes a grammar test ‘useful’?

We concluded in the last chapter that the goal of every grammar test was to obtain (and provide) information on how well a student knows or can use grammar to convey meaning in some situation where the target language is used. The responses to the test items can then be used as a basis for assigning scores and for making inferences about the student’s underlying grammatical ability. We discussed these responses in terms of inferences because it is not possible to observe a person’s grammatical ability directly; rather, we must infer the underlying ability from responses to questions or from samples of actual performance.

The quality of reliability

Similarly, the scores from tests or components of tests can also be characterized as being reliable when the tests provide the same results every time we administer them, regardless of the conditions under which they are administered.

The quality of construct validity

The second quality that all ‘useful’ tests possess is construct validity. Bachman and Palmer (1996) define construct validity as ‘the extent to which we can interpret a given test score as an indicator of the ability(ies), or construct(s), we want to measure. Construct validity also has to do with the domain of generalization to which our score interpretations generalize’ (p. 21). In other words, construct validity not only refers to the meaningfulness and appropriateness of the interpretations we make based on test scores, but it also pertains to the degree to which the score-based interpretations can be extrapolated beyond the testing situation to a particular TLU domain (Messick 1993).

The quality of authenticity

A third quality of test usefulness is authenticity, a notion much discussed in language testing since the late 1970s, when communicative approaches to language teaching were first taking root. Building on these discussions, Bachman and Palmer (1996) refer to ‘authenticity’ as the degree of correspondence between the test-task characteristics and the TLU task characteristics. Given the framework for test-task characteristics discussed in Chapter 5, they provide a systematic way of matching test tasks with TLU tasks in terms of the features of the test setting, rubrics, input, expected response and the relationship between the input and response.

The quality of interactiveness

A fourth quality of test usefulness outlined by Bachman and Palmer (1996) is interactiveness. This quality refers to the degree to which the aspects of the test-taker’s language ability we want to measure (e.g., grammatical knowledge, language knowledge) are engaged by the test task characteristics (e.g, the input response, and relationship between the input and response) based on the test constructs. In other words, the task should engage the characteristics we want to measure (e.g., grammatical knowledge) given the test purpose, and nothing else (e.g., topical knowledge, affective schemata); otherwise, this may mask the very constructs we are trying to measure. In the case of grammar assessment, test tasks can be characterized as ‘interactive’ to the extent that they require individuals to draw on and manage their cognitive and metacognitive strategies (i.e., their strategic competence) in order to use grammatical knowledge accurately and meaningfully.

The quality of impact

Testing plays an important role in society. Tests serve as gate-keeping devices or doors to opportunity. They can be used to punish and to praise. It is, therefore, important to recognize that tests reflect and represent the social, cultural and political values of any given society, and in the evaluation of test usefulness, we must take into consideration the possible consequences that may ensue from the decision to use test results for decision-making. Bachman and Palmer (1996) refer to the degree to which testing and test score decisions influence all aspects of society and the individuals within that society as test impact.

The quality of practicality

Scores from a grammar test could be highly reliable and provide a basis for making valid inferences, but at the same time completely lacking in practicality. It may be completely beyond our means with respect to the available human, material or time resources. Test practicality is not a quality of a test itself, but is a function of the extent to which we are able to balance the costs associated with designing, developing, administering, and scoring a test in light of the available resources (Bachman, personal communication, 2002).

Overview of grammar-test construction

Each testing situation is specific unto itself, with a specific purpose, a specific audience and a specific set of parameters that will affect the test design and development process. As a result, there is no one ‘right’ way to develop a test; nor are there any recipes for ‘good’ tests that could generalize to all situations. There are, however, several frameworks of test development that have been proposed (e.g., Alderson, Clapham and Wall, 1995; Bachman and Palmer, 1996; Brown, 1996; Davidson and Lynch, 2002) which serve to guide the test-development process so that the qualities of test usefulness will not be ignored.

Illustrative Tests Of Grammatical Ability

The First Certificate in English Language Test (FCE)

Purpose

The First Certificate in English (FCE) exam was first developed by the University of Cambridge Local Examinations Syndicate (UCLES, now Cambridge ESOL) in 1939 and has been revised periodically ever since. This exam is the most widely taken Cambridge ESOL examination with an annual candidature of over 270,000 (see http://www.cambridgeesol.org/ exam/index.cfm). The purpose of the FCE (Cambridge ESOL, 2001a) is to assess the general English language proficiency of learners as measured by their abilities in reading, writing, speaking, listening, and knowledge of the lexical and grammatical systems of English (Cambridge ESOL, 1995, p. 4). More specifically, the FCE is a level-three exam in the Cambridge main suite of exams, and consists of five compulsory subtests or ‘papers’: reading, writing, use of English, listening and speaking (Cambridge ESOL, 1996, p. 8). Students who pass the FCE are assumed to have sufficient proficiency to handle routine office jobs (clerical, managerial) and to take courses given in English (Cambridge ESOL, 2001a, p. 6). Given that the FCE can be used as certification of English language proficiency for certain types of jobs, it is considered a high-stakes test.

The Comprehensive English Language Test (CELT)

Purpose

The Comprehensive English Language Test (CELT) (Harris and Palmer, 1970a, 1986) was designed to measure the English language ability of nonnative speakers of English. The authors claim in the technical manual (Harris and Palmer, 1970b) that this test is most appropriate for students at the intermediate or advanced levels of proficiency. English language proficiency is measured by means of a structure subtest, a vocabulary subtest and a listening subtest. According to the authors, these subtests can be used alone or in combination (p. 1). Scores from the CELT have been used to make decisions related to placement in a language program, acceptance into a university and achievement in a language course (Harris and Palmer, 1970b, p. 1), and for this reason, it may be considered a high-stakes test. One or more subtests of the CELT have also been used as a measure of English language proficiency in SLA research.

Learning-Oriented Assessments Of Grammatical Ability

What is learning-oriented assessment of grammar?

In reaction to conventional testing practices typified by large-scale, discrete- point, multiple-choice tests of language ability, several educators (e.g., Herman, Aschbacher and Winters, 1992; Short, 1993; Shohamy, 1995; Shepard, 2000) have advocated reforms so that assessment practices might better capture educational outcomes and might be more consistent with classroom goals, curricula and instruction.

Implementing learning-oriented assessment of grammar

Considerations from grammar-testing theory

The development procedures for constructing large-scale assessments of grammatical ability discussed in Chapter 6 are similar to those needed to develop learning-oriented assessments of grammar for classroom purposes with the exception that the decisions made from classroom assessments will be somewhat different due to the learning-oriented mandate of classroom assessment. Also, given the usual low-stakes nature of the decisions in classroom assessment, the amount of resources that needs to be expended is generally less than that required for large-scale assessment. In this section, without repeating what was discussed in Chapter 6, I will highlight some of the implications this mandate might have for test design and operationalization.

Considerations from L2 learning theory

Given that learning-oriented assessment involves the collection and interpretation of evidence about performance so that judgments can be made about further language development, learning-oriented assessment of grammar needs to be rooted not only in a theory of grammar testing or language proficiency, but also in a theory of L2 learning. What is striking in the literature is that models of language ability rarely refer to models of language learning, and models of language learning rarely make reference to models of language ability. In learning-oriented assessment, the consideration of both perspectives is critical.

Illustrative example of learning-oriented assessment

Let us now turn to an illustration of a learning-oriented achievement test of grammatical ability.

Making assessment learning-oriented

The On Target achievement tests were designed with a clear learning mandate. The content of the tests had to be strictly aligned with the content of the curriculum. This obviously had several implications for the test design and its operationalization. From a testing perspective, the primary purpose of the Unit 7 achievement test was to measure the students’ explicit as well as their implicit knowledge of grammatical form and meaning on both the sentence and discourse levels.

Challenges and new directions in assessing grammatical ability

The state of grammar assessment

In the last fifty years, language testers have dedicated a great deal of time to discussing the nature of language proficiency and the testing of the four skills, the qualities of test usefulness (i.e., reliability, authenticity), the relationships between test-taker or task characteristics and performance, and numerous statistical procedures for examining data and providing evidence of test validity

Challenge 1: Defining grammatical ability

One major challenge revolves around how grammatical ability has been defined both theoretically and operationally in language testing. As we saw in Chapters 3 and 4, in the 1960s and 1970s language teaching and language testing maintain eda strong syntax to centric view of language rooted largely in linguistic structuralism. Moreover, models of language ability, such as those proposed by Lado (1961) and Carroll (1961), had a clear linguistic focus, and assessment concentrated on measuring language elements – defined in terms of morphosyntactic forms on the sentence level – while performing language skills.

Challenge 2: Scoring grammatical ability

A second challenge relates to scoring, as the specification of both form and meaning is likely to influence the ways in which grammar assessments are scored. As we discussed in Chapter 6, responses with multiple criteria for correctness may necessitate different scoring procedures. For example, the use of dichotomous scoring, even with certain selected response items, might need to give way to partial-credit scoring, since some wrong answers may reflect partial development either in form or meaning. As a result, language educators might need to adapt their scoring procedures to reflect the two dimensions of grammatical knowledge.

Challenge 3: Assessing meanings

The third challenge revolves around ‘meaning’ and how ‘meaning’ in a model of communicative language ability can be defined and assessed. The ‘communicative’ in communicative language teaching, communicative language testing, communicative language ability, or communicative competence refers to the conveyance of ideas, information, feelings, attitudes and other intangible meanings (e.g., social status) through language.

Challenge 4: Reconsidering grammar-test tasks

The fourth challenge relates to the design of test tasks that are capable of both measuring grammatical ability and providing authentic and engaging measures of grammatical performance. Since the early 1960s, language educators have associated grammar tests with discrete-point, multiple-choice tests of grammatical form. These and other ‘traditional’ test tasks (e.g., grammaticality judgments) have been severely criticized for lacking in authenticity, for not engaging test-takers in language use, and for promoting behaviors that are not readily consistent with communicative language teaching.

Challenge 5: Assessing the development of grammatical

Ability

The fifth challenge revolves around the argument, made by some researchers, that grammatical assessments should be constructed, scored and interpreted with developmental proficiency levels in mind. This notion stems from the work of several SLA researchers (e.g. Clahsen, 1985; Pienemann and Johnson, 1987; Ellis, 2001b) who maintain that the principal finding from years of SLA research is that structures appear to be acquired in a fixed order and a fixed developmental sequence. Furthermore, instruction on forms in non-contiguous stages appears to be ineffective. As a result, the acquisitional development of learners, they argue, should be a major consideration in the L2 grammar testing.

Final remarks

Despite loud claims in the 1970s and 1980s by a few influential SLA researchers that instruction, and in particular explicit grammar instruction, had no effect on language learning, most language teachers around the world never really gave up grammar teaching. Furthermore, these claims have instigated an explosion of empirical research in SLA, the results of which have made a compelling case for the effectiveness of certain types of both explicit and implicit grammar instruction. This research has also highlighted the important role that meaning plays in learning grammatical forms.

Assessing Vocabulary

The Place Of Vocabulary In Language Assessment

Recent trends in language testing

However, scholars in the field of language testing have a rather different perspective on vocabulary-test items of the conventional kind. Such items fit neatly into what language testers call the discrete point approach to testing. This involves designing tests to assess whether learners have knowledge of particular structural elements of the language: word meanings, word forms, sentence patterns, sound contrasts and so on. In the last thirty years of the twentieth century, language testers progressively moved away from this approach, to the extent that such tests are now quite out of step with current thinking about how to design language tests, especially for proficiency assessment.

Three dimensions of vocabulary assessment

Up to this point, I have outlined two contrasting perspectives on the role of vocabulary in language assessment. One point of view is that it is perfectly sensible to write tests that measure whether learners know the meaning and usage of a set of words, taken as independent semantic units. The other view is that vocabulary must always be assessed in the context of a language-use task, where it interacts in a natural way with other components of language knowledge. To some extent, the two views are complementary in that they relate to different purposes of assessment.

Discrete - embedded

The first dimension focuses on the construct which underlies the assessment instrument. In language testing, the term construct refers to the mental attribute or ability that a test is designed to measure. In the case of a traditional vocabulary test, the construct can usually be labelled as `vocabulary knowledge' of some kind. The practical significance of defining the construct is that it allows us to clarify the meaning of the test results. Normally we want to interpret the scores on a vocabulary test as a measure of some aspect of the learners' vocabulary knowledge, such as their progress in learning words from the last several units in the course book, their ability to supply derived forms of base words (like scientist and scientific, from science), or their skill at inferring the meaning of unknown words in a reading passage. Thus, a discrete test takes vocabulary knowledge as a distinct construct, separated from other components of language competence.

Selective - comprehensive

The second dimension concerns the range of vocabulary to be included in the assessment. A conventional vocabulary test is based on a set of target words selected by the test-writer, and the test-takers are assessed according to how well they demonstrate their knowledge of the meaning or use of those words. This is what I call a selective vocabulary measure. The target words may either be selected as individual words and then incorporated into separate test items, or alternatively the test-writer first chooses a suitable text and then uses certain words from it as the basis for the vocabulary assessment.

Context-independent - context-dependent

The role of context, which is an old issue in vocabulary testing, is the basis for the third dimension. Traditionally contextualisation has meant that a word is presented to test-takers in a sentence rather than as an isolated element. From a contemporary perspective, it is necessary to broaden the notion of context to include whole texts and, more generally, discourse. In addition, we need to recognise that contextualisation is more than just a matter of the way in which vocabulary is presented. The key question is to what extent the test takers are being assessed on the basis of their ability to engage with the context provided in the test.

An overview of the book

The three dimensions are not intended to form a comprehensive model of vocabulary assessment. Rather, they provide a basis for locating the variety of assessment procedures currently in use within a common framework and, in particular, they offer points of contact between tests which treat words as discrete units and ones that assess vocabulary more integratively in a task-based testing context. At various points through the book I refer to the dimensions and exemplify them. Since a large proportion of work on vocabulary assessment to date has involved instruments which are relatively discrete, selective and context independent in nature, this approach may seem to be predominant in several of the following chapters. However, my aim is to present a balanced view of the subject, and I discuss measures that are more embedded, comprehensive and context dependent wherever the opportunity arises, and especially in the last two chapters of the book.

References:

Purpura, james. 2004. ASSESSING GRAMMAR. United Kingdom: University Press Cambridge.

Read, John. 2000. ASSESSING VOCABULARY. United Kingdom: University Press Cambridge.

Summary Assessing Reading and Assessing Writing from book Language Assessment Principles and Classroom Pratices

Assessing Reading

TYPES (GENRES) OF READING

Each type or genre of written text has its own set of governing rules and conventions. A reader must be able to anticipate those conventions in order to process meaning efficiently With an extraordinary number of genres present in any literate culture, the reader's ability to process texts must be very sophisticated. Consider the following abridged list of common genres, which ultimately form part of the specifications for assessments of reading ability.

Genres of reading

1. Academic reading general interest articles (in magazines, newspapers, etc.) technical reports (e.g., lab reports), professional journal articles reference material (dictionaries, etc.) textbooks, these essays, papers test directions editorials and opinion writing

2. Job-related reading messages (e.g., phone messages) letters/emails memos (e.g., interoffice) reports (e.g., job evaluations, project reports) schedules, labels, signs, announcements forms, applications, questionnaires financial documents (bills, invoices, etc.) directories (telephone, office, etc.) manuals, directions

3. Personal reading newspapers and magazines letters, emails, greeting cards, invitations messages, notes, lists schedules (train, bus, plane, etc.) recipes, menus, maps, calendars advertisements (commercials, want ads) novels, short stories, jokes, drama, poetry financial documents (e.g., checks, tax forms, loan applications) forms, questionnaires, medical reports, immigration documents comic strips, cartoons.

MICROSKILLS, MACROSKILLS, AND STRATEGIES FOR READING

Aside from attending to genres of text, the skills and strategies for accomplishing reading emerge as a crucial consideration in the assessment of reading ability. Ellie micro- and macroskills below represent the spectrum of possibilities for objectives in the assessment of reading comprehension.

Micro- and macroskil[s for reading comprehension

Ø Microskills

1. Discriminate among the distinctive graphemes and orthographic patterns of English.

2. Retain chunks of language of different lengths in short-term memory,

3. Process writing at an efficient rate of speed to suit the purpose.

4. Recognize a core of words, and interpret word order patterns and their significance.

5. Recognize grammatical word classes (nouns, verbs, etc.), systems (e.g., tense, agreement, pluralization), patterns, rules, and elliptical forms.

6. Recognize that a particular meaning may be expressed in different grammatical forms.

7. Recognize cohesive devices in written discourse and their role in signaling the relationship between and among clauses.

Ø Macroskills

8. Recognize the rhetorical forms of written discourse and their significance for interpretation.

9. Recognize the communicative functions of written texts, according to form and purpose.

10. Infer context that is not explicit by using background knowledge.

11. From described events, ideas, etc., infer links and connections between events, deduce causes and effects, and detect such relations as main idea, supporting idea, new information, given information, generalization, and exemplification.

12. Distinguish between literal and implied meanings.

13. Detect culturally specific references and interpret them in a context of the appropriate cultural schemata.

14. Develop and use a battery of reading strategies, such as scanning and skimming, detecting discourse markers, guessing the meaning of words from context. and activating schemata for the interpretation of texts.

TYPES OF READING

1. Perceptive. In keeping with the set of categories specified for listening comprehension, similar specifications are offered here, except with some differing terminology to capture the uniqueness of reading. Perceptive reading tasks involve attending to the components of larger stretches of discourse: letters, words, punctuation, and other graphemic symbols. Bottom-up processing is implied.

2. Selective. This category is largely an artifact of assessment formats. In order to ascertain one's reading recognition of lexical, grammatical, or discourse features of language within a very short stretch of language, certain typical tasks are used: picture-cued tasks, matching, true/false, multiple-choice, etc. Stimuli include sentences, brief paragraphs, and simple charts and graphs. Brief responses are intended as well. A combination of bottom-up and top-down processing may be used.

3. Interactive. Included among interactive reading types are stretches of lan guage of several paragraphs to one page or more in which the reader must, in a psycholinguistic sense, interact with the text. That is, reading is a process of negotiating meaning; the reader brings to the text a set of schemata for understanding it, and intake is the product of that interaction, Typical genres that lend themselves to interactive reading are anecdotes, short narratives and descriptions, excerpts from longer texts, questionnaires, memos, announcements, directions, recipes, and the like.nie focus of an interactive task is to identify relevant features (lexical, symbolic, grammatical, and discourse) within texts of moderately short length with the objective of retaining the information that is processed. Top-down processing is typical of such tasks, although some instances of bottom-up performance may be necessary.

4. Extensive. Extensive reading. as discussed in this book, applies to texts of more than a page, up to and including professional articles, essays. technical reports, short stories, and books.

DESIGNING ASSESSMENT TASKS: PERCEPTIVE READING

Reading Aloud

The test-taker sees separate letters, words, and/or short sentences and reads aloud, one by one, in the presence of an administrator. Since the assessment reading comprehension, any recognizable oral approximation of the response is considered correct.

Written Response

The same stimuli are presented, and the test-taker's task is to reproduce the probe in writing. Because of the transfer across different skills here, evaluation of the testtaker's response must be carefully treated. If an error occurs, make sure you determine its source; what might be assumed to be a writing error, for example, may actually be a reading error, and vice versa.

Multiple-Choice

Multiple-choice responses are not only a matter of choosing one of four or five pos. Sible answers. Other formats, some of which are especially useful at the low levels of reading, include same/different, circle the answer, true/false, choose the letter, and matching. Here are some possibilities.

Picture-Cued Items

Test-takers are shown a picture, such as the one on the next page, along with a written text and are given one of a number of possible tasks to perform.

DESIGNING ASSESSMENT TASKS: SELECTIVE READING

Multiple-Choice (for Form-Focused Criteria)

By far the most popular method of testing a reading knowledge of vocabulary and grammar is the multiple-choice format, mainly for reasons of practicality: it is easyto administer and can be scored quickly The most straightforward multiple-choice items may have little context, but might serve as a vocabulary or grammar check.

Matching Tasks

At this selective level of reading, the test-taker's task is simply to respond correctly, which makes matching an appropriate format. The most frequently appearing criterion in matching procedures is vocabulary.

Editing Tasks

Editing for grammatical or rhetorical errors is a widely used test method for assessing linguistic competence in reading. The TOEFLS and many other tests employ this technique with the argument that it not only focuses on grammar but also introduces a simulation of the authentic task of editing, or discerning errors in written passages. Its authenticity may be supported if you consider proof reading as a real-world skill that is being tested. Here is a typical set of examples of editing.

Picture-Cued Tasks

In the previous section we looked at picture-cued tasks for perceptive recognition of symbols and words. Pictures and photographs may be equally well utilized for examining ability at the selective level. Several types of picture-cued methods are commonly used.

1. Test-takers read a sentence or passage and choose one of four pictures that is being described. The sentence (or sentences) at this level is more complex.

2. Test-takers read a series of sentences or definitions, each describing a Iabeled part of a picture or diagram. Their task is to identify each labeled item. In the following diagram, test-takers do not necessarily know each term, but by reading the definition they are able to make an identification.

Gap-Filling Tasks

Many of the multiple-choice tasks described above can be converted into gap-filling or "fill-in-the-blank," items in which the test-taker's response is to write a word phrase. An extension of simple gap-filling tasks is to create sentence completion items where test-takers read part of a sentence and then complete it by writing phrase.

DESIGNING ASSESSMENT TASKS; INTERACTIVE READING

Cloze Tasks

One of the most popular types of reading assessment task is the cloze procedure. The word cloze was coined by educational psychologists to capture the Gestalt psychological concept of "closure," that is, the ability to fill in gaps in an incomplete image (visual, auditory, or cognitive) and supply (from background schemata) omitted details. Cloze tests are usually a minimum of two paragraphs in length in order to account for discourse expectancies. They can be constructed relatively easily as long as the specifications for choosing deletions and for scoring are clearly defined. Typically every seventh word (plus or minus two) is deleted (known as fixed-ratio deletion), but many cloze test designers instead use a rational deletion procedure of choosing deletions according to the grammatical or discourse functions of the words.

Impromptu Reading Plus Comprehension Questions

If cloze testing is the most-researched procedure for assessing reading, the tional "Read a passage and answer some questions" technique is undoubted* oldest and the most common. Virtually every proficiency test uses the forms, one would rarely consider assessing reading without some component of the ment involving impromptu reading and responding to questions.

Short-Answer Tasks

Multiple-choice items are difficult to construct and validate, and classroom teachers rarely have time in their busy schedules to design such a test. A popular alternativeto multiple-choice questions following reading passages is the age-old short-answer format. A reading passage is presented. and the test-taker reads questions that must be answered in a sentence or two. Questions might cover the same specifications indicated above for the TOEFL reading, but be worded in question form.

Editing (Longer Texts)

The previous section of this chapter (on selective reading) described editing tasks, but there the discussion was limited to a list of unrelated sentences. each presented with an error to be detected by the test-taker The same technique has been applied successfully to longer passages of 200 to 300 words, Several advantages are gained in the longer format

Scanning

Scanning is a strategy used by all readers to find relevant information in a text. Assessment of scanning is carried out by presenting test-takers with a text (prose or something in a chart or graph format) and requiring rapid identification of relevant bits of information. Possible stimuli include

a. a one- to two-page news article,

b. an essay, a chapter in a textbook,

c. a technical report;

d. a table or chart depicting some research findings, a menu, and

e. an application form.

Ordering Tasks

Students always enjoy the activity of receiving little strips of paper, each with a sentence on it, and assembling them into a story, sometimes called the "strip story" technique. Variations on this can serve as an assessment of overall global understanding of a story and of the cohesive devices that signal the order of events or ideas.

Information Transfer: Reading Charts, Maps, Graphs, Diagrams

Every educated person must be able to comprehend charts, maps, graphs, calendars diagrams, and the like. Converting such nonverbal input into comprehensible intake requires not only an understanding of the graphic and verbal conventions of medium but also a linguistic ability to interpret that information to someone else Reading a map implies understanding the conventions of map graphics, but it often accompanied by telling someone where to turn, how far to go. etc Scanning a menu requires an ability to understand the structure of most menus as well as thx capacity to give an order when the time comes. Interpreting the numbers on a stock market report involves the interaction of understanding the numbers and of conveying that understanding to others.

DESIGNING ASSESSMENT TASKS: EXTENSIVE READING

Skimming Tasks

Skimming is the process of rapid coverage of reading matter to determine its gist or main idea. It is a prediction strategy used to give a reader a sense of the topic and purpose of a text, the organization of the text, the perspective or point of view of the writer, its ease or difficulty, and/or its usefulness to the reader. Of course skimming can apply to texts of less than one page, so it would be wise not to confine this type of task just to extensive texts.

Summarizing and Responding

As you can readily see, a strict adherence to the criterion of assessing reading, and trading only, implies consideration of only the first factor; the other three pertain to writing performance. The first criterion is nevertheless a crucial factor; otherwise the reader-writer could pass all three of the other criteria with virtually no understanding of the text itself. Evaluation of the reading comprehension criterion will necessity remain somewhat subjective because the teacher will need to determine degrees of fulfillment of the objective (see below for more about scoring this task).

Note-Taking and Outlining

Finally, a reader's comprehension of extensive texts may be assessed through an evaluation of a process of note-taking and/or outlining. Because of the difficulty Of controlling the conditions and time frame for both these techniques, they rest firmly in the category of informal assessment. Their utility is in the strategic training that learners gain in retaining information through marginal notes that highlight key information or organizational outlines that put supporting ideas into a visually manageable framework. A teacher, perhaps in one-on-one conferences with students, can use student notes/outlines as indicators of the presence or absence of effective reading strategies, and thereby point the learners in positive directions.

Assessing Writing

GENRES OF WRITTEN IANGUAGE

The same classification scheme is reformulated here to include the most common genres that a second language writer might produce, within and beyond the requirements of a curriculum. Even though this list is slightly shorter, you should be aware of the surprising multiplicity of options of written genres that second language learners need to acquire.

TYPES OF WRITING PERFORMANCE

Four categories of written performance that capture the range of written production are considered here. Each category resembles the categories defined for the other three skills, but these categories, as always, reflect the uniqueness of the skill area.

1. Imitative. To produce written language, the learner must attain skills in the fundamental, basic tasks of writing letters, words, punctuation, and very brief sentences. This category includes the ability to spell correctly and to perceive phoneme-grapheme correspondences in the English spelling system. It is a level at which learners are trying to master the mechanics of writing. At this stage, form is the primary if not exclusive focus, while context and meaning are of secondary concern

2. Intensive (controlled). Beyond the fundamentals of imitative writing are skills in producing appropriate vocabulary within a context, collocations and idioms and correct grammatical features up to the length of a sentence. Meaning and cone text are of some importance in determining correctness and appropriateness, but most assessment tasks are more concerned with a focus on form, and are rather strictly controlled by the test design.

3. Responsive. Here, assessment tasks require learners to perform at a limited discourse level, connecting sentences into a paragraph and creating a logically connected sequence of two or three paragraphs. Tasks respond to pedagogical directives, lists of criteria, outlines, and other guidelines

4. Extensive. Extensive writing implies successful management of all processes and strategies of writing for all purposes, up to the length of an essay term paper, a major research project report, or even a thesis. Writers focus on achieving a purpose, organizing and developing ideas logically, using details to support illustrate ideas, demonstrating syntactic and lexical variety, and in many cases. gaging in the process of multiple drafts to achieve a final product.

MICRO- AND MACROSKILIS OF WRITING

Ø Microskills

1. Produce graphemes and orthographic patterns of English.

2. Produce writing at an efficient rate of speed to suit the purpose.

3. Produce an acceptable core of words and use appropriate word order patterns.

4. Use acceptable grammatical systems (e.g., tense, agreement, pluralization), patterns, and rules.

5. Express a particular meaning in different grammatical forms.

6. Use cohesive devices in written discourse.

Ø Macroskills

7. Use the rhetorical forms and conventions of written discourse.

8. Appropriately accomplish the communicative functions of written texts according to form and purpose,

9. Convey links and connections between events, and communicate such relations as main idea, supporting idea, new information, given information, generalization, and exemplification.

10. Distinguish between literal and implied meanings when writing.

11. Correctly convey culturally specific references in the context of the written text.

12. Develop and use a battery of writing strategies, such as accurately assessing the audience's interpretation, using prewriting devices, writing with fluency in the first drafts, using paraphrases and synonyms, soliciting peer and instructor feedback, and using feedback for revising and editing.

DESIGNING ASSESSMENT TASKS: IMITATIVE WRITING

Tasks in [Handl Writing Letters, Words, and Punctuation

First, a comment should be made on the increasing use of personal and laptop computers and handheld instruments for creating written symbols. Handwriting has the potential of becoming a lost art as even very young children are more and more likely to use a keyboard to produce writing. Making the shapes of letters and other symbols is now more a question of learning typing skills than of training the muscles of the hands to use a pen or pencil. Nevertheless, for all practical purposes, handwriting remains a skill of paramount importance within the larger domain of language assessment.

Spelling Tasks and Detecting Phoneme— Grapheme Correspondences

1. Spelling tests. In a traditional, old-fashioned spelling test, the teacher dictates a simple list of words, one word at a time, followed by the word in a sentence repeated again, with a pause for test-takers to write the word. Scoring emphasize correct spelling. You can help to control for listening errors by choosing words the students have encountered before—words that they have spoken or heard their class.

2. Picture-cued tasks. Pictures are displayed with the objective of focusing familiar words whose spelling may be unpredictable. Items are chosen according to the objectives of the assessment, but this format is an opportunity to present some challenging words and word pairs: boot/book, read/reed, bit/bite, etc.

3. Multiple-choice techniques. Presenting words and phrases in the form of 2 multiple-choice task risks crossing over into the domain of assessing reading, but if the items have a follow-up writing component, they can serve as formative reinforcement of spelling conventions.

4. Matching phonetic symbols. If students have become familiar with the phonetic alphabet, they could be shown phonetic symbols and asked to write the correctly spelled word alphabetically.

DESIGNING ASSESSMENT TASKS: INTENSIVE (CONTROLLED) WRITING

Dictation and Dicto-Comp

Dictation is simply the rendition in writing of what one hears aurally, so it could be classified as an imitative type of writing, especially since a proportion of the testtaker's performance centers on correct spelling. Also, because the test-taker must listen to stretches of discourse and in the process insert punctuation, dictation of a paragraph or more can arguably be classified as a controlled or intensive form of writing.

Grammatical Transformation Tasks

In the heyday of structural paradigms of language teaching with slot-filler techniques and slot substitution drills, the practice of making grammatical transformations—orally or in writing—was very popular. To this day, language teachers have also used this technique as an assessment task, ostensibly to measure grammatical competence Numerous versions of the task are possible:

1. Change the tenses in a paragraph.

2. Change full forms of verbs to reduced forms (contractions). Change statements to yes/no or wb-questions.

3. Change questions into statements.

4. Combine two sentences into one using a relative pronoun.

5. Change direct speech to indirect speech.

6. Change from active to passive voice.

Picture-Cued Tasks

A variety of picture-cued controlled tasks have been used in English classroom' around the world. The main advantage in this technique is in detaching the alm ubiquitous reading and writing connection and offering instead a nonverbal means to stimulate written responses.

Vocabulary Assessment Tasks

Most vocabulary study is carried out through reading. A number of assessments reading recognition of vocabulary were discussed in the previous chapter: multiple choice techniques, matching, picture-cued identification, cloze techniques, guessing the meaning of a word in context, etc. The major techniques used to assess voce, vocabulary are (a) defining and (b) using a word in a sentence. The latter is the mc authentic, but even that task is constrained by a contrived situation in which t: test-taker, usually in a matter of seconds, has to come up with an appropriate sentence, which may or may not indicate that the test-taker "knows" the word.

Ordering Tasks

One task at the sentence level may appeal to those who are fond of word games and puzzles: ordering (or reordering) a scrambled set of words into a correct sentence Here is the way the item format appears. While this somewhat inauthentic task generates writing performance and may said to tap into grammatical word-ordering rules, it presents a challenge to test takers whose learning styles do not dispose them to logical-mathematical problem solving. If sentences are kept very simple with perhaps no more the four or five words, if only one possible sentence can emerge, and if students have practiced the technique in class, then some justification emerges. But once again in so many writing techniques, this task involves as much, if not more, reading performance as writing.

Short-Answer and Sentence Completion Tasks

Some types of short-answer tasks were discussed in Chapter 8 because of the heavy participation of reading performance in their completion. Such items range from very simple and predictable to somewhat more elaborate responses. Look at the range of possibilities. The reading-writing connection is apparent in the first three item types but has less of an effect in the last three, where reading is necessary in order to understand the directions but is not crucial in creating sentences. Scoring on a 2-1-0 scale (a described above) may be the most appropriate way to avoid self-arguing about the appropriateness of a response.

ISSUES IN ASSESSING RESPONSIVE AND EXTENSIVE WRITING

Responsive writing creates the opportunity for test-takers to offer an array of possible creative responses within a pedagogical or assessment framework: test-taker are "responding" to a prompt or assignment. Freed from the strict control of intensive writing, learners can exercise a number of options in choosing vocabulary, grammar, and discourse, but with some constraints and conditions. Criteria now begin to include the discourse and rhetorical conventions of paragraph structure and of connecting two or three such paragraphs in texts of limited length. The learner is responsible for accomplishing a purpose in writing, for developing a sequence of connected ideas, and for empathizing with an audience.

The genres of text that are typically addressed here are

a. short reports (with structured formats and conventions);

b. responses to the reading of an article or story;

c. summaries of articles or stories;

d. brief narratives or descriptions; and

e. interpretations of graphs, tables, and charts.

DESIGNING ASSESSMENT TASKS: RESPONSIVE AND EXTENSIVE WRITING

Paraphrasing

One of the more difficult concepts for second language learners to grasp is paraphrasing. The initial step in teaching paraphrasing is to ensure that learners understand the importance of paraphrasing: to say something in one's own words, to avoid plagiarizing, to offer some variety in expression. With those possible motivations and purposes in mind, the test designer needs to elicit a paraphrase of a sentence or paragraph, usually not more.

Guided Question and Answer

Another lower-order task in this type of writing, which has the pedagogical ben of guiding a learner without dictating the form of the output, is a guided questiorr and-answer format in which the test administrator poses a series of questions that essentially serve as an outline of the emergent written text. In the writing of a narrative that the teacher has already covered in a class discussion, the following kinds of questions might be posed to stimulate a sequence of sentences.

Paragraph Construction Tasks

The participation of reading performance is inevitable in writing effective paragraphs. To a great extent, writing is the art of emulating what one reads.You read an effective paragraph; you analyze the ingredients of its success; you emulate it. Assessment of paragraph development takes on a number of different forms:

1. Topic sentence writing. There is no cardinal rule that says every paragraph must have a topic sentence, but the stating of a topic through the lead sentence (or a subsequent one) has remained as a tried-and-true technique for teaching the concept of a paragraph. Assessment thereof consists of

• specifying the writing of a topic sentence,

• scoring points for its presence or absence, and

• scoring and/or commenting on its effectiveness in stating the topic.

2. Topic development within a paragraph. Because paragraphs are intended to provide a reader with "clusters" of meaningful, connected thoughts or ideas, another stage of assessment is development of an idea within a paragraph. Four criteria are commonly applied to assess the quality of a paragraph:

• the clarity of expression of ideas

• the logic of the sequence and connections

• the cohesiveness or unity of the paragraph

• the overall effectiveness or impact of the paragraph as a whole

3. Development of main and supporting ideas across paragraphs. As writers string two or more paragraphs together in a longer text (and as we move up the continuum from responsive to extensive writing), the writer attempts to articulate a thesis or main idea with clearly stated supporting ideas These elements can be considered in evaluating a multi-paragraph essay:

• addressing the topic, main idea, or principal purpose

• organizing and developing supporting ideas

• using appropriate details to undergird supporting ideas

• showing facility and fluency in the use of language demonstrating syntactic variety

Strategic Options

Developing main and supporting ideas is the goal for the writer attempting to create an effective text, whether a short one- to two-paragraph one or an extensive one of several pages. A number of strategies are commonly taught to second language writers to accomplish their purposes. Aside from strategies of free writing, outlining, drafting, and revising, writers need to be aware of the task that has been demanded and to focus on the genre of writing and the expectations of that genre.

TEST OF WRITTEN ENGLISH (TWE)

The TWE is in the category of a timed impromptu test in that test-takers are under a 30-minute time limit and are not able to prepare ahead of time for the topic that will appear. Topics are prepared by a panel Of experts following specifications for topics that represent commonly used discourse and thought patterns at the university level. Here are some sample topics published on the IVE website.

SCORING METHODS FOR RESPONSIVE AND EXTENSIVE WRITING

Holistic Scoring

The TWE scoring scale above is a prime example of holistic scoring. In Chapter 7, a rubric for scoring oral production holistically was presented. Each point on a holistic scale is given a systematic set of descriptors, and the reader-evaluator matches an overall impression with the descriptors to arrive at a score. Descriptors usually (but not always) follow a prescribed pattern. For example, the first descriptor across all score categories may address the quality of task achievement, the second may deal with organization, the third with grammatical or rhetorical considerations, and so on. Scoring, however, is truly holistic in that those subsets are not quantitatively added up to yield a score.

Primary Trait Scoring

A second method of scoring, primary trait, focuses on "how well students cc write within a narrowly defined range of discourse" (Weigle, 2002, p. 110).This type of scoring emphasizes the task at hand and assigns a score based on the effective ness of the text's achieving that one goal. For example, if the purpose or function for an essay is to persuade the reader to do something, the score for the writing would rise or fall on the accomplishment of that function. If a learner is asked to exploit the imaginative function of language by expressing personal feelings, then the response would be evaluated on that feature alone.

Analytic Scoring

Primary trait scoring focuses on the principal function of the text and therefore offers some feedback potential, but no washback for any of the aspects of the written production that enhance the ultimate accomplishment of the purpose. Classroom evaluation of learning is best served through analytic scoring, in which as many as six major elements of writing are scored, thus enabling learners to home in on weaknesses and to capitalize on strengths. Analytic scoring may be more appropriately called analytic assessment in order to capture its closer association with classroom language instruction than with formal testing. Brown and Bailey (1984) designed an analytical scoring scale that specified five major categories and a description of five different levels in each category, ranging from "unacceptable" to "excellent".

BEYOND SCORING: RESPONDING TO EXTENSIVE WRITING

Formal testing carries with it the burden of designing a practical and reliable instrument that assesses its intended criterion accurately. To accomplish that mission designers of writing tests are charged with the task of providing as "objective scoring procedure as possible, and one that in many cases can be easily interpreted by agents beyond the learner. Holistic, primary trait, and analytic scoring all satisfy those ends. Yet beyond mathematically calculated scores lies a rich domain of assessment in which a developing writer is coached from stage to stage in a process 'i building a storehouse of writing skills. Here in the classroom, in the tutored relationship of teacher and student, and in the community of peer learners, most of hard work of assessing writing is carried out. Such assessment is informal, formative and replete with washback.

Assessing Initial Stages of the Process of Composing

Following are some guidelines for assessing the initial stages (the first draft or two) of a written composition. These guidelines are generic for self, peer, and teach responding. Each assessor will need to modify the list according to the level of tl learner, the context, and the purpose in responding. The teacher-assessor's role is as a guide, a facilitator, and an ally; therefor assessment at this stage of writing needs to be as positive as possible to the writer. An early focus on overall structure and meaning will enable writers clarify their purpose and plan and will set a framework for the writers' later refinement of the lexical and grammatical issues.

Assessing Later Stages of the Process of Composing

Through all these stages it is assumed that peers and teacher are both responding to the writer through conferencing in person, electronic communication, or, at the very least, an exchange of papers. The impromptu timed tests and the methods oi scoring discussed earlier may appear to be only distantly related to such an individualized process of creating a written text, but are they, in reality? All those developmental stages may be the preparation that learners need both to function in creative real-world writing tasks and to successfully demonstrate their competence on timed impromptu test. And those holistic scores are after all generalizations of various components of effective writing. If the hard work of successfully gressing through a semester or two of a challenging course in academic writing mately means that writers are ready to function in their real-world contexts, and get a 5 or 6 on the TWE, then all the effort was worthwhile.

This chapter completes the cycle of considering the assessment of all of the four skills of listening, speaking, reading, and writing. As you contemplate using some of the assessment techniques that have been suggested, I think you can fully appreciate two significant overarching guidelines for designing an assessment procedure:

1. It is virtually impossible to isolate any one of the four skills without the involvement Of at least one other mode of performance. Don't underestimate power of the integration of skills in assessments designed to target a single skill area

2. The variety of assessment techniques and item types and tasks is virtually infinite in that there is always some possibility for creating a unique variation. Explore those alternatives, but with some caution lest your overzealous urge to be distract you from a central focus on achieving the intended purpose and rendering an appropriate evaluation of performance.

Reference :

Brown,H.Douglas. 2004. LANGUAGE ASSESSMENT “Principles and classroom practice”. New York: Pearson Education.