Tugas language assesment: summary about Practicality, Reliability, Validity, Authenticity and Washback

A. Practicality

A test is prohibitively expensive is impractical. A test of language proficiency that takes a student five hours to complete is impractical it consumes more time (and money) then necessary to accomplish its objective. A test that requires individual one-on-one proctoring is impractical for a group of several hundred test takers and only a handful of examiners. A test that takes a few minutes for a student to take and several hours for an examiner to evaluate is impractical for most classroom situations. A test that can be scored only by computer is impractical if the test takes place a thousand miles away from the nearest computer. The value and quality of a test sometimes hinge on such nitty-gritty, practical considerations.

B. Reliability

A reliable test is consistent and dependable. If you give the same test to the same student or matched students on two different occasions, the test should yield similar result. The issue of reliability of a test may best be addressed by considering a number of factors that may contribute to the unreliability of a test. Consider the following possibilities fluctuations in the student, in scoring, in test administration, and in the test itself.

· Student related reliability

The most common learner related issue in reliability is caused by temporary illness, fatigue, a “bad day”, anxiety, and other physical or psychological factors, which may make an “observed” score deviate from one’s “true” score. Also included in this category are such factors as attest taker’s or strategies for efficient test taking.(Mousavi, 2002, p, 804)

· Rater reliability

Human error, subjectivity, and bias may enter into the scoring process. Inter-rater reliability occurs when two or more scorers yield inconsistent scores or the same test possibly for lack of attention to scoring criteria, inexperience, inattention, or even preconceived biases. In the story above about the placement test, the initial scoring plan for the dictations was found to be unreliable-that is, the two scorers were not applying the same standards.

· Test administration reliability

Unreliability may also result from the conditions in which the test is administered. I once witnessed the administration of a test of aural comprehension in which a tape recorder played items for comprehension, but because of street noise outside the building, students sitting next to windows could not hear the tape accurately. This was a clear case of unreliability are found in photocopying variations, the amount of light in different parts of the room, variations in temperature, and even the condition of desks and chairs.

· Test reliability

Sometimes the nature of the rest itself can cause measurement errors. If a test is too long, test-takers may become fatigued by the time they reach the later items and hastily respond in correctly. Timed test may discriminate against students who do not perform well on a test with a time limit. We all know people who know the course material perfectly but who are adversely affected by the presence of a clock ticking away. Poorly written test items may be a further source of test unreliability.

C. Validity

By far the most complex criterion of an effective test and arguably the most important principle is validity, the extent to which inferences made from assessment result are appropriate, meaningful, and useful in terms of the purpose of the assessment (Gronlud,1998,p. 226). A valid test of reading ability actually measures reading ability not 20/20 vision, nor previous knowledge in a subject, nor some other variable of questionable relevance. To measure writing ability, one might ask students to write as many words as they can in 15 minutes, then simply count the words for the final score. Such a test would be easy to administer (practical), and the scoring quite dependable (reliable). But it would not constitute a valid test of writing ability without some consideration of comprehensibility, rhetorical discourse elements, and the organization of ideas, among others factors.

D. Authenticity

A fourth major principle of language testing is authenticity, a concept that is a little slippery to define, especially within the art and science of evaluating and designing test. Bachman and palmer (1996, p. 23) define authenticity as “the degree of correspondence of the characteristics of a given language test task to the features of a target language task,” and then suggest an agenda for identifying those target language tasks and for transforming them into valid test items.

In a test, authenticity may be present in the following ways:

1. The language in the test is as natural as possible.

2. Items are contextualized rather than isolated.

3. Topic are meaningful (relevant, interesting) for the learner.

4. Some thematic organization to items is provided, such as through a story line or episode.

5. Tasks represent, or closely approximate, real-world tasks.

E. Washback

Otherwise known among language testing specialists as washback. In large scale assessment, washback generally refers to the effects the test have on instruction in terms of how students prepare for the test. “cram” courses and “teaching” to the test “are examples of such washback. Anothers form of washback that occurs more in classroom assessment is information that “washes back” to students in the form of useful diagnoses of strengths and weaknesses. Washback also includes the effects of an assessment on teaching and learning prior to the assessment itself, that is, on preparation for the assessment. Informal performance assessment is by nature more likely to have built-in washback effects because the teacher is usually providing interactive feedback. Formal test can also have positive washback, but they provide no washback if the students receive a simple letter grade or single overall numerical score.

Reference:
Brown, H. Douglas. 2004 . language assessment: princple and classroom practices. New York: Pearson Education.

Tugas language assesment

Kamis, 19 Maret 2020

summary about Practicality, Reliability, Validity, Authenticity and Washback

Tidak ada komentar:

Posting Komentar