A. Practicality
A
test is prohibitively expensive is impractical. A test of language proficiency
that takes a student five hours to complete is impractical it consumes more
time (and money) then necessary to accomplish its objective. A test that
requires individual one-on-one proctoring is impractical for a group of several
hundred test takers and only a handful of examiners. A test that takes a few
minutes for a student to take and several hours for an examiner to evaluate is
impractical for most classroom situations. A test that can be scored only by
computer is impractical if the test takes place a thousand miles away from the
nearest computer. The value and quality of a test sometimes hinge on such
nitty-gritty, practical considerations.
B.
Reliability
A
reliable test is consistent and dependable. If you give the same test to the
same student or matched students on two different occasions, the test should
yield similar result. The issue of reliability of a test may best be addressed
by considering a number of factors that may contribute to the unreliability of
a test. Consider the following possibilities fluctuations in the student, in
scoring, in test administration, and in the test itself.
·
Student related reliability
The
most common learner related issue in reliability is caused by temporary
illness, fatigue, a “bad day”, anxiety, and other physical or psychological
factors, which may make an “observed” score deviate from one’s “true” score.
Also included in this category are such factors as attest taker’s or strategies
for efficient test taking.(Mousavi, 2002, p, 804)
·
Rater reliability
Human
error, subjectivity, and bias may enter into the scoring process. Inter-rater
reliability occurs when two or more scorers yield inconsistent scores or the
same test possibly for lack of attention to scoring criteria, inexperience,
inattention, or even preconceived biases. In the story above about the
placement test, the initial scoring plan for the dictations was found to be
unreliable-that is, the two scorers were not applying the same standards.
·
Test administration reliability
Unreliability
may also result from the conditions in which the test is administered. I once
witnessed the administration of a test of aural comprehension in which a tape
recorder played items for comprehension, but because of street noise outside
the building, students sitting next to windows could not hear the tape
accurately. This was a clear case of unreliability are found in photocopying
variations, the amount of light in different parts of the room, variations in
temperature, and even the condition of desks and chairs.
·
Test reliability
Sometimes
the nature of the rest itself can cause measurement errors. If a test is too
long, test-takers may become fatigued by the time they reach the later items
and hastily respond in correctly. Timed test may discriminate against students
who do not perform well on a test with a time limit. We all know people who
know the course material perfectly but
who are adversely affected by the presence of a clock ticking away. Poorly
written test items may be a further source of test unreliability.
C.
Validity
By
far the most complex criterion of an effective test and arguably the most
important principle is validity, the extent to which inferences made from
assessment result are appropriate, meaningful, and useful in terms of the
purpose of the assessment (Gronlud,1998,p. 226). A valid test of reading
ability actually measures reading ability not 20/20 vision, nor previous
knowledge in a subject, nor some other variable of questionable relevance. To
measure writing ability, one might ask students to write as many words as they
can in 15 minutes, then simply count the words for the final score. Such a test
would be easy to administer (practical), and the scoring quite dependable
(reliable). But it would not constitute a valid test of writing ability without
some consideration of comprehensibility, rhetorical discourse elements, and the
organization of ideas, among others
factors.
D.
Authenticity
A
fourth major principle of language testing is authenticity, a concept that is a
little slippery to define, especially within the art and science of evaluating
and designing test. Bachman and palmer (1996, p. 23) define authenticity as
“the degree of correspondence of the characteristics of a given language test task to the features of a target language task,” and then suggest an
agenda for identifying those target language tasks and for transforming them
into valid test items.
In
a test, authenticity may be present in the following ways:
1. The
language in the test is as natural as possible.
2. Items
are contextualized rather than isolated.
3. Topic
are meaningful (relevant, interesting) for the learner.
4. Some
thematic organization to items is provided, such as through a story line or
episode.
5. Tasks
represent, or closely approximate, real-world tasks.
E.
Washback
Otherwise
known among language testing specialists as washback. In large scale
assessment, washback generally refers to the effects the test have on
instruction in terms of how students prepare for the test. “cram” courses and
“teaching” to the test “are examples of such washback. Anothers form of
washback that occurs more in classroom assessment is information that “washes
back” to students in the form of useful diagnoses of strengths and weaknesses.
Washback also includes the effects of an assessment on teaching and learning
prior to the assessment itself, that is, on preparation for the assessment.
Informal performance assessment is by nature more likely to have built-in washback
effects because the teacher is usually providing interactive feedback. Formal
test can also have positive washback, but they provide no washback if the
students receive a simple letter grade or single overall numerical score.
Brown, H. Douglas. 2004 . language assessment: princple and classroom practices. New York: Pearson Education.
Tidak ada komentar:
Posting Komentar