Early
in the decade of the 1990s, in a culture of rebellion against the notion that
all people and all skills could be measured by traditional tests, a novel
concept emerged that began to be labeled "alternative" assessment. As
teachers and students were becoming aware of the shortcomings of standardized
tests, "an alternative to standardized testing and all the problems found
with such testing" (Huerta-Macias, 1995, p. 8) was proposed. That proposal
was to assemble additional measures of students—portfolios, journals,
observations, self-assessments. peer-assessments, and the like—in an effort to
triangulate data about students. For some, such alternatives held "ethical
potential" (Lynch, 2001, p. 228) in their promotion of fairness and the
balance of power relationships in the classroom,
Why,
then, should we even refer to the notion of "alternative" when
assessment already encompasses such a range of possibilities? This was the
question to which Brown and Hudson (1998) responded in a TESOL Quarterly
article. They noted that to speak of alternative assessments is
counterproductive because the term implies something new and different that may
be "exempt from the requirements of responsible test construction"
(p. 657). So they proposed to refer to -alternatives" in asses& meant
instead. Their term is a perfect fit within a model that considers tests as a
subset of assessment. Throughout this book, you have been reminded that all
tests are assessments but, more important, that not all assessments are tests.
The
defining characteristics of the various alternatives in assessment that been
commonly used across the profession were aptly summed up by Brown and Hudson
(1998, pp. 654-655), Alternatives in assessments
1.
require students to perform, create,
produce, or do something;
2.
use real-world contexts or simulations;
3.
are nonintrusive in that they extend the
day-to-day classroom activities;
4.
allow students to be assessed on what
they normally do in class every day;
5.
use tasks that represent meaningful
instructional activities;
6.
focus on processes as well as products;
7.
tap into higher-level thinking and
problem-solving skills;
8.
provide information about both the
strengths and weaknesses of students;
9.
are multi culturally sensitive when
properly administered;
10.
ensure that people, not machines, do the
scoring, using human judgment;
11.
encourage open disclosure of standards
and rating criteria; and
12.
call upon teachers to perform new
instructional and assessment roles.
The
Dilemma Of Maximizing Both Practicality And Washback
The
principal purpose of this chapter is to examine some of the alternatives
assessment that are markedly different from formal tests. Tests, especially
large scale standardized tests, tend to be one-shot performances that are
timed, multiple choice decontextualized, norm- referenced, and that foster
extrinsic motivation- On the other hand, tasks like portfolios, journals, and
self-assessment are
1)
open-ended in their time orientation and
format.
2)
contextualized to a curriculum,
3)
referenced to the criteria (objectives)
of that curriculum, and
4)
likely to build intrinsic motivation.
One
way of looking at this contrast poses a challenge to you as a teacher test
designer. Formal standardized tests are almost by definition highly practical
able instruments. are designed to minimize time and money on the part 01
designer and test-taker, and to be painstakingly accurate in their Alternatives
such as portfolios, or conferencing with students on drafts of written work, or
observations of learners over time all require considerable time and effort on
the part of the teacher and the student. Even more time must be spent if the
teacher hopes to offer a reliable evaluation within students across time, as
well as across students (taking care not to favor one student or group of
students). But the alternative techniques also offer markedly greater washback,
are superior formative measures, and. because of their authenticity, usually
carry greater face validity.
A
number of approaches to accomplishing this end are pos. sible, many of which
have already been implicitly presented in this book
1. building
as much authenticity as possible into multiple-choice task types and items
2. designing
classroom tests that have both objective-scoring sections and open-ended
response sections, varying the performance tasks
3. turning
multiple-choice test results into diagnostic feedback on areas of needed
improvement
4. maximizing
the preparation period before a test to elicit performance rele vant to the
ultimate criteria of the test
5. teaching
test-taking strategies
6. helping
students to see beyond the test: don't "teach to the test"
7. triangulating
information on a student before making a final assessment competence.
The
flip side of this challenge is to understand that the alternatives in assess
ment are not doomed to be impractical and unreliable. As we look at alternatives
assessment in this chapter, we must remember Brown and Hudson's (1998) admo
nition to scrutinize the practicality, reliability, and validity of those
alternatives at same time that we celebrate their face validity, washback
potential, and authentictrv It is easy to fly out of the cage of traditional
testing rubrics, but it is tempting doing so to flap our wings aimlessly and to
accept virtually any classroom actiVirr as a viable alternative. Assessments
proposed to serve as triangulating measures competence imply a responsibility
to be rigorous in determining object• response modes, and criteria for
evaluation and interpretation.
word
about performance-based assessment is in order. There has been a grez deal of
press in recent years about performance-based assessment, sometimes merely
called performance assessment (Shohamy, 1995; Norris et al., 1998). Js
different from what is being called "alternative assessment"?
The
push toward more performance-based assessment is part of the same enl
educational reform movement that has raised strong objections to using dardized
test scores as the only measures of student competencies (see example,Valdez
Pierce & O'Malley, 1992; Shepard & Bliem, 1993). The argum you can
guess, was that standardized tests do not elicit actual performanceon part of
test-takers.
Performance-based
assessment implies productive, observable skills, such as speaking and writing,
of content-valid tasks. Such performance usually, but not always, brings with
it an air of authenticity—real-world tasks that students have had time to
develop.
O'Malley
and Valdez Pierce (1996) considered performance-based assessment to be a subset
of authentic assessment. In other words, not all authentic assessment is
performance-based. One could infer that reading, listening, and thinking
have many authentic manifestations, but
since they are not directly observable in and of themselves, they are not
performance-based. According to O'Malley and Valdez Pierce (p. 5), the following
are characteristics of performance assessment:
1)
Students make a constructed response.
2)
engage in higher-order thinking, with
open-ended tasks.
3)
Tasks are meaningful, engaging, and
authentic.
4)
Tasks call for the integration of
language skills.
5)
Both process and product are assessed.
6)
Depth of a student's mastery is
emphasized over breadth.
Performance-based
assessment needs to be approached with caution. It is tempting for teachers to
assume that if a student is doing something, then the process has fulfilled its
own goal and the evaluator needs only to make a mark in the grade book that
says •accomplished- next to a particular competency. In reality, performances
as assessment procedures need to bc treated with the same rigor as traditional
tests.
To
sum up, performance assessment is not completely synonymous with the concept of
alternative assessment. Rather, it is best understood as one of the primary
traits of the many available alternatives to assessment.
• PORTFOLIOS
One
of the most popular alternatives in assessment, especially within a framework
of communicative language teaching, is portfolio development. According to
Genesee and Upshur (1996), a portfolio is "a purposeful collection of
students' work that demonstrates their
efforts, progress, and achievements in given areas" (p. 99). Portfolios
include materials such as
1)
essays and compositions in draft and
final forms;
2)
reports, project outlines;
3)
poetry and creative prose; artwork, photos, newspaper or magazine
clippings; audio and/or video
recordings of presentations, demonstrations, etc.;
4)
journals, diaries, and other personal
reflections;
5)
tests, test scores, and written homework
exercises; notes on lectures; and
6)
self- and peer-assessments—comments,
evaluations, and checklists.
Gottlieb (1995)
suggested a developmental scheme for considering the nature and purpose of
portfolios, using the acronym CRADLE to designate six possible attributes of a
portfolio:
·
Collecting Reflecting
·
Assessing
·
Documenting
·
Linking
·
Evaluating
• JOURNALS
A
journal is a log (or "account") of one's thoughts, feelings,
reactions, assessments, ideas, or progress toward goals, usually written with
little attention to structure, form, or correctness. Learners can articulate
their thoughts without the threat of those thoughts being judged later (usually
by the teacher). Sometimes journals are rambling sets of verbiage that
represent a stream of consciousness with no par. ticular point, purpose, or
audience. Fortunately, models of journal use in educational
practice
have sought to tighten up this style of journal in order to give them some
focus (Staton et al., 1987). Ille result is the emergence of a number of
overlapping categories or purposes in journal writing, such as the following:
1.
Language learning logs
2.
Grammar journals
3.
Responses to readings
4.
Strategies-based learning logs
5.
Self-assessment reflections
6.
Diaries of attitudes, feelings, and
other affective factors
7.
Acculturation logs
Most
class room-oriented journals are what have now come to be knows as dialogue
journals. They imply an interaction between a reader and the student through
dialogues or responses. For the best results, those responses should be
dispersed across a course at regular intervals, perhaps weekly or biweekly. One
of the principal objectives in, a student dialogue journal is to carry on a
conversation with, the teacher. Through dialogue journals, teacher can become
better acquainted with their affective states, and thus become better equipped
to meet students individual needs.
• CONFERENCES AND
INTERVIEWS
Conferences
are not limited to drafts of written work. Including portfolios and journals
discussed above, the list of possible functions and subject matter for cone
ferencing is substantial:
Ø commenting
on drafts of essays and reports
Ø reviewing
portfolios
Ø responding
to journals
Ø advising
on a student's plan for an oral presentation
Ø assessing
a proposal for a project
Ø giving
feedback on the results of performance on a test
Ø clarifying
understanding of a reading
Ø exploring
strategies-based options for enhancement or compensation
Ø focusing
on aspects of oral production
Ø checking
a student's self-assessment of a performance
Ø setting
personal goals for the near future
assessing general progress in a course
Conferences
must assume that the teacher plays the role of a facilitator and guide, not of
an administrator, of a formal assessment. In this intrinsically motivating
atmosphere, students need to understand that the teacher is an ally who is
encouraging self-reflection and improvement. So that the student will be as
candid as posSible in self-assessing, the teacher should not consider a
conference as something to be scored or graded. Conferences are by nature
formative, not summative, and their primary purpose is to offer positive
washback
This
term is intended to denote a context in which a teacher interviews a student
for a designated assessment purpose. (We are not talking about a student
conducting an interview of others in order to gather information on a topic.)
Interviews may have one or more of several possible goals, in which the teacher
Ø assesses
the student's oral production,
Ø ascertains
a student's needs before designing a course or curriculum,
Ø seeks
to discover a students learning styles and preferences
Ø asks
a student to assess his or her own performance and
Ø requests
an evaluation of a course
• OBSERVATIONS
How
do all these chunks of information become stored in a teacher's brain cells?
Usually not through rating sheets and checklists and carefully completed
observation charts. Still, teachers' intuitions about students' performance are
not infallible, and certainly both the reliability and face validity of their
feedback to students can be increased with the help of empirical means of
observing their language performance. The value of systematic observation of
students has been extolled for decades (Flanders, 1970; Moskowitz, 1971; Spada
& Frölich, 1995), and its utilization greatly enhances a teacher's
intuitive impressions by offering tangible corroboration of conclusions.
Occasionally, intuitive information is disconfirmed by observation data.
We
will not be concerned in this section with the kind of observation that rates a
formal presentation or any other prepared, prearranged performance in which the
student is fully aware of some evaluative measure being applied, and in which
the teacher scores or comments on the performance. We are talking about
observation as a systematic, planned procedure for real-time, almost
surreptitious recording of student verbal and nonverbal behavior. One of the
objectives of such observation is to assess students without their awareness
(and possible consequent anxiety) of the observation so that the naturalness of
their linguistic performance is maximized.
• SELF-AND PEERASSESSMENTS
Self-assessment
derives its theoretical justification from a number of welle established principles
of second language acquisition. The principle of autonomy stands out as one of
the primary foundation stones of successful learning. ability to set one's own
goals both within and beyond the structure of a classroom curriculum, to pursue
them without the presence of an external prod, and to inde pendently monitor
that pursuit are all keys to success. Developing intrinsic motivation that
comes from a self-propelled desire to excel is at the top of the list
successful acquisition of any set of skills
Peer-assessment
appeals to similar principles, the most obvious of which is coor erative
learning. Many people go through a whole regimen of education froc kindergarten
up through a vaduate degree and never come to appreciate the value
collaboration in learning—the benefit of a community of learners capable of
teaching each other something. Peer-assessment is simply one arm of a plethora
of tasks procedures within the domain of learner-centered and collaborative
education.
Researchers
(such as Brown & Hudson, 1998) agree that the above theoretici
underpinnings of self- and peer-assessment offer certain benefits: direct
involvement of students in their own destiny, the encouragement of autonomy,
and increased motivation because of their self-involvement. Of course, some
noteworthy draw backs must also be taken into account. Subjectivity is a
primary obstacle to over come. Students may be either too harsh on themselves
or too self-flattering, or the may not have the necessary tools to make an
accurate assessment. Also, especiaDs in the case of direct assessments of
performance (see below), they may not be abk to discern their own errors. In
contrast, Bailey (1998) conducted a study in which learners showed moderately
high correlations (between .58 and .64) between self rated oral production
ability and scores on the OPI, which suggests that in th assessment of general
competence, learners' self-assessments may be more accurat than one might
suppose.
Types
of Self- and Peer-Assessment
It
is important to distinguish among several different types of self- and
peer-assessment and to apply them accordingly. r have borrowed from widely
accepted classifica tions of strategic options to create five categories of
self- and peer-assessment
Ø direct
assessment of performance,
Ø indirect
assessment of performance
Ø metacognitive
assessment,
Ø assessment
of social affective factors, and
Ø Student
self-generated tests.
1.
Assessment of [a specific/ Performance.
In this category, a student typically monitors him- or herself—in either oral
or written production—and renders some kind of evaluation of performance. The
evaluation takes place immediately or very soon after the performance. Thus,
having made an oral presentation, the student (or a peer) fills out a checklist
that rates performance on a defined scale. Or perhaps the student views a
video-recorded lecture and completes a self-corrected comprehension quiz. A
journal may serve as a tool for such self-assessment. Peer editing is an
excellent example of direct assessment of a specific performance.
2.
Indirect assessment of[general]
competence. Indirect self- or peer-assessment targets larger slices of time
with a view to rendering an evaluation of general ability, as opposed to one
specific, relatively time-constrained performance. The distinction between
direct and indirect assessments is the classic competence-performance
distinction. Self- and peer-assessments of performance are limited in time and
focus to a relatively short performance. Assessments of competence may
encompass a lesson over several days, a module, or even a whole term of course
work, and the objective is to ignore minor, nonrepeating performance flaws and
thus to evaluate general ability.
3.
Metacognitive assessment (for setting
goals). Some kinds of evaluation are more strategic in nature, with the purpose
not just of viewing past performance or competence but of setting goals and
maintaining an eye on the process of their pursuit. Personal goal-setting has
the advantage of fostering intrinsic motivation and of providing learners with
that extra-special impetus from having set and accomplished one's own goals.
Strategic planning and self-monitoring can take the form of journal entries,
choices from a list of possibilities, questionnaires, or cooperative (oral)
pair or group planning.
Guidelines
for Self- and Peer-Assessment
Self-
and peer-assessment are among the best possible formative types of assessment
and possibly the most rewarding, but they must be carefully designed and administered for them to reach their
potential. Four guidelines will help teachers bring this intrinsically
motivating task into the classroom successfully.
1.
Tell students the purpose of the
assessment Self-assessment is a process that many students—especially those in
traditional educational systems—will initially find quite uncomfortable. They
need to be sold on the concept. It is therefore essential that you carefully
analyze the needs that will be met in offering both selfand peer-assessment
opportunities, and then convey this information to students,
2.
Define the task(s) clearly. Make sure
the students know exactly what they are supposed to do. If you are offering a
rating sheet or questionnaire, the task is not complex, but an open-ended
journal entry could leave students perplexed about what to write. Guidelines
and models will be of great help in clarifying the procedures.
3.
Encourage impartial evaluation of
performance or ability. One of the greatest drawbacks to self-assessment is the
threat of subjectivity. By showing students the advantage of honest, objective
opinions, you can maximize the beneficial washback of self-assessments.
Peer-assessments, too, are vulnerable to unreliability as students apply
varying standards to their peers. Clear assessment criteria can go a long way
toward encouraging objectivity.
4.
Ensure beneficial washback through
follow-up tasks. It is not enough to simply toss a self-checklist at students
and then walk away. Systematic follow-up can be accomplished through further
self-analysis, journal reflection, written feedback from the teacher,
conferencing with the teacher, purposeful goal-setting by the student, or any
combination of the above.
A
Taxonomy of Self- and Peer-Assessment Tasks
An
evaluation of self- and peer-assessment according to our classic principles of
assessment yields a pattern that is quite consistent with other alternatives to
assessment that have been analyzed in this chapter. Practicality can achieve a
moderate level with such procedures as checklists and questionnaires, while
reliability risks remaining at a low level, given the variation within and
across learners. Once students accept the notion that they can legitimately
assess themselves, then face validity can be raised from what might otherwise
be a low level. Adherence to course objectives will maintain a high degree of
content validity. Authenticity and washback both have very high potential
because students are centering on their
own linguistic needs and are receiving useful feedback,
Perhaps
it is now clear why "alternatives in assessment" is a more
appropriate phrase than "alternative assessment." To set traditional
testing and alternatives against each other is counterproductive. All kinds of
assessment, from formal conventional procedures to informal and possibly
unconventional tasks, are needed to assemble information on students. The
alternatives covered in this chapter may not be markedly different from some of
the tasks described in the preceding four chapters (assessing listening,
speaking, reading, and writing). When we put all of this together, we have at
our disposal an amazing array of possible assessment tasks for second language
learners of English. Ihe alternatives presented in this chapter simply expand
that continuum of possibilities.
Reference :
Brown,H.Douglas. 2004.
LANGUAGE ASSESSMENT “Principles and classroom practice”. New York: Pearson
Education.
Tidak ada komentar:
Posting Komentar