Kamis, 26 Maret 2020

summary chapter 3 from book language assessment principles and classroom practices


CHAPTER 3 : Designing classroom language tests

TEST TYPES

The first task you will face in designing a test for your students is to determine the purpose for the test. Defining your purpose will help you choose the right kind of test, and it will also help you to focus on the specific objectives of the test. We will look first at two test types that you will probably not have many opportunities to create as a classroom teacher-language aptitude tests and language proficiency tests and three types that you will almost certainly need to create-placement tests, diagnostic tests, and achievement tests,
1.      Language Aptitude Tests
One t e type of test-although admittedly not a very common one-predicts a person's success prior to exposure to the second language. A language aptitude test is designed to measure capacity or general ability to learn a foreign language and ulti mate success in that undertaking Language aptitude tests are ostensibly designed to apply to the classroom learning of any language. Two standardized aptitude tests have been used in the United States: the Modern Language Aptitude Test (MLAD (Carroll & Sapon, 1958) and the Pasteur Language Aptitude Battery (PLAB) (Pimsleur, 1966). Both are English language tests and require students to perform a number of language-related tasks. The MLAT for example, consists of five different tasks.

2.      Proficiency tests
A proficiency test is not limited to any one course, curriculum, or single skill in the language; rather, it tests overall ability. Proficiency test have traditionally consisted of standardized multiple choice items in grammar, vocabulary, reading comprehension, and aural comprehension. A typical example of a standardized proficiency test is the Test of English as a Foreign Language (TOEFL") produced by the Educational Testing Service. The TOEFL is used by more than a thousand institutions of higher education in the United States as an indicator of a prospective student's ability to undertake academic work in an English-speaking milicu, The TOEFL consists of sections on listening comprehension, Structure (or grammatical accuracy).reading comprehension, and written expression.

3.      Placement tests
Certain proficiency tests can act in the role of placement tests, the purpose of which is to place a student into a particular level or section of a language curriculum or school. A placement test usually, but not always, includes a sampling of the material to be covered in the various courses in a curriculum; a student's performance on the test should indicate the point at which the student will find material neither too easy nor too difficult but appropriately challenging. Placement tests come in many varieties: assessing comprehension and production, responding through written and oral performance, open-ended and limited responses, selection (e.g., multiple-choice) and gap-filling formats.
4.      Diagnostic tests
A diagnostic test is designed to diagnose specified aspects of a language. A test in pronunciation, for example, might diagnose the phonological features of English that are difficult for learners and should therefore become part of a curriculum Usually. such tests offer a checklist of features for the administrator (often the teacher) to use in pinpointing difficulties. A writing diagnostic would elicit a writing sample from students that would allow the teacher to identify those rhetorical and linguistic features on which the course needed to focus special attention.

5.      Achievement tests
An achievement test is related directly to classroom lessons, units, or even a total curriculum. Achievement tests are (or should be) limited to particular material addressed in a curriculum within a particular time frame and are offered after a course has focused on the objectives in question- Achievement tests can also serve the diagnostic role of indicating what a student needs to continue to work on in the future, but the primary role of an achievement test is to determine whether course objectives have been met—and appropriate knowledge and skills acquired—by the end of a period of instruction. Achievement tests are often summative because they are administered at the end of a unit or term of study. They also play an important formative role. An effective achievement test will offer washback about the quality of a learner's performance in subsets of the unit or course. This washback contributes to the formative nature of such tests.

SOME PRACTICAL STEPS TO TEST CONSTRUCTION

1.      Assessing Clear, Unambiguous Objectives

In addition to knowing the purpose of the test you're creating, you need to know as specifically as possible what it is you want to test. Sometimes teachers give tests simply because it's Friday of the third week of the course, and after hasty glances at the chapter(s) covered during those three weeks, they dash off some test items so that students will have something to do during the class. This is no way to approach a test. Instead, begin by taking a careful look at everything that you think your students should "know" or be able to "do." based on the material that the students are responsible for. In other words, examine the objectives for the unit you are testing.

2.      Drawing Up Test Specifications

Test specifications for classroom use can be a simple and practical outline of your test. (a) a broad outline of the test, (b) what skills you will test, and (c) what the items will look like. Let's look at the first two in relation to the midterm unit assessment already referred to above. (a) Outline of the test and (b) skills to be included. Because of the constraints of your curriculum, your unit test must take no more than 30 minutes. This is an integrated curriculum. so you need to test all four skills. Since you have the luxury of teaching a small class (only 12 students!), you decide to include an oral production component in the preceding period (taking students one by one into a separate room while the rest of the class reviews the unit individually and completes workbook exercises). You can therefore test oral production objectives directly at that time. You determine that the 30-minute test will be divided equally in time among listening, reading. and writing. (c) Item types and tasks. The next and potentially more complex choices involve the item types and tasks to use in this test. It is surprising that there are a limited number of modes of eliciting responses (that is, prompting) and of responding on tests of any kind. Consider the options: the test prompt can be oral (student listens) or written (student reads), and the student can respond orally or in writing.

3.      Devising Test Tasks

Ideally, you would try out all your tests on students not in your class before actually administering the tests. But in our daily classroom teaching, the tryout phase is almost impossible. Alternatively, you could enlist the aid of a colleague to look over your test. And so you must do what you can to bring to your students an instrument that is, to the best of your ability, practical and reliable.
In the final revision of your test. imagine that you are a student taking the test. Go through each set of directions and al] items slowly and deliberately. Time yourself. (Often we underestimate the time students will need to complete a test.) If the test should be shortened or lengthened. make the necessary adjustments. Make sure your test is neat and uncluttered on the page, reflecting all the care and precision you have put into its construction. If there is an audio component, as there is in our hypothetical test, make sure that the script is clear, that your voice and any other voices are clear, and that the audio equipment is in working order before starting the test.

4.      Designing Multiple-Choice Test Items

Multiple-choice items, which may appear to be the simplest kind of item to construct, are extremely difficult to design correctly. cautions against a number of weaknesses of multiple-choice items:
     The technique tests only recognition knowledge.
     Guessing may have a considerable effect on test scores.
     The technique severely restricts what can be tested,
     It is very difficult to write successful items.
     Washback may be harmful.
     Cheating may be facilitated.
multiple-choice items offer overworked teachers the tempting possibility of an easy and consistent process of scoring and grading But is the preparation phase worth the effort? Sometimes it is, but you might spend even more time designing such items than you save in grading the test. Of course, if your objective is to design a large-scale standardized test for repeated administrations, then a multiple-choice format does indeed become viable. First, a primer on terminology.
1)      Multiple-choice items are all receptive, or selective, response items in that the test-taker chooses from a set of responses (commonly called a supply type of response) rather than creating a response. Other receptive item types include true-false questions and matching lists. (In the discussion here, the guidelines apply primarily to multiple-choice item types and not necessarily to other receptive types.)
2)      Every multiple-choice item has a stem, which presents a stimulus. and several (usually between three and five) options or alternatives to choose from.
3)      One of those options, the key, is the correct response, while the others serve as distractors


SCORING, GRADING, AND GIVING FEEDBACK

1.      Scoring

As you design a classroom test, you must consider how the test will be scored and graded. Your scoring plan reflects the relative weight that you place on each section and items in each section. The integrated-skills class that we have been using as an example focuses on listening and speaking skills with some attention to reading and writing. Three of your nine objectives target reading and writing skills.
Because oral production is a driving force in your overall objectives, you decide to place more weight on the speaking (oral interview) section than on the other three sections. Five minutes is actually a long time to spend in a one-on-one situation with a student, and some significant information can be extracted from such a session. You therefore designate 40 percent of the grade to the oral interview. You consider the listening and reading sections to be equally important, but each of them, especially in this multiple-choice format, is of less consequence than the oral interview. So you give each of them a 20 percent weight. That leaves 20 percent for the writing section, which seems about right to you given the time and focus on writing in this unit of the course.
2.      Grading
Your first thought might be that assigning grades to student performance on this test would be easy. just give an "A" for 90-100 percent, a "B" for 80-89 percent, and so on. Not so fast! Grading is such a thorny issue that all of Chapter 11 is devoted to the topie How you assign letter grades to this test is a product of
       the country, culture. and context of this English classroom,
       institutional expectations (most of them unwritten),
       explicit and implicit definitions of grades that you have set forth.
       the relationship you have established with this class, and
    student expectations that have been engendered in previous tests and quizzes in this class.
3.      Giving  feedback
A section on scoring and grading would not be complete without some consideration of the forms in which you will offer feedback to your students, feedback that you want to become beneficial washback. In the example test that we have been referring to here—which is not unusual in the universe of possible formats for periodic.

Reference:
Brown, H. Douglas. 2004 . language assessment: princple and classroom practices. New York: Pearson Education.

Kamis, 19 Maret 2020

summary about Practicality, Reliability, Validity, Authenticity and Washback



A.     Practicality
A test is prohibitively expensive is impractical. A test of language proficiency that takes a student five hours to complete is impractical it consumes more time (and money) then necessary to accomplish its objective. A test that requires individual one-on-one proctoring is impractical for a group of several hundred test takers and only a handful of examiners. A test that takes a few minutes for a student to take and several hours for an examiner to evaluate is impractical for most classroom situations. A test that can be scored only by computer is impractical if the test takes place a thousand miles away from the nearest computer. The value and quality of a test sometimes hinge on such nitty-gritty, practical considerations.
B.     Reliability
A reliable test is consistent and dependable. If you give the same test to the same student or matched students on two different occasions, the test should yield similar result. The issue of reliability of a test may best be addressed by considering a number of factors that may contribute to the unreliability of a test. Consider the following possibilities fluctuations in the student, in scoring, in test administration, and in the test itself. 
·        Student related reliability
The most common learner related issue in reliability is caused by temporary illness, fatigue, a “bad day”, anxiety, and other physical or psychological factors, which may make an “observed” score deviate from one’s “true” score. Also included in this category are such factors as attest taker’s or strategies for efficient test taking.(Mousavi, 2002, p, 804)
·        Rater reliability
Human error, subjectivity, and bias may enter into the scoring process. Inter-rater reliability occurs when two or more scorers yield inconsistent scores or the same test possibly for lack of attention to scoring criteria, inexperience, inattention, or even preconceived biases. In the story above about the placement test, the initial scoring plan for the dictations was found to be unreliable-that is, the two scorers were not applying the same standards.
·        Test administration reliability
Unreliability may also result from the conditions in which the test is administered. I once witnessed the administration of a test of aural comprehension in which a tape recorder played items for comprehension, but because of street noise outside the building, students sitting next to windows could not hear the tape accurately. This was a clear case of unreliability are found in photocopying variations, the amount of light in different parts of the room, variations in temperature, and even the condition of desks and chairs.
·        Test reliability
Sometimes the nature of the rest itself can cause measurement errors. If a test is too long, test-takers may become fatigued by the time they reach the later items and hastily respond in correctly. Timed test may discriminate against students who do not perform well on a test with a time limit. We all know people who know the course material  perfectly but who are adversely affected by the presence of a clock ticking away. Poorly written test items may be a further source of test unreliability.
C.     Validity
By far the most complex criterion of an effective test and arguably the most important principle is validity, the extent to which inferences made from assessment result are appropriate, meaningful, and useful in terms of the purpose of the assessment (Gronlud,1998,p. 226). A valid test of reading ability actually measures reading ability not 20/20 vision, nor previous knowledge in a subject, nor some other variable of questionable relevance. To measure writing ability, one might ask students to write as many words as they can in 15 minutes, then simply count the words for the final score. Such a test would be easy to administer (practical), and the scoring quite dependable (reliable). But it would not constitute a valid test of writing ability without some consideration of comprehensibility, rhetorical discourse elements, and the  organization of ideas, among others factors.  
D.     Authenticity
A fourth major principle of language testing is authenticity, a concept that is a little slippery to define, especially within the art and science of evaluating and designing test. Bachman and palmer (1996, p. 23) define authenticity as “the degree of correspondence of the characteristics of a given  language test task to the features of  a target language task,” and then suggest an agenda for identifying those target language tasks and for transforming them into valid test items.
In a test, authenticity may be present in the following ways:
1.      The language in the test is as natural as possible.
2.      Items are contextualized rather than isolated.
3.      Topic are meaningful (relevant, interesting) for the learner.
4.      Some thematic organization to items is provided, such as through a story line or episode.
5.      Tasks represent, or closely approximate, real-world tasks.  
E.      Washback
Otherwise known among language testing specialists as washback. In large scale assessment, washback generally refers to the effects the test have on instruction in terms of how students prepare for the test. “cram” courses and “teaching” to the test “are examples of such washback. Anothers form of washback that occurs more in classroom assessment is information that “washes back” to students in the form of useful diagnoses of strengths and weaknesses. Washback also includes the effects of an assessment on teaching and learning prior to the assessment itself, that is, on preparation for the assessment. Informal performance assessment is by nature more likely to have built-in washback effects because the teacher is usually providing interactive feedback. Formal test can also have positive washback, but they provide no washback if the students receive a simple letter grade or single overall numerical score.


Reference:
Brown, H. Douglas. 2004 . language assessment: princple and classroom practices. New York: Pearson Education.

The analysis of the 3 principles of language assessment (practicality, reliability, and validity)


THE ANALYSIS OF THE 3 PRINCIPLES OF LANGUAGE ASSESSMENT IN THE TEST PAPER (UAS SMP 7 GRADE)
This is a 7th grade final exam questions.

1. Teacher : Good morning everybody. Is anybody absent today
Students : …. Miss. Nobody is absent, Miss.
a. Good evening.
b. Good night.
c. Good afternoon
d.  Good morning
2. Hanna : …. Rani?
Rani : I go to SMP Nusa Bangsa.
a. How do you go to school
b. How did you go to school
c.Where do you go to school
d. Where did you go to school
3. Deni : Hi, My name is Deni . What is your name?
Omi : Hi, I’m Omi. Are you … ?
Deni : Yes. I just moved from Bandar Lampung. I’m in I B, and you?
Omi : Me too.
a. new teacher
b. a teacher
c. new student
d. a librarian

4. Ronal : Cindy, this is my sister Nita. Anita this is Cindy.
Cindy : How do you do?
Nita : How do you do?
What does Ronal do?
a. He introduces himself.
b. He introduces his sister.
c. He greets Nita
d. He meets Cindy.
5. bedroom – clean – the – keep
1                 2           3           4
a. 4-3-2-1
b. 4-1-3-2
c. 4-3-1-2
d. 4-2-3-1
6 . Ninda : Hello farida, I heard that you‟ve won the beauty contest.
Farida : That‟s right.
Ninda : “ ………………………….”
Farida : Thank you very much.
a. You‟re very lucky
b. I‟m surprised on your success
c. May you congratulate me
d. Congratulation on your success

7. SMP Patriot is quite big. It has many rooms. It also has a big yard with a flaq pole in the middle. The school is very green . The teachers and students plant many trees there.
Where is the flag pole?
a. In the middle of the yard.
b. In the middle of school building.
c. In the middle of SMP Patriot.
d. In the middle of the trees.
Text for questions no 8-10
This is my school, SMPN 1 Pasir Sakti. It is on Hasyim Asy’ari street. It is  far from my house. It has twenty one classrooms. Every classroom consist of thirty one students. There are a library, a laboratory, a computer room, a mosque, four toilets, and a meeting hall. My favourite place is a library. I like reading fairy stories, ensiclopedia books, and computer books.
8. What is on Hasyim Asya‟ari street? A
a. My school
b. SMPN SMPN 1 Pasir Sakti
c. The writer‟s house
d. My house
9. Where does the writer always read his favourite books?
a. The classroom
b. The laboratory
c. The library
d. The computer room
10. How many students are every classroom?

a. Twenty one
b. Thirty one
c. Twenty two
d. Thirty two
This is the analysis from test paper :

a)      Practicality
Final exam questions in every city has difference. The difference was due to regional autonomy in the filed education, so in making the final semester exam questions become different. Making questions also cannot be separated from core competencies, basic competencies, as well as aspects in the 2013 curriculum and competency standards and basic competencies at the education unit level curriculum.

b)      Reliability
this final test is a bit reliable, because every students  has the same question paper. But all the questions and the texts are clear and only have on correct answer in them, students are also given time in answering questions and final exam assessments are also consistent in the result of student answers.

c)      Validity
This final exam question is valid, because there are several reasons such as ; all questions and text in it using English, to answers the whole question requires the expertise of students in reading the point targeted in the question, and rearranging random sentence. Everything contained in the questions is very similar with what was learned before.  

Reference:
soal UAS (ujian akhir semester) ganjil bahasa inggris kelas 7. Retrived from
https://www.ilmubahasainggris.com/contoh-soal-uas-ujian-akhir-semester-ganjil-bahasa-inggris-kelas-7/