Sunday, June 8, 2008

Chapter 15 – Appropriate Evaluating Teaching and Grading Students

Educational Assessment - Review By Brenda Roof
Classroom Assessment – What Teachers Need to Know - By W. James Popham

Chapter 15 addresses evaluating teaching and grading students. These two topics while sometimes used interchangeably are separate functions. Evaluation is an activity focused on determining the effectiveness of the teachers. Grading is an activity focused on informing students how they are performing. Pre-instruction and post-instruction assessment practices are discussed as well as split-and-switch design for informing instruction. The use of standardized achievement tests for evaluating students and instruction was also weighed. Three schemes for grading are also describe along with a more commonly used practice.
There are two types of evaluation used in apprising instructional efforts of teachers. The first is formative evaluation. Formative evaluation is the appraisal of the teacher’s instructional program for purposes of improving the program. The second form of evaluation is summative evaluation. Summative evaluation is not improvement focused, it is an appraisal of teachers competencies to make more permanent decisions about teachers. These decisions are typically about continuation of employment or awarding of tenure. Summative evaluation is usually made by an administrator or supervisor. Teachers will typically do their own formative evaluation in order to better their own instruction. Summative data may be supplied to administrators or supervisors to show effectiveness of teaching.
Instructional impact can be gauged by pre-instruction and post-instruction. Assessing students prior to instruction is pre-assessment and then assessing after instruction has occurred is post-assessment and an indication of learning that has occurred. This scenario however, can be reactive. Reactive is when students are sensitized to what needs to be learned from the assessment and then perform well on the post-assessment as a result. An alternative to this problem might be a split-and switch design. This alternative data gathering design works best on large groups of students versus smaller groups. In this model you will split your class and administer two similar tests to each half. Mark the test as pre-tests instruct the group and then switch the tests for each group and post-test. Blind scoring should then occur. Blind scoring is when someone else grades the tests, another teacher, parent or assistant. The test results are then pulled together for each test. There is no problem caused in this design by differential difficulty and students will not have previously seen the post-test so reactive impact is not a consideration or problem. As a result instructional impact should be seen.
A common use of evaluating teaching has been through performance of students on standardized achievement tests. For most achievement tests there is a very inappropriate way to evaluation instructional quality. A standardized test is any exam administered and scored in a predetermined, standard manner. There are two major forms of standardized tests they are aptitude tests and achievement tests. Schools effectiveness is typically based on standardized achievement tests. There are three types of standardized achievement tests. The first is a traditional national standardized achievement test. The second is a standards-based achievement test that is instructionally insensitive and the third is a standards-based achievement test that is instructionally sensitive.
The purpose of Nationally Standardized Achievement tests is to allow valid inferences to be made about the knowledge and skills a student possesses in a certain content area. These inferences are then compared with a norm group of students of the same age and grade. The dilemma of this is that there is so much that would need to be tested that only a small sampling is possible. The consequence of this is an assumption that the norm group is a genuine representation of th nation at large. If this is the case these tests should not evaluate the quality of education that is not their purpose. There is a likelihood the tests are not aligned rigorously with a state’s curricular aims. Items covering important emphasized content by the classroom teacher may be eliminated in a quest for score spread. The final reason nationally standardized achievement tests should be used to evaluate teachers success is many items are linked to students SES – Social economic status or their inherited academic aptitude. In essence they are measuring what students bring to school not what they learn at school.
Standards-based tests sound like they would make much more sense. Two problems that have occurred with standards-based instructionally insensitive tests are the large number of content standards needing to be addressed and then reporting results used have limited instructional value. If properly designed these are standards-based tests that are instructionally sensitive. Three attributes must be present for standards-based test to be instructionally sensitive. They are the skills and/or bodies of knowledge must be clearly described so students’ mastery is very clear and the test results must allow clear identification of each assessed skill or body of knowledge mastered by a student. A standards-based test not possessing all three of these attributes is not instructionally sensitive and therefore, is useless. Instructionally sensitive standards-based tests are the right kind of tests to use to evaluate schools.
Teachers also need to inform students of how well they are doing and how well they have done. This is a demonstration of what they have learned and the extent of their achievement. Serious thought should be given to identifying factors to consider when grading and how much those factors will count. There are three common grade giving approaches. The first is absolute grading. In this model a grade is given based on the teachers’ idea of what level of students performance is necessary to earn each grade. This method is similar to criterion-referenced approach to assessments. The second form of grading is relative grading. Relative grading is a grade based on how students perform in relation to one another. This type of grading requires flexibility from class to class due to make-up of class changes. This form is close to norm-referenced grading approach. The third grade option is aptitude-based grading. Aptitude-based grading is a grade assigned to each student based on how well the students perform in relation to the students’ potential. This form of grading tends to “level the playing field”, by grading according to ability and encouraging full potential. Given these three options researchers have found that teachers really use a more “Hodgepodge” form of grading based loosely on judgment of students assessed achievement, effort, attitude, in-class conduct, and growth. The results of this type of grading are low performance in any of these areas results in a low grade for a student. There are not scientific quantitational models for clear cut grades using the “hodgepodge” method. It is purely judgmental on most levels but is widely used and accepted by teachers and students.
The final chapter has described distinctions of evaluating and grading. Evaluating of teachers quality of instruction and grading of students. Also discussed was the inappropriateness of using national standardized achievement tests to evaluate teachers. The difference between instructionally insensitive standards-based achievement tests and instructionally sensitive achievement tests was shown. Grading was then discussed and the importance of developing criteria and weighting of grades ahead of actual grade dispensing. Three grading options were described the reality of “hodgepodge” grading was presented.

No comments: