Saturday, March 8, 2008

Chapter 2 – Reliability of Assessment

Educational Assessment - Review By Brenda Roof
Classroom Assessment – What Teachers Need to Know - W. James Popham

Chapter 2 discusses reliability of assessments and its relevance to a classroom teacher, as well as high stakes exams. There are three types of reliability evidence. The three types are stability, alternate form and internal consistency. The standard error of measurement can also be used to determine the consistency of individual scores as they relate to the group as a whole.
Teachers need to be aware of the importance of reliability as it relates to assessments and high stakes tests. Reliability refers to the consistency with which a test measures what it is intended to measure. Parents may want to know why a student does not score well on a high stakes test but seems to be doing well academically. The scores themselves may also need to be explained and what they measure and why. In many cases the curriculum taught does not align with what is assessed on the high stakes assessments. Teachers need to be able to explain this and why this is the case to parents.
The first form of reliability is Stability reliability. This form is closely related to the idea of consistency equaling reliability. Teachers generally want the results of their tests to be consistent when administered over time. That is to say, if a test is given on Monday and something happens to the test, and on Wednesday the test is re-administered the scores should not change from student to student. This is also termed test-retest reliability. A correlation co-efficient is determined on the two tests. The correlation co-efficient reflects the degrees of similarity between the scores on both tests. If the scores are 1.0 within the range of similarity they are good the test is reliable. If they are 0 or -1.0 they are weak and testing conditions were not similar. Another use of stability reliability calculations is for classification consistency. If a teacher wants to determine who has grasped certain concepts and who needs further instruction classification consistency can help establish this. Students who score at or above the classifications defined should move onto a new topic having mastered the previous topic. Students who score below or one grade above and one grade below should continue to receive instruction on the topic being learned. Stability reliability is consistency over time for a single examination.
A second form of reliability is alternate-form reliability. Alternate-Form Reliability is whether two or more allegedly equivalent tests forms are actually equivalent. In order to determine alternate form consistency two tests are administered to the same individuals. The best scenario would be to have little or no delay between the administrations of the two tests. Once you have scores from each test, a correlation co-efficient is created to reflect the relationship between the student’s performances on the two forms. Alternate-form reliability deals with consistency of change from two or more examinations and their equivalency.
The third form of reliability is internal consistency reliability. This form is very different from stability and alternate form reliability as it does not look at consistency of student scores on a test. Internal consistency deals with the extent to which the items on an assessment are functioning in a consistent manner. This form of reliability can be determined in a single administration of a test, unlike the other forms that require two or more assessments to be administered. Assessments that are multiple choice assessments internal consistency can be computed using the Kuder-Richardson procedures. For assessments that are essay items the Cronbach’s co-efficient alpha can be used to determine item consistency. The more items on an assessment the more reliable it will tend to be.
The standard error of measurement (SEM) is the index used in educational assessment to describe the consistency of a student’s performance(s). The higher the reliability of an assessment, the smaller the error of measurement, should be for the assessment. An assessment standard error of measurement depends on two factors. The first is the standard deviation. The standard deviation is how spread out the scores are, the more the spread the greater the standard deviation. The second factor is the co-efficient that represents the assessment reliability. As stated earlier the larger the standard deviation, the larger the standard error of measurement, the smaller the reliability co-efficient, the larger the standard error of measurement. If a student’s standard error of measurement is small it more accurately reflects the student’s actual performance level.
Classroom teachers need to know and understand what reliability is. It isn’t necessary that they compute all their tests for reliability, but if they are using pre-created tests or high stakes tests by knowing the types of reliability and what they mean will better allow them to understand the assessments and how reliable they are. The three types of reliability; stability, alternate form and internal consistency should not be used interchangeably. Standard error of measurement supplies an indicator of consistency for individual student’s scores using an estimated person-score consistency from evidence of group score consistency.

No comments: