Saturday, March 29, 2008

Chapter 4 – Absence of Bias

Educational Assessment - Review By Brenda Roof
Classroom Assessment – What Teachers Need to Know - W. James Popham

Chapter 4 discusses absence-of-bias, the last of the three essential criteria for evaluating educational assessments. Assessment bias is defined as qualities of any assessment instrument that offend or unfairly penalize a group of students because of the students’ gender, race, ethnicity, socioeconomic status, religion or other such group-defining characteristic. Assessment bias occurs when elements of an assessment distort a subgroups performance on the assessment. Bias review panels should be created to review especially high stakes tests. Students who are disabled or English language learners also experience assessment bias, due to the nature of their needs. An understanding of assessment bias is essential for classroom teachers to be aware of.
Assessment-bias occurs when elements of an assessment distort a student’s performance based on personal characteristics of a student. Assessment bias also interferes with test validity. If it distorts student performance then score based inferences can not be used accurately. There are two forms of assessment bias. The first form is offensiveness. Offensiveness occurs when negative stereotypes of certain sub groups are presented in an assessment. Offensiveness can also act as a distracter to these students taking focus off the question and causing a student to respond incorrectly or not fully. The second form of assessment bias is unfair penalization. Unfair penalization occurs when a student’s test performance is distorted due to content that may not be offensive but disadvantages a student’s subgroup. An example of this could be questions aimed at a strong knowledge of the game of football. Girls may tend to not be as familiar with terms used and therefore not do as well on these types of questions. Unfair penalization happens when it is not the students’ ability that leads to poor performance but the students’ characteristics or subgroup membership.
Disparate impact however, does not indicate assessment bias. If disparate impact has an effect on members of a certain ethnicity, gender or religious group it should be scrutinized for bias, but often it is an educational factor that needs to be addressed. Content review of test items is essential, especially for high stakes tests. These types of reviews can help show disparate impact as well as assessment bias. One way to do this is bias review panels. Bias review panels should consist of experts in the subject being reviewed as well as individual exclusively from the subgroup being adversely impacted or potentially adversely impacted. Male and females should also be equally represented. Once the panel is formed assessment bias needs to be clearly defined and explained to the panel. The purpose of the assessment also needs to be clearly defined and explained. The next step for the bias review panel would be a per-item absence-of-bias judgment. A question should be developed that the review panel will ask of each item as they read through the assessment items. They should answer yes or no to the item question. Once the items are tallied the percentage of no judgments per item are calculated. An overall of per item absence-of-bias index can be computed for each item, and then the entire test. In addition to this type of scoring, when a panelist feels a question warrants a “no”, an explanation is also provided in writing. Often an item is discarded on this written basis. Individual item bias review should then be followed by overall absence-of-bias review. A overall question should be created and asked of the whole assessment. The same scoring process is applied and items can be modified to be corrected rather quickly.
Whole bias detection for high stakes tests can be detected through bias review panels as well. However, this is not practical for classroom assessments. Bias detection in the classroom can be prevented by becoming sensitive to the existence of assessment bias and the need to eliminate it. Teachers’ should think seriously about the backgrounds and experiences of the various students in their class. Then try to review each item on every assessment as if the item might offend or unfairly penalize any students. If any items may show bias eliminate them. It may also be helpful if you are unsure and have access to other teachers with background or knowledge of a particular sub group, to have them review your tests or the item you are unsure of.
Assessing students with disabilities can also lead to assessment bias. Federal laws have forced regular teachers to look at how they teach and assess students with disabilities. In 1975 Public Law 94-142 was enacted for states that properly educated students with disabilities to receive federal funds. This law also established the use of “IEP’s” or Individual Education Plans for students with disabilities. IEP’s are federally prescribed documents created by parents, teachers, and specialized service providers about how a child with disabilities should be educated. In 1997 this act was reauthorized and renamed Individuals with disabilities act (IDEA). This reauthorization required state and districts to identify curricular expectations for special education students similar to expectations of all students. These students were also required to be included in assessment programs reported to the public. In January 2002, No Child Left Behind was enacted to provide consequences for states and districts not complying to force them to have to comply or face consequences. This act also intended to improve achievement levels of all students. These improvements have to be demonstrated on State chosen tests linked to the states curricular aims. Over twelve years all schools must meet adequate yearly progress (AYP), which increases in score levels from year to year. No Child Left Behind also requires sub-groups of students to meet AYP targets based on a number chosen by each state. IEP’s are used today to help students meet content standards pursued by all students.
Assessment accommodations are used to address the issue of assessment-bias for disabled students. Accommodations are procedures or practice that allows students with disabilities to have equitable access to instruction and assessment. Assessments do not lower expectations for these students but rather provide a student with a setting similar to the one used during instruction. The accommodations can not alter the nature of the skills or knowledge being assessed. There are four typical accommodation categories; presentation, response setting and timing and scheduling accommodations. Students should be a part of choosing the accommodations that best work for them to ensure their proper use.
English language learners are another diverse group. This group consists of; students whose first language is not English and know little if any, students who are beginning to learn English but could benefit from school instruction, and students who are proficient in English but need additional assistance in academic or social contexts. Another sub group is limited English Proficient (LEP) students who are students who use a language other than English proficiently at home. If students are going to be assessed fairly that are considered ELL or ESL then classifications need to be made consistent within and across states. There are also sparse populations in areas and statistically this needs to be accounted for. There is also instability over time of these students. Mobility is a very common and therefore instruction instability can also be a contributing factor. Lower base-line and cut scores should be considered as language based subjects tend to be more difficult. The movement to isolate these sub-groups makes sense; however, how the scores are analyzed needs to be taken into consideration for students who have a poor understanding of English. When taking a test in English, preventing test bias for these students should be a consideration.
Classroom teachers need to have awareness for assessment bias. They also need to use measures to prevent bias on their assessments whenever possible. On high stakes tests test review panels need to be formed for item analysis as well as assessment analysis. Students with disabilities should have IEP’s written to help accommodate their testing needs for equity during assessments. Students whose first language is not English should be assessed in a subgroup and not penalized for low scores on English written tests.

Saturday, March 15, 2008

Chapter 3 – Validity

Educational Assessment - Review By Brenda Roof
Classroom Assessment – What Teachers Need to Know - W. James Popham

Chapter 3 discusses validity. Validity is the most significant concept in assessment. Tests themselves are not valid. Inferences about the tests are made to check the validity of a test. There are three types of validity evidence available to test validity.
The first type of validity is content-related evidence of validity. This refers to the adequacy with which content of a test represent the content of the curricular standard about which inferences are to be made. Curricular standard are content that encompasses knowledge, skills or attitudes. The more critical an assessment the more developmental care may be given to the assessment. Developmental activities are used to test content-related validity. First a panel of national experts recommends knowledge and skills that should be measured. Then the content is systematically contrasted with topics from five leading texts. Then, a group of nationally recognized teachers in the assessments subject provide suggestions regarding key topics. Several internationally recognized college professors in the subject area then offer recommendations for additions, deletions and modifications. Finally, state and national associates in the subject area, provide reviews of the proposed content to be measured.
A less formal form of content-related evidence is a teacher creating a end of unit assessment the teacher should create an outline of important skills and knowledge, then identify the content of curricular standards covering the instructional period. The assessment should then be created based on the identified content. A second form of content-related evidence of validity for educational assessment procedures would involve gathering judges to rate the content appropriateness of a given assessment, as it relates to the curricular standard, the assessment represents. For high stakes assessments this process is very systematic as it is used to evaluate student performance on a large scale. For general classroom assessments a fellow teacher can be asked to review the assessment.
The second form of validity evidence is criterion-related evidence of validity. This form of validity helps educators decide how much confidence can be placed in a score-based inference about a student’s status with regards to one of more curricular standards. This method is used when trying to predict how well students will perform on a subsequent criterion variable. This type of evidence is typically collected on an aptitude test and the grades students subsequently earn. If predictor assessments work well, results can be used to make educational decisions about students. These tests however, are typically far from perfect, so this form of validity should be used with caution.
The last validity evidence is construct-related evidence of validity. It is the most comprehensive of the three types of validity evidence. Construct-related evidence of validity is the extent to which empirical evidence confirms that an inferred construct exists and that a given assessment procedure is measuring the inferred construct accurately. The data for this construct is gathered first based on our understanding of how the hypothetical construct we are measuring works. Then data is gathered to evidence whether the hypothesis or hypotheses is confirmed. If all the data is confirmed and the test is measuring what it is intended to measure we are able to draw a valid score based inference once students take the test and scores are given.
There are three types of strategies most commonly used with construct-related evidence studies. The first is an intervention study. In this method it is hypothesized that after some type of intervention students will respond differently to an assessment that was previously given. The second kind of investigation is differential-population study. In this study based on the knowledge of the construct being measured a hypothesis that students representing distinctly different populations will score differently on the assessment procedure under consideration. The third investigation is related-measure study. Here a hypothesis that a given kind of relationship will be present between students’ scores on the assessment device being scrutinized and their scores on a related or even unrelated assessment device. These strategies should be used with related test scores to show a positive relationship known as, convergent evidence. If the comparisons are not related the relationship is discriminate evidence and the results would be weak and not easily supported.
The most important thing to remember about validity is that it does not reside in the test, but is a score based inference that is either accurate or inaccurate. It is important for teachers to have a relative understanding of Assessment validity. Content-related evidence is probably the most important of the three types for a teacher to have a good handle on, especially for high stakes tests. Best practices for teachers is, to have a colleague review their tests who has an understanding of the curriculum standards or key topics being taught to ensure that is what is being assessed.

Saturday, March 8, 2008

Chapter 2 – Reliability of Assessment

Educational Assessment - Review By Brenda Roof
Classroom Assessment – What Teachers Need to Know - W. James Popham

Chapter 2 discusses reliability of assessments and its relevance to a classroom teacher, as well as high stakes exams. There are three types of reliability evidence. The three types are stability, alternate form and internal consistency. The standard error of measurement can also be used to determine the consistency of individual scores as they relate to the group as a whole.
Teachers need to be aware of the importance of reliability as it relates to assessments and high stakes tests. Reliability refers to the consistency with which a test measures what it is intended to measure. Parents may want to know why a student does not score well on a high stakes test but seems to be doing well academically. The scores themselves may also need to be explained and what they measure and why. In many cases the curriculum taught does not align with what is assessed on the high stakes assessments. Teachers need to be able to explain this and why this is the case to parents.
The first form of reliability is Stability reliability. This form is closely related to the idea of consistency equaling reliability. Teachers generally want the results of their tests to be consistent when administered over time. That is to say, if a test is given on Monday and something happens to the test, and on Wednesday the test is re-administered the scores should not change from student to student. This is also termed test-retest reliability. A correlation co-efficient is determined on the two tests. The correlation co-efficient reflects the degrees of similarity between the scores on both tests. If the scores are 1.0 within the range of similarity they are good the test is reliable. If they are 0 or -1.0 they are weak and testing conditions were not similar. Another use of stability reliability calculations is for classification consistency. If a teacher wants to determine who has grasped certain concepts and who needs further instruction classification consistency can help establish this. Students who score at or above the classifications defined should move onto a new topic having mastered the previous topic. Students who score below or one grade above and one grade below should continue to receive instruction on the topic being learned. Stability reliability is consistency over time for a single examination.
A second form of reliability is alternate-form reliability. Alternate-Form Reliability is whether two or more allegedly equivalent tests forms are actually equivalent. In order to determine alternate form consistency two tests are administered to the same individuals. The best scenario would be to have little or no delay between the administrations of the two tests. Once you have scores from each test, a correlation co-efficient is created to reflect the relationship between the student’s performances on the two forms. Alternate-form reliability deals with consistency of change from two or more examinations and their equivalency.
The third form of reliability is internal consistency reliability. This form is very different from stability and alternate form reliability as it does not look at consistency of student scores on a test. Internal consistency deals with the extent to which the items on an assessment are functioning in a consistent manner. This form of reliability can be determined in a single administration of a test, unlike the other forms that require two or more assessments to be administered. Assessments that are multiple choice assessments internal consistency can be computed using the Kuder-Richardson procedures. For assessments that are essay items the Cronbach’s co-efficient alpha can be used to determine item consistency. The more items on an assessment the more reliable it will tend to be.
The standard error of measurement (SEM) is the index used in educational assessment to describe the consistency of a student’s performance(s). The higher the reliability of an assessment, the smaller the error of measurement, should be for the assessment. An assessment standard error of measurement depends on two factors. The first is the standard deviation. The standard deviation is how spread out the scores are, the more the spread the greater the standard deviation. The second factor is the co-efficient that represents the assessment reliability. As stated earlier the larger the standard deviation, the larger the standard error of measurement, the smaller the reliability co-efficient, the larger the standard error of measurement. If a student’s standard error of measurement is small it more accurately reflects the student’s actual performance level.
Classroom teachers need to know and understand what reliability is. It isn’t necessary that they compute all their tests for reliability, but if they are using pre-created tests or high stakes tests by knowing the types of reliability and what they mean will better allow them to understand the assessments and how reliable they are. The three types of reliability; stability, alternate form and internal consistency should not be used interchangeably. Standard error of measurement supplies an indicator of consistency for individual student’s scores using an estimated person-score consistency from evidence of group score consistency.

Saturday, March 1, 2008

Chapter 1 – Why Do Teachers Need to Know About Assessment?

Educational Assessment - Review By Brenda Roof
Classroom Assessment – What Teachers Need to Know - W. James Popham


In chapter one the author discusses the importance of teachers knowing and understanding about assessments. Teachers need to be informed about assessments for a variety of reasons. Teachers will be more effective instructionally if they know how to properly assess a students learning. Federal laws also impact the need to now about assessments and scores. Teachers need to be able to talk to parents about testing and scores, as well as, what the scores represent.
Testing is typically not a favorite pastime for teachers. This is probably due to the negative connotations around tests and testing. Teachers in many cases do not understand how to test or write instruction based tests. This topic is not always addressed in their formal education programs. However, if testing can benefit instruction greatly teachers should be taught at least of its importance. A teacher can greatly improve instruction with tests that are effective and measurable.
In 1965 legislation was introduced that contained significant test-based accountability provisions. Today we have the No Child Left Behind Act, signed by President George W. Bush, January 8, 2002. This act put a larger focus on each States accountability to test scores in reading and math. Schools must show “Adequate Yearly Progress” (AYP) in grades 3-8, which means from year to year scores must go up. If the scores do not go up, the school is marked inadequate. If the school fails two years consecutively, they will lose Federal funding. In addition the schools will have to meet certain sanctions, which could prove devastating for the school and the staff, as well as the students. This law has caused a shift in teaching. The focus is sometimes more on the assessments and scores instead of on teaching and student learning.
In the past there were three kinds of tests all pencil and paper. The three types were essay style, multiple choice and true and false. Today educators realize that paper and pencil test do not measure all types of learning. Many teachers also test orally or hands on, in addition to paper and pencil. There are systematic ways to get a fix on a student’s status through formal assessments or testing as it was called in the past. The author talks about assessment versus testing and the fact that they are interchangeable words. As a result Educational Assessment has been defined as “a formal attempt to determine students’ status with respect to educational variables of interest”. Students vary in how much they know about a subject, how skilled they are at performing a certain task and how positive their feelings for school are. The assessment is also a way to get a formal fix on a student’s status or comfort level.
In the past there were four commonly known reasons to assess students for. The first reason is to determine a student’s strengths and weaknesses. Teachers should know what each students prior accomplishments are in order to put instructional energy toward weaknesses and avoid too much time on strengths or mastery. A teacher can do this with a pre-assessment. The second reason to know about assessment is to allow the teacher to determine, if a student is making satisfactory progress. This is like a dip stick into the students learning, that occurs every now and then to assess their progress. This can be accomplished with formative and summative assessments. The third reason to assess is, to assign grades by collecting evidence of accomplishment, measured by grading. The more frequent and varied a teacher assesses, the easier it is to assign grades, which can then show evidence of gains or losses. The fourth reason to assess, has been to determine a teachers overall effectiveness. Through assessments a teacher is given evidence of learning which can either back up the instructional approach, or allow for change in the approach.
Today the reasons to assess comprise of the four traditional reasons already discussed as well as these three newer reasons. The first additional reason is, influencing public perceptions of educational effectiveness. Because today statewide educational test scores are published and schools as well as districts are ranked, teachers must be able to explain how and why their students score and place on the high stakes tests. The second newly developed reason for teachers to understand assessing is, student’s assessment performances are increasingly seen as part of the teacher evaluation process. In many districts teachers are required to assemble assessments both pre and post that demonstrate learning as a result of their teaching. Teacher evaluations usually contain evidence of this in their administrative reviews. Some teachers will also point out that a group make-up of students could also contribute to learning, but by having evidence from years past or other classes this can be easier to show. The third new reason teachers today should assess is, for clarification of instructional intentions. Assessment devices can improve instructional quality. An assessment should never be instructional afterthought. The assessment should be prepared prior to instructional planning to allow the teacher to understand what is expected of students. As well as incorporate it in the instructional activities. The better the teacher understands what to teach the more effective the instruction can be.
Teachers must know and understand the importance of assessments. Instruction will be more effective when teachers understand why they are assessing. Students are better able to understand their strengths and weaknesses as well as the teachers and parents. Accountability makes the need to understand how and why to assess essential in schools today.