Educational Assessment - Review By Brenda Roof
Classroom Assessment – What Teachers Need to Know - By W. James Popham
Educational Assessment – Mentor: Lorraine Lander
Final Project: Aligning Classroom Assessments to Content Standards
Assessments in the classroom today should have one main focus. Assessments should be aligned with the content or curricular aims intending to be learned. Content standards or curricular aims are described as the knowledge or skills teachers and educators want students to learn. When teachers plan for instruction they need to mindfully plan their assessments prior to instruction in order to properly align the curricular aims with the instruction. Thus, allowing for the adjustment of instruction when curricular aims are not being mastered. The assessments also need to be valid and reliable so that the inferences that are made about students can be determined upon the results or outcomes of the assessments. Popham suggests that using various types of assessments will help teachers to get a sampling of what knowledge or skills students have acquired.
One very important assessment that is used in New York State is the 4th Grade NYS Math Assessment. The 4th Grade NYS Math Assessment is one of many grade level based exams used to comply with No Child Left Behind Legislation. Popham refers to this type of test as a high stakes exam. The NYS 4th Grade assessment is a standards based assessment that according to Popham’s definition would be considered instructionally sensitive. The 4th Grade NYS Math assessment attempts to measure a manageable number of curricular aims., The assessment also attempts to clearly describes the skills and/or knowledge being assessed and it is able to report students’ performance in a way to benefit teachers and students instructionally which according to Popham is essential. Popham would also agree that an instructionally sensitive assessment is also a way to measure a teacher’s effectiveness instructionally. Since the exam is not created in the classroom however, it may need to be carefully evaluated for assessment bias as well as item bias. Many districts in New York State are beginning to take advantage of this aspect of the assessment. In order for teachers to align and inform their instruction a group of teachers in New York State are using benchmark testing three times a year along with unit assessments and other formative assessment measures, to help inform their instruction and get a handle on what their students are really learning.
The process being used by a current group of New York State Teachers is described as follows. The first step for the teachers was to aligning their instruction to the assessment they are creating. Currently they are taking the current grade level being taught; for the purpose of this paper I will use Grade 5 as the base grade. The teachers will use the scores as well as the Performance Indicators from the March, 4th Grade NYS Assessment as their base assessment. If a new or untested student arrives in the fall, they will use only the score from the September 5th Grade Benchmark exam for alignment purposes. A Fall Benchmark Assessment is then developed. In order to develop this benchmark assessment the teachers will gather the performance indicators from the NYS 4th Grade Assessment. Then with the help of the other fourth grade teachers along with the current fifth grade teachers they will look at the key curricular aims in 4th grade as well as 5th Grade. The key curricular aims that these students should have learned and mastered from 4th grade are used from the NYS 4th Grade Assessment. They will then, align these curricular aims with the current aims expected in 5th grade to see where there are matches or similar curricular aims. Popham also recommends this type of alignment, but stresses not using too many content standards when trying to get a handle on content mastery. The teachers will align the curriculum maps of these two grade levels side by side and then select the curricular aims that appear across the grade levels. A form to help collect this information is used to help simplify the process for them. Below is a sample of this form.
Sample Form:
Targeted Grade Level: _________
Projected Benchmark Testing Dates: __________________________________
Test Format: __________________________________
Targeted Content Strand (Curricular Aim): ___Measurement____________________________
Performance Indicator Sample Test Question
4M1 – Select tools and units (customary and metric) appropriate for length measured.
5M1 – Use a ruler to measure the nearest in., ½, ¼ , and 1/8 inch Measure the following line.:
________________________
6M1 – Measure capacity and calculate volume of a rectangular prism.
Bands within the Content Strands:
Measurement: Units of measurement
Tools and Methods
Units
Error and Magnitude
Estimation
Once they have decided the appropriate curricular aims the teachers will then begin to look at test questions that are aligned with the curricular aims they have chosen. To do this they utilize the bank of older test questions provided from NYS of previous tests or software that they have available to use that is aligned with the curricular aims chosen. This practice also aligns with sound assessment practices recommended by Popham. Because the teachers are pulling questions from multiple sources they are showing generalized test-taking preparation. This allows students to be prepared for many different types of test items according to test preparation guidelines explained by Popham. Popham also recommends the more questions used the better you are able to gage the students’ mastery of skills and knowledge. There are some key questions that the teachers use to ensure that they are designing this Benchmark Assessment accordingly. These key questions also align with many of Popham’s recommendations for classroom assessment creation
1. What will be the format for your test questions? Popham recommends keeping test items in sequence and size aligned. He also suggests not having problems run onto another page or causing pages to be flipped when taking an assessment.
2. For each curricular aim selected, how many questions will we have? Popham would suggest an even distribution. For example 2 questions per curricular aim.
3. How many total questions will be on the test? Popham would recommend 50 questions or more for reliable results.
4. How many of each type of question will be included? Again, an even distribution is recommended by Popham.
5. How will we ensure that there is a range in the level of difficulty for the question selected for each curricular aim? An item difficulty indices might be used to determine
this.
6. Will there be a time limit? If so what will the time limit be?
For this assessment the teachers will incorporate multiple choice as well extended response or constructed response questions. The NYS Math exam uses three types of question formats; Multiple Choice, Short Response and Extended Response. Generalized mastery should be promoted with whatever is being taught, therefore varied assessment practices should also be employed whenever possible (Popham, 2008). Performance assessments and Portfolio assessments as well as Affect assessments can also be used throughout the year to assess where students are in relation to the curricular aims intending to be assessed. By using these strategies the teachers are promoting educational defensibility as well as professional ethics. This practice is also teaching to the curricular aims represented by the assessment. While the test is important it is not the single defining piece of learning for the teacher. The curricular aims addressed are also important to the grade level curriculum. That is how they are proving educational defensibility and by using older exam questions aligned to the curricular aims the teachers are intending to measure they are also ensuring professional ethics. If the teachers incorporate some of the other assessment techniques throughout the year they will be using general assessment practices and preparing students for a variety of different testing formats. Since the 4th Grade NYS Assessment is largely comprised of Number Sense and Operation questions the teachers have chosen many of the questions from that standard. A couple from the Algebra and Statistics and probability standards will also be addressed but the fall instruction for 5th grade is based largely on the Number Sense and Operations curriculum. The test will have a total of 20 questions 16 multiple choice and 4 Constructed Response. The students will be given an hour to complete the assessment. To make the assessment a more reliable assessment they might have chosen more questions and used a couple of questions per curricular aim being assessed. This would have also helped with the reliability of the assessment.
Some final considerations suggested by the teachers before successfully administering the assessment are described next. The teachers recommend the use of a cover page. The directions should be clearly written and contain understandable wording for each section of the test. A practice also outlined by Popham in his five general Item-Writing Commandments. (Popham, 2008). The teachers suggest that modeling your assessment after the NYS Math assessment, directions should be modeled as well. Popham would suggest a more generalized modeling in order to prepare students for various test items they may encounter. Bubble sheets for Multiple Choice. Again, clearly labeled sheets and directions on how to bubble the answer sheet, as well as, how to correct mis-bubbles or mistakes is very important. Determine ahead of time the number of copies needed of assessments and answers sheets. (if this is a grade-wide assessment, determine who will be in charge of copying?). Testing Modifications and accommodations for Special Education students should be used. This practice is ensuring that these students are assessed in the same way that they are instructed for valid inferences to be gained and to minimize assessment bias. Date and time that the grade level or class will be given the exam. (To ensure that students are administered the same exam at the same time to disallow test reactive-ness to occur). Scoring scale and rubric are established far before the assessment is administered and before instruction occurs. Having students participate in the rubric building is also recommended. Grading sheets prepared. Who will be scoring the test and when is determined ahead of time. How the data will be analyzed and reported.
The fifth grade students are then given the assessment in early September. The multiple choice portion is then graded separately from the constructed response portion. A group of teachers will meet to grade the constructed response portion during a mutual meeting time. A grid is created in Excel for all portions of the test. Down the left side of the grid is the students names along the top of the grid are the curricular aims addressed and the corresponding test question numbers. Along the bottom of the grid is a tally for the number missed. Each question that is missed gets a check under the question and curricular aim it corresponds to. All the items are tallied at the bottom and off to the left side a total correct tally is also created for both parts of the assessment along with a total score tally area. (See attached example for clarification purposes).
The items that seem to have the most problems are the items considered to not have been mastered by those students. Since there are a number of items the group as a whole is struggling with the teacher will use these items as key targets in upcoming instruction. When we look at this analysis grid it might seem that the items should be thrown out however, since this assessment is being used to guide instruction the teachers are using it to see what was learned and not learned, they are therefore, valid questions. On the December benchmark these items would hopefully not be as problematic and instead show learning that has occurred. The same can be said for the items scored well on. These items in some assessment situations might be items to through out however, in this case they are items that have been mastered and show learning that has occurred. The teachers in this group might want to create groups to target their instruction for these students in the areas of weakness and use formative assessments to show mastery along the way. While the students who preformed well, newer curricular aims can be used to begin instruction for them as they have shown mastery and are ready to move forward. A pre-assessment should be created to administer to them and to then guide that instruction.
In December these teachers are going to create another assessment. This time the assessment will contain problems from the areas that were not mastered by the majority during the September assessment as well as new curricular aims they have been introduced to, to see if mastery has occurred for that learning, as well as areas that need to be targeted for instruction. The teachers use grouping models for the math students and will also re-group students according to areas of mastery and weakness.
The process described above is very through and seems to cover the areas suggested by the author in creating assessments that are not only valid but also reliable. I was especially encouraged by the idea that, these teachers get together over the summer and during the year to re-evaluate their assessments and continually inform their instruction. This is a practice that I feel all teachers should strive for and be encouraged to participate in. Many of the practices described by Popham for good sound assessment building and administration were followed by these teachers and in areas where they were not the fact that they were trying might suggest that over time they will change to align tighter with practices suggested by Popham. It was very exciting to attend this session and hear about the thoughtfulness that is occurring in testing practices within New York State. While some teachers may not agree with the thoughtfulness and approach these teachers used, they should strive for more collaboration and sharing. The promotion of shared learning and teaching will only benefit student learning and mastery. The forms and processes described by these teachers aligned very nicely with the text used for this study and was very beneficial to pulling it all together.
Friday, June 20, 2008
Sunday, June 8, 2008
Chapter 15 – Appropriate Evaluating Teaching and Grading Students
Educational Assessment - Review By Brenda Roof
Classroom Assessment – What Teachers Need to Know - By W. James Popham
Chapter 15 addresses evaluating teaching and grading students. These two topics while sometimes used interchangeably are separate functions. Evaluation is an activity focused on determining the effectiveness of the teachers. Grading is an activity focused on informing students how they are performing. Pre-instruction and post-instruction assessment practices are discussed as well as split-and-switch design for informing instruction. The use of standardized achievement tests for evaluating students and instruction was also weighed. Three schemes for grading are also describe along with a more commonly used practice.
There are two types of evaluation used in apprising instructional efforts of teachers. The first is formative evaluation. Formative evaluation is the appraisal of the teacher’s instructional program for purposes of improving the program. The second form of evaluation is summative evaluation. Summative evaluation is not improvement focused, it is an appraisal of teachers competencies to make more permanent decisions about teachers. These decisions are typically about continuation of employment or awarding of tenure. Summative evaluation is usually made by an administrator or supervisor. Teachers will typically do their own formative evaluation in order to better their own instruction. Summative data may be supplied to administrators or supervisors to show effectiveness of teaching.
Instructional impact can be gauged by pre-instruction and post-instruction. Assessing students prior to instruction is pre-assessment and then assessing after instruction has occurred is post-assessment and an indication of learning that has occurred. This scenario however, can be reactive. Reactive is when students are sensitized to what needs to be learned from the assessment and then perform well on the post-assessment as a result. An alternative to this problem might be a split-and switch design. This alternative data gathering design works best on large groups of students versus smaller groups. In this model you will split your class and administer two similar tests to each half. Mark the test as pre-tests instruct the group and then switch the tests for each group and post-test. Blind scoring should then occur. Blind scoring is when someone else grades the tests, another teacher, parent or assistant. The test results are then pulled together for each test. There is no problem caused in this design by differential difficulty and students will not have previously seen the post-test so reactive impact is not a consideration or problem. As a result instructional impact should be seen.
A common use of evaluating teaching has been through performance of students on standardized achievement tests. For most achievement tests there is a very inappropriate way to evaluation instructional quality. A standardized test is any exam administered and scored in a predetermined, standard manner. There are two major forms of standardized tests they are aptitude tests and achievement tests. Schools effectiveness is typically based on standardized achievement tests. There are three types of standardized achievement tests. The first is a traditional national standardized achievement test. The second is a standards-based achievement test that is instructionally insensitive and the third is a standards-based achievement test that is instructionally sensitive.
The purpose of Nationally Standardized Achievement tests is to allow valid inferences to be made about the knowledge and skills a student possesses in a certain content area. These inferences are then compared with a norm group of students of the same age and grade. The dilemma of this is that there is so much that would need to be tested that only a small sampling is possible. The consequence of this is an assumption that the norm group is a genuine representation of th nation at large. If this is the case these tests should not evaluate the quality of education that is not their purpose. There is a likelihood the tests are not aligned rigorously with a state’s curricular aims. Items covering important emphasized content by the classroom teacher may be eliminated in a quest for score spread. The final reason nationally standardized achievement tests should be used to evaluate teachers success is many items are linked to students SES – Social economic status or their inherited academic aptitude. In essence they are measuring what students bring to school not what they learn at school.
Standards-based tests sound like they would make much more sense. Two problems that have occurred with standards-based instructionally insensitive tests are the large number of content standards needing to be addressed and then reporting results used have limited instructional value. If properly designed these are standards-based tests that are instructionally sensitive. Three attributes must be present for standards-based test to be instructionally sensitive. They are the skills and/or bodies of knowledge must be clearly described so students’ mastery is very clear and the test results must allow clear identification of each assessed skill or body of knowledge mastered by a student. A standards-based test not possessing all three of these attributes is not instructionally sensitive and therefore, is useless. Instructionally sensitive standards-based tests are the right kind of tests to use to evaluate schools.
Teachers also need to inform students of how well they are doing and how well they have done. This is a demonstration of what they have learned and the extent of their achievement. Serious thought should be given to identifying factors to consider when grading and how much those factors will count. There are three common grade giving approaches. The first is absolute grading. In this model a grade is given based on the teachers’ idea of what level of students performance is necessary to earn each grade. This method is similar to criterion-referenced approach to assessments. The second form of grading is relative grading. Relative grading is a grade based on how students perform in relation to one another. This type of grading requires flexibility from class to class due to make-up of class changes. This form is close to norm-referenced grading approach. The third grade option is aptitude-based grading. Aptitude-based grading is a grade assigned to each student based on how well the students perform in relation to the students’ potential. This form of grading tends to “level the playing field”, by grading according to ability and encouraging full potential. Given these three options researchers have found that teachers really use a more “Hodgepodge” form of grading based loosely on judgment of students assessed achievement, effort, attitude, in-class conduct, and growth. The results of this type of grading are low performance in any of these areas results in a low grade for a student. There are not scientific quantitational models for clear cut grades using the “hodgepodge” method. It is purely judgmental on most levels but is widely used and accepted by teachers and students.
The final chapter has described distinctions of evaluating and grading. Evaluating of teachers quality of instruction and grading of students. Also discussed was the inappropriateness of using national standardized achievement tests to evaluate teachers. The difference between instructionally insensitive standards-based achievement tests and instructionally sensitive achievement tests was shown. Grading was then discussed and the importance of developing criteria and weighting of grades ahead of actual grade dispensing. Three grading options were described the reality of “hodgepodge” grading was presented.
Classroom Assessment – What Teachers Need to Know - By W. James Popham
Chapter 15 addresses evaluating teaching and grading students. These two topics while sometimes used interchangeably are separate functions. Evaluation is an activity focused on determining the effectiveness of the teachers. Grading is an activity focused on informing students how they are performing. Pre-instruction and post-instruction assessment practices are discussed as well as split-and-switch design for informing instruction. The use of standardized achievement tests for evaluating students and instruction was also weighed. Three schemes for grading are also describe along with a more commonly used practice.
There are two types of evaluation used in apprising instructional efforts of teachers. The first is formative evaluation. Formative evaluation is the appraisal of the teacher’s instructional program for purposes of improving the program. The second form of evaluation is summative evaluation. Summative evaluation is not improvement focused, it is an appraisal of teachers competencies to make more permanent decisions about teachers. These decisions are typically about continuation of employment or awarding of tenure. Summative evaluation is usually made by an administrator or supervisor. Teachers will typically do their own formative evaluation in order to better their own instruction. Summative data may be supplied to administrators or supervisors to show effectiveness of teaching.
Instructional impact can be gauged by pre-instruction and post-instruction. Assessing students prior to instruction is pre-assessment and then assessing after instruction has occurred is post-assessment and an indication of learning that has occurred. This scenario however, can be reactive. Reactive is when students are sensitized to what needs to be learned from the assessment and then perform well on the post-assessment as a result. An alternative to this problem might be a split-and switch design. This alternative data gathering design works best on large groups of students versus smaller groups. In this model you will split your class and administer two similar tests to each half. Mark the test as pre-tests instruct the group and then switch the tests for each group and post-test. Blind scoring should then occur. Blind scoring is when someone else grades the tests, another teacher, parent or assistant. The test results are then pulled together for each test. There is no problem caused in this design by differential difficulty and students will not have previously seen the post-test so reactive impact is not a consideration or problem. As a result instructional impact should be seen.
A common use of evaluating teaching has been through performance of students on standardized achievement tests. For most achievement tests there is a very inappropriate way to evaluation instructional quality. A standardized test is any exam administered and scored in a predetermined, standard manner. There are two major forms of standardized tests they are aptitude tests and achievement tests. Schools effectiveness is typically based on standardized achievement tests. There are three types of standardized achievement tests. The first is a traditional national standardized achievement test. The second is a standards-based achievement test that is instructionally insensitive and the third is a standards-based achievement test that is instructionally sensitive.
The purpose of Nationally Standardized Achievement tests is to allow valid inferences to be made about the knowledge and skills a student possesses in a certain content area. These inferences are then compared with a norm group of students of the same age and grade. The dilemma of this is that there is so much that would need to be tested that only a small sampling is possible. The consequence of this is an assumption that the norm group is a genuine representation of th nation at large. If this is the case these tests should not evaluate the quality of education that is not their purpose. There is a likelihood the tests are not aligned rigorously with a state’s curricular aims. Items covering important emphasized content by the classroom teacher may be eliminated in a quest for score spread. The final reason nationally standardized achievement tests should be used to evaluate teachers success is many items are linked to students SES – Social economic status or their inherited academic aptitude. In essence they are measuring what students bring to school not what they learn at school.
Standards-based tests sound like they would make much more sense. Two problems that have occurred with standards-based instructionally insensitive tests are the large number of content standards needing to be addressed and then reporting results used have limited instructional value. If properly designed these are standards-based tests that are instructionally sensitive. Three attributes must be present for standards-based test to be instructionally sensitive. They are the skills and/or bodies of knowledge must be clearly described so students’ mastery is very clear and the test results must allow clear identification of each assessed skill or body of knowledge mastered by a student. A standards-based test not possessing all three of these attributes is not instructionally sensitive and therefore, is useless. Instructionally sensitive standards-based tests are the right kind of tests to use to evaluate schools.
Teachers also need to inform students of how well they are doing and how well they have done. This is a demonstration of what they have learned and the extent of their achievement. Serious thought should be given to identifying factors to consider when grading and how much those factors will count. There are three common grade giving approaches. The first is absolute grading. In this model a grade is given based on the teachers’ idea of what level of students performance is necessary to earn each grade. This method is similar to criterion-referenced approach to assessments. The second form of grading is relative grading. Relative grading is a grade based on how students perform in relation to one another. This type of grading requires flexibility from class to class due to make-up of class changes. This form is close to norm-referenced grading approach. The third grade option is aptitude-based grading. Aptitude-based grading is a grade assigned to each student based on how well the students perform in relation to the students’ potential. This form of grading tends to “level the playing field”, by grading according to ability and encouraging full potential. Given these three options researchers have found that teachers really use a more “Hodgepodge” form of grading based loosely on judgment of students assessed achievement, effort, attitude, in-class conduct, and growth. The results of this type of grading are low performance in any of these areas results in a low grade for a student. There are not scientific quantitational models for clear cut grades using the “hodgepodge” method. It is purely judgmental on most levels but is widely used and accepted by teachers and students.
The final chapter has described distinctions of evaluating and grading. Evaluating of teachers quality of instruction and grading of students. Also discussed was the inappropriateness of using national standardized achievement tests to evaluate teachers. The difference between instructionally insensitive standards-based achievement tests and instructionally sensitive achievement tests was shown. Grading was then discussed and the importance of developing criteria and weighting of grades ahead of actual grade dispensing. Three grading options were described the reality of “hodgepodge” grading was presented.
Subscribe to:
Posts (Atom)