Wednesday, May 21, 2008

Chapter 13 – Making Sense Out of Standardized Test Scores

Educational Assessment - Review By Brenda Roof
Classroom Assessment – What Teachers Need to Know - By W. James Popham

Chapter 13 focuses on standardized test scores and making sense out of these scores. There are a variety of ways to interpret standardized tests and their scores. Depending upon what the scores are intending to measure a variety of interpretations can be used. Teachers need to understand how to score and assess the various tests administered. By understanding how these tests are scored, teachers can inform their instruction, as well as, making sense of students’ performance on these tests and what the scores mean.
There are two types of standardized tests; one is designed to yield norm-referenced inferences the other criterion-referenced inferences. These tests are administered, scored and interpreted in a standard pre-determined manner. National standardized achievement tests are typically used to provide normed-referenced interpretations focused on the measurement of aptitude or achievement. State developed standardized achievement tests are used in many states for accountability purposes. Some states use them to assess basic skills of high school students to advance or allow or disallow a student to receive a diploma, even if curriculum requirements are met. Results are often publicized and try to indicate educators’ effectiveness. State standardized tests are intended to produce criterion-referenced interpretations.
Test scores can be interpreted individually or as a group. Group-focused interpretations are necessary for looking at all your students or groups of students. There are three ways described to do this. The first is by computing central tendency. The central tendency is an index of the groups’ scores, such as mean or median. A raw score is used to show the number of items answered correctly by a student. The median raw score is the midpoint of a set of scores. The mean raw score is the average of a set of scores. The mean and median show a center point for group scores. A second useful way to describe group scores is variability. Variability is how spread out the scores are. A simple measure of variability of a set of students scores is a range. A range is calculated easily by subtracting the lowest score and the highest score. A third way to look at group scores is the standard deviation. The standard deviation is the average difference between the individual scores in a group of scores and the mean of that set of scores. The larger the size of the standard deviation the more spread out the scores in the distribution. The formula for standard deviation is as follows:

SD = √∑(x-M)2
N
∑ (x-M)2 = Sum of the squared raw scores (x) – M (mean)
N = number of scores in the distribution


The mean and standard deviation are the best ways to discuss and describe group scores. It may however, be easier to compute median and range. However, this is not a very reliable way to always look at scores so the standard deviation method is much more reliable.
Individual test interpretations are also necessary. There are two ways to interpret individual test scores. They are interpreted in absolute or relative terms. An absolute inference is when we infer from a score what the student has mastered or not mastered and the skills and/or knowledge being assessed. A relative inference is when we infer from a score, how the student stacks up against other students currently taking the test, or already taken the test. The terms below average and above average are typically used in a relative inference.
There are three interpretative schemes to be considered from relative score-interpretations. The first scheme is percentiles or percentile ranks. A percentile compares a student’s score with those of other students in a norm group. The percentile indicates the percent of students in the norm group the student outperformed. The norm group is the students who took the test before it was published, to establish a norm group and help identify the test items that are appropriate. There are also different types of norm groups. There can be national norm groups and local norm groups. The second interpretive scheme is called grade-equivalent scores. A grade-equivalent is an indicator of student test performance based on grade levels and months of the school year. The purpose of grade-equivalent scores is to convert scores on a standardized assessment to an index score reflecting a student’s grade level progress in school. This score is also a developmental score and is indicated as follows:
Grade.month of school year
5.6

Grade equivalent scores are typically seen in reading and math. Grade-equivalent scores are determined by administering the same test to several grade levels establishing a trend-line which reflects the raw score increases at each grade level. Estimates at points along the trend line are established indicating what the grade-equivalent of the raw score would be. There are many assumptions made in this scoring theme making it a rather questionable scoring theme. It also can be misleading to parents and what it really translates to. The appropriate assumption is to say the grade equivalent score is an estimate of how the average student taking the test at a certain grade might score. The third scoring scheme is scale score interpretations. Scale scores are converted raw scores that use a new arbitrary chosen scale to represent levels of achievement or ability. The most popular scale score system is an item-response theory (IRT). This is different from a raw score reporting system. The difference is that IRT scales take into consideration the difficulty and other technical properties of every single item on the test. There is a different average scale score for each grade level. Scale scores are used heavily to describe group test performances at the state, district, and school levels. Scale scores can be used to permit longitudinal tracking of students’ progress and making direct comparisons of classes, schools and districts. It is very important to remember that, not all scale scores are similar and therefore, can’t be compared consistently on different scales score exams. Standardized tests use normal curve equivalent or NCE to attempt to use students’ raw scores to arrive at a percentile for a raw score, if the students’ scores were perfectly symmetrical a bell curve would be formed, however; sometimes the normal curve does not form and the NCE evaporates. Therefore, NCE’s were not a solution for comparing different standardized tests stanine’s like an NCE but it divides a score distribution into nine segments that though equal along the baseline of a set of scores contain different proportions of the distribution scores. Stanines are approximate scale scores.
There are two tests used to predict a high school student’s academic success in college. The first test is known as the SAT or Scholastic Aptitude Test. It’s function was originally to assist admissions officials in a group of elite Northeastern universities to determine who to admit. The test was designed to compare inherited verbal, quantitative and spatial aptitudes. Today, however; it is divided in three sections. The three sections are; critical reading, writing and mathematics. The SAT uses a score range from 200 to 800 for each of the three sections. The highest score as of the year 2005, that can be earned is a 2400. This is a total from all three sections. The test takes about three hours to administer. There is also a PSAT to help students prepare for the SAT. The second type of test is the ACT. The ACT is also an aptitude test. Different from the SAT the ACT or the American College Test was created as a measure of a student’s educational development for the soldiers that were taking advantage of the GI money being awarded to them for college. The SAT was sometimes too difficult or inappropriate so a new measure was needed. There are four content areas it addresses, they are English, Mathematics, Reading, and Science. The ACT also takes three hours to administer similar to the SAT. The ACT is scored by giving 1 point for every correct answer and no subtraction of points for wrong answers. Then an average is computed for each of the four sections, unlike the SAT where scores are added together. One very important aspect of the SAT and ACT tests is that only 25% of academic success in college is associated with a high school student’s performance. The other 75% has to do with non-test factors. Therefore, students who may not do well on these tests should not be discouraged from attending college.
Standardized tests are assessment instruments that are administered, scored and interpreted in a typically, predetermined standard format. The standardized tests are used to get a handle on students’ achievement and aptitude. Test scores are described two ways by central tendency and variability. Central tendency uses mean and median and variability uses range and standard deviation to describe scores meaning. Interpreting results by percentiles, grade-equivalent scores, scale scores, stanines and normal curve-equivalent were also explained for strengths and weaknesses. The SAT and ACT were also described and explained.

No comments: