Understanding Assessment Data

The whole point of special education assessment is to identify what is preventing a child from accessing his/her educational opportunities and provide the data necessary to figure out a way to overcome whatever is getting in the way. Over the years, I’ve seen all kinds of things happen with assessments.

 

I’ve seen very good assessments performed by both education agency personnel and outside assessors. I’ve also seen horrible assessments performed by both education agency personnel and outside assessors. 

 

I’ve seen assessments that yielded valid data to the degree that assessment was done, but that, overall, were substantively lacking because assessment was not conducted in all areas of need. I’ve also seen assessments that were outright fabrications, which just makes me ill because, regardless of the assessors’ motivations to misrepresent the facts, no child is served when inaccurate data is presented as though it legitimately documents the child’s needs and how they can be met.

 

There are two varieties of assessments that are most commonly used to collect data on children for the purposes of special education: criterion-referenced and norm-referenced assessments. Each has unique aspects that bring value to the assessment process.

 

Criterion-referenced tests ask whether a child can perform a specific task. For example, can the child properly punctuate and capitalize a sentence, yes or no? Criterion-referenced tests are simply looking at whether or not a child can produce a specific outcome.

 

Norm-referenced tests are more complex. Norm-referenced tests that measure cognition (IQ), academic achievement, visual-motor processing, vocabulary development, etc. all have in common the way their scores are presented and interpreted.

 

When a norm-referenced test is developed, the producers of the test recruit thousands of people from all walks of life who collectively represent the population to take their test. These recruits are collectively referred to as the “normed group” or “sample” and it’s their job to establish what constitutes as “average” on the test. 

 

By having normed groups at each age and/or grade level take the test first and establish what is “normal” for a person of each age or grade to score on the test, the scores of people taking the assessment in the real world can be compared against something meaningful.? The purpose of norming a test before it’s put out for actual use is so the scores achieved in the real world can be compared against what should be expected by a particular age or grade.

 

The scores on norm-referenced tests are mathematically designed so that even though different tests measure different things, the scores can be compared to each other. This is referred to as “standardizing” the scores, which is also why these tests are referred to as standardized assessments.

 

In order to standardize the scores from many different kinds of tests, the producers of the tests use statistics to make the scores comparable by using normal distributions. Graphically, this is also known as a bell-curve.  I don’t want to make this overly-technical for folks with little to no background in statistics, but I don’t want to over-simply this so much that I’m not really telling you anything either, so I’m going to attempt to strike a balance, here.

 

One way to standardize the scores is to use what is called, appropriately enough, Standard Scores. By converting the raw scores of a given assessment into Standard Scores, they can be compared against the Standard Scores of another type of assessment. This is very commonly done when looking at whether a child has a specific learning disability using the discrepancy model, which calls for a significant gap between academic achievement and cognitive ability on standardized tests that measure each.

 

Let’s say that a student takes a standardized IQ test such as the WISC-IV and achieves the following scores:

 

Verbal Comprehension = 101

Perceptual Reasoning = 104

Working Memory = 99

Processing Speed = 96

—————————————-

Full Scale IQ = 100

 

Now, let’s say that the same student takes a standardized measure of academic achievement such as the WJ-III and, on the portions pertaining to reading achieves the following scores:

 

Letter-Word Identification = 78

Reading Fluency = 70

Reading Comprehension = 72

 

First, given that the subtest scores on the WISC-IV are so close together, the Full Scale IQ can be presumed to be sufficiently representative of the student’s cognitive abilities. ?An IQ score of 100 is dead-center average; in other words, it’s perfectly normal.

 

Reading scores in the 70s, however, are not normal for someone with perfectly normal intelligence. If you look at your PDF handout that you should have already downloaded, you will see that there is a big gap between achievement (the WJ-III scores) and ability (the Full Scale IQ).

This is one way to use standardized scores to understand the picture painted by the data. Another way, which many parents find a lot easier to understand, is Percentile Rankings. Percentile Rankings present the scores as comparisons against the normed sample, which is to compare the child’s scores against the general population.

Because of the nature of normal distributions, reliance on Percentile Rankings is more appropriate with scores that are fairly close to the mean, where Standard Scores are more reliable at the extremes of the distribution. For this reason, when you look at the PDF handout we’ve provided, you’ll notice that the scores don’t all exactly align from one illustration to the next.

 

Taking our Standard Scores from before and converting them to Percentile Rankings results in the following:

 

Verbal Comprehension = 52nd %ile

Perceptual Reasoning = 62nd %ile

Working Memory = 48th %ile

Processing Speed = 38th %ile

—————————————-

Full Scale IQ = 50th %ile

 

Letter-Word Identification = 7th %ile

Reading Fluency = 2nd %ile

Reading Comprehension = 3rd %ile

 

You need to understand that Percentile Rankings are not the same thing as percentages of correct answers. For example, if your child got 50% on his/her science test, you’d know that was an “F” grade. With Percentile Rankings, 50th Percentile (or “50th %ile”) means that half of your child’s same age or grade peers (depending on which comparison is being made) scored beneath your child and the other half scored above your child.? At the low extreme is the 0 Percentile and at the high extreme is the 100th Percentile.

 

If you averaged the scores of all the people who took the test, their scores should mostly cluster around the 50th percentile. A relatively small number of people gifted in whatever area the test measures will score at the high end of the distribution while another relatively small number of people will score at the low end. This accounts for the hump in the middle of the distribution and the skinny ends at the extremes. Most people’s scores will fall within the tall hump.

 

By looking at the Percentile Rankings, parents can know for example, that if their child scored at the 62nd %ile in Perceptual Reasoning, then he/she outscored 62% of his/her peers and was only outscored by 38% of his/her peers.

 

If, however, the same child scored at the 3rd %ile in Reading Comprehension, that means that 97% of his/her peers outscored him/her on this measure and only 3% of his/her peers scored lower. This puts things into perspective.

 

If a child is outperforming 62% of his/her peers in Perceptual Reasoning, but is being outperformed by 97% of his/her peers on Reading Comprehension, there’s a problem. Clearly something is interfering with this child’s reading that can’t be accounted for by low cognition. This is when you begin to dig for processing disorders that might be responsible for a learning disability.

 

All too often, parents go into IEP meetings where report data is presented and it all flies right over their heads. Unfortunately, some school agency personnel take advantage of that fact and will either skimp on their actual data collection or misrepresent what the data means. They may present only broad cluster scores, which are just averages of the subtest scores, without presenting the subtest scores themselves. This is dangerous because if there is a lot of scatter among the subtest scores – that is, they aren’t all close in number and you have a wide variety of scores falling along the distribution – then the averages represented in the clusters don’t tell you anything.

 

For example, if instead of the scores represented above on the WISC-IV, let’s say you have a student with the following scores:

 

Verbal Comprehension = 120(PR=91st %ile)

Perceptual Reasoning = 136(PR=99th %ile)

Working Memory = 93(PR=31st %ile)

Processing Speed = 82(PR=12th %ile)

—————————————-

Full Scale IQ = 108(PR=69th %ile)

You can see here that because of the diversity of the subtest scores, the overall average of the Full Scale IQ doesn’t really paint a clear picture of what is going on with this person. An IQ score of 108 is still a fairly middle-of-the-road average score. But, this is a person who is scoring in the above-average to superior range when it comes to Verbal Comprehension and Perceptual Reasoning.

 

Because of the disparity among the subtest scores, the Full Scale IQ is not considered to be reliably representative of the student’s intellectual abilities and the subtest scores have to be looked at individually. I’ve seen WISC-IV subtest scatter like this with kids who have learning disabilities and/or ADHD. 

 

The topic of assessment scores and data interpretation is extremely complex and multi-faceted. People get Masters’ Degrees in school psychology just to be able to make sense of it all. There’s no way I can hit all the things you need to know in a blog posting.

 

But, understanding the scores well enough to read the assessment reports with any comprehension is critical for parents and educators alike. It’s been my unfortunate experience that some assessors either don’t understand their own data or even how to properly administer the assessments in the first place, which results in inaccurate data. I’ve seen reports in which the scores actually contradict the positions asserted by the reports’ authors. I’ve seen testing in which the assessor completely failed to adhere to the test instructions provided by the producers of the test, thereby rendering invalid scores.

 

The more parents understand about assessment scores, the less they are able to be misled by inaccurate and/or disingenuous representations of the data. The more teachers understand about assessment scores, the more able they are to put that data to constructive use in developing teaching strategies for their students with special needs.