When a child is referred for evaluation in the schools or as new client in the clinic, one of the most important and frequently used tools is norm-referenced testing. Norm-referenced testing can also be useful when re-evaluating a child by comparing previous and present scores to get a sense of their progress. However, if you and your child are new to speech, occupational, or physical therapy services, understanding the meaning of norm-referenced scores on evaluation reports can be quite confusing. This post will explain how norm-referenced assessments are used, break down some of the jargon, and discuss some of the limitations of norm-referenced testing.
What is a norm-referenced test?
A norm-referenced test is a type of standardized assessment, meaning that all test items are required to be answered and scored in the same manner. Norm-referenced tests compare a child’s performance to a normative sample. What is a “normative sample”? During test design, the test creators administer the assessment to a large group of children (usually hundreds to thousands) from varying regions, socioeconomic backgrounds, races, and disability statuses that are intended to represent the general population of children for whom the test is intended. Each test taker’s performance is then ranked according to age group and/or sometimes gender along the bell curve, pictured below. Individuals who fall in the middle of the curve are considered to be typical or “average” for that peer group. Children who fall to the right of center would be considered to be above average, while children who obtained scores that place them to the left are considered below average compared to peers. When your child takes a norm-referenced test, their score is compared to the normative sample to determine where they fit in amongst their peers on the curve, which helps us see if their abilities are typical for their age group.
Terms you may see on an evaluation report and what they mean:
When a child completes a norm-referenced test, the clinician takes their raw score (the number of test items they answered correctly) and converts it into a standard score, which corresponds to the bell curve. Standard scores between 85 and 115 are considered to fall in the average range. If a child obtains a standard score less than 85, this is considered below average and may indicate skill deficits or impairment compared to others their age.
Standard scores can be compared to come up with a percentile ranking. Percentile rank indicates the percentage of same-age peers that scored the same or lower than the child who is being evaluated. For example, if a child scores in the 40th percentile, this means they scored as well or better than 40% of other same-age peers who took the test. It is important to keep in mind that percentile has nothing to do with percentage correct. Percentile ranks below 16 are considered to fall below average.
Because human behavior is variable, we can never be 100% certain whether a child’s performance is due to their actual skill base or other factors (attention, motivation, memory, etc.). To account for this uncertainty, many clinicians report a Confidence Interval, which indicates a score range where we are highly certain the child’s true score lies. For example, if a child receives a standard score of 79 with a 90% confidence interval of plus or minus 5, then we are highly certain the child’s true score lies between a standard score of 74 and 84.
Standard deviation tells us how far a standard score differs or “deviates” from the mean. On the bell curve, the mean standard score is always 100, which falls in the exact center. The closer a child’s score is to 100, the smaller the standard deviation. One standard deviation above or below the mean is 15, giving us the 85 to 115 range discussed earlier. A child who scores greater than one standard deviation below the mean would fall in the “below average” range compared to same-age peers clinically, though some school districts require students to fall at least 1.5 or 1.75 standard deviations below the mean to be eligible for special services.
When norm-referenced tests aren’t enough
Although norm-referenced tests are an important tool for helping clinicians diagnose disorders, they are not perfect. Some norm-referenced assessments have design flaws that affect how accurate they are or whether they actually test the skills we are interested in. In addition, normative samples don’t always reflect the background of the child being evaluated, creating “unequal” comparison. There are also factors individual to the child that may come into play. Although a norm-referenced test may be appropriate for a child based on their age, the child’s true abilities may not be captured by a norm-referenced test if they have trouble engaging in structured tasks or have physical and/or cognitive barriers that would make it difficult to complete the test. Because of these limitations, clinicians rely on other assessment tools, such as observations, parent and/or teacher input, and schoolwork samples. This way we can ensure we have a complete picture of a child and can develop an intervention plan that will meet their specific needs.
-Olivia Edelman, M.A. CF-SLP