How to discover hidden racial profiling in McHenry County police data

McHenry County is located northwest of Chicago, has just over 300,000 residents, and is part of the six-county Chicago region. In recent years, the county has had a growing Hispanic population (2009 Census figures estimate Hispanics make up about 11% of the population) and there was data to suggest that Hispanics might have been racially profiled by local police. Here is how the Chicago Tribune describes the data between 2004 and 2009:

Racial profiling is difficult to prove. That’s why researchers push for data collection, to flag potential problems. In 2004, the first year data were collected, McHenry County’s indicators were high.

Statewide, minorities were 15 percent more likely to be stopped than what would have been expected based on their respective populations.

McHenry County’s disparity rate, however, was 65 percent, more than double that of the Chicago area’s five other sheriff’s departments.

The county’s rate, however, began dropping dramatically in 2007, and by 2009 was average for area sheriff’s departments.

On the surface, this data suggests the problem might have been solved: police were made aware of the issue and McHenry County’s numbers were back in line with regional figures within a few years.

But the Chicago Tribune goes on to say that a statistical analysis suggests it isn’t that racial profiling actually decreased; rather, McHenry County police simply marked Hispanics as white in their reports:

By 2009, the statistical analysis showed, 1 in 3 Hispanics cited by deputies likely were mislabeled as white or not included in department data reported to the state.

•If mislabeling and underreporting are taken into account, the department’s official rate of minority stops would have towered over its Chicago-area peers rather than appearing average.

•Department brass repeatedly missed warning signs of potential problems, even after a deputy complained that some peers targeted Hispanics.

So how exactly did the Chicago Tribune do this analysis: how does one look between the lines of arrest data to make a claim about current racial profiling? As a sidebar in the print edition and an extra link to click on online, the Tribune describes how they did their analysis:

Drivers’ names from the court and department data were compared with names in the census database to find each driver’s likelihood of Hispanic ethnicity. Mirroring methodology of similar research, drivers were deemed Hispanic only if their last names were 70 percent or more likely to be Hispanic.

The department data were used to analyze accuracy of labeling by deputies — comparing the rate of likely Hispanics with what each deputy logged. But the department database lacked records of all cited drivers, so the Tribune used the court data to determine the extent of mislabeling and incorrect logging departmentwide. The rate of likely Hispanics, as shown by the court data, was compared with the rate of Hispanics that the department told the state it cited.

In doing the departmentwide analysis, the Tribune counted only the labeling of likely Hispanics as white, because such mislabeling artificially improved the state’s rating of the department. Deputies at times also labeled likely Hispanics as other minorities, such as when a driver who looks like Sammy Sosa was labeled African-American. The analysis didn’t count that type of mislabeling because it didn’t affect the state’s rating.

Researchers say the census-based analysis is commonly used in studies but has limitations: It counts non-Hispanic women who marry Hispanics, and misses Hispanic women who marry non-Hispanics. It also misses Hispanics who have nontraditional surnames. With the limitations taken into account, it’s generally considered an undercount of Hispanics.

This is an interesting methodological process involving several moving parts. The analysis used and compared multiple sources of data. This triangulation method then doesn’t just rely the data that police report – such data can have issues as the TV show The Wire illustrated. Surnames from the records were compared to US Census records to determine the likelihood that the name is Hispanic. This isn’t going to catch all cases but the Tribune says other researchers claim this actually produces an undercount. If this is the case, perhaps McHenry County police are even further engaged in this practice. Also, what counts as a correct labeling or not is determined by the state.

A few lessons could be learned from this:

1. “Official data,” as self-reported police records here, are not necessarily trustworthy.

2. There are often multiple sources of data one can use to describe or evaluate a situation. Relying only on one source of data gives a part of the story – in this case, the one the police wanted to tell, which is interesting in itself – but having multiple sources can give a more complete picture.

3. If the Chicago Tribune analysis is correct, it is a reminder that “hiding” or “disguising” data can be difficult to do if people are interested or determined enough to look into what the data actually means.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s