Every 10 years, officials revise federal, state and local voting districts based on the decennial census.
But the methods used to keep census data anonymous might obscure the very facts meant to inform those decisions.
The U.S. Census Bureau balances privacy and accuracy by sprinkling its data with random errors while ensuring that, at the right scale, the figures retain their key demographic and political patterns.
"The idea is that, by adding noise, you can protect the privacy of individuals so nobody is able to match individuals in another data set with census data to figure out your private information," said statistician Kosuke Imai of Harvard University.
But a new redistricting simulation study by Imai and colleagues in the journal Science Advances finds that recent tweaks meant to better protect anonymity in the information age also tone down racial and political diversity in some precincts.
"There's a systematic overcounting and undercounting of those areas, in part because, when they're doing the postprocessing, they are focusing on the accuracy of the largest ethnic groups within each area," said Imai.
That could bias redistricting efforts, violate the "One Person, One Vote" standard that people should have equal representation in voting, and skew federal spending and social science research.
Federal law prohibits the U.S. Census from releasing personally identifiable information about an individual for 72 years after it was gathered, even to government or law enforcement agencies.
In the past, the agency met this standard through methods like averaging outliers or swapping people among areas with very small populations. In theory, such approaches maintained population counts while obscuring data that might identify individuals. In practice, attackers still found ways to identify some individuals and protected information using various methods and datasets.
The growth of available data and interconnectedness in recent decades has further complicated the problem. In its attempt to counter that shift via a new disclosure avoidance system (DAS), critics say the bureau may have overbalanced.
"There is a tradeoff between the two, and Census Bureau have to figure out how much noise to add, and what's the appropriate compromise," said Imai.
Prior to the paper's release, the bureau adjusted the DAS and postprocessing to address criticisms voiced by Imai and others, which were made possible by test data the bureau made available. Imai said the new data showed a marked improvement in accuracy but sacrificed privacy protection in the bargain.
As the age of big data rolls on, striking the right balance between accuracy and anonymity will likely grow even more difficult.
"That issue of privacy protection is really important, but will become more and more important in the future. And I think it's important for us to understand that trade off and to get involved in that political process to make sure that these policy decisions are made in the interest of citizens," Imai said.