Note — Dec 12, 2021

Statistical Imaginaries

Essay version of danah boyd’s talk at the 2021 Microsoft Research Summit. Long read on the societal importance of data (especially the census in this case), how uncertainty and margins of error are often excluded from discussions because politicians want something perfect, why keeping them in is important, and political manoeuvring and “agnotology.”

boyd introduces her concept of statistical imaginaries, which form “when people collectively construct a vision of what data are and what they could be.” She argues that the “key to responsible data science is to keep statistical imaginaries in check.” In other words, to realize where imaginaries come from, why they skew a certain way, what is obfuscated, and to consider related data accordingly. More importantly, that data have to stop being seen as perfect sets, the uncertainty that accompanies them needs to be visible, all the while trying to prevent political capture.

The moment that data matter, those data can never be neutral. The greater the stakes, the less objective those data can be. The very choices of what data to collect, how to categorize data, and how to present data reveal ideological, social, and political commitments. […]

The scientific community has developed a range of techniques to improve data quality in spite of data collection limitations, but embracing these requires that stakeholders understand data’s limitations and vulnerabilities. […]

Census data are the product of scientific work. They are also infrastructural in our society, core to countless policies and practices. Lives depend on that data. Economies depend on that data. Public health depends on that data. […]

Engaging with uncertainty is risky business. People are afraid to engage with uncertainty. They don’t know how to engage with uncertainty. And they worry about the politicization of uncertainty. But we’re hitting a tipping point. By not engaging with uncertainty, statistical imaginaries are increasingly disconnected from statistical practice, which is increasingly undermining statistical practice.