Data Void

There are many search terms for which the available relevant data is limited, nonexistent, or deeply problematic. …We call these low-quality data situations ‘data voids.’

Data Voids: Where Missing Data Can Easily Be Exploited

“Data voids are a security vulnerability that must be systematically, intentionally, and thoughtfully managed.” 

Michael Golebiewski of Microsoft coined the term “data void” in May 2018 to describe search engine queries that turn up little to no results, especially when the query is rather obscure, or not searched often.

In Data Voids: Where Missing Data Can Easily Be ExploitedGolebiewski teams up with danah boyd (Microsoft Research; Data & Society) to demonstrate how data voids are exploited by manipulators eager to expose people to problematic content including falsehoods, misinformation, and disinformation.

Data & Society — Data Voids

Data voids are often difficult to detect. Most can be harmless until something happens that causes lots of people to search for the same term, such as a breaking news event, or a reporter using an unfamiliar phrase. In some cases, manipulators work quickly to produce conspiratorial content to fill a void, whereas other data voids, such as those from outdated terms, are filled slowly over time. Data voids are compounded by the fraught pathways of search-adjacent recommendation systems such as auto-play, auto-fill, and trending topics; each of which are vulnerable to manipulation.

Data & Society — Data Voids

So what you get is what researchers call a “data void“: people who know anything about the history of Europe, immigration, etc. don’t talk about Kalergi, because he is insignificant, a figure most notable for the conspiracy theories built around him. But people using the conspiracy theory talk about Kalergi quite a lot. So when you search Kalergi Plan, almost all the information you get will be by white supremacist conspiracy theorists.

These bad actors then use the language of critical thinking to tell you to look at the evidence and “make up your own mind.”

Things used to be much worse up until a few months ago, because if you watched one of these videos, YouTube would keep playing you conspiracy videos on the “Kalergi Plan” via a combination of autoplay, recommended videos, and personalization. It would start connecting you to other videos on other neo-Nazi theories, “race science”, and the like. People would Google a term once and suddenly find themselves permanently occupying a racist, conspiracy driven corner of the internet. Fun stuff.

Due to some recent actions by YouTube this follow-on effect has been substantially mitigated (though their delay in taking action has led to the development of a racist-conspiracist bro culture on YouTube that continues to radicalize youth). The tamping down of the recommended video conspiracy vector isn’t perfect, but it is already having good effects. However, it’s worth noting that reducing the influence of this vector has probably increased the importance of Google This ploys on the net, since people are less likely to hit these videos without direct encouragement.

What can we do as educators? What should we encourage our students to do?

1. Choose your search terms well

2. Search for yourself

3. Anticipate what sorts of sources might be in a good search — and notice if they don’t show up

Data Voids and the Google This Ploy: Kalergi Plan | Hapgood

Data voids are not unique to search engines; they occur on social media platforms, too, where search is typically limited to information hosted on that particular platform. Golebiewski and boyd emphasize that there is no “quick fix” for data voids. Instead, they urge search engines and content creators to work together to anticipate and identify risky data voids, and to fill them with quality content. “Data voids are a security vulnerability that must be systematically, intentionally, and thoughtfully managed.” Golebiewski and boyd first introduced data voids in the May 2018 version of this report. Read it here.

Data & Society — Data Voids

The logic underpinning search engines is akin to a lesson from kindergarten: no question is a bad question. But what happens when innocuous questions produce very bad results for users? 

Data voids are one such way that search users can be led into disinformation or manipulated content. These voids occur when obscure search queries have few results as- sociated with them, making them ripe for exploitation by media manipulators with ideological, economic, or political agendas. Search engines aren’t simply grappling with media manipulators using search engine optimization techniques to get their website ranked highly or to get their videos recommended; they’re also struggling with conspiracy theorists, white nationalists, and a range of other extremist groups who see search algorithms as a tool for exposing peo- ple to problematic content. 

Data voids are difficult to detect. Generally speaking, data voids are not a liability until something happens that results in an increase of searches on a term. Some are created by media manipulators, and escape notice for long periods of time. Others are the sudden products of a news spike, as millions are prompted to search names or terms for the first time, and misleading or hateful content is created to meet demand. Search-adjacent recommendation systems, like search bar auto-suggestions, further complicate the data voids problem by providing auto-suggestions that can send people down deeply disturbing paths. 

Search engine creators want to provide high quality, relevant, informative, and useful information to their users, but they face an arms race with media manipulators. In this report, we focus on five types of data voids that are currently being corrupted by those spreading conspiracies or hate: 

Data Voids: Where Missing Data Can Easily Be Exploited

If you know of neurodiversity and disability related data voids, help us fill them.

Further reading,