Publication Date: October 2019
Publisher: Data&Society
Research and Editorial Team: Michael Golebiewski, danah boyd

There are many search queries for which the available results are limited or deeply problematic. The paper calls "data voids" these low-quality data situations. Data voids can come both naturally and through manipulation. Unusual search terms can lead users to disinformation or manipulated contents. When few results are associated to a query, the void can be exploited by media manipulators to spread problematic contents. According to the authors, "problematic includes conspiratorial, extremist, hate-oriented, terroristic, graphic and illicit content".

The paper identifies five main types of data voids.

  1. Breaking news: problematic content is commonly spread during a breaking news situation. In such a case, many users start to seek new information about a place or a subject that was almost unknown before. Journalists need to produce content quickly, which must be integrated into search engines. Manipulators have an opportunity to capture attention in the time between the first report and the creation of massive content. The goal is dual: first, they attempt to waste journalists' time, for example by using inauthentic accounts on Twitter to send them in the wrong direction; secondly, media manipulators seek to impact news coverage in order to influence public perception. This type of data void is naturally cleaned up, as the production of high-quality content will eventually outweigh the problematic content.
  2. Strategic new terms: media manipulators create new terms to divert discourse into areas full of disinformation. This kind of data void is particularly dangerous when combined with a breaking news situation. The paper provides an example referring to the 2012 shooting at Sandy Hook Elementary School: members of conspiracy forums coined the phrase "crisis actors" to refer to survivors and parents. Six years later, this concept broke into national news coverage on the occasion of another shooting.
  3. Outdated terms: when a term stops being regularly used, content creators stop producing content associated to it. This creates a data void that can be exploited by manipulators for a long period.
  4. Fragmented concepts: manipulators can break the connections between related ideas in order to create different clusters of information. According to Golebiewski and Boyd, users “can end up in an entirely different sphere of information than other who seek information on similar topics using different terms”.
  5. Problematic queries: search results for unpredictable queries are often problematic. These questions should be addressed by both content creators and search engines.

In conclusion, the authors argue that there is no fix for data voids.They are a security vulnerability that must be treated with the same level of seriousness as any security issue. Search engines and content creators must work together to identify them. A countermeasure is the production of high-quality contents that can fill the data voids.

Tags: Fake news and disinformation Online news Online media Fact-checking

The content of this article can be used according to the terms of Creative Commons: Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) . To do so use the the wording "this article was originally published on the Resource Centre on Media Freedom in Europe" including a direct active link to the original article page.