Can Natural Language Processing Help Rebuild Trust in Fact-Based Discourse?

Robert Pierce Wall
4 min readApr 28, 2021

The weather’s getting warmer, the days are getting longer, and more and more of the people we know are receiving that long-awaited second dose of the COVID vaccine. With the light at the end of the tunnel growing brighter, it’s tempting to try and forget that 2020 ever happened. But the sad truth is that society’s full recovery from the loss, isolation, disillusionment, and betrayal we’ve experienced over the past year just isn’t that simple. Our hyperconnected life still seems intent on drowning us in information, leaving us too exhausted from simply trying to stay afloat to have any hope of discerning what we actually ought to believe anymore.

Scarred by the pervasive effects of inconsistent information and downright lies about everything from COVID-19 to the presidential election, it now feels like the future of democracy is on the line if we can’t regain a common understanding of what’s objectively true.

It’s not always easy to identify a “true” statement in isolation. But true statements will emerge when we look across a whole body of information. Regardless of differences in word choice, objectively true statements about the same subject should bear a basic conceptual similarity to one another. By contrast, when false statements emerge simultaneously, they tend to look different from both true statements and other false statements.

Looking at this in real time, statements that have little or no similarity to other narratives on the same subject will start to appear as outliers within a body of text. Some of these outliers might be new discoveries, while others could be unintentional (misinformation) or deliberate deception and manipulation (disinformation).

Truly new discoveries, misinformation, and disinformation are all factual outliers which present unique challenges to readers and fact checkers, who can only evaluate and compare information on a statement-by-statement basis. Thankfully, artificial intelligence (AI) has evolved to address these difficulties, helping us gain insights from the large masses of digital content we consume each day.

AI has the potential to solve problems of tremendous complexity that have traditionally relied on human analysis: forecasting the actions of our enemies, predicting stock market trends, and interpreting legal documents. However, helping separate factual outliers among digital content could be AI’s most significant contribution in fighting the forces tearing our society apart.

Written, audio, and video content can all be distilled into text, the specialty of a sub-field of AI called natural language processing, or NLP. NLP extracts salient information from masses of text across dialect and genre, providing an easily searchable, conceptual understanding of its content, a feat that would take humans many lifetimes to achieve. As a result, NLP can identify the degree of logical similarity among text inputs, allowing us to see the outliers in our information ecosystem.

Like humans, NLP and other AI systems make sense of new data by referencing the information to which they have already been exposed. However, unlike intuitive and sometimes impressionable humans, AI is rigidly consistent in applying the logic with which it has been programmed — and it doesn’t forget what it has learned. Thus, in order to keep AI models free of bias, data scientists will remove outliers from the data used in model training. This avoids skewing results and enables the development of algorithms that are ideally suited to parse the intended dataset. The presumption underlying this practice is that AI systems trained on data cleansed of outliers will more accurately predict real world outcomes.

NLP’s ability to identify outliers within text makes it a powerful tool for this moment, where we are faced with both reliable and unreliable information on a massive scale. Imagine this: if NLP digested the whole body of text generated about COVID across the globe over the last 15 months, we could quickly identify the very real correlation between consistent application of interventions like wearing masks or social distancing and slowed community spread. In retrospect, eliminating doubt about the effectiveness of such measures could have increased public buy-in at critical moments during the pandemic, focusing society on the actions that would have better contained the virus. Examining the underpinnings of divergent statements on a wider scale would slow the spread of falsehood and make the public less susceptible to manipulation.

As America works to regain its footing on factual ground, private citizens, corporations, and government must guard carefully against the factual outliers in the information we consume. In his inaugural address, President Biden warned against the persistent threat of “lies told for profit and power.” NLP’s inherent ability to perform logical comparisons at tremendous scale offers a powerful tool for first identifying those outliers, so that we have a fighting chance of restoring the common understanding upon which fact-based discourse depends.

--

--

Robert Pierce Wall

I'm a passionate technologist on a lifelong mission to make the world a safer and more hospitable place.