It’s not just Google’s algorithms that are manipulating search results

Mike Wacker
Medium

Google CEO Sundar Pichai was summoned to testify in front of the House Judiciary Committee on December 11, 2018. This hearing spanned many topics, including Google’s alleged bias against conservative content. In response to one such question about manipulation of search results, Sundar said, “We don’t manually intervene on any particular search result.” 

Sundar Pichai did not tell the truth when he made this statement. 

“Just Go Talk to the Engineers”

In the introductory chapter of How Google Works, “Introduction — Lessons Learned from the Front Row,” the very first lesson reads, “Just go talk to the engineers.” Few quotes embody the ethos of Google as much as this one. Google’s ethos eschewed traditional business plans and traditional corporate hierarchies in favor of a structure that granted more freedom and autonomy to its engineers, engineers who were not just technically gifted, but who also displayed intelligence and creativity in other domains. 

Back then, the tech industry was treated as one of the prize jewels of the American economy, and it enjoyed unwavering support from both political parties. However, by the time Sundar testified in front of Congress, those golden days were long gone. On top of that, Google’s problems have only intensified since that visit to Capitol Hill. Stories from multiple sources have since alleged manipulation of search resultsmanual blacklisting of sites, and even election interference. These stories have called into the question the sacred reputation of Google’s search results. 

In light of all these events, journalists, members of Congress, and American citizens are all trying to figure out what is true, and what exactly is going on inside the company. At a time like this, it’s time to go back to that very first lesson: it’s time to go talk to the engineers. 

I happen to be an engineer who used to work for Google, and I have come here to explain the truth behind many of these stories, including the story of the three most consequential words I may have ever written at Google: “the smoking gun.” The account I will give will corroborate several of these stories from the perspective of an engineer who has firsthand knowledge of the blacklists and documents that I will discuss. 

A Tale of Two Questions

This story begins with a tale of two questions: one that occurred during the aforementioned Congressional hearing, and another one that took place a few days later outside of the Congressional spotlight. 

During that hearing, Rep. Zoe Lofgren (D-CA) asked Sundar Pichai a question about some less than flattering search results for Donald Trump.

Now, manipulation of search results. I think it’s important to talk about how search works. Right now, if you Google the word “idiot” under images, a picture of Donald Trump comes up. I just did that. How would that happen? How does search work so that that would occur?

Sundar then went on to explain how search works, referencing the 200 algorithmic signals that Google uses to objectively rank search results: “things like relevance, freshness, popularity, how other people are using it.” In other words, his answer was that an objective, automated process produced that image of Donald Trump as a result for the word “idiot.” Rep. Lofgren then asked a follow-up question to confirm this point.

So it’s not some little man sitting behind the curtain figuring out what we’re going to show the user? It’s basically a compilation of what users are generating and trying to sort through that information.

Sundar again confirmed the algorithm was deciding what to show the user, adding, “We don’t manually intervene on any particular search result.” Based on that response, Rep. Lofgren threw cold water on allegations of conservative bias and manipulation of search results:

In Santa Clara County, Donald Trump in the 2016 election got 20% of the vote. That’s how much of the vote he got. So it’s not a surprise that the engineers who live in Santa Clara County would reflect that general political outcome. That has nothing to do with the algorithms and the really automated process that is the search engine that serves us.

A few days later on Friday, December 14, April Glaser, a pro-choice writer for Slate, searched for “abortion” on YouTube, and she didn’t like the results she saw. Here is what she described in her article for Slate:

Before I raised the issue with YouTube late last week, the top search results for “abortion” on the site were almost all anti-abortion — and frequently misleading. One top result was a clip called “LIVE Abortion Video on Display,” which over the course of a gory two minutes shows images of a formed fetus’ tiny feet resting in a pool of blood. Several of the top results featured a doctor named Antony Levatino, including one in which he testified to the House Judiciary Committee that Planned Parenthood was aborting fetuses “the length of your hand plus several inches” in addition to several misleading animations that showed a fetus that looks like a sentient child in the uterus. The eighth result was a video from conservative pundit Ben Shapiro, just above a video of a woman self-narrating a blog titled, “Abortion: My Experience,” with text in the thumbnail that reads, “My Biggest Mistake.” Only two of the top 15 results struck me as not particularly political, and none of the top results focused on providing dispassionate, up-to-date medical information.

As a result, she sent an email to YouTube that same afternoon, asking “why anti-abortion videos saturated the search results for ‘abortion,’ and if the platform thought accurate, health-focused information had a place there complaining about those results.” When she checked again Monday morning, the search results for “abortion” seemed to be very different — and more to her liking. 

She would later receive a response from a YouTube spokesperson. However, this answer was different from the answer that Sundar Pichai delivered to Rep. Lofgren. Instead of explaining how search works and how these results were produced by an objective, automated process, the spokesperson “stressed that the company is working to provide more credible news content from its search and discovery algorithms.”

The Smoking Gun: YouTube’s Alternative Search Results

This Slate article then caught the attention of Alexandra DeSanctis, a pro-life writer for the National Review. In response, she wrote a quick article criticizing YouTube’s actions here:

YouTube has apparently changed the search results on its site for the term “abortion” after Slate writer April Glaser contacted the company last Friday to ask “why anti-abortion videos saturated the search results for ‘abortion,’ and if the platform thought accurate, health-focused information had a place there.” 

Glaser reports that, by this past week, “anti-abortion content meant to enrage or provoke viewers was no longer purely dominating the results” on the site. According to Glaser, YouTube did not tell her whether or how it tweaked the results for “abortion,” but “stressed that the company is working to provide more credible news content from its search and discovery algorithms.”

That article then caught my attention. On the one hand, this claim seemed almost too good to be true. Would YouTube really change its search results simply because one writer from Slate complained? On the other hand, the evidence for this claim also appeared to be fairly strong. 

So, not knowing what to believe, I decided to investigate the matter further. After trying a few different approaches, I eventually found the smoking gun: the exact change where Google had altered the search results for abortion. My initial reaction was a mix of excitement and shock. 

I then shared what I had found with other employees on an internal emailing list called industryinfo@. For context, industryinfo@ is a mailing list where employees post articles about the tech industry. The first part of my post was a copy of original National Review article from Alexandra DeSanctis. The second part was a small bit of commentary from myself, including a link to that change, which I referred to as “the smoking gun.” 

From there, my email and other emails from that discussion would later be leaked to Breitbart, and those three words would become a part of one of the most famous headlines written by Breitbart’s tech section: “‘THE SMOKING GUN’: Google Manipulated YouTube Search Results for Abortion, Maxine Waters, David Hogg”

So, what exactly was this change?

To reference the infamous phrase “alternative facts,” the change essentially used an alternative algorithm that delivers alternative search results. A special file named youtube_controversial_query_blacklist could be used to manually trigger this alternative algorithm. For any search or query that matched an entry on that blacklist, YouTube would blacklist the normal search results, switching over to the alternative search results instead.

The smoking gun I had discovered was a change that added two entries to that blacklist: “abortion” and “abortions”. As a result of this change, searches for those terms displayed the alternative search results. The change had been made at Dec 14, 2018, 3:17 PM PST, mere hours after April Glaser of Slate had emailed YouTube. 

How did this alternative algorithm work? I didn’t work in YouTube, and thus I don’t know the details of this algorithm, but the algorithm was referred to as “authoritative” ranking, and it seemed to favor more “authoritative” sources.

Of course, the million dollar question is exactly what an “authoritative” source is. Who wins or loses with these alternative search results, and how does Google decide when to use them? For example, does this change favor mainstream sources such as CNN while disfavoring grassroots sources such as Live Action? In the case of search results for abortion, the pro-choice side appeared to benefit from the alternative search results. 

What else was on this blacklist?

The first entries on this blacklist were related to the Las Vegas shooting, and that use of the blacklist is easily defensible. Misinformation can spread very quickly in the intermediate aftermath of a shooting, and any journalist knows that any information they receive during those first few hours needs to be taken with a grain (or perhaps a heap) of salt. In fact, most of the entries on the blacklist covered events like mass shootings, natural disasters, and terrorist attacks. If the use cases for the blacklist ended there, this blacklist would have been a non-story. 

It didn’t end there. Herein lies the key point: it’s not about where the blacklist begins, but where the blacklist ends. In this case, that contrast could not be more stark: the first entry on the blacklist was added because of a mass shooting, and the last entry was added because a Slate writer complained about search results for abortion. 

Two other additions to the blacklist deserve additional scrutiny. The first one is related to a member of Congress. On December 14, 2017, “maxine waters” was added to the blacklist. This change had been made because a single employee had complained that search results for her were low quality. The potential motivations and biases of that employee are not known. Another employee then compared the normal search results and the alternative search results for “maxine waters” and decided to switch over to the alternative search results. The criteria used to determine which search results are better are also not known.

The consequences of this change would then carry over into the 2018 midterm election. During that election, users who searched for Maxine Waters on YouTube would have received the alternative search results, whereas users who searched for her opponent, Omar Navarro, would have received the normal search results. 

The second change touches on the referendum in Ireland to repeal the 8th Amendment and legalize abortion, which took place on May 25, 2018. A little over a week before this election took place, over 100 entries were added to the blacklist on May 17. These entries were related to both abortion and the referendum; Project Veritas would later obtain a document containing the list of those entries. YouTube had started serving users alternative search results in the middle of an election campaign. 

Special Blacklists for Google Search

When Breitbart wrote their article about this blacklist, they also obtained a rather interesting quote from one engineer in those leaked emails:

We have tons of white- and blacklists that humans manually curate. Hopefully this isn’t surprising or particularly controversial.

Well at least he got the first part right. 

Since then, more stories have emerged exposing blacklists that are used for Google’s main search engine. One story from the Daily Caller even documented how Google blacklisted a Washington Post op-ed“How Vladimir Putin Became The World’s Favorite Dictator,” so that it would not appear in a special type of search result known as a WebAnswer. 

Of these stories, the biggest one came from the Daily Caller: “EXCLUSIVE: Documents Detailing Google’s ‘News Blacklist’ Show Manual Manipulation Of Special Search Results.” I am familiar with this blacklist as well, and I have seen the documents described in that article firsthand. 

To explain this story, it helps to first explain how the Google search page has evolved. In the early days of Google, a search would yield a much simpler page search consisting of only ten blue links. These “ten blue links” are called the web search results or the organic search results. 

Since then, Google has added many new search features offering new types of search results. For example, if you ask Google, “what is the tallest tree”, you will likely see a special block at the top about Sequoia sempervirens. This type of result is a special search result, and these special search results are often located at or near the top of the page. 

(Section 12.8 of Google’s Search Quality Evaluator Guidelines also offers a useful reference to help understand both types of results. In those guidelines, the web search results are called “Web Search Result Blocks,” and the special search results are called “Special Content Results Blocks.” The guidelines also refer to both types of results as “search results.”) 

The particular blacklist that the Daily Caller unearthed, deceptive_news_blacklist_domains.txt, applied to entire sites, and it was intended to prevent blacklisted sites from showing up in any of the special search results: “The purpose of the blacklist will be to bar the sites from surfacing in any Search feature or news product.” However, this blacklist would not alter the web search results: “It will not cause a demotion in the organic search results or de-index them altogether.” 

As stated before, these special search results often appear at or near the top of the page, so this sort of blacklisting was quite consequential. 

So, what problem was this blacklist trying to solve? 

In theory, the blacklist was designed to enforce two of Google’s policies: the “misrepresentation policy” and the “good neighbor policy.” On its face, the policies as described in the policy document seemed to be fairly unobjectionable and uncontroversial. 

In practice, while these policies seemed facially valid, a look at the actual blacklist itself raised serious concerns with the policies as-applied. Many questionable entries existed on that blacklist, including the website for The American Spectator and Matt Walsh’s blog. It was not clear at all how these websites had violated either of the policies.

Another important part of this story was how that blacklist was generated. In short, it was not generated by an algorithm or an automated process. This blacklist was manually generated by Google’s Trust and Safety team; another document describing the enforcement details mentioned that the “Ares manual review tool” is used to complete reviews of websites.

Finally, the policy document for these two policies was approved at the highest levels of the company. One of the approvers for this policy document was Ben Gomes, a Senior VP who is in charge of Google Search and who also directly reports to Google CEO Sundar Pichai. 

The Final Verdict

Based on this information, it is clear that Google CEO Sundar Pichai did not tell the truth to Congress when he said, “We don’t manually intervene on any particular search result.” 

In reaching this verdict, I have considered not just this statement in isolation but also the relevant context, namely the part of Rep. Lofgren’s question where she asked, “So it’s not some little man sitting behind the curtain figuring out what we’re going to show the user?” It was not an algorithm that decided to switch over to the alternative search results for abortion; it was a human who manually made that decision behind the curtain. It was not an algorithm that blacklisted The American Spectator from the special search results; it was the Trust and Safety team who manually made that decision behind the curtain. 

Sundar’s answer misled Rep. Lofgren, who responded that the potential biases of Google’s employees “has nothing to do with the algorithms and the really automated process that is the search engine that serves us.” As it turns out, the question of “what we’re going to show the user” is not dictated solely by algorithms and automated processes, and the biases of Google’s employees could have something to do with those search results.

Any false statement before Congress is a problem in and of itself, but this false statement also had a significant practical consequence: it obstructed Congressional oversight of Google. 

As more and more stories have emerged, Google has shifted the goalposts more and more with each statement that it provides in response to the latest story. Here, though, we must come back to the original goalpost. Sundar did not say, “In limited cases, we may manually intervene, but Google has never manipulated or modified the search results or content in any of its products to promote a particular political ideology.” Sundar said, “We don’t manually intervene on any particular search result.” 

If Sundar had disclosed these forms of manual intervention, Congress would have been afforded the opportunity to ask additional questions about these manual processes. For example, it could have asked what safeguards exist to prevent the biases of Google’s employees from seeping into these manual processes. However, since Sundar said that Google does not manually intervene, he denied Congress the opportunity to ask these questions, questions that are of vital importance to both Congress and the general public. In that respect, he obstructed Congressional oversight of Google.

Back in September, the Wall Street Journal broke a story about how Google employees discussed tweaking search results to counteract President Trump’s travel ban, though a Google spokesperson said that none of these ideas were ever implemented. In response to that article, Sundar sent an email to all Google employees warning them to stay nonpartisan

In that email, he wrote about the importance of trust: “The trust our users place in us is our greatest asset and we must always protect it. If any Googler ever undermines that trust, we will hold them accountable.” It is that theme of trust I would like to come back here. 

We now know that Sundar’s statement to Congress was false, but we do not know the exact reason why. One possibility is that Sundar deliberately misled Congress. Another possibility — to borrow a term of art from the NCAA — is that there was a “lack of institutional control” at Google, and it was this lack of control that led to this false statement. 

(However, it is difficult for Sundar to plead ignorance here, considering that the deceptive news blacklist had been approved by Ben Gomes, who directly reports to Sundar.) 

Whatever the reason was, one thing remains clear: it created a clear violation of trust. Google CEO Sundar Pichai cannot be trusted to tell the truth to Congress. What will he do to hold himself accountable?