Algorithm nation

What are algorithms, how do they decide what we see and what we don’t see online?



What we see in our news feeds isn’t always the truth. In fact, it’s usually decided by a sequence of actions that for many of us remain a mystery: the algorithm. What are algorithms, how do they decide what we see and what we don’t see online, and can they be used to break, not build, our social media bubbles?

Interview by David Phillips

At a dinner party, if you were asked to describe how an algorithm works, what would your answer be?

I would explain a very simple algorithm that takes care of content selection on the web. Most simply, it would involve two pieces of news items. If I want to know which one to show to you, I would estimate, one way or another, which of those you are more likely to click on. That is often what companies try to optimise for. If I have no information about you or the articles whatsoever, a way to estimate which of the two you are most likely to click on is just to randomly show either article A or B to a bunch of people. After doing this for a while I look to see whether A was clicked more than B. If it was, then I’ll show A to the next person. That’s an algorithm.

How easy is it to use this kind of algorithms to anticipate what kind of content will best circulate online?

The algorithm I just described is based on not knowing anything about you or anything about the content itself. But most content algorithms are more refi ned, in the sense that they try to predict the probability of you clicking by using information about you and the content to do so. If we know you happen to click a lot on news about left-wing politics and we know that some of these items that we might select for you are left wing while others are right wing, then I’ll make sure that we show the left-wing ones more because we know those are the ones you’ve clicked on historically or those are the ones that people clicked on who resembled you in one way or another.

These similarities might be as superficial as living in the same place, using the same laptop or visiting the same websites. All of these kinds of information streams are combined. The basic principles are not that hard. All of them are trying to make a better model to predict what things you would click on.

Are these predictions based on a detailed understanding of the content?

Often in solely algorithmic selection, the algorithms do not have a clue what the item is. There’s no conscious coding anywhere of what’s in the item. There are some features that describe maybe the length, the producer of the item or some basic categories. But primarily the coding is based on the type of person who is reading the item and if it works for people like you. The algorithm will show it to more people like you, irrespective of what’s in it.

To what extent does the human element create a context for discriminatory algorithms? We’ve read a lot in the news recently about how in the US Presidential election people were targeted with fake news stories because the algorithms ‘decide’ that certain demographics respond well to a story that plays up to racist fears, for example. Is algorithmic fairness possible, or are they doomed to reflect the biases and prejudices of the people who put them into motion?

This is a very tricky question because it boils down to definitions of fairness. How can we ensure whatever a computer does is in some ways fair or understandable? What we currently believe to be a proper news collection would be a balanced view on a range of topics, news that is supposed to add nuance to whatever you believe and give you a broader perspective. On the other hand, if it’s a purely algorithmic selection, what is selected is often based on whether or not you click it and whether or not you read it – which is measured by the time spent on the page.

This is optimised fairly to the extent the algorithm will do what it does for everybody equally. The problem is the criteria that the algorithm tries to optimise seems to not align with the criteria we would like to have for news selection. These algorithms are very good for selecting items, even without knowing the content, that people will click on. But if those items happen to be news items, we can often be shocked by the type of news that we get. Apparently that selection does not satisfy what we want to see. This is the main problem for what’s happening right now with the fake news idea.

Could you give an example?

Suppose there are two items. One is this nuanced view on a topic you already know something about. It basically extends your view. The other is a headline that is not what you were expecting. It’s a totally new idea but you are more likely to click on it. Content that is fake or untrue may be more surprising and is therefore going to be clicked on more often. Many of these algorithms only have the idea of maximising clicks as a criterion. These news items surface not because they are fake, but because they are surprising to us and hence we click on them. And because we click them, they tend to surface more often.

"These news items surface not because they are fake, but because they are surprising to us and hence we click on them."

Last year was a watermark for Facebook, Twitter and Instagram. They all instituted similar changes in their timelines, which meant it would no longer be the most recent story to appear on the top but algorithms would decide what individual users click on more often. Was this a good idea?

The main aim, especially on Facebook, is maintaining user engagement levels. Its prime outcome is that you stay active on Facebook. It sounds plausible that those things you like are the things you will click and share, but it does not necessarily correspond. What it’s optimised for is whatever makes you active, not necessarily what you want. That’s a source of many of the problems people have now with what they see on Facebook and the way it’s reflecting activity. This measure of activity or advertising that they are displaying are apparently not reflecting accurately what we would like to get out of this kind of information.

Are the algorithms doing a good job then?

I would say they are because they are making people more active and that was the aim of this change. I would be inclined to say that Facebook has very little responsibility regarding this. They are open about the fact that they are a commercial company, they are publicly listed. There is a perception amongst the general public that Facebook should be responsible for providing you with accurate news. There are actually companies that do this. Those are the news agencies that check things and you have to pay for it.

But isn’t it true that many of us get most of our news stories from Facebook? Doesn’t Facebook they have an ethical obligation to make sure the algorithms are used to help us see balanced news stories?

They have an ethical obligation to be clear as to whether or not they’re trying to be a publisher, and up until now that they have been fairly clear that this was not their aim. In the past year there has been more and more public uproar, as the general audience perceives Facebook as more of a media or news company than they aimed to be. So now they seem to be moving in that direction because it is something their clients expect from them. I know that this is not a common stance but I would say, ethically, they have been fairly clear with their aims. You can disagree with what they do, but I would then say: don’t use it.

We've established that the general distrust of algorithms is due to a lack of understanding of what they actually are. But could algorithms be used to break the filter bubble that we all live in? For example, in the top news stories on my newsfeed I saw no pro-Brexit items. Is there any way algorithms can actually halt this echo chamber effect?

This filter bubble originates from the fact the selection is made based on clicks. You’re seeing stuff that you click on, and apparently you are not clicking on the pro-Brexit stories, which means you are not seeing them. The algorithm has been instructed to gain clicks, nothing more. If you instruct it to give you a nuanced view of the world, there are no theoretical objections to this, it’s just changing the criteria that the algorithm tries to optimise. The big difference between a nuanced view of the world and a click is that the clicks are very easy to measure. It’s quite hard to measure whether or not you have actually received a nuanced view of the world.

Suppose Facebook wants to provide “proper” selection of news, whatever that means exactly. In this case, the main question is, how do we measure what is proper? If we know how to do this then we can build algorithms that optimize this criterion. I have however no doubt whatsoever that we can do this as long as we define properly what it is that should be measured.

Journalistic fact checking is back on the public agenda. Journalists fact check statements put out by a spokesperson or fake news stories that appear in the media. Would it also equally easy theoretically to develop a fact checking algorithm?

Fact checking is something that sounds very simple but is very hard. Estimating a clear line between what is fact and what is not fact is going to be difficult in several cases. That fewer people attended Trump’s inauguration was undeniable if you see the pictures. But from there on you get to facts that can be debated from all kind of sides. It would be very hard to do a content fact check of all news items that have been generated. However, you could quite easily design an algorithm that suggest news items to you that are not read just by your closest friends but are read by people who don’t share certain properties with you or that are slightly more distant in your social network.

If we use a criterion like this, then even without checking the content we can make sure you are getting a slightly more nuanced view on the world. Because now you are seeing things that other people, who you might not normally discuss things with, are interested in. That kind of solution is to me far more promising than actual fact checking of the content, which is a very hard task.

Maurits Kaptein

Dr. Maurits Kaptein is an assistant professor of statistics at Tilburg University. Previously, Maurits was a researcher at the University of Eindhoven and the Aalto School of Business, and a visiting scholar at Stanford University. Maurits explores statistical methods for content personalization. His work has been published in leading journals such as Behavior Research Methods, Bayesian Analysis, and the Journal of Interactive Marketing.