Knight Center
Knight Center

JOURNALISM IN THE AMERICAS Blog

Panama Papers: Mar Cabra, ICIJ's data editor, talks about challenges of analyzing 11 million documents



The Panama Papers, the biggest leak in journalism’s history, led to a global investigative effort joined by about 100 Latin American journalists who were able to untangle how fiscal paradises work. Under the leadership of Spanish journalist Mar Cabra, the global data team from the International Consortium of Investigative Journalists (ICIJ) was the brain behind the investigation that required an analysis of 11.5 million documents; the team proved that knowing how to deal with big data has become essential to investigative journalism.

Mar Cabra, editor of the data team at ICIJ (Photo: Twitter)

During the first phase of the project, eventually published in the beginning of April 2016, Cabra and her team organized 2.6 terabytes of information leaked from the Panamanian law firm Mossack Fonseca and released the data to journalists around the world. On May 9, ICIJ made the information public, launching the Offshore Leaks Database. During its first six days, the webpage had almost 35 million visitors.

"We analyzed the data, pulled out our news stories, and then exposed the data to the public. Now people can search for their own stories, whether or not they are for the press. A new stage of the Panama Papers has started, when the citizen becomes an investigator," Mar Cabra emphasized.

In this interview for the Knight Center for Journalism in the Americas, the journalist talked about the work conducted by her team at ICIJ, the opportunities that arose from the collaboration between journalists and developers and the relevance of data journalism in regaining the public interest in long-form investigative news.

Knight Center: How did you start working with data journalism?

I went to study investigative journalism at Columbia University with a Fulbright scholarship. There I learned about data journalism and I fell in love. I realized that it helped me to tell stories in a more powerful way, more systematically, that didn’t have to rely too much on statements. I graduated in 2010, went back to Spain and was hired to work at ICIJ as an investigative and data journalist. In Spain, I helped to create the first Master’s program in investigative journalism, data and visualization, at El Pais, and I kept developing activities to foment the field. This is why in 2014, when the Consortium started to think about creating a specific team dedicated to data journalism, I was offered to lead that group.

KC: How does data journalism work inside ICIJ?

ICIJ has always conducted investigations that involve working with data. Since 2011, there has not been one single investigation in which I have worked at ICIJ that hasn't involved data, but until 2014 there was not a specific team assigned for that. We started with two programmers and myself. The team started to grow and nowadays we are seven – four developers and three journalists – four of whom are women. We work in three fronts: conducting data analysis to deepen and discover new stories, helping with the interactive products of the investigations, and developing tools to promote collaboration among journalists from different countries, such as the platform we created to search more than 11 million Panama Papers documents.

KC: Is it a big team considering ICIJ’s structure?

The data journalism team accounts for about half of the staff and it is essential to ICIJ, it is one of the main areas of the organization. We have been growing over the last two years because the demand has increased. I remember my bosses asked ‘but we are going to hire two programmers, will we have work for them?’, and I said yes. Right after that I told them we would have to hire one more programmer. One of the programmers joined the staff to work exclusively with the Panama Papers’ documents. It was a big challenge to work with that massive amount of data, it was necessary to prepare them and we needed extra help. The team was growing organically, because of the necessities arising and the important role fulfilled by data and technology in our daily lives.

KC: What was your first reaction when you came across the leaked documents?

When I received the documents, my first reaction was ‘my God, how are we going to deal with this?’. It was an interesting challenge, but we have learned from previous investigations made from leaks that we would have to invest time analyzing the data before we started any reports. For the Panama Papers, we applied several lessons learned in previous projects, and that helped us not to make the same mistakes.

KC: How is this project different from the previous ones?

The amount of data. For the Offshore Leaks, in 2013, we worked with 260 gigabytes of information, and for the Panama Papers there were 2.6 terabytes, so much more! Also, we created a transnational team with over 370 journalists, which required considerable coordination efforts and data management. Imagine having almost 400 people sending you questions! Content-wise, this leak allowed us to understand the operation of fiscal paradises like never before, because Mossack Fonseca is one of the main firms in the world creating offshore companies in fiscal paradises, and it had high-level clients. This also has made the content more interesting journalistically.

KC: And how do you work with this amount of data?

We already had developed tools used in previous projects and this case allowed us to improve them. We created a communication forum, a kind of reporter’s Facebook, and everything that we were finding we would disclose there, like a virtual newsroom. We also have a tracker that helps us to search within several documents, as if it was our own Google. Lastly, we have a tool that shows the connections between individuals and offshore [companies] in fiscal paradises, because the real owners don’t always appear. These three tools were used in previous projects, specially the Swiss Leaks, but this is the project in which we gave them optimal use. We were able to improve them so we could deal with so much data. The knowledge that the Consortium has in working with tools in the cloud and making collaborative journalism beyond borders is unique. All our projects brought us expertise to deal with this massive amount of data in a more efficient way.

KC: Isn’t it risky for the journalist to work with sensitive data in the cloud?

Before we start any project, we analyze possible threats, and in the case of Panama Papers, we concluded that the governments were not a threat. We were not dealing with NSA documents, for instance, as in Snowden’s case, that required more caution using tools in the cloud. After realizing that we could work in a remote way, what we did was to create different steps to secure the reporters’ safety. For example, our inside social network has an two-step authentication, so besides providing user ID and password, the reporter is required to inform a code that is sent to his/her mobile. This two-step authentication is also necessary to access our document tracker and all the tools that are hosted in our servers, with encrypted connection. We also asked the reporters to use PGP cryptography systems to exchange messages.

KC: Do you think there are newsrooms already capable of doing the data journalism work that your team did for the Panama Papers?

The truth is that the technical expertise of the team working on the Panama Papers is well superior to the one working on the Swiss Leaks, the leak we published last year. Over the last year, I have seen data journalism teams growing in media outlets, but what is different is that inside the newsrooms, their main work is to take care of the news stories that will be published by them. The value of the ICIJ is to be a neutral territory, where we can make decisions on an international level, where we can worry about the story on a global level, not as much with the interest of a specific outlet. Our mission is to help in the collaboration and to improve the tools that allow this, look for the global stories, help reporters to get the best out of the local stories. And I believe this is unique in the world.

KC: What are the main challenges you face working with developers in your team?

They are different worlds, different ways of working and organizing. The developers are much more organized; journalists are more chaotic. The biggest challenge is to understand the best way of working together. And it is necessary to establish processes for that to happen. I have been working with developers for over five years, and it is fascinating, because it gives you super powers and allows you to journalistically search for questions you couldn’t have before. Without the developers’ work in our data team we couldn’t have dealt with these terabytes of information in such an efficient way. Journalists need to be more technical and not only know how to work with Excel and databases but also start to learn programming, try to understand the different programming languages, to start talking with the developers on the same language. When you work in a team, it is necessary to know your partner, and in this case the partner has a different profession, so the more you know about that profession, the better it will be for you to work together.

KC: Initially you decided to not release the data of this investigation, but recently you released a database with information obtained from leaked documents. How did you conceive the strategy to release the data? Did the intent to release them come afterwards?

From the beginning, the idea was to publish as much data as possible, but we knew there were ethical dilemmas and legal risks. The 11.5 million papers we have access to, we decided to not publish them in raw form, Wikileaks’ style, because we realized there were many personal documents, passports, identities, bank accounts, and to publish like that, in a massive way, didn’t make any sense ethically. Besides having legal implications. However, we saw that there was a parcel of the documents that we could publish, with information of the shareholders in fiscal paradises, which is considered secret information because the fiscal paradises sell exactly that, secrecy. Without the leak it is very hard to find out who is behind these companies. We had to reconstruct Mossack Fonseca’s database to publish this part of the offshore registries and to improve the transparency that is lacking in the fiscal paradises.

KC: What is the repercussion of the Panama Papers?

The Offshore Leaks Database we published on May 9 is a success. During the first six days of the publication we had almost 5 million unique visitors and over 35 million page views. This was only during the first six days, so you can have an idea of the interest shown for these data. And this is what data journalism allows us, it gives us a better level of interaction with the users. We worked with the data, pulled out our stories, and then made the data public. Now individuals can search for their own stories, whether or not they are for the press. And this is very interesting because now a new stage of the Panama Papers has started, when the citizen becomes an investigator. Not only citizens, but also public officials. The fiscal authorities in Colombia accessed our data and realized that over 60% of Colombians with offshore accounts didn’t mention these companies. Previously, the journalist would publish the news story and the work would end. Data journalism nowadays allows more interaction with the users. We also see that with interactive products. We created an interactive product called "The power players", with over 60 stories that linked politicians and their families with offshore companies in fiscal paradises. This was one of the most viewed pages, showing that there is a public interest in this kind of more interactive product, that emerged with data journalism, proving again its importance.



Newsletter

Subscribe to our weekly newsletter "Journalism in the Americas"

Boletim Semanal (Português)
Boletín Semanal (Español)
Weekly Newsletter (English)
 
Marketing by ActiveCampaign

Facebook