Panama Papers – using eDiscovery power to process the data
5th April 2016
Panama Papers – using eDiscovery power to process the data
The recently published Panama Papers has been described as the biggest data leak in journalistic history and is the culmination of an anonymous source leaking the information to German newspaper Süddeutsche Zeitung. The leaked data contained approximately 11.5 million encrypted internal documents, totalling 2.6 terabytes of data and detailed the activities of Mossack Fonseca, a Panamanian law firm that sells anonymous offshore companies around the world.
These offshore shell companies are legal within the jurisdictions they are registered in, however the subsequent investigation has revealed that these entities set up by Mossack Fonseca have helped clients in a long list of unlawful activities including money laundering, tax evasion, drug trafficking and fraud.
The 2.6 TB
Süddeutsche Zeitung analysed the data in cooperation with the International Consortium of Investigative Journalists (ICIJ), who had already been involved in Offshore Leaks, Luxembourg Leaks, and Swiss Leaks. However, the Panama Papers 2.6 terabytes of data, far exceeds the combined total of this other leaked data, including Wikileaks/ Cablegate;
- Wikileaks/ Cablegate (2010) – 1.7GB
- Offshore Leaks (2013) – 260 GB
- Luxembourg Leaks (2014) – 4GB
- Swiss Leaks (2015) – 3.3GB
The 11.5 million documents within the Panama Papers data is comprised primarily of emails, databases, PDF files and image files covering a period of over 30 years.
To process this vast amount of information, Süddeutsche Zeitung and ICIJ used the eDiscovery analytics software Nuix to process the data. Using Nuix allowed for the 2.6 terabytes of information to be processed, indexed and have optical character recognition (OCR) applied at high speed, which transformed the data into machine-readable and searchable files/text. This also de-duplicated the data, which is critical when dealing with high volumes of information and in this case removed approximately a third of the data.
By feeding Nuix with this information, the journalists involved were able to apply specific search terms and create lists of the individuals named within the data, which has subsequently highlighted the involvement of politicians, Fifa officials, fraudsters, drug smugglers, celebrities and professional athletes.
Rise of the Machines
Before the advent of the digital age, journalists, lawyers and anyone else needing to process high volumes of data to gain the relevant information needed in a specific matter were faced with a daunting task. Disseminating mountainous piles of paper were typically the norm and going through by hand was an exhausting and time consuming process.
However, with the acceleration of technology over the years, eDiscovery was born. Within that came powerful review platforms such as Nuix and Relativity, which enable users to process an amount of data that in the past been deemed impossible. The comparisons below put that into perspective…
- Byte of data – one grain of rice
- Kilobyte – cup of rice
- Megabyte – 8 bags of rice
- Gigabyte – 3 container lorries
- Terabyte – 2 container ships
- Petabyte – Covers Manhattan
- Exabyte – Covers the UK 3 times
- Zettabyte – Fills the Pacific Ocean
In a stark comparison to Nuix processing large amounts of ESI for the Panama Papers case, in 2014, CYFOR were instructed by the company of a Ukrainian industrialist and political figure involved in multi-million pound litigation. This was being fought in the English courts and CYFOR were asked to conduct a large-scale and complex eDiscovery investigation involving 2.5 terabytes of data held on a variety of data sources and multiple companies in the litigation.
The commercial dispute centred around agreements made – or not made – between the parties. The alleged breach of contract and breach of trust necessitated CYFOR identifying Electronically Stored Information (ESI) relevant to the case. A keyword analysis was then executed on 450 concurrent keywords ranging from three different languages, across a multitude of electronic documents relevant to the case.
CYFOR’s eDiscovery investigation team were dispatched to the client’s headquarters in the Ukrainian capital of Kiev. At the client’s request, the entire digital forensic investigation was to be conducted at its location, wherein the team set up a laboratory which conformed to the high standards needed of a forensically sound process. Then, over 4 phases totalling 33 days, the investigation of the documentary evidence was initiated.
CYFOR expert and lead investigator on the case John Young, commented:
“It is not unusual – as is the case here – that a client requires that the evidence remain on its premises. We could not bring the evidence back to our laboratories in the UK, so we were first challenged to create the environment we needed whilst on site. During the time my investigation team were involved in the project, Kiev was in political turmoil but we were able to fly our specialist kit over. Responding to this kind of request would drain the resources of many other eDiscovery providers, especially within a politically charged environment, something my experienced investigators are well used to handling.”
CYFOR engaged Nuix to process, search and index the 2.5 terabytes of data stored in emails, documents, SMS text messages, web logs and other electronic artefacts, taking 6 days in all for this initial phase.
A total of 1.7 million items were found to respond to the keyword search. By applying advanced searching and automated technologies, CYFOR eliminated irrelevant documents, enabling focus on a more manageable fraction of the ESI that were crucial to the matter. The end export totalled 40 thousand relevant items.
John Young, added:
“To add a layer of sophistication to the project, the documents we were tasked to analyse were in English, Ukrainian, Polish and Russian – a common issue facing international organisations. As anyone who knows Russian will tell you, translating into English is no easy task; there are 12 different word endings alone that are not present in the English language, Ukrainian has 16!”
John Young concluded:
“Invariably the most expensive part of the litigation process is eDiscovery, but it is also often hugely disruptive. Not only must digital forensics investigators have the resources to react to challenging client requests like maintaining confidentiality and privilege in cross-border and multi-jurisdictional matters, they provide a cost-effective solution when managing vast amounts of electronic documents.”