Exploring Technology Assisted Review
2nd August 2017
Following last year’s ruling in the UK High Court, endorsing the use of Technology Assisted Review (TAR), the UK eDiscovery industry has now changed to incorporate the use of predictive coding software to aid document review.
What exactly is Technology Assisted Review?
Technology Assisted Review has become a crucial tool within eDiscovery due to the increasing amounts of data generated and the disproportionate amount of time and costs to carry out an electronic document review. Technology Assisted Review is (or TAR for short) is software built using mathematical algorithms and statistical sampling to code documents automatically. The software is trained, using a seed set of documents, coded by an expert, to determine what is a ‘relevant’ document and what is not.
At CYFOR, we use Relativity Assisted Review (RAR) as our predictive coding tool. RAR uses functionality called ‘Categorisation’ to arrange the documents into groups of ‘Relevant’ and ‘Not Relevant’ documents. Categorisation uses Relativity’s analytics engine to look at textual concepts within a document set. This is based on a type of textual analytics called Latent Semantic Indexing (LSI). The analytics engine will look at concepts within a document and identify other documents containing similar textual content. This allows us to teach the system about the types of documents we are interested in and then allow the analytics engine to categorise them accordingly.
CYFOR works alongside clients in the early stages of a litigation to determine whether assisted review is the best way forward for the project at hand. Once CYFOR have been instructed by a client to carry out an assisted review project, the eDiscovery team guide the reviewers through the necessary steps to achieve the desired outcome:
Relativity needs to be able to measure the accuracy of the assisted review project. This is measured using a control or truth set, a statistically significant, random sample taken from the data set. These documents are batched out for manual review and simply coded as either ‘Relevant’ or ‘Not Relevant’. The results of this round are used as a marker for determining the F1 measure, a calculation used to monitor the stability of the project.
Pre-coded Seed Round
Training the system to code documents effectively can take time. In cases where a manual review has already been carried out on a set of documents, these documents can be used as pre-coded seeds. Using the same designation field as created for the assisted review project, the documents can be used as examples of ‘Relevant’ and ‘Not Relevant’ documents and used to categorise more documents in the assisted review project.
Training rounds are carried out to teach the system how to categorise documents. During training rounds, a document sample is batched out and manually reviewed, preferably by someone particularly familiar with the case. Documents can be coded simply as either ‘Relevant’ or ‘Not Relevant’. Documents can also be tagged as examples by checking a ‘Use as Example’ box. This would be in instances where a document has a good amount of text and is deemed a good example of a ‘Relevant’ or ‘Not Relevant’ document. Alternatively, an extract of text can be copied from a document and pasted into a text box named ‘Use Text Excerpt’.
See below layout example:
Quality Control Round
On completion of training rounds, a quality control round is executed on the categorised documents to test how accurately the system has grouped the documents. A sample of the documents already categorised as a result of the training rounds are batched out for manual review. The system can then compare how many documents have been categorised correctly by the system and how many have been ‘overturned’. An ‘overturn’ is where the system has categorised a document, for example, as ‘Relevant’ and a manual review has coded the documents as ‘Not Relevant’. The number of ‘overturns’ can be measured and analysed to identify and correct issues within the assisted review project.
The below diagram presented by Kcura, shows the review cycle of a Relativity Assisted Review Project:
In summary, the aim of the training rounds and QC rounds is to categorise as many documents as possible, as accurately as possible, in line with the F1 score (a measure of a test’s accuracy), agreed at the start of the project. The end result is a set of coded documents which can be used for a timely production, to prioritise review by pushing the ‘Relevant’ document to the review team first or to batch out all the ‘Relevant’ documents for review. Whatever the reason for your assisted review project, predictive coding will be utilised more and more in the future as data sizes grow and manual document review costs soar. Technology assisted review is quickly becoming an essential tool in litigations, prioritising documents for review whilst reducing time and overall costs.