Tips and Trends Industry Advice and Developments

DIY E-Discovery: Computer Files and Electronic Evidence 101

A big part of a lawyer’s job is to identify and present evidence, and most evidence in today’s world is digital — existing as a series of 1s and 0s on computers. 
Jeff Kerr

Unfortunately, many lawyers have no idea how to identify, preserve, collect, produce or present digital evidence. As a result, they fail to discover helpful evidence or fail to produce evidence to which the other side is entitled. In this guide, we’ll cover the basics that you need to know about computer files as it relates to e-discovery.

Other than witness testimony, nearly all the evidence worth discovering in litigation is electronic. More often than not, it’s emails, text messages, digital images and video, metadata, and Microsoft Office files that tell us the truth about what really happened.

In the context of software applications, native files refer to the file format which the application was designed to work with and in which it was originally created. Notably, a file can be described as native only with reference to its origin.

For example, an Adobe Portable Document Format (PDF) file could be native in one scenario but non-native in another. If the PDF document was originally created in Microsoft Word but later saved as a PDF file (or virtually “printed” to a PDF file), then the PDF version of the Word file would not be a native file. On the other hand, a PDF file would be a native file if it was originally created in an Adobe application.

WHY ARE NATIVE FILES THE BEST OPTION?

One of the biggest obstacles for a lawyer trying to obtain electronically stored information (ESI) from the opposing party is the widespread practice of converting electronic evidence from its original, native form into some other form — and, in the process, degrading its ability to reveal the truth.

Instead of sending over the original files of spreadsheets and text documents, many firms will first convert those files to TIFF images or PDFs. This can result in profound differences in the information available in the file. In their new format, converted files are stripped of their original metadata, such as the creation time stamp, author, etc.

In addition to stripping away relevant case information, converting native files to a different form also provides a convenient way to hide key documents in an avalanche of unsearchable irrelevance. While native electronic files are generally text‑searchable, conversion frequently strips out searchable text, making it more difficult to get the information you need.

THE SIMPLE TRICK TO GETTING NATIVE FILES

Ask for them! It’s best to explain why you need native files at the earliest possible meeting with your adversary, to document your request for native files, and to include detailed instructions for producing native files in your document requests.

If your opponent refuses to comply, it’s often worth your time to take the issue to court. Discovery is about revealing the truth, and native documents serve this goal much better than degraded pseudo-copies in TIFF, PDF or paper form.

VERIFYING AUTHENTICITY

Computer files are ultimately nothing but a series of 1s and 0s. This series remains the same unless the file is modified. For example, if we opened a document in Microsoft Word and changed one of the sentences, we would expect the series of 1s and 0s to be changed. If we did not change the file but instead merely moved it to a different directory or emailed it to a coworker, we would expect the series of 1s and 0s to remain exactly the same.

The easiest way to tell whether a file has been changed, or whether two files are in fact identical, is the MD5 hash. This algorithmic method provides a file with its own unique 32-digit hexadecimal number, or checksum, that serves as a digital fingerprint for the file.

The MD5 function has the following properties:

  • The same file will always* have the same MD5 checksum.
  • Different files will always have different MD5 checksums.
  • A small change in the code that composes a file — even changing a single bit — will result in a scrambling of the original MD5 checksum.

Because of these properties, a file’s MD5 checksum can be usefully considered a fingerprint for the document.

The fact that even a tiny change in the content of a native file will dramatically alter the file’s MD5 checksum is useful for many purposes. When a native file constitutes evidence in a legal proceeding, its MD5 checksum should be computed at the earliest opportunity. With the MD5 checksum in hand, proving that the file was not modified during the pendency of the case is as simple as recalculating its MD5 checksum and verifying that it is the same as it was at the beginning. 

*So-called “collisions” where two different files have the same MD5 hash can be engineered, but the odds of a natural collision are so microscopically small that you can safely rule them out.