A project closes in on a protocol to improve e-discovery results Jump to full article: ABA Journal (American Bar Association), 2009-04-02 Author: Jason Krause
Intro: Three years ago, a handful of lawyers and scientists started the quest, a project to save litigation from being buried in an avalanche of electronic documents. Since then, the Text Retrieval Conference Legal Track has been using different types of computer searches to wade through huge piles of digital information, hoping to get closer to a complete picture of what is issue-important in a computer’s data stores.
The good news: The TREC Legal Track team believes it is close to finding a protocol that can work. The bad: The project also found disturbing problems with the way lawyers work today.
And the harshest conclusion: Keyword searching—what most lawyers use to find litigation documents—misses the majority of relevant documents.
. . .
Later, as director of litigation for the U.S. National Archives and Records Administration, Baron was assigned a request to review documents pertaining to tobacco litigation in U.S. v. Philip Morris. . . .
“It was obvious to me that the volume of information was overwhelming us in litigation, and the technology we have to deal with it was just not sufficient,” Baron says.
He figured someone somewhere in the federal government must have done some research on the topic of information retrieval. In fact, he discovered that the U.S. Department of Commerce’s National Institute of Standards and Technology had been conducting a 15-year investigation on retrieval of text from large document collections.
When Baron approached the government scientists involved, they were thrilled to have a real-world problem to tackle as part of what had been a pure research project. TREC Legal Track, begun in 2006, is now co-sponsored by NIST and subagencies of the U.S. Office of the Director of National Intelligence. . . .
Here’s where the tobacco litigation archive comes in. Legal Track is using the nearly 7 million publicly available documents from the master settlement agreement database, a collection of tobacco documents produced in relation to several state lawsuits against the industry. That database was chosen because it contains a wide spectrum of types of documents.
At that target cache, TREC Legal Track is aiming 13 hypothetical legal complaints (PDF). Written like normal legal documents, they contain all the information included in real-world complaints for fictional tobacco-related lawsuits, such as campaign finance violations, class actions, antitrust investigations, securities litigation, patent infringement and wrongful death suits. The most important part is the search terms these hypotheticals lay out.
Jump to full article » |