I was at the EMC writer’s conference this past Friday, speaking on Text Analytics and ECM. The idea behind the conference is very cool. EMC brings together writers and bloggers, from all over the world, to discuss topics relevant to content management. All of the sessions were great. We discussed Cloud, Web 2.0, Sharepoint, Text Analytics, and e-Discovery.
I want to focus here on the e-Discovery discussion, since e-Discovery has been showing up on my top text analytics applications list for several years. There are a growing number of vendors looking to address this problem (although not all of them may be making use of text analytics yet) including large companies like EMC, IBM, Digital Iron Mountain, Microsoft and smaller providers such as Zylab.
Ralph Losey gave the presentation. He is a defense lawyer, by training, but over the years has focused on e-Discovery. Losey has written a number of books on the topic and he writes a blog called e-Discovery Team. An interesting fellow!
His point was that “The failure of American business to adopt ECM is destroying the American system of justice.” Why? His argument went something like this:
- You can’t find the truth if you can’t find the evidence. As the amount of digital data explodes, it is harder to find the information companies need to defend themselves. This is because the events surrounding the case might have occurred a year or more in the past, and the data is buried in files or email. I don’t think anyone will argue with this fact.
- According to Losey, most trial lawyers are luddites, implying that they don’t get technology. Lawyers aren’t trained this way so they are not going to push for ECM systems, since they might not even know what they are. And corporate America is putting off decisions to purchase ECM systems that could actually help organize some of the content and make it more findable.
- Meanwhile, the cost of litigation is skyrocketing. Since it is so expensive, many companies don’t go to court and they look to private arbitration. Why spend $2M in e-Discovery when you can settle for $3M? Losey pointed to one example, in the Fannie Mae securities litigation (2009), where it cost $6M (or 9% of the annual budget of the Office of Federal Housing Enterprise Oversight) to comply with ONE subpoena. This involved about 660,000 emails.
- According to Losey, it costs about $5 to process one computer file for e-Discovery. This is because the file needs to be reviewed for relevance, privilege, and confidentiality.
Can the American justice system be saved?
So, can e-Discovery tools be used to help save the justice system as we know it? Here are a few points to ponder:
- Losey seems to believe that the e-Discovery process may be hard to automate since it requires a skilled eye to determine whether an email (or any file for that matter) is admissible in court.
- I’m not even sure how much corporate email is actually being stored in content management systems – even when companies have content management systems. It’s a massive amount of data.
- And, then there is the issue of how much email companies will want to save to begin with. Some will store it all because they want a record. Others seem to be moving away from email altogether. For example, one person in the group told us that his Bank of America financial advisor can no longer communicate with him via email! This opens up a whole different can of worms, which is not worth going into here.
- Then there is the issue of changing vocabularies between different departments in companies, people not using certain phrases once they get media attention, etc. etc.
Before jumping to any conclusions let’s look at what vendors can do. According to EMC, the email overload problem can be addressed. The first thing to do is to de-duplicate emails that could be stored in a content management system. Think about it. You get an email and 20 people are copied on it. Or, you forward someone an email and they don’t necessarily delete it. These emails would pile up. De-duplicating emails would go a long way in reducing the amount of content in the ECM. Then there is the matter of classifying these emails. That could be done. Some of this classification would be straight-forward. And, the system might be able to be trained to look for those emails that might be privileged, and classify these accordingly, but this would no doubt still require human intervention, to help with the process. Of course, terminology will change, as well and people will have to stay on top of this.
The upshot is that there are certainly hurdles to overcome to put advanced classification and text analytics in place to help in e-Discovery. However, as the amount of digital information keeps piling up, something has to be done. In this case, the value certainly would seem to outweigh the cost of business as usual.
Interesting post Fern!
I never thought about all the digital data that is “piling up.” De-duplicating email is one solution but as a consultant I tend to look for the root of the problem and work to find solutions that prevent it from occurring in the first place. Has anyone thought about designing email systems that prevent or at least minimize duplication?
I know I have a lot of other digital junk on my PC that is my own doing. I have versions upon versions of the same document – all incremental saves to protect the data from a crash. I know I should clean it up but since storage is cheaper than my time, I don’t.
Regarding Bank of America’s email policy, I assume they now want their financial advisors to electronically communicate only via their system. Sounds like a good idea to me. Presumably their system will do a much better job of managing the communication, from a content perspective, than an email system. There are drawbacks but from a content and controls perspective, I like it.
Jim Catalano
Fern, do you have a link to the official page for the conference. It sounds great, might attend next year.
Dan Gordon