Text Analytics Meets Enterprise Content Management

Late last summer, Hurwitz & Associates published a report on Text Analytics.  As part of the report we surveyed companies that had deployed the technology, were planning to deploy the technology, or had no plans to deploy text analytics.  We asked companies planning to deploy text analytics solutions whether they planned to integrate it with their BI solution and whether they planned to implement it with their content management solution.  It turns out that the majority of companies planned to use text analytics with both their BI and content management systems.  In fact, sixty-two percent stated that they plan to deploy the technology in conjunction with their content management systems. 


A few definitions are in order.  Enterprise Content Management (ECM) generally refers to a set of technologies that are used to acquire, manage, store, and serve up content.  Content management software usually has some sort of categorization capability to help classify content information and search to help access the information.  But, what about actually analyzing this unstructured content? That’s where text analytics comes in.  Hurwitz & Associates defines text analytics as the process of analyzing and extracting relevant unstructured text and transforming it into structured information that can then be mined and analyzed in various ways.  Currently, the most popular method of deploying text analytics is as part of a business intelligence solution.  This makes sense – it is a comfortable paradigm.   So, how would deploying text analytics with a content management system work?  Text analytics can be used as part of a content management solution in many ways including: 

·        To help feed the content repository.  Text analytics can help categorize or enrich content.  In life sciences as well as other industries, regulations are mandating that notes, etc. be put in electronic form.

·        To better categorize all documents related to each other.  As a vertical application on top of a repository.  For example, in the area of legal compliance, if one document is tagged as sensitive in a legal case, then it is necessary to find all other docs that relate to this.

·        To better analyze information in the repository. As a means to actually extract information from the content repository and use it for analysis purposes. This might include, for example, extracting information from email complaints, merging it with information found in other systems and using it for analysis.

·        As part of the workflow.  Here, as digital assets are coming into a content repository, information is extracted, merged with other enterprise information and fed it into the workflow process.  For instance, in the email example above, the idea would be to pull information out of the email and merge it with information about the customer, their invoices, etc. found in other systems, and feed this to a customer care agent. The text analytics software can sit on top of the content management repository.  It can access the content via a pre-built connector that acts as the gateway and retrieval of the documents. 

Once there, information can be extracted (terms, facts, etc.)  from the documents and then either stored within the text analytics vendor’s repository or, another data store, or within the content management system itself.   Hurwitz & Associates is seeing a small but growing number of companies looking to implement this type of model.  So far, only a handful of content management vendors are providing this functionality.  Let’s look at some of the top players: 

·        OpenText.  As part of its LiveLink ECM product, OpenText is building text analytics that will be delivered in a series of solutions.  These solutions will link ECM to other enterprise systems and provide algorithms to identify entities, relationships, etc. from the content and applications and use it, for example as part of a workflow process. The infrastructure will be built and in place in the next 12 to 18 months. 

·        IBM. Last summer, IBM announced the integration of its Omnifind Analytics Edition Product, which uses linguistic understanding and trend analysis to allow users to search, mine, and analyze the combined information from their unstructured content and structured data, with ECM solutions including its FileNet products. IBM provides the capability for exploration and mining of enterprise content, as well as services to add analytics-derived content insight to vertical content  applications.

·        EMC Documentum. EMC has a partnership with TEMIS, a provider of text analytics solutions that is particularly strong in the life sciences.  EMC’s Content Intelligence Services, an extension to its content management platform, automates this task by intelligently mining the pertinent information from the document and tagging and classifying the document.

I expect that in the next year, we will finally see some real action in text analytics on the content management front. 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s