I recently had the chance to get an update on what SPSS is up to in text analytics. It was an interesting conversation for several reasons:
- First, it highlighted an important point about text analytics – which we know but is worth repeating – which is that the analysis of unstructured data can be more useful, in many scenarios, when accompanied by structured data.
- Second, it got me thinking more about social media/network analysis, which prompted the question on the recent “four questions about innovations in analysis” blog I recently posted.
A few words of background. SPSS’s goal is help its customers analyze everything about data associated with people – behavior, attitudes, and so on to help an organization understand anyone it interacts with. In fact, Olivier Jouve, VP of Corporate Development at SPSS was quite clear that SPSS is not a BI company. Rather, SPSS software helps to enable what SPSS refers to as the “Predictive Enterprise”. The Predictive Enterprise makes use of analytics (not simply reports) to help manage multiple dimensions across the enterprise including customer intimacy, product placement, and even operational issues such as fraud.
SPSS offers a suite of text-mining products that is based on 25 years of research in the application of natural language processing (NLP) technologies. In 2002, SPSS bought LexiQuest™, a linguistics-based text-mining company, intending to combine LexiQuest’s extraction capabilities with SPSS’s data-mining capabilities in order to strengthen the company’s position in predictive analytics. All of SPSS’s text-analytics products now share this same core linguistic functionality.
It’s not just about text
While the market for text analytics has moved out of the early adopter stage, depending on what type of analysis you’re trying to accomplish, it often is not just about the text.
For example, consider the following churn scenario: A telecommunications company is concerned about churn. The company realizes that it has a wealth of information at its disposal to help predict churn. On the structured data side it has collected demographic information, usage information, trouble ticket, and product information about each of its customers. On the unstructured side, it also has collected call center notes, emails, and customer satisfaction surveys. The company decides to invest in text analytics software that can sift through its call center notes, emails, and survey notes. At the end of the exercise, the company has some great insight into customer complaints that it can certainly act on. However, it has not exactly gotten the information it might need to solve the churn problem. In order to do this, it is probably more useful to marry the unstructured information from the call centers and surveys and emails to an actual customer and all of the structured information about that customer. This way, using some predictive modeling the company can train its system to zero in on those customers that are likely to drop its service and make the right decisions to help retain them.
According to SPSS many of its customers have seen upwards of a 50% reduction in churn by combining data mining with text mining.
Social media is becoming an important source of information for companies
What about other forms of media such as blogs, message threads, etc.? SPSS is also moving into social network/media analysis because as Olivier said, “The number of people participating in Web 2.0 activities is growing rapidly across all age groups, and businesses are using the direct influence they have traditionally had over customers’ decisions about their products. Peer to Peer networks are now a trusted source of insight and information.” This is quite true. Our recent Hurwitz & Associates survey confirmed that companies do plan to make use of the information found in various kinds of social networks, even if they don’t think they are making use of text analytics. One interesting point on this front is that blogs, message boards, etc. do provide a great source of information of customer sentiment, opinions, etc. The challenge will be mapping this kind of information back to the other information that a company keeps about its customers, and making sense of the behaviors. I’ll look forward to hearing more about what SPSS is doing to help solve this problem.