I attended EMC’s User Conference last week in Las Vegas. The theme of the event was Big Data meets the Cloud. So, what’s going on with Big Data and EMC? Does this new strategy make sense?
EMC acquired Greenplum in 2010. At the time EMC described Greenplum as a “shared nothing, massively parallel processing (MPP) data warehousing system.” In other words, it could handle pedabytes of data. While the term data warehouse denotes a fairly static data store, at the user conference, EMC executives characterized big data as a high volume of disparate data, which is structured and unstructured, it is growing fast, and it may be processed in real time. Big data is becoming increasingly important to the enterprise not just because of the need to store this data but also because of the need to analyze it. Greenplum has some of its own analytical capabilities but recently the company formed a partnership with SAS to provide more oomph to its analytical arsenal. At the conference, EMC also announced that it has now included Hadoop as part of its Greenplum infrastructure to handle unstructured information.
Given EMC’s strength in data storage and content management, it is logical for EMC to move into the big data arena. However, I am left with some unanswered questions. These include questions related to how EMC will make storage, content management, data management, and data analysis all fit together.
• Data Management. How will data management issues be handled (i.e. quality, loading, etc.)? EMC has a partnership with Informatica and SAS has data management capabilities, but how will all of these components work together?
• Analytics. What analytics solutions will emerge from the partnership with SAS? This is important since EMC is not necessarily known for analytics. SAS is a leader in analytics and can make a great partner for EMC. But, its partnership with EMC is not exclusive. Additionally, EMC made a point of the fact that 90% most enterprises’ data is unstructured. EMC has incorporated Hadoop into Greenplum, ostensibly to deal with unstructured data. EMC executives mentioned that the open source community has even begun developing analytics around Hadoop. EMC Documentum also has some text analytics capabilities as part of Center Stage. SAS also has text analytics capabilities. How will all of these different components converge into a plan?
• Storage and content management. How do the storage and content management parts of the business fit into the big data roadmap? It was not clear from the discussions at the meeting how EMC plans to integrate its storage platforms into an overall big data analysis strategy. In the short term we may not see a cohesive strategy emerge.
EMC is taking on the right issues by focusing on customers’ needs to manage big data. However, it is a complicated area and I don’t expect EMC to have all of the answers today. The market is still nascent. Rather, it seems to me that EMC is putting its stake in the ground around big data. This will be an important stake for the future.