At TDWI, we’re seeing that organizations are dealing with complex data environments. In our surveys we see that more than half of the respondents are already managing 10s or 100s of TB of data. Of that group, 15% are already managing petabytes of data. That data includes legacy data and primarily structured data, but also new data types such as unstructured data, machine data, perhaps from IoT, image data and other kinds of data. Because of this, more than 45% of respondents in recent surveys have said that they need to expand their data strategy to deal with this data, and of course, to support new forms of analytics to derive insight, take action, and build applications. They are looking to modernize their environments. Two approaches include the cloud data stack and the data fabric.
The Cloud Stack
The cloud stack is a set of integrated services. In other words, it is a stack of technologies, or a set of related cloud services that are provisioned together. The cloud provider may offer these services with its partners. The idea is that you are centralizing or unifying the data in a cloud stack for use in analytics. In a 2022 survey, we asked what is the best way to unify your data storage environment? Thirty-three percent believe that a cloud stack is the best way to unify their data storage environment.
There are numerous benefits to centralizing on a cloud stack which are worth considering. First, some organizations like the idea of centralizing all of their data for analytics in one place. That helps with enriching data sets and building out robust data sets for analytics. It helps in governing the data all in one place. They like the fact that the cloud stack is scalable for compute intensive analytics such as machine learning and can handle diverse data types like text data or image data. Cloud providers have also done a good job providing lots of services as part of the stack. That includes data infrastructure, security, analytics, as well as data governance services. Some feel that a centralized platform too, is easier to govern.
However, there are also cons. Migration to the cloud can be difficult and it can take time. Some organizations do a lift and shift, but you really want to improve whatever you’ve done in the past when you move to the cloud. Importantly, it is going to be hard to centralize all of your data. For many organizations, they won’t be able to centralize everything. A large organizations will have legacy systems that will be difficult to centralize, for instance. Additionally, the cloud also requires a new skill set that needs to be planned. Finally, some companies are concerned about vendor lock-in with the cloud.
The Data Fabric
Another approach is the data fabric. At TDWI we’re defining a data fabric as a way in which to bring together disparate data in an intelligent fashion. The data fabric maps and connects relevant application data stores, with metadata to describe data assets and their relationships. So basically, you’re trying to unify your architecture. The fabric is not necessarily one tool or process, it is an emerging architecture with an integrated set of technologies and services to deliver integrated and enriched data. The data fabric combines key data management technologies – such as data catalog, data governance, data integration, data pipelining, and data orchestration. Some organizations accomplish the fabric using a modern semantic layer or other approaches. The point is with the data fabric, you’re leaving the data where it is and creating a fabric across the whole data environment that enables you to bring data together for analytics.
Again, there are pros and cons with this approach. A big pro is that you’re able to enrich data sets for analytics from multiple silos. With the fabric and the fabric tools, you can provide a common language, via the metadata to use to find and access data. Also, data is not replicated which is a plus for governance. Fabric providers offer up tools as part of their solutions to help with data governance and other aspects of the data and analytics life cycle.
The cons include the fact that the environment can be complex and you’ll need the skills to implement it. Historically too, some methods have had performance issues. So that may not be good for methods like machine learning or other advanced analytics.
In our research, while 33% say that they want to use a cloud stack for unification, about 20% say they use/will use a data fabric. Not surprisingly, the rest say they will use a combination of approaches.
At the end of the day, it is going to be important to weigh the challenges vs. the benefits of either of these approaches. For instance, moving from a centralized model to a decentralized one may make sense for organizations with multiple business units where benefits might outweigh the costs and they have the resources to staff each domain. It can be difficult for others. Likewise, if you just moved your data to the cloud and centralized it there, maybe it doesn’t make sense right now to consider another approach. You’ll also need to consider new roles and skills sets – whether that is understanding cloud platforms or how to build a semantic layer. Finally, your company will need to be organized to execute. Perhaps for the cloud model the centralized organizational model makes sense. Does it for a fabric approach? Maybe a hub and spoke model with a central data office works better for that. The jury is still out. Your organization will need to decide what is best given its particular set of circumstances.