Storage Systems and Data Management for Kubernetes: Power in Diversity
The last twelve months have been momentous for the growth of stateful applications on Kubernetes. There has been innovation in every corner of the ecosystem ranging from higher-level abstractions such as the Container Storage Interface (CSI) and the widespread adoption of this interface by storage vendors, new cloud-native Software-Defined Storage (SDS) systems (e.g., Longhorn), vendor-specific support such as VMware vSphere’s support for First Class Disks (FCDs) and Cloud Native Storage (CNS), the redesign of database systems to use cloud-native architectural principles (e.g., Vitess).
However, this rapid ecosystem expansion along with the plethora of new infrastructure options has also created open questions in the eyes of the customer. In particular, folks wonder whether storage management (storage volume provisioning, mounting, unmounting, deletion, etc.) and data management (backup, recovery, disaster recovery, portability, etc.) should be combined into a single system or whether these management systems should remain separate.
While it might seem like combining storage and data management is the simpler option, we strongly believe that keeping them separate is the correct choice for customers. This article dives into the reasons behind our opinion and also highlights the risks of getting this decision wrong. We examine the above question in the light of three critical pillars:
- Leveraging Storage Innovation
- Benefits of Software-Defined Storage
- Separation of Concerns
I. Leveraging Storage and Data Management Innovation
While it might not always be visible from the outside, there is tremendous innovation happening in the design and implementation of storage systems. This innovation can be found at every layer of the stack. Looking at hardware, we see new physical storage media (NVMe and Intel’s Optane), new and faster storage fabrics, and hardware storage accelerators from startups such as Pensando and Fungible. Moving into the software stack, we are seeing large scale-out storage systems (file, block, and object) provided by cloud providers as well as innovation into new ways of building storage systems (e.g., Longhorn, a distributed block storage system purpose-built for cloud-native platforms).
At the same time, with Kubernetes, storage systems implementations have been abstracted away by the Container Storage Interface (CSI). Through the use of a common and widely supported API, it has become significantly easier to use distinct storage systems without requiring any application or deployment changes.
Similarly, we have seen an explosion of innovation in the area of data management. The emergence of Kubernetes, a common infrastructure platform that can be found pretty much everywhere today, has allow companies like Kasten to reimagine what it really means to do data management. Instead of only-focusing on only the data, we can now envelop the entire application in a programmatic manner, deliver complex features such as Disaster Recovery in a radically simpler manner, and allow for unparalleled ease of automation and operation.
To combine data management and storage systems would mean that customers would only be able to leverage the innovation that happens by one particular vendor or in a specific solution. Instead, we strongly believe that customers should have the freedom of choice to both pick the right storage architecture for the environment they are running in (e.g., EBS when running in AWS and VMware’s vSAN on-prem) as well as the right data management solution that can work across all these environments.
II. Benefits of Software-Defined Storage
We are very excited about the relevance of software-defined storage systems in cloud-native environments. These systems have the ability to better conform to an application’s requirements (look out for upcoming blog posts in this area), can work on a wide variety of hardware in your data center, and can be cost-effective for test/dev environments and workloads with dynamic footprint characteristics.
While the arguments put forward to tightly pair data management with a specific storage system might sound appealing on the surface (e.g., reduce complexity, better cost management, etc.) but a closer look reveals that this is like trying to fit a square peg in a round hole. Applications, composed under the hood of multiple microservices and various databases, are the natural unit of atomicity in the cloud-native environment. Every application has different business continuity requirements in terms of backup consistency, speed of backup and restore operations (RPO/RTO), recovery granularity etc. This means a good data management solution needs to work at multiple layers of the application - the various flavors of storage (file, block, object) and a growing universe of databases (relational, NoSQL, times series and graph databases, etc.). Given this complexity, conflating storage and data management runs the risk of creating a solution that is sub-optimal for both needs.
Additionally, with some of these storage systems that say they provide better data management by layering themselves on top of other storage systems (e.g., an architecture where the SDS system itself sits on another storage system like EBS), the cost, management complexity, and security surface concerns can more than double while suffering from significantly reduced IO performance as all data goes through two storage stacks.
III. Separation of Concerns
Finally, borrowing from Kubernetes’s philosophical underpinnings, one can make the core argument that separation of concerns is a basic tenet to make problems simpler and making large problems tractable. The same reasoning can be followed to argue why storage management and data management should be separated.
One of the strongest points in favor of this separation is that of fault isolation and, in particular, when applied to the core data management tasks of backup and recovery. Backup and recovery workflows are the last line of defense when it comes to safeguarding your data. However, as everyone that has scar tissue from traditional IT systems will tell you, placing backup and recovery in the hands of the storage system carries high risk.
Accidental (e.g., operator error) or malicious (e.g., ransomware) data outages carry the risk of the error propagating to the entire system including backups. There is an additional risk as these systems sometimes conflate the concepts of storage replication for availability and backups for resiliency. While high availability might ensure your data is available when a particular data center or region goes down, it also allows corruptions to get quickly replicated across all sites. Similarly, unexpected downtime in the storage system and all its replicas (e.g., via “query of death” issues) will leave your application backups unavailable exactly when you need them the most.
A true separation of concerns allows for data and storage management systems to evolve independently and, as mentioned earlier, allows for feature depth and provides the product focus needed to innovate in their respective areas. Truly, the K10, the leading data management platform for Kubernetes might not have evolved to provide true multi-cloud independence with a plethora of vendor choices had it been deeply tied to a storage system.
As the above sections have shown, we believe that storage and data management are extremely powerful when used together but show weakness when they are integrated into the same system. In fact, we have seen this exact pattern play out in the traditional Virtual Machine (VM) ecosystem where there are a number of large storage vendors (e.g., Dell EMC, Pure Storage, NetApp) but there is an equally vibrant community of data management vendors (e.g., Commvault, Veeam, Veritas, Rubrik). We expect the same separation to be powerful for customers and users in the cloud-native ecosystem too!
This article originally appeared in Storage Magazine.
Niraj Tolia is the CEO and Co-Founder at Kasten and is interested in all things Kubernetes. He has played multiple roles in the past, including the Senior Director of Engineering for Dell EMC's CloudBoost family of products and the VP of Engineering and Chief Architect at Maginatics (acquired by EMC). Niraj received his PhD, MS, and BS in Computer Engineering from Carnegie Mellon University.