Latest Posts


Backup and Disaster Recovery for Rook+Ceph with Kasten K10

K10 Ceph CSI Rook Blog Image

With the increasing growth of stateful applications in Kubernetes, we have also seen the rapid growth in the usage of Ceph to provide storage to cloud-native applications. More often than not, we see Ceph clusters being predominantly deployed via the Rook cloud-native storage orchestrator. In fact even Red Hat, with their release of OpenShift 4, has moved their OpenShift Container Storage (OCS) platform to Rook+Ceph.

Given this shift, we have worked hard so that K10, our enterprise data management platform, not only integrates seamlessly with Rook+Ceph but also is extremely easy to use. If you are not familiar with K10, it is a secure software-only product that has been purpose-built for Kubernetes and provides operations teams an easy-to-use, scalable, and secure system for backup/restore, disaster recovery, and mobility of Kubernetes applications. K10’s unique application-centric approach and deep integrations with relational and NoSQL databases, Kubernetes distributions, and all clouds provide teams the freedom of infrastructure choice without sacrificing operational simplicity.

To demonstrate the power of K10 with Rook+Ceph and the Container Storage Interface (CSI), this blog will run through all the steps needed to get the combination of K10, Rook+Ceph, and your favorite applications up and running. Apart from a Kubernetes cluster, all you need to get up and running is the Helm package manager and git (to checkout Rook code).

The rest of this post is broken up into three parts:

  1. Installing Rook+Ceph in Kubernetes
  2. Installing K10, An Enterprise-Grade Data Management Platform
  3. Backup and Restore Workflows With a Test Application

I. Installing Rook+Ceph, CSI-based Storage

Rook and Ceph Logos

We are going to use Rook to provision Ceph-based block storage in Kubernetes. While more details on the installation process are available elsewhere, we will walk through the “happy-path” steps here.

Setting up a Ceph RBD (Block) Cluster

First, we need to obtain the Rook source code to set Ceph up.

$ git clone -b v1.2.4
Cloning into 'rook'...

$ cd rook

Now, we will install Rook and then Ceph into the rook-ceph namespace.

$ cd cluster/examples/kubernetes/ceph

$ kubectl create -f common.yaml
namespace/rook-ceph created created

$ kubectl create -f operator.yaml
deployment.apps/rook-ceph-operator created

At this point, we need to verify that the Rook Ceph operator is in the Running state. This can be performed via a command like

$ kubectl --namespace=rook-ceph --watch=true get pods

Once you have verified the operator is running, you can install a test Ceph cluster.

$ kubectl create -f cluster-test.yaml created

If you would like to verify that Ceph is running correctly, simply run the following and ensure the cluster health is shown as HEALTH_OK.

$ kubectl create -f toolbox.yaml
$ kubectl --namespace=rook-ceph exec -it $(kubectl --namespace=rook-ceph \
    get pod -l "app=rook-ceph-tools" \
    -o jsonpath='{.items[0]}') ceph status
    id:     <cluster id>
    health: HEALTH_OK

Setting Up Storage and Snapshot Provisioning

Once you have verified that all the above steps completed without errors and that all pods in the rook-ceph namespace are running without problems, we can turn on dynamic storage provisioning as well as volume snapshotting definitions.

First, install the StorageClass and VolumeSnapshotClass using

$ kubectl create -f csi/rbd/storageclass.yaml
$ kubectl create -f csi/rbd/snapshotclass.yaml

Finally, we need to switch the cluster’s default storage class to the new Ceph storage class. Assuming the old default was gp2 (if running in AWS), we can execute the following commands to make the switch.

$ kubectl patch storageclass gp2 \
    -p '{"metadata": {"annotations":{"":"false"}}}'
$ kubectl patch storageclass rook-ceph-block \
    -p '{"metadata": {"annotations":{"":"true"}}}'

II. Installing K10, An Enterprise-Grade Data Management Platform

Kasten K10 Logo

While the instructions above get you started with a CSI-compliant storage system on Kubernetes, that is only the beginning. To really leverage the power of your application’s data, we will now add data management to the mix. In particular, we will use the free and fully-featured edition of K10, a data management platform that is deeply integrated into Kubernetes. 

K10 will provide you an easy-to-use and secure system for backup/restore and mobility of your entire Kubernetes application. This includes use cases such as:

  • Simple backup/restore for your entire application stack to make it easy to “reset” your system to a good known state
  • Cloning within namespaces within your development cluster for debugging
  • Copying data from production into your test environment for realistic testing

Installing K10 and Integrating with CSI-based Rook/Ceph

Installing K10 is quite simple and you should be up and running in 5 minutes or less! You can view the complete documentation if needed but just follow these instructions to get started.

$ helm repo add kasten
$ helm install --name=k10 --namespace=kasten-io kasten/k10
NAME:   k10
LAST DEPLOYED: Fri Feb 28 05:44:46 2020

We will next annotate the VolumeSnapshotClass we had created as a part of the Rook+Ceph install to indicate that K10 should use it. Note that just like a default StorageClass, there should be only one VolumeSnapshotClass with this annotation. We will also mark the deletion policy to retain snapshots here.

$ kubectl annotate volumesnapshotclass csi-rbdplugin-snapclass \
$ kubectl patch volumesnapshotclass csi-rbdplugin-snapclass \
    -p '{"deletionPolicy":"Retain"}' --type=merge

III. Backup and Restore Workflows With a Test Application

Redis Logo

Installing Redis as a Test Application

To demonstrate the power of this end-to-end storage and data management system, let’s also install Redis on your test cluster.

# The below instructions are for Helm 2.x and must be tweaked for Helm 3.x
$ helm install --name=redis stable/redis --namespace redis

Once installed, you should verify that this provisioned new Ceph storage by running oc get pv and checking the associated StorageClass of the Redis Persistent Volumes (PVs).

Backup and Recovery Workflows

First, to access the K10 dashboard (API and CLI access is also available if needed), run the following command

$ kubectl --namespace kasten-io port-forward service/gateway 8080:8000

and then access the dashboard at

As seen in the images below, you will then be presented with the dashboard that shows a card with all automatically-discovered applications. By clicking on this card, you will be presented with an informational description of these applications and you can either create a policy, or for experimentation, just do a full manual backup. The main dashboard will then show in-progress and completed activity as this action gets invoked and completed. Finally, if you want to, you can also restore (or clone in a new namespace) this application again after making changes via the same applications page.

Converting Snapshots into Backups

The above snapshot workflow, while extremely useful for fast recovery, only invokes snapshots that are stored on Ceph alongside the primary data. This lack of fault isolation means that these snapshots are not meant to protect data against loss. For that, you truly want to create backups by exporting these snapshots to an independent object store.

The easiest way to do this is to create a policy for periodic backup and snapshots and associating the policy with a mobility profile that uses object storage to store deduplicated backups. K10 supports a wide variety of object storage systems including all public cloud options as well as Ceph’s S3-compatible Object Gateway and Minio.

To support various workflows, and as the screenshots below show, the snapshot and backup retention schedule can be independent. This allows you to save a few snapshots for quick undelete with a larger number of backups stored in object storage to deliver cost, performance, and scalability benefits.

Advanced Use Cases: Multi-Cloud Mobility

The walkthrough above is an example of a snapshot, backup, and restore workflow in a single cluster. However, this only scratches the surface of the power of K10. You can use the same system to export an entire application stack and its data from production clusters and bring it up in a geographically separate DR cluster. Alternatively, you could mask data, push it to an object storage system, and then read it in your local development cluster. Documentation for these use cases is available here and you can look at this video to get a quick overview.

IV. Summary

This article has shown how K10 can be easily integrated with Rook-based Ceph storage. However, K10 also integrates with a variety of storage systems that can be used with Kubernetes including storage provided by all the major public cloud providers (AWS, Azure, Google, IBM), NetApp (via their Trident provider), and more. K10 also includes application-level integrations with a range of relational databases (e.g., MySQL or PostgreSQL) and NoSQL systems (e.g., MongoDB or Elastic). Check out our datasheet for more information.

Finally, we would love to hear from you to see how K10 could be useful in your Kubernetes environment. Find us on Twitter, drop us an email, or swing by our website!

Niraj Tolia

Niraj Tolia is the General Manager and President of Kasten (acquired by Veeam), that he founded in order to solve the problem of Kubernetes backup and disaster recovery. With a strong technical background in distributed systems, storage, and data management, he has held multiple leadership roles in the past, including Senior Director of Engineering for Dell EMC's CloudBoost group and VP of Engineering and Chief Architect at Maginatics (acquired by EMC). Dr. Tolia received his PhD, MS, and BS in Computer Engineering from Carnegie Mellon University.


Recent Blog Posts