Kubernetes – Storage Volumes with Ceph

In this blog I show how to setup a Kubernetes Storage Volume with Ceph. I assume that you have installed already a kubernetes cluster with one master-node and at least three worker-nodes. On each worker node you need a free unmounted device used exclusively for ceph. Within the ceph cluster I setup a Ceph Filesystem (CephFS) that we can use as a storage volume for kubernetes.

Ceph

Ceph is a scalable network filesystem. This allows you to create a large, distributed storage solution on common hard ware. You can connect a Ceph storage to Kubernetes to abstract the volume from your services. You can install Ceph on any node this includes the kubernetes worker nodes.  The following guide explains how to install Ceph on Debian 9. You will find detailed information about installation here. Also checkout the blog from RisingStack.

Install ceph-deploy

First install the ceph-deploy service on your master node. ceph-deploy will be used to install the ceph software on your worker nodes. See details here.

Running Debian I was only able deploy the ceph luminous release on Debian 9 (Stretch). When writing this blog the necessary packeges were not available for Debain 10 or newer ceph releases (e.g. nautilus).

Note: In the following example I will use ‘node1’ ‘node2’ ‘node3’ as node names. Make sure to replace this names with the short names of your cluster nodes. 

Setup the Repositories

First you need to setup the debian package sources and install the ceph-deploy tool. On your Kubernetes master node run as non root user:

$ wget -q -O- 'https://download.ceph.com/keys/release.asc' | sudo apt-key add -
$ echo deb https://download.ceph.com/debian-luminous/ $(lsb_release -sc) main | sudo tee /etc/apt/sources.list.d/ceph.list
$ sudo apt-get update && sudo apt-get install ceph-deploy ntp
$ sudo /etc/init.d/ntp restart

Check the Ceph-Deploy version:

$ ceph-deploy --version

Create a Ceph Deploy User

The ceph-deploy utility must login to a Ceph node as a user that has passwordless sudo privileges, because it needs to install software and configuration files without prompting for passwords. 

If you do not have a non-previleged cluster user follow the next steps and replace ‘{username}’ with the name of our cluster user you have choosen.

$ useradd -m {username} -s /bin/bash
$ passwd {username}
$ echo "{username} ALL = (root) NOPASSWD:ALL" | sudo tee /etc/sudoers.d/{username}
$ sudo chmod 0440 /etc/sudoers.d/{username}

Next create a SSH key on the master node and distribute the public key to each worker node. Leave the passphrase and filename empty

# login with your cluster user
$ su {username}
$ ssh-keygen

The keys will be stored in the .ssh/ directory. Now you can copy the key to all worker nodes:

$ ssh-copy-id {username}@node1
$ ssh-copy-id {username}@node2
$ ssh-copy-id {username}@node3

For background information see also here.

Install a Ceph Cluster

Now as you have installed ceph-deplyo and created ssh access to your ceph nodes you can create a new ceph cluster. You can find details about this procedure here

First create a working directory to store you configurations.

$ mkdir ceph-cluster
$ cd ceph-cluster

On your master node from the directory you created start to create the cluster:

$ ceph-deploy new node1 node2 node3

You can verify the Ceph configuration in the file ‘ceph.conf’

$ cat ceph.conf

Next install Ceph packages:

$ ceph-deploy install --release luminous node1 node2 node3

The ceph-deploy utility will install Ceph on each node.

Note: The release must match the release you have installed on your master node!

Deploy the initial monitor(s) and gather the keys:

$ ceph-deploy mon create-initial

Use ceph-deploy to copy the configuration file and admin key to your admin node and your Ceph Nodes

$ ceph-deploy admin node1 node2 node3

Deploy a manager daemon. (Required only for luminous+ builds):

$ ceph-deploy mgr create node1

Next create metadata servers:

$ ceph-deploy mds create node1 node2 node3

At this point, you can check your cluster status.. SSH into one of the ceph nodes and run:

$ sudo ceph status 

Create Object Store Daemons (OSDs)

Now you can create the OSDs on your ceph nodes. For the purposes of these instructions, I assume you have an unused disk in each node called /dev/vdb. Be sure that the device is not currently in use and does not contain any important data.

You can fetch a list a of available disks on a specific node:

$ ceph-deploy disk list node-1

Note: the device does not have to be mounted in a directory.

You can run multiple OSDs on the same host, but using the same storage drive for multiple instances is a bad idea as the disk’s I/O speed might limit the OSD daemons’ performance.

So just create on OSD on each node.

ceph-deploy osd create --data {device} {ceph-node}

For example:

ceph-deploy osd create --data /dev/vdb node1
ceph-deploy osd create --data /dev/vdb node2
ceph-deploy osd create --data /dev/vdb node3

Check your cluster’s health.

$ ssh node1 sudo ceph health
HEALTH_OK

You ceph cluster is now ready to use!

The Ceph Dashboard

Ceph comes with a nice web based dashboard allowing you to control your cluster from a web browser. To enable the dashboard connect to on of your cluster nodes and run:

$ sudo ceph mgr module enable dashboard

You can access the dashboard form the your webbrowser

http://node-1:7000/

To disable the dashboard run:

$ sudo ceph mgr module disable dashboard

Starting over if something went wrong..

To me it happened that my installation failed several times and I was forced with the problem to wipe out also my logical volumes created in a previous installation.

If at any point you run into some trouble and you can delete everything and start over. Execute the following to purge the Ceph packages, and erase all its data and configuration:

$ sudo ceph-deploy purge node1 node2 node3
$ sudo ceph-deploy purgedata node1 node2 node3
$ sudo ceph-deploy forgetkeys
$ rm ceph.*

If you execute purge, you must re-install Ceph. The last rm command removes any files that were written out by ceph-deploy locally during a previous installation.

remove logical volumes from a ceph-node

If you need to remove a logical volume from a ceph-node you can use the following procedure (Note: this will erase all content from your device!)

List all logical volumes on a node:

$ sudo ceph-volume lvm list

Wipe out a logical device. (replace [device-name] with you device e.g. sdb)

$ sudo ceph-volume lvm zap /dev/[device-name] --destroy
# this should done it. If you still have problems wipe out the disk:
$ sudo dmsetup info -C
# copy the ceph dm name
$ sudo dmsetup remove [dm_map_name]
$ sudo wipefs -fa /dev/[device-name]
$ sudo dd if=/dev/zero of=/dev/[device-name] bs=1M count=1

CephFS for Kubernetes

To setup a CephFS to be used by your Kubernetes Cluster your need to create two RADOS pools, one for the actual data and one for the metadata. Connect to one of your ceph cluster nodes and create the pools:

$ sudo ceph osd pool create cephfs_data 64
$ sudo ceph osd pool create cephfs_metadata 64

The Pools will be replicated automatically to the other cluster nodes.

Next you can enable the filesystem feature

$ sudo ceph fs new cephfs cephfs_metadata cephfs_data

The ceph setup is now completed so you can deploy a so called ceph-provisioner and a ceph-storageclass.

The CEPHFS-Provisioner

Before you can deploy the CephFS provisioner into you kubernetes cluster you first need to create a secret. With the following command you can get the ceph admin key out from one of your ceph nodes:

$ sudo ssh node1 ceph auth get-key client.admin
ABCyWw9dOUm/FhABBK0A9PXkPo6+OXpOj9N2ZQ==

Copy the key and create a kubernetes secret named ‘ceph-secret’:

 $ kubectl create secret generic ceph-secret-admin \
    --from-literal=key='ABCyWw9dOUm/FhABBK0A9PXkPo6+OXpOj9N2ZQ==' \
    --namespace=kube-system
secret/ceph-secret created  

I am running the cephfs in the kubernetes system namespace ‘kube-sytem’. But you can run it also in a separate namespace.

So next create a the cephfs-provisoner.yaml file holding the role bindings:

kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: cephfs-provisioner
  namespace: kube-system
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
  - apiGroups: [""]
    resources: ["services"]
    resourceNames: ["kube-dns","coredns"]
    verbs: ["list", "get"]
    
    
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: cephfs-provisioner
subjects:
  - kind: ServiceAccount
    name: cephfs-provisioner
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: cephfs-provisioner
  apiGroup: rbac.authorization.k8s.io
  
  
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: cephfs-provisioner
  namespace: kube-system
rules:
  - apiGroups: [""]
    resources: ["secrets"]
    verbs: ["create", "get", "delete"]
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
    
    
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: cephfs-provisioner
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: cephfs-provisioner
subjects:
- kind: ServiceAccount
  name: cephfs-provisioner
  
  
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: cephfs-provisioner
  namespace: kube-system
  
  
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: cephfs-provisioner
  namespace: kube-system
spec:
  replicas: 1
  selector: 
    matchLabels:
      app: cephfs-provisioner  
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: cephfs-provisioner
    spec:
      containers:
      - name: cephfs-provisioner
        image: "quay.io/external_storage/cephfs-provisioner:latest"
        env:
        - name: PROVISIONER_NAME
          value: ceph.com/cephfs
        - name: PROVISIONER_SECRET_NAMESPACE
          value: kube-system
        command:
        - "/usr/local/bin/cephfs-provisioner"
        args:
        - "-id=cephfs-provisioner-1"
      serviceAccount: cephfs-provisioner

Create the provisioner with kubectl:

$ kubectl create -n kube-system -f ceph/cephfs-provisioner.yaml

Next create the cephfs-storageclass.yaml file:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: cephfs
provisioner: ceph.com/cephfs
parameters:
  # replace the {node-1}.. with the IP addresses of you ceph nodes
  monitors: {node-1}:6789, {node-2}:6789, {node-3}:6789
  adminId: admin
  # replace the ceph-admin-secret with a kubernetes secret holding your ceph client.admin key
  adminSecretName: {ceph-admin-secret}
  adminSecretNamespace: kube-system
  claimRoot: /pvc-volumes

Within this file replace the node-1-3 with the IP addresses of your ceph cluster.

Create the storageclass with:

$ kubectl create -n kube-system -f cephfs-storageclass.yaml 

To check your new deployment run:

$ kubectl get pods -l app=cephfs-provisioner -n kube-system
NAME                               READY   STATUS    RESTARTS   AGE
cephfs-provisioner-7ab15bd-rnxx8   1/1     Running   0          96s

The PersistentVolumeClaim

As you now have created a cephFS with the corresponding storageClass in your kubernetes cluster you are ready to define a so called PersistentVolumeClaim (PVC). A PVC is part of your pods. The PVC will create the persistence volume automatically associated with your pod.

The following is an example how to create a volume claim for the CephFS within a pod.

volumeclaim.yaml:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: mydata
#  namespace: cephfs
spec:
  storageClassName: cephfs
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Gi

Within a deployment you can than mount a volume based on this claim. See the following example:

When you create the volumeclaim with

$ kubectl create -f volumeclaim.yaml

… you can finally test the status of your persistent volume claim:

$ kubectl get pvc
NAME          STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
mydata     Bound     pvc-060e5142-f915-4385-8e86-4166fe2980f6   1Gi        RWO            fast-rbd       27m

In your pod deployment (e.g. a database server) you can now define the volume based on your volume claim.

apiVersion: apps/v1
kind: Deployment
.....
spec:
  ....
  template:
    ....
    spec:
      containers:
      .....
        volumeMounts:
        - mountPath: /var/lib/myapp/data
          name: mydata
      restartPolicy: Always
      volumes:
      - name: mydata
        persistentVolumeClaim:
          claimName: mydata
....

9 Replies to “Kubernetes – Storage Volumes with Ceph”

  1. Hi Ralph,

    I am trying to mount CephFS on kubernetes. I am able to mount the CephFS using pod deployment. But it’s not taking the same storage amount which I have set in PVC yaml file. It’s taking the total storage capacity of my ceph cluster. Please provide a solution on the same.

    Thanks and best regards.
    Akshay

  2. Hi, I’m having a problema on my pvc because the ceph client is from jewel version.
    What I need to check to upgrade them to Nautilus?

  3. Ah ok, but my problem is the provisioner, based on an older image -> “quay.io/external_storage/cephfs-provisioner:latest”
    My CEPH Storage is Nautilus.

Leave a Reply to Bruno Emanuel Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.