In this Blog I will explain how to install a distributed filesystem on a kubernetes cluster. To run stateful docker images (e.g. a Database like PostgreSQL) you have two choices.
- run the service on a dedicated node – this avoids the lost of data if kubernetes re-schedules your server to another node
- use a distributed storage solution like ceph or glusterfs storage
Gluster is a scalable network filesystem. This allows you to create a large, distributed storage solution on common hard ware. You can connect a gluster storage to Kubernetes to abstract the volume from your services.
You can install Glusterfs on any node this includes the kubernetes worker nodes. The following guide explains how to intal Glusterfs on Debian 10(buster). You will find more information about installation here.
Repeat the following installation on each node you wish to join your gluster network storage. Run the commands as root:
$ su # Add the GPG key to apt: $ wget -O - https://download.gluster.org/pub/gluster/glusterfs/7/rsa.pub | apt-key add - # Add the source (s/amd64/arm64/ as necessary): $ echo deb [arch=amd64] https://download.gluster.org/pub/gluster/glusterfs/7/LATEST/Debian/buster/amd64/apt buster main > /etc/apt/sources.list.d/gluster.list # Install... $ apt update $ apt install -y glusterfs-server
Start and enable the GlusterFS Service on all the servers.
$ sudo systemctl start glusterd $ sudo systemctl enable glusterd
To test the gluster status run connect into one of your cluster-nodes:
$ sudo service glusterd status
Setup Gluster Network
Now you can check form one of your gluster nodes if you can reach each other node
$ gluster peer probe [gluster-node-ip]
where gluster-node-ip is the IP Adress or the DNS name of one of your gluster nodes. Now you can check the peer status on each node:
$ gluster peer status Uuid: vvvv-qqq-zzz-yyyyy-xxxxx State: Peer in Cluster (Connected) Other names: [YOUR-GLUSTER-NODE-NAME]
Setup a Volume
Next you can set up a GlusterFS volume. For that create a data volume on all servers:
$ mkdir -p /data/glusterfs/brick1/gv0
From any single worker node run:
$ gluster volume create gv0 replica 2 [GLUSTER-NODE1]:/data/glusterfs/brick1/gv0 [GLUSTER-NODE2]:/data/glusterfs/brick1/gv0 volume create: gv0: success: please start the volume to access data
replace [GLUSTER-NODE1] with the gluster node dns name or ip address.
Note: the directory must not be on the root partition. At least you should provide 3 gluster nodes.
Now you can start your new volume ‘gv0’:
$ sudo gluster volume start gv0 volume start: gv0: success
With the following command you can check the status of the new volume:
$ sudo gluster volume info
Find more about the setup [here}(https://docs.gluster.org/en/latest/Quick-Start-Guide/Quickstart/).
Now as your gluster filesystem is up and running it’s time to tell your kubernetes from the new storage.
Fist you need to install the glusterfs-client package on your master node. The client is used by the kubernetes scheduler to create the gluster volumes.
$ sudo apt install gluster-client
Persistence Volume Example
The following is an example how to create a volume claim for the GlusterFS within a pod. First you need to create a persistence volume (pv)
apiVersion: v1 kind: PersistentVolume metadata: # The name of the PV, which is referenced in pod definitions or displayed in various oc volume commands. name: gluster-pv spec: capacity: # The amount of storage allocated to this volume. storage: 1Gi accessModes: # labels to match a PV and a PVC. They currently do not define any form of access control. - ReadWriteMany # The glusterfs plug-in defines the volume type being used glusterfs: endpoints: gluster-cluster # Gluster volume name, preceded by / path: /gv0 readOnly: false # volume reclaim policy indicates that the volume will be preserved after the pods accessing it terminate. # Accepted values include Retain, Delete, and Recycle. persistentVolumeReclaimPolicy: Retain
The gluster volume must be created before.
Persistence Volume Claim Example
The persistent volume claim (PVC) specifies the desired access mode and storage capacity. Currently, based on only these two attributes, a PVC is bound to the PV created before. Once a PV is bound to a PVC, that PV is essentially tied to the PVC’s project and cannot be bound to by another PVC. There is a one-to-one mapping of PVs and PVCs. However, multiple pods in the same project can use the same PVC.
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: gluster-claim spec: accessModes: - ReadWriteMany resources: requests: storage: 1Gi
The claim name is referenced by the pod under its volumes section.
Within a deployment you can than mount a volume based on this claim. See the following example:
apiVersion: apps/v1 kind: Deployment ..... spec: .... template: .... spec: containers: ..... image: postgres:9.6.1 # run as root because of glusterfs securityContext: runAsUser: 0 allowPrivilegeEscalation: false volumeMounts: - mountPath: /var/lib/postgresql/data name: dbdata readOnly: false restartPolicy: Always volumes: - name: dbdata persistentVolumeClaim: claimName: gluster-claim ....
Find also a more detailed example here.