Ceph Pacific running on Debian 11 (Bullseye)

In this tutorial I will explain how to setup a Ceph Cluster on Debian 11. The Linux Distribution is not as relevant as it sounds but for the latest Ceph release Pacific I am using here also the latest Debian release Bullseye.

In difference to my last tutorial how to setup Ceph I will focus a little bit more on network. Understanding and configuring the Ceph network options will ensure optimal performance and reliability of the overall storage cluster. See also the latest configuration guide from Red Hat.

Continue reading “Ceph Pacific running on Debian 11 (Bullseye)”

Kubernetes, Ceph and Static Volumes

Ceph is an open source distributed storage system which integrates with the concept of Kubernetes in a perfect way. With the Ceph CSI-Plugin you can connect a Ceph cluster into your Kubernetes cluster in a well designed way. In one of my last posts I give a short tutorial how to setup a Ceph cluster on Debian. Also take a look at the Imixs-Cloud project.

Static Persistence Volumes

When we talk about Kuberentes and Persistence Volumes often you will find examples working with a so called storage class and Dynamic Persistence Volumes. In this concept a persistence volume will be provisioned automatically by the Kubernetes CSI adapter and you do not need to think much about how this works. But this kind of persistence volumes are not durable which means, that if you delete your POD also the persistence volume will be removed and all the data you container wrote so far will be lost. To avoid this, you need a so called Static Persistence Volume. Such a persistence volume is marked with the flag ‘Retain’:

persistentVolumeReclaimPolicy: Retain

This means the volume will not be deleted when the POD is removed or updated.

To setup a Static Persistence Volume in Ceph, two steps are necessary. Fist you need to create the ceph image on you ceph cluster. This can be done form the ceph web admin interface or from the command line tool:

# rbd create test-image --size=1024 --pool=kubernetes --image-feature layering

Next you can define the corresponding Kubernetes Persistence Volume Object referring to this RBD image:

---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: rbd-static-pv
spec:
  volumeMode: Filesystem
  storageClassName: ceph
  persistentVolumeReclaimPolicy: Retain
  accessModes:
  - ReadWriteOnce
  capacity:
    storage: 1Gi
  csi:
    driver: rbd.csi.ceph.com
    fsType: ext4
    nodeStageSecretRef:
      name: csi-rbd-secret
      namespace: ceph-system
    volumeAttributes:
      clusterID: "<clusterID>"
      pool: "kubernetes"
      staticVolume: "true"
      # The imageFeatures must match the created ceph image exactly!
      imageFeatures: "layering"
    volumeHandle: test-image 

Replace <clusterID> with the id of you ceph cluster. Note: also a storage class is needed here to identify the ceph nodes. Find more details here.

Resizing Static Persistence Volumes

So are everything is working fine using Ceph for static persistence volumes. But it becomes a little bit tricky if you need to resize an image. Imagine you are running a database and the calculated storage you need exceeds the size you planed in the beginning.

In this case you first need to resize the ceph image. This can be done easily form the Ceph web admin interface or from the command line tool.

# rbd resize --image foo --size 2048

But the problem is, that after you delete and redeploy your POD in Kubernetes it will still see the old disk size. This happens because the Ceph CSI Plugin did not support automatically resizing of static volumes.

If you are using the fsType ext4 (as in my example) you can run the resize2fs command from within your POD to give your container the correct new size:

# resize2fs /dev/rbd[number]

You need to replace [number] with the correct rbd image mounted within your POD. You can check the rdb number with the command df -h.

Note: The command will only work if the resize2fs lib is installed on your container (which is for example the case for the official PostgreSQL image). Also it is important for this command that your POD runs with the securityContext privileged=true :

....         
          volumeMounts:
            - name: volume-to-resize
              mountPath: /var/lib/data
          securityContext:
            privileged: true

Using a Kubernetes Job

As an alternative to executing the resize2fs command manually you can also start a simple Kubernetes job to resize your RBD images automatically.

---
###################################################
# This job can be used to resize a ext4 filesystem
# aligned to the given size of the underlying RBD image.
###################################################
apiVersion: batch/v1
kind: Job
metadata:
  name: ext4-resize2fs
spec:
  template:
    spec:
      containers:
        - name: debian
          image: debian

          command: ["/bin/sh"]
          args:
            - -c
            - >-
                echo '******** start resizeing block device  ********' &&
                echo ...find rbd mounts to be resized.... &&
                df | grep /rbd &&
                DEVICE=`df | grep /rbd | awk '{print $1}'` &&
                echo ...resizing device $DEVICE ... &&
                resize2fs $DEVICE &&
                echo '******** resize block device completed ********'

          volumeMounts:
            - name: volume-to-resize
              mountPath: /tmp/mount2resize
          securityContext:
            privileged: true
      volumes:
        - name: volume-to-resize
          persistentVolumeClaim:
            claimName: test-pg-dbdata
      restartPolicy: Never
  backoffLimit: 1

Make sure that the PV and PVC objects exist before you run the job. Replace the PVC with the name of your PVC to be resized.

$ kubectl apply -f resize2fs.yaml

If you have any comments please post them here.

Monitoring Web Servers Should Never be Complex

If you run several web servers in your organisation or even public web servers in the internet you need some kind of monitoring. If your servers go down for some reason this may not be funny for your colleagues, customer and even for yourself. For that reason we use monitoring tools. And there are a lot of monitoring tools available providing all kinds of features and concepts. For example you can monitor the behaviour of your applications, the hardware usage of your server nodes, or even the network traffic between servers. One prominent solution is the open source tool Nagios which allows you to monitor hardware in every detail. In Kubernetes environments you may use the Prometeus/Grafana Operator, which integrates into the concept of Kubernetes providing a lot of different export services to monitor a cluster in various ways. And also there is a large market providing monitoring solutions running in the cloud. The cloud solutions advertise that no complex installation is required. But personally I wonder if it is a good idea to send application and hardware metrics to a third party service.

Continue reading “Monitoring Web Servers Should Never be Complex”

How to Draw a Server Network Diagram With Text Characters

In case you want to document a network diagram in a fast way without using a graphical tool, you can find the necessary ASCII characters on this wiki page . In this way you can draw boxes and connectors. See the following example.

                              Internet
---------------------------------------------------------------------------
 ❰PUBLIC-IP❱
      |
┌────────────┐     ┌────────────┐     ┌────────────┐     ┌────────────┐
│Master-Node │     │  Worker-1  │     │  Worker-2  │     │  Worker-3  │
└────────────┘     └────────────┘     └────────────┘     └────────────┘
      ├───────────────────┼─────────────────┼──────────────────┤
  ❰10.0.0.2❱         ❰10.0.0.3❱         ❰10.0.0.4❱         ❰10.0.0.4❱ 
---------------------------------------------------------------------------
                         Private Network 10.0.0.0/24

And here is another example:


     ╔═╧═╧═╧═╧═╧═╧══╗
     ║ Ardoino Nano ║
     ╚═╤═╤═╤═╤═╤═╤══╝
         │   │          ╭────────╮ 
         │   ╰──────────┤Sensor-2│   
         │              ╰────────╯
    ╭────┴───╮     
    │Sensor-1│     
    ╰────────╯    

How to Measure Network Speed?

Even if most people use the ‘ping‘ command to test a network connection this tool is not build to get a realistic indication about a network connection. This is due to the internal protocol used by ping. If you really want to know how fast is your network connection – e.g. between to servers – you should use the command line tool ‘iperf‘.

If you want to measure the network performance between to servers – e.g. server-a and server-b, first start the tool on the one side of your two servers:

server-a:$ iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size:  128 KByte (default)
------------------------------------------------------------

This command starts a server listening on port 5001 (you can change the port number if blocked by firewall rules).

Now you can start a test with a client connection from server-b to server-a:

server-b:$ iperf -c server-a
------------------------------------------------------------
Client connecting to server-a, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.2 port 55622 connected with 10.0.0.3 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.09 GBytes   935 Mbits/sec

in this example, ipref is sending about 1GB from server A to server B with a network speed of 930Mbits per second.

Ceph Warning that Won’t Resolve

If ceph is having a temporarily problem – e.g. a node goes down – it may happen, that you see constanctly a waring in the Web UI or also if you run

$ ceph status

In case the message is

.. daemons have recently crashed

but your ceph is up an running again and you can not see any more concerning messages you can remove the messages the force this kind of status. To do this you can run the following form your ceph console:

ceph crash ls
# lists all crash message

ceph crash archive-all
# moves the messages into the archive

This will bring back the health status to HEALTH_OK.

Jakarta EE9 – Wildfly – Elytron – SecurityDomains

With version 11 Wildfly introduced a complete new security concept named ‘Elytron’. This security concept is a little bit confusing on the first look if you have worked with previous versions of Wildfly. To be honest I personally recognized the Elytron framework with version Wildfly 24. Even it it is well documented it took me a while until I get things working. Initially I came across the configuration concept during migrating the Imixs-Workflow project form Jakarta EE8 to Jakarta EE9. As we are using docker images to run our applications we are configuring the Wildfly server by the standalone.xml file and not via the CLI provided by Wildfly. In the following I will show what is important to get a Jakarta EE9 application work with Elytron.

The Elytron Subsystem

Wildfly is separated in its core into subsystems. Each subsystem has its own configuration section in the standalone.xml file. For the Elytron subsystem this is urn:wildfly:elytron:14.0.

If you look into the subsystem configuration you can see that a security domain is split now into the domain and the realm section. A simple FileBased security realm with the realm name ‘imixsrealm’ will look like this:

        <subsystem xmlns="urn:wildfly:elytron:14.0" final-providers="combined-providers" disallowed-providers="OracleUcrypto">
.....
            <security-domains>
.....
              	<!-- imixsrealm filerealm configuration   -->
		<security-domain name="imixsrealm" default-realm="imixsrealm" permission-mapper="default-permission-mapper">
			<realm name="imixsrealm"/>
		</security-domain>
            </security-domains>
            <security-realms>
....
                <!-- imixsrealm filerealm property files -->
                <properties-realm name="imixsrealm" groups-attribute="Roles">
			<users-properties path="sampleapp-users.properties" relative-to="jboss.server.config.dir" digest-realm-name="Application Security" plain-text="true"/>
			<groups-properties path="sampleapp-roles.properties" relative-to="jboss.server.config.dir"/>
		</properties-realm>              
            </security-realms>
.....
        </subsystem>

I added a security-domain with the name ‘imixsrealm’ and also a properties-realm section with the same name where I define the users and roles property files. The attribute plain-text="true" indicates that you store the password in plaintext, which makes testing much easier. Place the sample-app-roles and users property files into the standalone/config/ directory. Do not modify the other sections of the Elytron subsystem!

The content of the sampleapp-users.properties looks like this (with plain text passwords)

admin=adminadmin
manfred=password
anna=password

In the file sampleapp-roles.properties you can assign users to application specific roles:

admin=MANAGERACCESS
manfred=MANAGERACCESS
anna=AUTHORACCESS

So far everything seems to look similar to the old security-domain configuration. But at this moment you new security domain wont work. There are additional steps needed.

The EJB and Web Subsystems

To get the security domain working with your application you need to add the security domain also to the undertow web subsystem. In this subsystem you will find a section ‘application-security-domains’. And in this section you need to add your new security domain as well:

       <subsystem xmlns="urn:jboss:domain:undertow:12.0" default-server="default-server" default-virtual-host="default-host" default-servlet-container="default" default-security-domain="other" statistics-enabled="${wildfly.undertow.statistics-enabled:${wildfly.statistics-enabled:false}}">
....
            <application-security-domains>
                <application-security-domain name="imixsrealm" security-domain="imixsrealm"/>
                <application-security-domain name="other" security-domain="ApplicationDomain"/>                                            
            </application-security-domains>
        </subsystem>

There is also a subsystem for EJBs “ejb3:9.0” and it becomes important that you add your security domain also there if you have EJBs with the annotations @RolesAllowed or @RunAs

        <subsystem xmlns="urn:jboss:domain:ejb3:9.0">
...
            <default-security-domain value="other"/>
            <application-security-domains>
                 <application-security-domain name="imixsrealm" security-domain="imixsrealm"/>
                <application-security-domain name="other" security-domain="ApplicationDomain"/>                
            </application-security-domains>
...
        </subsystem>

Now you have completed your configuration in the standalone.xml file

The jboss-web.xml and jboss-ejb3.xml

There are still 2 application specific files which need to be part of your web application.

In the jboss-web.xml you define you custom security domain:

<?xml version="1.0" encoding="UTF-8"?>
<jboss-web>
	<context-root>/</context-root>	
	<security-domain>imixsrealm</security-domain>
</jboss-web>

and in the jboss-ejb3.xml file:

<?xml version="1.1" encoding="UTF-8"?>
<jboss:ejb-jar xmlns:jboss="http://www.jboss.com/xml/ns/javaee"
	xmlns="http://java.sun.com/xml/ns/javaee" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xmlns:s="urn:security:1.1"
	xsi:schemaLocation="http://www.jboss.com/xml/ns/javaee http://www.jboss.org/j2ee/schema/jboss-ejb3-2_0.xsd http://java.sun.com/xml/ns/javaee http://java.sun.com/xml/ns/javaee/ejb-jar_3_1.xsd"
	version="3.1" impl-version="2.0">

	<assembly-descriptor>
		<s:security>
			<ejb-name>*</ejb-name>			
			<s:security-domain>imixsrealm</s:security-domain>
			<!-- This configuration is necessary to enable @runAs for the AdminPService  -->
			<s:missing-method-permissions-deny-access>false</s:missing-method-permissions-deny-access>
		</s:security>
	</assembly-descriptor>

</jboss:ejb-jar>

So finally your Jakarta EE9 application should now deploy and run within Wilfly 24 using the new Elytron Security Framework.

Database Realm / jdbc-realm

It is also easy to use jdbc-realm configuration. You can find general information about Database realms here. The following shows an example how to configure a jdbc-realm with two tables storing an encrypted password and user roles.

<jdbc-realm name="imixsrealm">
    <principal-query sql="select PASSWORD from USERID where ID=?" data-source="office">
        <simple-digest-mapper algorithm="simple-digest-sha-256" password-index="1" hash-encoding="hex"/>
    </principal-query>
          <principal-query sql="select GROUP_ID from USERID_USERGROUP where ID=?" data-source="office">
              <attribute-mapping>
                  <attribute to="Roles" index="1"/>
              </attribute-mapping>
    </principal-query>
</jdbc-realm>           

Note: I am using two queries here as my role definitions are stored in a separate table (USERID_USERGROUP). The password is stored in hex format encrypted with a SHA-256 algorithm.

How to Migrate From Java EE8 to Jakarta EE9

Once you have developed a project under Java EE8 or Jakarta EE8, sooner or later you will get to the point where you need to migrate to Jakarta EE9. The most important part is to replace the old Java package names javax.* with jakarta.* . The renaming of the package names is needed for all EE packages but some other packages like javax.xml.* are still valid. So you need to be careful. But with a shell script this works well as you will see.

Change the Maven Dependency

Fist of all you should change the maven dependencies in your project:

Replace the maven java compiler plugin to source and target version 11 if not yet done

...
	<plugin>
		<groupId>org.apache.maven.plugins</groupId>
		<artifactId>maven-compiler-plugin</artifactId>
		<version>3.8.1</version>
		<configuration>
			<source>11</source>
			<target>11</target>
		</configuration>
	</plugin>
...

Next make sure you have the new jakarta EE9 dependency:

	<dependency>
	    <groupId>jakarta.platform</groupId>
	    <artifactId>jakarta.jakartaee-api</artifactId>
	    <version>9.0.0</version>
	    <scope>provided</scope>
	</dependency>

If you have removed the old JavaEE8 dependency and added the new Jakarta EE 9 dependency you should see a lot of compiler errors in your Java files because of the wrong import package names.

Replace javax.* with jakarta.*

You can run the following shell script against your java code. This script will replace the java package names automatically for all your java files. Just place the script into the root of your project and run the script from there. (The script is written for Linux OS but I guess you can adapt it to Windows Power Shell if needed):

#!/bin/bash

# this script can be used to replace deprecated javax. package names from a 
# Java EE8 project with the new jakarta. package names in Jakarta 9
# Initial version from rsoika, 2021

echo "replacing:"
echo "	javax.annotation.  -> jakarta.annotation."
echo "	javax.ejb.         -> jakarta.ejb."
echo "	javax.enterprise.  -> jakarta.enterprise."
echo "	javax.faces.       -> jakarta.faces."
echo "	javax.inject.      -> jakarta.inject."
echo "	javax.persistence. -> jakarta.persistence."
echo "	javax.ws.          -> jakarta.ws."
echo "Replacing now..."

###################
## REPLACE LOGIC ##
###################

# replace package names...
find * -name '*.java' | xargs perl -pi -e "s/javax.annotation./jakarta.annotation./g"
find * -name '*.java' | xargs perl -pi -e "s/javax.ejb./jakarta.ejb./g"
find * -name '*.java' | xargs perl -pi -e "s/javax.enterprise./jakarta.enterprise./g"
find * -name '*.java' | xargs perl -pi -e "s/javax.faces./jakarta.faces./g"
find * -name '*.java' | xargs perl -pi -e "s/javax.inject./jakarta.inject./g"
find * -name '*.java' | xargs perl -pi -e "s/javax.persistence./jakarta.persistence./g"
find * -name '*.java' | xargs perl -pi -e "s/javax.ws./jakarta.ws./g"

echo "DONE!"

That’s it! Now you should be able to compile and run your project with Jakarta EE9.

Kubernetes – PersistentVolume: MountVolume.SetUp failed

During testing Ceph & Kubernetes in combination with the ceph-csi plugin in run into a problem with some of my deployments. For some reason the deployment of a POD failed with the following event log:

Events:                                                                                                                                                                                                Type     Reason            Age                From                 Message      
---------------------------------------------------------------------------------------------
Warning  FailedScheduling  29s                default-scheduler        0/4 nodes are available: 4 persistentvolumeclaim "index" not found. 
Warning  FailedScheduling  25s (x3 over 29s)  default-scheduler        0/4 nodes are available: 4 pod has unbound immediate PersistentVolumeClaims. 
Normal   Scheduled         11s                default-scheduler        Successfully assigned office-demo-internal/documents-7c6c86466b-sqbmt to worker-3
Warning  FailedMount       3s (x5 over 11s)   kubelet, worker-3  MountVolume.SetUp failed for volume "demo-internal-index" : rpc error: code = Internal desc = mount failed: exit status 32
Mounting command: mount  
Mounting arguments: -t ext4 -o bind,_netdev /var/lib/kubelet/plugins/kubernetes.io/csi/pv/demo-internal-index/globalmount/demo-internal /var/lib/kubelet/pods/af2f33e0-06da-4429-9f75-908981cb85c3/volumes/kubernetes.io~csi/demo-internal-index/mount
Output: mount: /var/lib/kubelet/pods/af2f33e0-06da-4429-9f75-9034535485c3/volumes/kubernetes.io~csi/demo-internal-index/mount: special device /var/lib/kubelet/plugins/kubernetes.io/csi/pv/demo-internal-index/globalmount/demo-internal does not exist. 

The csi-plugin logs messages like:

 csi-rbdplugin Mounting command: mount                                                                                                                                                                 
 csi-rbdplugin Mounting arguments: -t ext4 -o bind,_netdev /var/lib/kubelet/plugins/kubernetes.io/csi/pv/demo-internal-index/globalmount/demo-internal-imixs /var/lib/kubelet/pods/af2f33e0-34535-4429- 
 9f75-908981cb85c3/volumes/kubernetes.io~csi/demo-internal-index/mount                                                                                                                                 
 csi-rbdplugin Output: mount: /var/lib/kubelet/pods/af2f33e0-06da-4429-35445-908981cb85c3/volumes/kubernetes.io~csi/demo-internal-index/mount: special device /var/lib/kubelet/plugins/kubernetes.io/cs 
 i/pv/demo-internal-index/globalmount/demo-internal does not exist.                                                                                                                              
 csi-rbdplugin E0613 15:56:55.814449   32379 utils.go:136] ID: 33 Req-ID: demo-internal-imixs GRPC error: rpc error: code = Internal desc = mount failed: exit status 32                               
 csi-rbdplugin Mounting command: mount                                                                                                                                                                 
 csi-rbdplugin Mounting arguments: -t ext4 -o bind,_netdev /var/lib/kubelet/plugins/kubernetes.io/csi/pv/demo-internal-index/globalmount/demo-internal /var/lib/kubelet/pods/af2f33e0-06da-5552- 
 9f75-908981cb85c3/volumes/kubernetes.io~csi/demo-internal-index/mount                                                                                                                                 
 csi-rbdplugin Output: mount: /var/lib/kubelet/pods/af2f33e0-06da-4429-9f75-908981cb85c3/volumes/kubernetes.io~csi/demo-internal-index/mount: special device /var/lib/kubelet/plugins/kubernetes.io/cs 
 i/pv/demo-internal-index/globalmount/demo-internal does not exist. 

After investigating many hours, I figured out that on the corresponding worker node there was something wrong with the corresponding PV directory

/var/lib/kubelet/plugins/kubernetes.io/csi/pv/demo-internal-index/

After deleting this directory on the worker node, everything works again. See also the discussion here.