In my last blog I explained how to run the CockroachDB in a local dev environment with the help from docker-compose. Now I want to show how to setup a CockroachDB cluster in Kubernetes.
The CockroachDB is a distributed SQL database with a build in replication mechanism. This means that the data is replicated over several nodes in a database cluster. This increases the scalability and resilience in the case that a single node fails. With its Automated-Repair feature the database also detects data inconsistency and automatically fixes faulty data on disks. The project is Open Source and hosted on Github.
CockroachDB fully supports distributed ACID transactions. This means guaranteed atomicity, isolation, consistency, and durability of data. This allows CockroachDB to be used in combination with Jakarta EE and JPA. Supporting the PostgreSQL wire protocol, CockroachDB can be used out of the box for the Java Enterprise Applications and Microservices using the standard PostgresSQL JDBC driver.
The project CockroachDB offers a completely new kind of database. The CockroachDB is a distributed database optimized for container based environments like Kubernetes. The database is Open Source and hosted on Github. CoackroachDB implements the standard PostgreSQL API so it should work with Java Persistence API (JPA). The database also fully supports transaction API with is important for Java Enterprise applications in combination with JPA. In addition CockroachDB has a build in replica mechanism. This allows to replicate the data over several nodes in your cluster. With is Automated-Repair feature the database detects data inconsistency on read and write and automatically and fixes faulty data.
So it seems worth to me to test it in combination with Imixs-Workflow which we typically run with PostgreSQL. In the following I will show how to setup the database with docker-compose and run it together with the Imixs-Process-Manager.
To run a test environment with Imixs-Workflow I use docker-compose to setup 3 database nodes and one instance of a Wildfly Application server running the Imixs-Process-Manager.
The root user is created by default for each cluster which is running in the ‘insecure’ mode. The root user is assigned to the admin role and has all privileges across the cluster. To connect to the database using the PostgreSQL JDBC driver, the user root with a dummy password can be provided. Note: this is for test and development only. For production mode you need to start the cluster is the ‘secure mode’. See details here.
The Web UI
CockroachDB comes with a impressive Web UI which I expose on port 8180. So you can access the Web UI form your browser:
Create a Database
The Web UI has no interface to create users or databases. So we need to do this using the PostgreSQL command line syntax. For that open a bash in one of the 3 database nodes
$ cockroach sql --insecure
# Welcome to the CockroachDB SQL shell.
# All statements must be terminated by a semicolon.
# To exit, type: \q.
# Server version: CockroachDB CCL v20.1.0 (x86_64-unknown-linux-gnu, built 2020/05/05 00:07:18, go1.13.9) (same version as client)
# Cluster ID: 90ece5f6-2bb7-40c6-9c1d-d758cc954509
# Enter \? for a brief introduction.
now you can create an empty database for the Imixs-Workflow system:
> CREATE DATABASE "workflow-db";
That’s it! When you restart you deployment, the Imixs-Workflow engine successfully connects to CoackroachDB using the PSQL JDBC Driver. In the future I will provide some additional posts about running CockRoach in a Kubernetes Cluster based on the Open Source environment Imixs-Cloud.
Monitoring a Kubernetes cluster seems not to be so difficult as you look at the hundreds of blogs and tutorials. But there is a problem – it is the dynamic and rapid development of Kubernetes. And so you will find many blog posts describing a setup that may not work properly for your environment anymore. This is not because the author has provided a bad tutorial, but only because the article is maybe older than one year. Many things have changed in Kubernetes and it is the area of metrics and monitoring that is affected often.
For example, you will find many articles describing how to setup the cadvisor service to get container metrics. But this technology has become part of kubelet in the meantime so an additional installation should not be necessary anymore and can lead to incorrect metrics in the worst case. Also the many Grafana boards to display metrics have also evolved. Older boards are usually no longer suitable to be used in a new Kubernetes environment.
Therefore in this tutorial, I would like to show how to set up a monitoring correctly in the current version of Kubernetes 1.19.3. And of course also this blog post will be outdated after some time. So be warned 😉
In this tutorial I will show how you can combine different data queries in one Datatable. The scenario I came up to this requirement was a Kubernetes Dashboard where I wanted to combine the CPU and Memory Used of each Node with the OsVersion and the Docker Version. These metrics came form different sources the CPU und Memory the corresponding node_cpu_ and node_memory_ metrics provided by the Node Exporter and the OsVersion for example is provided by the cadvisor_version_info metric. Its a little bit tricky to come to the following output:
No – I don’t want to start a new flame war in which I put one framework above the other. Both, Spring Boot and Jakarta EE are great frameworks to build great modern Java applications. Some developers prefer this, others prefer that. Why is that? I think it’s often just because the one developer has collected more experience with Spring Boot, the other one with Java EE. These technologies are developing very fast and it is difficult to learn and be able to apply everything correctly. Basically it is a kind of protectionism that you put one over the other so that you don’t appear stupid and ignorant. But there is a certain noise around Spring Boot that gives the impression that Spring Boot would be the far better system.
Today, when we go into our beautiful Internet world, we have to admit that we have forgotten how the technology behind works. This is bad because it makes one of the most important inventions of modern times useless. Why is that? And what is the missing link?
In the beginning of the Internet and the World-Wide-Web (WWW) there was a problem to be solved. For universities, it was very difficult to organize the ever-increasing number of studies and publications in a way that this information could be found. How could a student be able to find out if there is an answer to his question in another university?
This was the moment as the Word-Wide-Web was invented by Tim Berners-Lee. The fact that the Internet already existed during this time and that university servers were connected to each other, it was obvious to use this technology also for publishing knowledge and not just for communication. After all universities published their publications on public self-hosted servers, it was possible to access the information from any point only by knowing the IP address or the name of the university. This was very simple and very efficient. And each university continued to have the control over its own information.
The idea was not limited to universities only and could be applied by any organization, any company and any individual with a public server. So everyone was now able to publish information. But what we did then was to publish information more and more only on a few centrally managed servers. At least most people today believe the internet consists only of this points of information. This severs are known as Facebook, Twitter or Tiktok. And so we have lost control, leading to all the unpleasant excesses that we see in society today.
What can you do about it? Very simple – look for answers to your questions from the person who knows it. not someone who may have found a part of the answer. Even if that is sometimes more time intensive.
We Did it Again!
Ok, that was the general part of my thinking. But since I am a software architect, I like to look at these things from the technical side. Even if we think we know the Internet technology, we are begin using it in the wrong way again.
Microservices are the latest excesses of this development. The basic idea of the microservice is again comparable to Tim Berners-Lee’s invention. Manage different kind of Data on separate (micro)servers. Connect those servers with each other and you can gain more flexibility and faster solutions. James Lewis and Martin Fowler explain this idea in very detail in their definition of Microservices on martinFowler.com.
But the most upsetting thing is – just as we have reduced the WWW to a few social networks by concentrating the distribution of information, we are now starting to do the same with microservice technology. If you read current blogs about microservices, you’ll find that most posts recommend to run your services on only a view central platforms such as AWS, Azure or Google Cloud. This is absolutely terrifying.
I myself recently applied this concept of centralizing data access in one of my open source projects (Imixs-SAGA) and developed a central registry service. Although it may sometimes seem useful to centralize things to facilitate access or data management, we should always consider what the basics of a technology are. In the case of the Internet technology, this is the decentralization of services and the usage and publication of known access points. We should apply these basics also to our microservice solutions. Only in this way can we say that we understand the Internet.
In one of my last blogs I explained how you can setup a cassandra cluster in a docker-swarm. The advantage of a container environment like docker-swarm or kubernetes is that you can run Cassandra with its default settings and without additional security setup. This is because the cluster nodes running within a container environment can connect securely to each other via the kubernetes or docker-swarm virtual network and need not publish any ports to the outer world. This kind of a setup for a Cassandra cluster can be fine for many cases. But what if you want to setup a Cassandra cluster in a more open network? For example in a public cloud so you can access the cluster form different services or your client? In this case it is necessary to secure your Cassandra cluster.
Kubernetes is definitely the most widely used solution when it comes to container platforms. Everyone knows it by now and it is not only used successfully in large projects. But I think it is also true that many projects do not run Kubernetes themselves but use one of the major platform operators. But why is it like that? It is not always a good idea to give up control. And certainly not when it comes to your own data.
Since version 5.1.0 of Imixs-Workflow we introduced Apache Solr as an alternative to our Apache Lucene Core search engine. The goal is to use Apache Solr as a cloud service and control the search engine completely via the Rest API.
Running a Apache Cassandra Cluster with Docker-Swarm is quite easy using the official Docker Image. Docker-Swarm allows you to setup several docker worker nodes running on different hardware or virtual servers. Take a look at my example docker-compose.yml file:
I am running each cassandra service on a specific host within my docker-swarm. We can not use the build-in scaling feature of docker-swarm because we need to define a separate data volume for each service. See the section ‘volumes’.
The other important part are the two environment variables ‘CASSANDRA_BROADCAST_ADDRESS’ and ‘CASSANDRA_SEEDS’.
‘CASSANDRA_BROADCAST_ADDRESS’ defines a container name for each cassandra node within the cassandra cluster. This name matches the service name. As both services run in the same network ‘cluster_net’ the both cassandara nodes find each user via the service name.
The second environment ‘CASSANDRA_SEEDS’ defines the seed node which need to be defined for the second service only. This is necessary even if a cassandra cluster is ‘master-less’.