Cassandra and Docker-Swarm

Running a Apache Cassandra Cluster with Docker-Swarm is quite easy using the official Docker Image. Docker-Swarm allows you to setup several docker worker nodes running on different hardware or virtual servers. Take a look at my example docker-compose.yml file:

version: "3.2"

networks:
  cluster_net:
    external:
      name: cassandra-net  
  
services:  

  ################################################################
  # The Casandra cluster 
  #   - cassandra-node1
  ################################################################        
  cassandra-001:
    image: cassandra:3.11
    environment:
      CASSANDRA_BROADCAST_ADDRESS: "cassandra-001"
    deploy:
      restart_policy:
        condition: on-failure
        max_attempts: 3
        window: 120s
      placement:
        constraints:
          - node.hostname == node-001
    volumes:
        - /mnt/cassandra:/var/lib/cassandra 
    networks:
      - cluster_net

  ################################################################
  # The Casandra cluster 
  #   - cassandra-node2
  ################################################################        
  cassandra-002:
    image: cassandra:3.11
    environment:
      CASSANDRA_BROADCAST_ADDRESS: "cassandra-002"
      CASSANDRA_SEEDS: "cassandra-001"
    deploy:
      restart_policy:
        condition: on-failure
        max_attempts: 3
        window: 120s
      placement:
        constraints:
          - node.hostname == node-002
    volumes:
        - /mnt/cassandra:/var/lib/cassandra 
    networks:
      - cluster_net

I am running each cassandra service on a specific host within my docker-swarm. We can not use the build-in scaling feature of docker-swarm because we need to define a separate data volume for each service. See the section ‘volumes’.

The other important part are the two environment variables ‘CASSANDRA_BROADCAST_ADDRESS’ and ‘CASSANDRA_SEEDS’.

‘CASSANDRA_BROADCAST_ADDRESS’ defines a container name for each cassandra node within the cassandra cluster. This name matches the service name. As both services run in the same network ‘cluster_net’ the both cassandara nodes find each user via the service name.

The second environment ‘CASSANDRA_SEEDS’ defines the seed node which need to be defined for the second service only. This is necessary even if a cassandra cluster is ‘master-less’.

That’s is!

Manage Big Data With Apache Cassandra

In this article, I will share my experience with Cassandra and how you can manage big data in an effective way.¬† Apache Cassandra is a high-performance, extremely scalable, fault-tolerant (i.e., no single point of failure), distributed non-relational database solution. But Cassandra differs from SQL and RDBMS in some important aspects. If, like me, you come from the world of SQL databases, it’s hard to understand Cassandra’s data concept. It took me several weeks to do so.¬† So let’s see what is the difference. Continue reading “Manage Big Data With Apache Cassandra”