These below instructions will provide you with a step by step guide in deploying Cassandra with Portworx on Kubernetes.

Kubernetes provides management of stateful workloads using Statefulsets. Cassandra is a distributed database and in this guide we will deploy a containerized Cassandra cluster with Portworx for volume management at the backend.

Prerequisites

Install

A statefulset in Kubernetes requires a headless service to provide network identity to the pods it creates. The following command and the spec will help you create a headless service for your Cassandra installation.

Create a cassandra-headless-service.yml with the following content.

apiVersion: v1
kind: Service
metadata:
  labels:
    app: cassandra
  name: cassandra
spec:
  clusterIP: None
  ports:
    - port: 9042
  selector:
    app: cassandra

Apply the configuration.

kubectl apply -f cassandra-headless-service.yml

This example dynamically provisions Portworx volumes using StorageClass API resource. PersistentVolumeClaims with Portworx Volume

Create a px-storageclass.yml with the following content.

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
  name: px-storageclass
provisioner: kubernetes.io/portworx-volume
parameters:
  repl: "2"
  priority_io: "high"
  group: "cassandra_vg"
  fg: "true"

Apply the configuration

kubectl apply -f px-storageclass.yml

Create the Statefulset for Cassandra with 3 replicas. The PodSpec in the statefulset specifies the container image of Cassandra. Statefulsets ensures a sticky and unique identity to the pods. The ordinal index ensures this identity to the Pods.
It also uses the stork scheduler to enable pods to be placed closer to where their data is located.

Create a cassandra-statefulset.yml with the following content

apiVersion: "apps/v1beta1"
kind: StatefulSet
metadata:
  name: cassandra
spec:
  serviceName: cassandra
  replicas: 3
  template:
    metadata:
      labels:
        app: cassandra
    spec:
      # Use the stork scheduler to enable more efficient placement of the pods
      schedulerName: stork
      containers:
      - name: cassandra
        image: gcr.io/google-samples/cassandra:v12
        imagePullPolicy: Always
        ports:
        - containerPort: 7000
          name: intra-node
        - containerPort: 7001
          name: tls-intra-node
        - containerPort: 7199
          name: jmx
        - containerPort: 9042
          name: cql
        resources:
          limits:
            cpu: "500m"
            memory: 1Gi
          requests:
           cpu: "500m"
           memory: 1Gi
        securityContext:
          capabilities:
            add:
              - IPC_LOCK
        lifecycle:
          preStop:
            exec:
              command: ["/bin/sh", "-c", "PID=$(pidof java) && kill $PID && while ps -p $PID > /dev/null; do sleep 1; done"]
        env:
          - name: MAX_HEAP_SIZE
            value: 512M
          - name: HEAP_NEWSIZE
            value: 100M
          - name: CASSANDRA_SEEDS
            value: "cassandra-0.cassandra.default.svc.cluster.local"
          - name: CASSANDRA_CLUSTER_NAME
            value: "K8Demo"
          - name: CASSANDRA_DC
            value: "DC1-K8Demo"
          - name: CASSANDRA_RACK
            value: "Rack1-K8Demo"
          - name: CASSANDRA_AUTO_BOOTSTRAP
            value: "false"
          - name: POD_IP
            valueFrom:
              fieldRef:
                fieldPath: status.podIP
          - name: POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
        readinessProbe:
          exec:
            command:
            - /bin/bash
            - -c
            - /ready-probe.sh
          initialDelaySeconds: 15
          timeoutSeconds: 5
        # These volume mounts are persistent. They are like inline claims,
        # but not exactly because the names need to match exactly one of
        # the stateful pod volumes.
        volumeMounts:
        - name: cassandra-data
          mountPath: /var/lib/cassandra
  # These are converted to volume claims by the controller
  # and mounted at the paths mentioned above.
  volumeClaimTemplates:
  - metadata:
      name: cassandra-data
      annotations:
        volume.beta.kubernetes.io/storage-class: px-storageclass
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

Apply the configuration

kubectl apply -f cassandra-statefulset.yml

Post Install status

Verify that the PVC is bound to a volume using the storage class.

kubectl get pvc
NAME                         STATUS    VOLUME                                     CAPACITY   ACCESSMODES   STORAGECLASS   AGE
cassandra-data-cassandra-0   Bound     pvc-e6924b73-72f9-11e7-9d23-42010a8e0002   1Gi        RWO           portworx-sc    2m
cassandra-data-cassandra-1   Bound     pvc-49e8caf6-735d-11e7-9d23-42010a8e0002   1Gi        RWO           portworx-sc    2m
cassandra-data-cassandra-2   Bound     pvc-603d4f95-735d-11e7-9d23-42010a8e0002   1Gi        RWO           portworx-sc    1m

Verify that the cassandra cluster is created

kubectl exec cassandra-0 -- nodetool status

Datacenter: DC1-K8Demo
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.0.160.2  164.39 KiB  32           62.3%             ce3b48b8-1655-48a2-b167-08d03ca6bc41  Rack1-K8Demo
UN  10.0.64.2   190.76 KiB  32           64.1%             ba31128d-49fa-4696-865e-656d4d45238e  Rack1-K8Demo
UN  10.0.192.3  104.55 KiB  32           73.6%             c778d78d-c6bc-4768-a3ec-0d51ba066dcb  Rack1-K8Demo

Verify that the storageclass is created.

kubectl get storageclass
NAME                 TYPE
portworx-sc          kubernetes.io/portworx-volume

kubectl get pods
NAME          READY     STATUS    RESTARTS   AGE
cassandra-0   1/1       Running   0          1m
cassandra-1   1/1       Running   0          1m
cassandra-2   0/1       Running   0          47s

/opt/pwx/bin/pxctl v l

ID                      NAME                                            SIZE    HA      SHARED  ENCRYPTED       IO_PRIORITY     SCALE   STATUS
651254593135168442      pvc-49e8caf6-735d-11e7-9d23-42010a8e0002        1 GiB   2       no      no              LOW             0       up - attached on 10.142.0.3
136016794033281980      pvc-603d4f95-735d-11e7-9d23-42010a8e0002        1 GiB   2       no      no              LOW             0       up - attached on 10.142.0.4
752567898197695962      pvc-e6924b73-72f9-11e7-9d23-42010a8e0002        1 GiB   2       no      no              LOW             0       up - attached on 10.142.0.5

Get the pods and the knowledge of the Hosts on which they are scheduled.

kubectl get pods -l app=cassandra -o json | jq '.items[] | {"name": .metadata.name,"hostname": .spec.nodeName, "hostIP": .status.hostIP, "PodIP": .status.podIP}'
{
  "name": "cassandra-0",
  "hostname": "k8s-2",
  "hostIP": "10.142.0.5",
  "PodIP": "10.0.160.2"
}  
{
  "name": "cassandra-1",
  "hostname": "k8s-0",
  "hostIP": "10.142.0.3",
  "PodIP": "10.0.64.2"
}  
{
  "name": "cassandra-2",
  "hostname": "k8s-1",
  "hostIP": "10.142.0.4",
  "PodIP": "10.0.192.3"
}

Verify that the portworx volume has 2 replicas created.

/opt/pwx/bin/pxctl v i 651254593135168442 (This volume is up and attached to k8s-0)
Volume  :  651254593135168442
        Name                     :  pvc-49e8caf6-735d-11e7-9d23-42010a8e0002
        Size                     :  1.0 GiB
        Format                   :  ext4
        HA                       :  2
        IO Priority              :  LOW
        Creation time            :  Jul 28 06:23:36 UTC 2017
        Shared                   :  no
        Status                   :  up
        State                    :  Attached: k8s-0
        Device Path              :  /dev/pxd/pxd651254593135168442
        Labels                   :  pvc=cassandra-data-cassandra-1
        Reads                    :  37
        Reads MS                 :  72
        Bytes Read               :  372736
        Writes                   :  1816
        Writes MS                :  17648
        Bytes Written            :  38424576
        IOs in progress          :  0
        Bytes used               :  33 MiB
        Replica sets on nodes:
                Set  0
                        Node     :  10.142.0.4
                        Node     :  10.142.0.3

Scaling

Portworx runs as a Daemonset in Kubernetes. Hence when you add a node or a worker to your kuberentes cluster you do not need to explicitly run Portworx on it.

If you did use the Terraform scripts to create a kubernetes cluster, you would need to update the minion count and apply the changes via Terraform to add a new Node.

Observe the Portworx cluster once you add a new node. Execute the command

kubectl get ds -n kube-system
NAME         DESIRED   CURRENT   READY     UP-TO-DATE   AVAILABLE   NODE-SELECTOR   AGE
kube-proxy   6         6         6         6            6           <none>          5h
portworx     6         5         5         5            5           <none>          4h
weave-net    6         6         6         6            6           <none>          5h

kubectl get pods -n kube-system
NAME                                 READY     STATUS    RESTARTS   AGE
etcd-k8s-master                      1/1       Running   0          5h
kube-apiserver-k8s-master            1/1       Running   0          5h
kube-controller-manager-k8s-master   1/1       Running   0          5h
kube-dns-2425271678-p8620            3/3       Running   0          5h
kube-proxy-3t2c9                     1/1       Running   0          5h
kube-proxy-94j40                     1/1       Running   0          5h
kube-proxy-h75gd                     1/1       Running   0          5h
kube-proxy-nvl7m                     1/1       Running   0          5h
kube-proxy-xh3gr                     1/1       Running   0          2m
kube-proxy-zlxmn                     1/1       Running   0          4h
kube-scheduler-k8s-master            1/1       Running   0          5h
portworx-14g3z                       1/1       Running   0          4h
portworx-ggzvz                       0/1       Running   0          2m
portworx-hhg0m                       1/1       Running   0          4h
portworx-rkdp6                       1/1       Running   0          4h
portworx-stvlt                       1/1       Running   0          4h
portworx-vxqxh                       1/1       Running   0          4h
weave-net-0cdb7                      2/2       Running   1          5h
weave-net-2d6hb                      2/2       Running   0          5h
weave-net-95l8z                      2/2       Running   0          5h
weave-net-tlvkz                      2/2       Running   1          4h
weave-net-tmbxh                      2/2       Running   0          2m
weave-net-w4xgw                      2/2       Running   0          5h

The portworx cluster automatically scales as you scale your kubernetes cluster.

/opt/pwx/bin/pxctl status

Status: PX is operational
License: Trial (expires in 30 days)
Node ID: k8s-master
        IP: 10.140.0.2
        Local Storage Pool: 1 pool
        POOL    IO_PRIORITY     RAID_LEVEL      USABLE  USED    STATUS  ZONE    REGION
        0       MEDIUM          raid0           10 GiB  471 MiB Online  default default
        Local Storage Devices: 1 device
        Device  Path            Media Type              Size            Last-Scan
        0:1     /dev/sdb        STORAGE_MEDIUM_SSD      10 GiB          31 Jul 17 12:59 UTC
        total                   -                       10 GiB
Cluster Summary
        Cluster ID: px-cluster
        Cluster UUID: d2ebd5cf-9652-47d7-ac95-d4ccbd416a6a
        IP              ID              Used    Capacity        Status
        10.140.0.7      k8s-4           266 MiB 10 GiB          Online
        10.140.0.2      k8s-master      471 MiB 10 GiB          Online (This node)
        10.140.0.4      k8s-2           471 MiB 10 GiB          Online
        10.140.0.3      k8s-0           461 MiB 10 GiB          Online
        10.140.0.5      k8s-1           369 MiB 10 GiB          Online
        10.140.0.6      k8s-3           369 MiB 10 GiB          Online
Global Storage Pool
        Total Used      :  2.3 GiB
        Total Capacity  :  60 GiB

Scale your cassandra statefulset

kubectl get sts cassandra
NAME        DESIRED   CURRENT   AGE
cassandra   4         4         4h

kubectl scale sts cassandra --replicas=5
statefulset "cassandra" scaled

kubectl get pods -l "app=cassandra" -w
NAME          READY     STATUS    RESTARTS   AGE
cassandra-0   1/1       Running   0          5h
cassandra-1   1/1       Running   0          4h
cassandra-2   1/1       Running   0          4h
cassandra-3   1/1       Running   0          3h
cassandra-4   1/1       Running   0          57s

kubectl exec -it cassandra-0 -- nodetool status
Datacenter: DC1-K8Demo
======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns (effective)  Host ID                               Rack
UN  10.0.128.1  84.75 KiB   32           41.4%             1c14f7dc-44f7-4174-b43a-308370c9139e  Rack1-K8Demo
UN  10.0.240.1  130.81 KiB  32           45.2%             60ebbe70-f7bc-48b0-9374-710752e8876d  Rack1-K8Demo
UN  10.0.192.2  156.84 KiB  32           41.1%             915f33ff-d105-4501-997f-7d44fb007911  Rack1-K8Demo
UN  10.0.160.2  125.1 KiB   32           45.3%             a56a6f70-d2e3-449a-8a33-08b8efb25000  Rack1-K8Demo
UN  10.0.64.3   159.94 KiB  32           26.9%             ae7e3624-175b-4676-9ac3-6e3ad4edd461  Rack1-K8Demo

Failover

Pod Failover

Verify that there is a 5 node Cassandra cluster running on your kubernetes cluster.

kubectl get pods -l "app=cassandra"
NAME          READY     STATUS    RESTARTS   AGE
cassandra-0   1/1       Running   0          1h
cassandra-1   1/1       Running   0          10m
cassandra-2   1/1       Running   0          18h
cassandra-3   1/1       Running   0          17h
cassandra-4   1/1       Running   0          13h

Create data in your Cassandra DB

kubectl exec -it cassandra-2 -- bash
root@cassandra-2:/# cqlsh

Connected to K8Demo at 127.0.0.1:9042.
[cqlsh 5.0.1 | Cassandra 3.9 | CQL spec 3.4.2 | Native protocol v4]
Use HELP for help.

cqlsh> CREATE KEYSPACE demodb WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };
cqlsh> use demodb;
cqlsh:demodb> CREATE TABLE emp(emp_id int PRIMARY KEY, emp_name text, emp_city text, emp_sal varint,emp_phone varint);
cqlsh:demodb> INSERT INTO emp (emp_id, emp_name, emp_city, emp_phone, emp_sal) VALUES(123423445,'Steve', 'Denver', 5910234452, 50000);

Let us look at which nodes host the data in your cassandra ring based on its partition key

root@cassandra-2:/# nodetool getendpoints demodb emp 123423445
10.0.112.1
10.0.160.1

Cross reference the above PodIPs to the nodes (Node k8s-0 is the one which hosts one of the pods)

kubectl get pods -l app=cassandra -o json | jq '.items[] | {"name": .metadata.name,"hostname": .spec.nodeName, "hostIP": .status.hostIP, "PodIP": .status.podIP}'
{
  "name": "cassandra-0",
  "hostname": "k8s-5",
  "hostIP": "10.140.0.8",
  "PodIP": "10.0.112.1"
}
{
  "name": "cassandra-1",
  "hostname": "k8s-0",
  "hostIP": "10.140.0.3",
  "PodIP": "10.0.160.1"
}
{
  "name": "cassandra-2",
  "hostname": "k8s-1",
  "hostIP": "10.140.0.5",
  "PodIP": "10.0.64.3"
}
{
  "name": "cassandra-3",
  "hostname": "k8s-3",
  "hostIP": "10.140.0.6",
  "PodIP": "10.0.240.1"
}
{
  "name": "cassandra-4",
  "hostname": "k8s-4",
  "hostIP": "10.140.0.7",
  "PodIP": "10.0.128.1"
}

Cordon the node where one of the replicas of the dataset resides. This will force scheduling of the pod to another node.

kubectl cordon k8s-0
node "k8s-0" cordoned

kubectl delete pods cassandra-1
pod "cassandra-1" deleted

The statefulset schedules a new cassandra pod on another host. (The pod gets scheduled on the node k8s-2 this time.)

kubectl get pods -w
NAME          READY     STATUS              RESTARTS   AGE
cassandra-0   1/1       Running             0          1h
cassandra-1   0/1       ContainerCreating   0          1s
cassandra-2   1/1       Running             0          19h
cassandra-3   1/1       Running             0          17h
cassandra-4   1/1       Running             0          14h
cassandra-1   0/1       Running   0         4s
cassandra-1   1/1       Running   0         28s

kubectl get pods -l app=cassandra -o json | jq '.items[] | {"name": .metadata.name,"hostname": .spec.nodeName, "hostIP": .status.hostIP, "PodIP": status.podIP}'
{
  "name": "cassandra-0",
  "hostname": "k8s-5",
  "hostIP": "10.140.0.8",
  "PodIP": "10.0.112.1"
}
{
  "name": "cassandra-1",
  "hostname": "k8s-2",
  "hostIP": "10.140.0.4",
  "PodIP": "10.0.192.2"
}
{
  "name": "cassandra-2",
  "hostname": "k8s-1",
  "hostIP": "10.140.0.5",
  "PodIP": "10.0.64.3"
}
{
  "name": "cassandra-3",
  "hostname": "k8s-3",
  "hostIP": "10.140.0.6",
  "PodIP": "10.0.240.1"
}
{
  "name": "cassandra-4",
  "hostname": "k8s-4",
  "hostIP": "10.140.0.7",
  "PodIP": "10.0.128.1"
}

Query for the data that was inserted earlier.

kubectl exec cassandra-1 -- cqlsh -e 'select * from demodb.emp'
 emp_id    | emp_city | emp_name | emp_phone  | emp_sal
-----------+----------+----------+------------+---------
 123423445 |   Denver |    Steve | 5910234452 |   50000

(1 rows)

Node Failover

Decomissioning a kubernetes node deletes the node object form the APIServer. Before that you would want to decomission your Portworx node from the cluster. Follow the steps mentioned in Decommision a Portworx node Once done, delete the kubernetes node if it requires to be deleted permanently.

kubectl delete node k8s-1

The kubernetes statefulset would schedule the pod which was running on the node to another node to fulfil the replicas definition.

Cluster State Before k8s-1 was deleted:

kubectl get pods -l app=cassandra -o json | jq '.items[] | {"name": .metadata.name,"hostname": .spec.nodeName, "hostIP": .status.hostIP, "PodIP": status.podIP}'
{
  "name": "cassandra-0",
  "hostname": "k8s-5",
  "hostIP": "10.140.0.8",
  "PodIP": "10.0.112.1"
}
{
  "name": "cassandra-1",
  "hostname": "k8s-2",
  "hostIP": "10.140.0.4",
  "PodIP": "10.0.192.2"
}
{
  "name": "cassandra-2",
  "hostname": "k8s-1",
  "hostIP": "10.140.0.5",
  "PodIP": "10.0.64.3"
}
{
  "name": "cassandra-3",
  "hostname": "k8s-3",
  "hostIP": "10.140.0.6",
  "PodIP": "10.0.240.1"
}
{
  "name": "cassandra-4",
  "hostname": "k8s-4",
  "hostIP": "10.140.0.7",
  "PodIP": "10.0.128.1"
}

kubectl get nodes --show-labels (Some of the tags and colums are removed for brevity)
k8s-0        Read          cassandra-data-cassandra-1=true,cassandra-data-cassandra-3=true
k8s-1        Ready         cassandra-data-cassandra-1=true,cassandra-data-cassandra-4=true
k8s-2        Ready         cassandra-data-cassandra-0=true,cassandra-data-cassandra-2=true
k8s-3        Ready         cassandra-data-cassandra-3=true
k8s-4        Ready         cassandra-data-cassandra-4=true
k8s-5        Ready         
k8s-master   Ready         cassandra-data-cassandra-0=true,cassandra-data-cassandra-2=true

Cluster State After k8s-1 was deleted:

kubectl get pods -l app=cassandra -o json | jq '.items[] | {"name": .metadata.name,"hostname": .spec.nodeName, "hostIP": .status.hostIP, "PodIP": .status.podIP}'
{
  "name": "cassandra-0",
  "hostname": "k8s-5",
  "hostIP": "10.140.0.8",
  "PodIP": "10.0.112.1"
}
{
  "name": "cassandra-1",
  "hostname": "k8s-2",
  "hostIP": "10.140.0.4",
  "PodIP": "10.0.192.2"
}
{
  "name": "cassandra-2",
  "hostname": "k8s-0",
  "hostIP": "10.140.0.3",
  "PodIP": "10.0.160.2"
}
{
  "name": "cassandra-3",
  "hostname": "k8s-3",
  "hostIP": "10.140.0.6",
  "PodIP": "10.0.240.1"
}
{
  "name": "cassandra-4",
  "hostname": "k8s-4",
  "hostIP": "10.140.0.7",
  "PodIP": "10.0.128.1"
}

kubectl get nodes --show-labels (Some of the tags and colums are removed for brevity)
k8s-0        Ready         cassandra-data-cassandra-1=true,cassandra-data-cassandra-3=true
k8s-2        Ready         cassandra-data-cassandra-0=true,cassandra-data-cassandra-2=true
k8s-3        Ready         cassandra-data-cassandra-3=true
k8s-4        Ready         cassandra-data-cassandra-4=true
k8s-5        Ready               
k8s-master   Ready         cassandra-data-cassandra-0=true,cassandra-data-cassandra-2=true

See Also

For further reading on Cassandra: