Cassandra – Installation on CentOS

Disclaimer: The following piece of content is my personal notes I took during learning Cassandra. I don’t claim to be an expert on this subject – at all

This post is about installing Apache Cassandra on virtual machines running CentOS. I am using rpm packages from DataStax to install Cassandra.

The aim is to setup a 3-node cluster to play with. The configuration will be as follows:

  • cassandra-1 => 192.168.56.191 dc1/rac1 (seed)
  • cassandra-2 => 192.168.56.192 dc1/rac1 (seed)
  • cassandra-3 => 192.168.56.193 dc1/rac1

VirtualBox Configuration

I am using VirtualBox to create virtual machines to run a Cassandra cluster on my laptop.

Step 1 – Get CentOS

Grab an ISO from CentOS website. I have downloaded the minimal image for version 6.5 for this post which is the latest version at the time of writing.

Step 2 – VirtualBox Network Setup

The setup that I am going for is:
– A host-only network between VMs
– External access for each VM via NAT to host

Here is how I setup the host-only network:

  1. Go to VirtualBox Preferences.
  2. Click Network tab.
  3. Switch to Host-only Networks tab.
  4. Click the little “Add a host-only network” button on the right hand side. An interface called “vboxnet0” will be created.
  5. Select the interface and click on the little “Edit host-only network” button on the right hand side.
  6. My settings are:

    IPv4 Address: 192.168.56.1
    IPv4 Network Mask: 255.255.255.0

This IP address and the network mask will be important when defining IP addresses for the virtual machines.

Step 3 – Create the First VM

Proceed with creating the first VM.

Setting Value
Name cassandra-1.localdomain
Mem 1GB
Disk 8GB

Step 3.1 – Attach the CentOS ISO to the VM:

  1. Go into the settings of the new VM.
  2. Click Storage tab.
  3. Click “Add CD/DVD Device” button next to Controller:IDE.
  4. Click “Choose Disk” and browse to the ISO image downloaded.

Step 3.2 – VM Network Adapter Configuration

  1. Go into the settings of the new VM.
  2. Click “Network” tab.
  3. “Adapter 1” will be host-only (this will correspond to eth0 on the VM)
    1. Make sure it is enabled.
    2. Attached to: Host-only Adapter
    3. Name: vboxnet0
  4. “Adapter 2” will be NAT (this will correspond to eth1 on the VM)
    1. Make sure it is enabled.
    2. Attached to: NAT

Step 3.3 – Start VM

Start the VM and install CentOS as usual.

Step 3.4 – VM Network Configuration

Configuring eth0

This is the host-only network interface:

vi /etc/sysconfig/network-scripts/ifcfg-eth0:

DEVICE=eth0
HWADDR=08:00:27:95:FB:4D
TYPE=Ethernet
UUID=4f24d86d-fd58-41ad-a5f7-51610a635edf
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=none
IPADDR=192.168.56.191
BROADCAST=192.168.56.255
NETWORK=192.168.56.0
NETMASK=255.255.255.0

Configuring eth1

This is the NAT interface – configure with DHCP:

vi /etc/sysconfig/network-scripts/ifcfg-eth1:

DEVICE=eth1
HWADDR=08:00:27:42:6F:CC
TYPE=Ethernet
UUID=d9394e20-0077-45df-a60f-dca5c301863a
ONBOOT=yes
NM_CONTROLLED=yes
BOOTPROTO=dhcp

Configure hostname

vi /etc/sysconfig/network:

NETWORKING=yes
HOSTNAME=cassandra-1.localdomain

vi /etc/hosts:

...
192.168.56.191  cassandra-1.localdomain

Restart networking

service network restart

Test configuration

Try to SSH into this VM from the host:

> ssh root@192.168.56.191
The authenticity of host '192.168.56.191 (192.168.56.191)' can't be established.
RSA key fingerprint is 66:29:3a:d9:28:71:89:dd:9b:bb:d5:2a:d6:4e:23:0b.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.56.191' (RSA) to the list of known hosts.
root@192.168.56.191's password:
Last login: Tue Dec 17 13:36:58 2013
[root@cassandra-1 ~]# ifconfig | grep addr
eth0      Link encap:Ethernet  HWaddr 08:00:27:95:FB:4D
          inet addr:192.168.56.192  Bcast:192.168.56.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe95:fb4d/64 Scope:Link
eth1      Link encap:Ethernet  HWaddr 08:00:27:42:6F:CC
          inet addr:10.0.3.15  Bcast:10.0.3.255  Mask:255.255.255.0
          inet6 addr: fe80::a00:27ff:fe42:6fcc/64 Scope:Link
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
[root@cassandra-1 ~]# hostname
cassandra-1.localdomain
[root@cassandra-1 ~]# hostname -i
192.168.56.191
[root@cassandra-1 ~]# ping www.google.com
PING www.google.com (173.194.34.82) 56(84) bytes of data.
64 bytes from lhr14s19-in-f18.1e100.net (173.194.34.82): icmp_seq=1 ttl=63 time=15.1 ms
64 bytes from lhr14s19-in-f18.1e100.net (173.194.34.82): icmp_seq=2 ttl=63 time=17.2 ms
64 bytes from lhr14s19-in-f18.1e100.net (173.194.34.82): icmp_seq=3 ttl=63 time=18.2 ms
64 bytes from lhr14s19-in-f18.1e100.net (173.194.34.82): icmp_seq=4 ttl=63 time=18.2 ms

Create Other VMs – Clone VM

You can either follow the same steps above to install the other two VMs, OR, you clone the first VM to create others. You can also choose to install Java and Cassandra on this VM and then clone the VM.

  1. Stop the running VM (shutdown -h now).
  2. Right-click VM in VirtualHost and select “Clone”.
  3. Pick a name for the VM and check “Reinitialise the MAC address of all network cards” option.
  4. Choose “Full Clone”.
  5. Once clone operation is complete, start the VM and login as root.

Fixing Network Configuration

Now both ifcfg-eth0 and ifcfg-eth1 are pointing to the IP addresses and hardware MAC addresses of the VM you cloned this from.

So IP addresses and host names are wrong, and also none of the network interfaces are working and even if you decide to (re)start them you will see the following error because of the incorrect MAC addresses in the configuration files:

Device eth0 does not seem to be present, delaying initialisation

In order to fix the IP addresses, host names and MAC address issues, follow these steps:

  1. Remove the kernel’s networking interface rules file – this will be regenerated upon reboot:
    rm -rf /etc/udev/rules.d/70-persistent-net.rules
    
  2. Open /etc/sysconfig/network-scripts/ifcfg-eth0:
    1. Correct the IP address, e.g. if this is cassandra-2, then IP address is 192.168.56.192.
    2. Remove (or comment out) HWADDR line
    3. Remove (or comment out) UUID line
  3. Open /etc/sysconfig/network-scripts/ifcfg-eth1:
    1. Remove (or comment out) HWADDR line
    2. Remove (or comment out) UUID line
  4. Reboot

Configure hostname

vi /etc/sysconfig/network:

NETWORKING=yes
HOSTNAME=cassandra-2.localdomain

vi /etc/hosts:

...
192.168.56.192  cassandra-2.localdomain

Verify whole setup

At this point I have 3 VMs with CentOS 6.5 installed.

Can they see each other?

Ping them!

E.g. can cassandra-1 see cassandra 2 & 3?

[root@cassandra-1 ~]# ping 192.168.56.192
PING 192.168.56.192 (192.168.56.192) 56(84) bytes of data.
64 bytes from 192.168.56.192: icmp_seq=1 ttl=64 time=0.284 ms
64 bytes from 192.168.56.192: icmp_seq=2 ttl=64 time=0.410 ms
64 bytes from 192.168.56.192: icmp_seq=3 ttl=64 time=0.503 ms
...
[root@cassandra-1 ~]# ping 192.168.56.193
PING 192.168.56.193 (192.168.56.193) 56(84) bytes of data.
64 bytes from 192.168.56.193: icmp_seq=1 ttl=64 time=0.335 ms
64 bytes from 192.168.56.193: icmp_seq=2 ttl=64 time=0.530 ms
64 bytes from 192.168.56.193: icmp_seq=3 ttl=64 time=0.453 ms
...

Cassandra Installation

Note [Mac users]: Instead of doing the below steps for all 3 VMs one by one, you can ssh into all 3 VMs via iTerm and broadcast inputs to all (Cmd + alt + i).

Install Java

On the host, download java runtime from Oracle (http://java.com/en/download/index.jsp)

Upload to VM:

> scp jre-7u45-linux-x64.gz root@192.168.56.191:/root
root@192.168.56.191's password:
jre-7u45-linux-x64.gz           100%   45MB  22.3MB/s   00:02

SSH into the VM and extract gz:

[root@cassandra-1 ~]# mkdir /usr/java;tar -xf jre-7u45-linux-x64.gz -C /usr/java/

Install Cassandra

I am using datastax Cassandra community edition to install Cassandra following this.

Configure yum repository:

vi /etc/yum.repos.d/datastax.repo

[datastax] 
name = DataStax Repo for Apache Cassandra
baseurl = http://rpm.datastax.com/community
enabled = 1
gpgcheck = 0

Install Cassandra:

> yum install dsc20

Fixing Java

DataStax Community RPM installs OpenJDK Java Runtime Environment. So once Cassandra installation is complete, you will see OpenJDK as the active java in the system:

[root@cassandra-1 ~]# java -version
java version "1.6.0_28"
OpenJDK Runtime Environment (IcedTea6 1.13.0pre) (rhel-1.66.1.13.0.el6-x86_64)
OpenJDK 64-Bit Server VM (build 23.25-b01, mixed mode)

We already have Oracle JRE installed on /usr/java/jre1.7.0_45. In order to configure the system to use the Oracle version:

[root@cassandra-1 ~]# alternatives --install /usr/bin/java java /usr/java/jre1.7.0_45/bin/java 20000

Now if you execute this command, you will see that we have two java runtimes installed and the one under /usr/java/jre1.7.0_45 is the default:

[root@cassandra-1 ~]# alternatives --config java

There is 2 program that provides 'java'.

  Selection    Command
-----------------------------------------------
   1           /usr/lib/jvm/jre-1.6.0-openjdk.x86_64/bin/java
*+ 2           /usr/java/jre1.7.0_45/bin/java

Also a last check:

[root@cassandra-1 ~]# which java
/usr/bin/java
[root@cassandra-1 ~]# java -version
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)

Repeat the setup to all remaining VMs (if you haven’t broadcasted inputs to all VMs).

Cassandra Configuration

Here is a list of important Cassandra folders (configured by the DataStax package release):

Directories Description
/var/lib/cassandra Data directories
/var/log/cassandra Log directory
/var/run/cassandra Runtime files
/usr/share/cassandra Environment settings
/usr/share/cassandra/lib JAR files
/usr/bin Binary files
/etc/cassandra Configuration files
/etc/init.d Service startup script

Configuration files for cassandra live under /etc/cassandra. The main configuration file is /etc/cassandra/conf/cassandra.yaml.

For each Cassandra node, the plan is to:

  1. Set the cluster name – all nodes should have the same value.
  2. Use vnodes
  3. Set the seed nodes – in my case, seed nodes are 192.168.56.191 and 192.168.56.192.
  4. Set the listen address – address to bind to and tell other Cassandra nodes to connect to.
  5. Set the RPC address – for client connections
  6. Set the snitch – I am setting this to PropertyFileSnitch.
  7. Configure the cluster topology (for PropertyFileSnitch).

cassandra.yaml

vi /etc/cassandra/conf/cassandra.yaml

Here is the cassandra.yaml changes for cassandra-1 node (192.168.56.191):

# The name of the cluster. This is mainly used to prevent machines in
# one logical cluster from joining another.
cluster_name: 'seymen'
…

num_tokens: 256
…

partitioner: org.apache.cassandra.dht.Murmur3Partitioner
…

- seeds: "192.168.56.191,192.168.56.192"
…

# TCP port, for commands and data
storage_port: 7000
…

listen_address: 192.168.56.191
…

rpc_address: 192.168.56.191
# port for Thrift to listen for clients on
rpc_port: 9160
…

endpoint_snitch: PropertyFileSnitch

cassandra-topology.properties

vi /etc/cassandra/conf/cassandra-topology.properties

# Cassandra Node IP=Data Center:Rack
192.168.56.191=dc1:rac1
192.168.56.192=dc1:rac1
192.168.56.193=dc1:rac1

# default for unknown nodes
default=dc1:rac1

Port Configuration

The following ports are used for Cassandra client/inter communication:

Port Description
7000 internode cluster communication
7199 JMX port – after the initial handshake, JMX protocol requires that the client reconnects on a randomly chosen port (1024+)
9160 Client communication port RPC/Thrift

For the full list, see Configuring firewall port access.

We need to make sure that those ports are open. By default those ports are closed on CentOS:

[root@cassandra-2 ~]# telnet 192.168.56.191 7000
Trying 192.168.56.191...
telnet: connect to address 192.168.56.191: No route to host

If port 7000 is closed, nodes won’t be able to talk to each other and each node will appear to form its own cluster, i.e. node won’t join an existing cluster.

Configuring iptables

Here I am only configuring iptables to accept incoming connections on port 7000, 7199 and 9160 from all sources. If you would like to define a source IP range or mac address, refer to CentOS IPTables HowTo.

…
-A INPUT -p tcp --dport 7000 -j ACCEPT
-A INPUT -p tcp --dport 7199 -j ACCEPT
-A INPUT -p tcp --dport 9160 -j ACCEPT
…

Restart iptables:

service iptables restart

Test port setup:

From 192 to 191:7000

[root@cassandra-2 ~]# telnet 192.168.56.191 7000
Trying 192.168.56.191...
Connected to 192.168.56.191.
Escape character is '^]'.

Start Cluster

Once this configuration is applied to all nodes, we can start the cluster. Seed nodes need to be started first.

In order to start Cassandra on any node:

service cassandra start

In order to see what is going on:

tail -f /var/log/cassandra/cassandra.log

When a node is started, also check the cluster status:

[root@cassandra-1 ~]# nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens  Owns (effective)  Host ID Rack
UN  192.168.56.191  40.86 KB   256     100.0%            …       rac1

Start all three Cassandra nodes and then check the status.

Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address         Load       Tokens  Owns (effective)  Host ID  Rack
UN  192.168.56.193  71.55 KB   256     62.5%             …        rac1
UN  192.168.56.191  40.86 KB   256     66.5%             …        rac1
UN  192.168.56.192  40.82 KB   256     71.0%             …        rac1

CQL Access

Start a CQL session to Cassandra (from any node):

[root@cassandra-1 ~]# cqlsh 192.168.56.191 9160
Connected to seymen at 192.168.56.191:9160.
[cqlsh 4.1.0 | Cassandra 2.0.3 | CQL spec 3.1.1 | Thrift protocol 19.38.0]
Use HELP for help.
cqlsh>

Listing my peers

This CQL command can be used to validate the peers, i.e. other nodes within this cluster:

This is executed from 192.168.56.191 – therefore it lists all other nodes as its peers.

cqlsh> select peer, data_center, host_id, preferred_ip, rack, release_version, rpc_address, schema_version from system.peers;

 peer           | data_center | host_id                              | preferred_ip | rack | release_version | rpc_address    | schema_version
----------------+-------------+--------------------------------------+--------------+------+-----------------+----------------+--------------------------------------
 192.168.56.193 |         dc1 | 56893f9c-a605-4273-93f4-e2f53f6f8b89 |         null | rac1 |           2.0.3 | 192.168.56.193 | 463377a8-9b5c-3e7d-a87b-2adfe4181f26
 192.168.56.192 |         dc1 | 25240ba2-6c3b-4e7f-b90e-a12704d330ee |         null | rac1 |           2.0.3 | 192.168.56.192 | 463377a8-9b5c-3e7d-a87b-2adfe4181f26

(2 rows)
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s