Saturday, 27 July 2019

Exploring Cassandra: Part- 1

Cassandra is an open source column family NoSQl database that is scalabale to handle massive volumes of data stored across commodity nodes. 

Why Cassandra?

Consider a scenario where we need to store large amounts of log data. Millions of log entries will be written everyday. It also requires a server with zero downtime.

Challenges with RDBMS


  • Cannot efficiently handle huge volumes of data
  • Difficult to serve users worldwide with the ceyralized single node model
  • Server with zero downtime

Using Cassandra


  • It is highly scalable and hence can handle large amounts of data
  • Most appropriate for write heavy work loads
  • Can handle millions of user requests per day
  • Can continue working even when nodes are down
  • Supports wide rows with a very flexible schema wherein all rows need not have the same number of columns

Cassandra Vs RDBMS



Cassandra Architecture


  • Cassandra follows a peer to peer master less architecture. So all the nodes in the cluster are considered equal.
  • Data is relicated on multiple nodes so as to ensure fault tolerance and high availability.
  • The node that recives client request is called the coordinator. The coordinator forwards the request to the appropriate node responsible for the given row key
  • Data Center: Collection of related nodes
  • Node: Place where the data is stored
  • Cluster: It contains one or more nodes

Applications of Cassandra


  • Suitable for high velocity data from sensors
  • Useful ot store time series data
  • Social media networking sites use Cassandra for analysis and recommendation of products to their customers
  • Preferred by companies providing messaging services for managing massive amounts of data



All good things are difficult to achieve; and bad things are very easy to get.