Big Big Things in my Little Little World: July 2019

Cassandra is an open source column family NoSQl database that is scalabale to handle massive volumes of data stored across commodity nodes.

Why Cassandra?

Consider a scenario where we need to store large amounts of log data. Millions of log entries will be written everyday. It also requires a server with zero downtime.

Challenges with RDBMS

Cannot efficiently handle huge volumes of data
Difficult to serve users worldwide with the ceyralized single node model
Server with zero downtime

Using Cassandra

It is highly scalable and hence can handle large amounts of data
Most appropriate for write heavy work loads
Can handle millions of user requests per day
Can continue working even when nodes are down
Supports wide rows with a very flexible schema wherein all rows need not have the same number of columns

Cassandra Vs RDBMS

Cassandra Architecture

Cassandra follows a peer to peer master less architecture. So all the nodes in the cluster are considered equal.
Data is relicated on multiple nodes so as to ensure fault tolerance and high availability.
The node that recives client request is called the coordinator. The coordinator forwards the request to the appropriate node responsible for the given row key
Data Center: Collection of related nodes
Node: Place where the data is stored
Cluster: It contains one or more nodes

Applications of Cassandra

Suitable for high velocity data from sensors
Useful ot store time series data
Social media networking sites use Cassandra for analysis and recommendation of products to their customers
Preferred by companies providing messaging services for managing massive amounts of data

All good things are difficult to achieve; and bad things are very easy to get.

Big Big Things in my Little Little World

Saturday, 27 July 2019

Exploring Cassandra: Part- 1