Cassandra is an open source column family NoSQl database that is scalabale to handle massive volumes of data stored across commodity nodes.
Why Cassandra?
Consider a scenario where we need to store large amounts of log data. Millions of log entries will be written everyday. It also requires a server with zero downtime.Challenges with RDBMS
- Cannot efficiently handle huge volumes of data
- Difficult to serve users worldwide with the ceyralized single node model
- Server with zero downtime
Using Cassandra
- It is highly scalable and hence can handle large amounts of data
- Most appropriate for write heavy work loads
- Can handle millions of user requests per day
- Can continue working even when nodes are down
- Supports wide rows with a very flexible schema wherein all rows need not have the same number of columns
Cassandra Vs RDBMS
Cassandra Architecture
- Cassandra follows a peer to peer master less architecture. So all the nodes in the cluster are considered equal.
- Data is relicated on multiple nodes so as to ensure fault tolerance and high availability.
- The node that recives client request is called the coordinator. The coordinator forwards the request to the appropriate node responsible for the given row key
- Data Center: Collection of related nodes
- Node: Place where the data is stored
- Cluster: It contains one or more nodes
Applications of Cassandra
- Suitable for high velocity data from sensors
- Useful ot store time series data
- Social media networking sites use Cassandra for analysis and recommendation of products to their customers
- Preferred by companies providing messaging services for managing massive amounts of data
All good things are difficult to achieve; and bad things are very easy to get.