Monday, 9 March 2020

DataOps

Way to DataOps

DataOps focuses on the end-to-end delivery of data. In the digital era, companies need to harness their data to derive competitive advantage. In addition, companies across all industries need to comply with new data privacy regulations. The need for DataOps can be summarised as follows:
  • More data is available than ever before
  • More users want access to more data in more combinations

DataOps

When development and operations don’t work in concert, it becomes hard to ship and maintain quality software at speed. This led to the need for DevOps. 
DataOps is similar to DevOps, but centered around the strategic use of data, as opposed to shipping software. DataOps is an automated process oriented methodology used by Big Data teams to improve the quality and reduce the cycle time of Data Analytics. It applies to the entire data life cycle from data preparation to reporting.
It includes automating different stages of the work flow including BI, Data Science and Analytics.DataOps speeds up the production of applications running on Big Data processing frameworks. 

Components 

DataOps include the following components:

  • Data Engineering
  • Data Integration
  • Data security
  • Data Quality

DevOps and DataOps

  • DevOps is the collaboration between Developers, Operations and QA Engineers across the entire Application Delivery pipeline, from Design and Coding, to Testing and Production Support. While DataOps is a Data Management method that emphasizes communication, collaboration, integration and automation of processes between Data Engineers, Data Scientists and other Data professionals.
  • DevOps mission is to enable Developers and Managers to handle modern web based Application development and deployment. DataOps enables data professionals to optimize modern web based data storage and analytics.
  • DevOps focuses on continuous delivery by leveraging on demand IT resources and by automating Testing and Deployment. While DataOps tries to bring the same improvements to  Data Analytics.

Steps to implement DataOps

The following are the 7 steps to implement DataOps:
  1. Add Data and logic tests
  2. Use a version control system 
  3. Branch and Merge Codebase
  4. Use multiple environments 
  5. Reuse and containerize
  6. Parameterize processing 
  7. Orchestrate data pipelines 

The above details are based on the learnings gathered from different Internet sources. 



Keep going, because you didn't come this far to come only this far.. 

Sunday, 1 March 2020

DevOps

DevOps is a Software Engineering practice that aims at unifying Software Development and Operation. As the name implies, it is a combination of Development and Operations. The main phases in each of these can be described as follows:
  • Dev: Plan - - > Create - - > Verify - - > Package
  • Ops: Release - - > Configure - - > Monitor

DevOps Culture 

Devops is often described as a culture. Hence it consists of different aspects such as:
  • Engineer EmpowermentIt gives engineers more responsibility over the typical application life cycle process starting from Development, Testing, Deployment, Monitoring and Be On Call.
  • Test Driven DevelopmentIt is the practice of writing Tests before writing code. This will help increase the quality of service and gives developers more confidence for faster and frequent code releases.
  • AutomationIt involves the concept of automating everything that can be automated. This includes Test Automation, Infrastructure Automation, Deployment Automation etc.
  • MonitoringThis is the process of building monitoring alerts and monitoring the applications.

Challenges in DevOps

Challenges DevOps Solution
Dev Challenges
Waiting time for Code Deployment Continuous Integration ensures there is quick deployment of code, faster testing and speedy feedback
Pressure of work on old code Since there is no waiting time to deploy the code, the developer focusses on building the current code
Ops Challenges
Difficult to maintain uptime of production environment Containerization or Virtualization ensures that there is a simulated environment created to run the software containers and also offer great reliability for application uptime
Tools to automate infrastructure management are not effective Configuration management helps to organize and execute configuration plans, consistently provision the system and proactively manage the infrastructure.
Number of servers to be monitored increases and hence it is difficult to diagnose the issues. Continuous monitoring and feedback system is established through DevOps. Thus effective administration is assured

Periodic Table of DevOps Tools



Popular DevOps Tools

Some of the most popular DevOps tools are:
  • Git: Git is an open source, distributed and the most popular software versioning system. It works on client server model. Code can be downloaded from main repository simultaneously by various clients or developers.
  • Maven: Maven is build automation tool. It automates software build process & dependencies resolution. A Maven project is configured using a project object model or pom.xml file.
  • Ansible: Ansible is an open source application which is used for automated software provisioning, configuration management and application deployment. Ansible helps in controlling an automated cluster environment consisting of many machines.
  • Puppet: Puppet is an open source software configuration management, automated provisioning tool. It is an alternative to Ansible and provides better control over client machines. Puppet comes up with GUI which makes it easy to use than Ansible.
  • Docker: Docker is a containerization technology. Containers consist of all the applications with all of its dependencies. These containers can be deployed on any machine without caring about underlying host details.
  • Jenkins: Jenkins is an open source automation server written in java. Jenkins is used in creating continuous delivery pipelines.
  • Nagios: Nagios is used for continuous monitoring of infrastructure. Nagios helps in monitoring server, application and network. It provides a GUI interface to check various details like memory utilisation, fan speed, routing tables of switches, or state of SQL server.
  • Selenium: This is an open source automation testing framework used for automating the testing of web applications. Selenium is not a single tool but a suite of tools. There are four components of Selenium – Selenium IDE, RC, WebDriver, and Grid. Selenium is used to repeatedly execute testcases for applications without manual intervention and generate reports.
  • Chef: Chef is a configuration management tool. Chef is used to manage configuration like creating or removing a user, adding SSH key to a user present on multiple nodes, installing or removing a service, etc.
  • Kubernetes: Kubernetes is an open source container orchestration tool. It is developed by Google. It is used in continuous deployment and auto scaling of container clusters. It increases fault tolerance, load balancing in a container cluster.

References 

https://xebialabs.com/periodic-table-of-devops-tools/

Can DevOps can be incorporated in Data Management?
More details regarding the same will be discussed in the next post.

Also Thank you to one of my former colleagues for inspiring me to write a post on this hot topic.



She woke up every morning with the option of being anyone she wished, how beautiful it was that she always choose herself...