Big Big Things in my Little Little World

Saturday 2 March 2019

Google Cloud Platform(GCP) : Part- 1

Cloud Computing

Cloud computing is a way of using I.T. that has these five important traits:

Get computing resources on-demand and self-service.
Access these resources over the internet from anywhere we want
The provider of these resources has a big pool of them and allocates them
Resources are elastic
Pay only for what we use

GCP Architectures

Virtualize data centers brought us Infrastructure as a Service, IaaS, and Platform as a Service, PaaS offerings.
IaaS offerings provide raw, compute, storage, and network organized in ways that are familiar from data centers.
PaaS offerings on the other hand, bind application code we write to libraries that give access to the infrastructure our application needs.
In the IaaS model, we pay for what we allocate. In the PaaS model, we pay for what we use. Google's popular applications like, Search, Gmail, Docs and Drive are Software as a Service applications.

Google Network

It's designed to give its users the highest possible throughput and the lowest possible latencies for their applications. When an Internet user sends traffic to a Google resource, Google responds to the user's request from an edge network location that will provide the lowest latency. Google's Edge-caching network cites content close to end users to minimize latency.

GCP Regions and Zones

A zone is a deployment area for Google Cloud Platform Resources. Zones are grouped into regions, independent geographic areas, and we can choose what regions our GCP resources are in. All the zones within a region have fast network connectivity among them. Locations within regions usually have round trip network latencies of under five milliseconds. Zone is a single failure domain within a region. As part of building a fault tolerant application, we can spread the resources across multiple zones in a region. That helps protect against unexpected failures. We can run resources in different regions too. Lots of GCP customers do that, both to bring their applications closer to users around the world, and also to protect against the loss of an entire region, say, due to a natural disaster.
A few Google Cloud platform services support placing resources in what we call a Multi-Region. For example, Google cloud storage lets us to place data within the Europe Multi-Region. That means, it will be stored redundantly in at least two geographic locations, separated by at least 160 kilometers within Europe.

Pricing

Google was the first major cloud provider to build by the second, rather than rounding up to bigger units of time for its virtual machines as a service offering. Google offers per second billing. Charges for rounding can really add up for customers who are creating and running lots of virtual machines. Computer engine offers automatically applied sustained use discounts, which are automatic discounts that we get for running a virtual machine for a significant portion of the billing month. When we run an instance from more than 25 percent of a month, computer engine automatically gives us a discount for every incremental minute we use it.

Open APIs

Google helps its customers avoid feeling locked in. GCP services are compatible with open source products. For example, Bigtable uses the interface of the open source database Apache HBase, which gives customers the benefit of code portability. Another example, Cloud Dataproc offers the open source big data environment Hadoop, as a managed service. Google publishes key elements of technology using open source licenses to create ecosystems that provide customers with options other than Google. For example, TensorFlow is an open source software library for machine learning developed inside Google.
Many GCP technologies provide interoperability. Kubernetes gives customers the ability to mix and match microservices running across different clouds, and Google Stackdriver lets customers monitor workload across multiple cloud providers.

Why GCP

Google Cloud Platform lets us choose from computing, storage, big data, machine learning and application services for web, mobile, analytics and backend solutions. It's global, it's cost effective, it's open source friendly and it's designed for security. Google Cloud Platforms products and services can be broadly categorized as compute, storage, big data, machine learning, networking and operations and tools.

We are born not to be Average,

We are born to be Awesome..

Sunday 5 August 2018

Data Analysis with Python

Python is an open-source and object-oriented programming language developed by Guido Van Rossum in 1980s. While implementing Python, Guido Van Rossum was also reading the published scripts from “Monty Python's Flying Circus”, a BBC comedy series. Van Rossum thought he needed a name that was short and slightly mysterious, so he decided to call the language Python. Python is an interpreted language. Compiled code is the executable code in assembly language. But interpreted languages must be translated at run time to CPU machine instructions. At Google, python is one of the 3 "official languages" alongside with C++ and Java. They even have a developer portal devoted to Python, with free classes offered including exercises and lecture videos (https://developers.google.com/edu/python/). Python is also used as the configuration language for Tupperware, Facebook's container deployment system.

Installation

2 options:

Directly download and install Python
Download and install Anaconda

Development environment

Terminal or shell
IDLE(Integrated Development and Learning Environment)
iPython Notebook

Data structures

Lists: List of comma separated values in square brackets. Items must be of same type.
Strings: They are immutable. Enclosed between single('), double(") or triple(''') quotes. Strings enclosed in triple quotes can span over multiple lines.
Tuples: A number of values separated by commas. They are surrounded by parenthesis. They are immutable. Tuples are faster in processing as compared to lists due to its immutable nature.
Dictionary: Set of Key:Value pairs enclosed in parenthesis. Keys are unique.

Python libraries for data analysis

Numpy: Numerical Python. It provides the n-dimensioanl array feature. It also includes basic linear algebra functions, Fourier transforms, advances random number capabilities and tools for inetgration withlow level languages like C, C++ and Fortran
Scipy: Scientific Python. Library for discrete Fourier transforms, linear algebra, optimization and sparse matrices. It is a module for science built on Numpy.
Matplotlib: It is used for plotting graphs.
Pandas: It is used for structured data operations and manipulations. The following data structures are included in Pandas:

- Series: These are one dimensional labelled arrays.
- Dataframes: These are two dimensional data structures. It has column names and row indices.

Scikit Learn: It is a library for Machine Learning built on Numpy, Scipy and Matplotlib. It inclued tools for Classification, Regression, Clustering and Dimensionality Reduction.
NLTK: Natural Language Processing Tool Kit. It is a library for Natural Language Processing.
Stats Models: It is a library used for statistical Modeling
Seaborn: It is a library based on Matplotlib for statistical data visualization.
Bokeh: It is used for creating interative plots and dashboards on web browsers. It can visualize large and streaming datasets.
Blaze: It is built on Numpy and Pandas for streaming and distributed datasets. It has connectors to Apache Spark, MongoDB etc.
Scrapy: It is used for web crawling. It can be used to extract information from all pages of a website.
Sympy: It is a library for symbolic computation. It has the capability of formatting the result of the computations as LaTeX code.
Astropy: It is a package for Astronomy in Python.
Biopython: It is a set of tools for biological computation.

Friendship is born at that moment when one person says to another: 'What! You too? I thought I was the only one' 😊

Sunday 18 March 2018

Smart Digital Store- DigiShopiGo

Tom is planning to buy a smartphone. He searches online but finds different brands with the same specifications. He even puts a Facebook post asking for suggestions. Accidentally he notices 'DigiShopiGo' web store. 'DigiShopiGo' is a world famous retail chain offering wide variety of products. Tom decides to search for the smartphone in the web store.

The first step to search for a product in 'DigiShopiGo' is to register the email-id. After registering Tom sees variety of models with the same specifications which his friends had bought. He could also see reviews of the product. Tom purchases the product. He gets intimations regarding the delivery of the product. To his surprise a drone delivered the purchased mobile in 30 minutes. Tom receives the product. He is very much satisfied with his experience in purchasing the product and shares his opinion in social media.

'DigiShopiGo' has an advanced analytics wing that keeps track of the social media activities of its customers. Identifying Tom's opinion as Positive feedback, It starts sending intimations to Tom regarding the sale and availability of mobile accessories.

Accidentally Tom loses his mobile charger. He searches for an electronics shop online and offline. Suddenly he gets an intimation regarding the 'DigiShopiGo' outlet just nearby. 'DigiShopiGo' uses shopper's location data to showcase the nearest retail location. On entering the store, Tom was astonished to see a map showing the exact location of the item he needs to buy.Tom also gets instant notifications regarding the estimated wait time at the store for billing. As Tom walks he could also see smart digital shelves that gives him a personalized experience highlighting products of his interest. He instantly shares the experience in social media. So being digital means being social and accessible.

Be where the world is going!

Sunday 28 January 2018

FAIR releases Detectron

Facebook’s AI research team(FAIR) has been working on the problem of object detection by using deep learning to give computers the ability to reach conclusions about what objects are present in a scene. The company’s object detection algorithm, based on the Caffe2 deep learning framework, is called Detectron. The Detectron project was started in July 2016 with the goal of creating a fast and flexible object detection system. It implements state-of-the-art object detection algorithms. It is written in Python and powered by the Caffe2 deep learning framework. The algorithms examine video input and are able to make guesses about what discrete objects comprise the scene.

At FAIR, Detectron has enabled numerous research projects, including:

Feature Pyramid Networks for Object Detection: Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But it is not currently recommended due to its compute and memory intensive nature.
Mask R-CNN: It is a general framework for object instance segmentation. In object instance segmentation, given an image, the goal is to label each pixel according to its object class as well as its object instance. Instance segmentation is closely related to two important tasks in computer vision, namely semantic segmentation and object detection. The goal of semantic segmentation is to label each pixel according to its object class. However, semantic segmentation does not differentiate between two different object instances of the same class. For example, if there are two persons in an image, semantic segmentation will assign the same label to pixels belonging to either of these two persons. The goal of object detection is to predict the bounding box and the object class of each object instance in the image. However, object detection does not provide per-pixel labeling of the object instance. Compared with semantic segmentation and object detection, object instance segmentation is strictly more challenging, since it aims to identify object instance as well as provide per-pixel labeling of each object instance.
Detecting and Recognizing Human-Object Interactions: To understand the visual world, a machine must not only recognize individual object instances but also how they interact. The Human-Object interaction is detected and represented as triplets<human, verb, object> in photos. Eg: <person, reads, book>
Focal Loss for Dense Object Detection: The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors. An object detector named Retinanet is designed to identify the loss. RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors.
Non-local Neural Networks: Non-local means is an algorithm in image processing for image denoising. Unlike "local mean" filters, which take the mean value of a group of pixels surrounding a target pixel to smooth the image, non-local means filtering takes a mean of all pixels in the image, weighted by how similar these pixels are to the target pixel. This results in much greater post-filtering clarity, and less loss of detail in the image compared with local mean algorithms. Inspired by the classical non-local means method in computer vision, the non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures.
Learning to Segment Every Thing: Existing methods for object instance segmentation require all training instances to be labeled with segmentation masks. This requirement makes it expensive to annotate new categories and has restricted instance segmentation models to ~100 well-annotated classes. A new partially supervised training paradigm is proposed, together with a novel weight transfer function, that enables training instance segmentation models over a large set of categories for which all have box annotations, but only a small fraction have mask annotations.
Data Distillation: Omni-supervised learning is a special area of semi-supervised learning in which the learner exploits all available labeled data plus internet-scale sources of unlabeled data. Data distillation is a method that ensembles predictions from multiple transformations of unlabeled data, using a single model, to automatically generate new training annotations.

The goal of Detectron is to provide a high-quality, high-performance codebase for object detection research. It is designed to be flexible in order to support rapid implementation and evaluation of novel research. Detectron includes implementations of the following object detection algorithms:

Mask R-CNN
RetinaNet
Faster R-CNN
RPN
Fast R-CNN
R-FCN

From augmented reality to various computer vision tasks, Detectron has a wide variety of uses. One of the many things that this new platform can do is object masking. Object masking takes objected detection a step further and instead of just drawing a bounding box around the image, it can actually draw a complex polygon. Detectron is available under the Apache 2.0 licence at GitHub. The company says it is also releasing extensive performance baselines for more than 70 pre-trained models that are available to download from its model zoo on GitHub. Once the model is trained, it can be deployed on the cloud and even on mobile devices.

References

https://www.techleer.com/articles/469-facebook-announces-open-sourcing-of-detectron-a-real-time-object-detection/
https://github.com/facebookresearch/Detectron
https://arxiv.org/

Success is walking from failure to failure with no loss of enthusiasm..

Sunday 10 December 2017

K-means clustering

It is the process of grouping documents such that documents in a cluster are similar and documents in different cluster are dissimilar. Vector space model is used.
Algorithm

Choose the value of k
K objects are randomly chosen to form centroid of k clusters
Repeat until no change in location of centroid or no change in objects assigned to cluster

Find distance of each object to cluster center and assign it to one with minimum
Calculate mean of each cluster group to compute new cluster centers

Let k=3 an initial cluster seeds be d2, d5 and d7.
calculating Euclidean distance between d1 and d2 :

The clusters are:
d2, d1, d6, d9, d10
d5, d8
d7, d3, d4

If you want something and you get something else, never be afraid. One day things will surely work your way!!

Sunday 3 December 2017

Developing Devices that can See

Google AIY Vision Kit

Google has introduced the AIY Voice Kit back in May, and now the company has launched the AIY Vision Kit that has on-device neural network acceleration for Raspberry Pi. Unlike the Voice Kit, the Vision Kit is designed to run the all the machine learning locally on the device rather than talk to the cloud. While it was possible to run TensorFlow locally on the Raspberry Pi with the Voice Kit, the previous kit was far more suited to using Google’s Assistant API or their Cloud Speech API to do voice recognition. However the Vision Kit is designed from the ground up to run do all its image processing locally. The Vision kit includes a new circuit board, and computer vision software can be paired with Raspberry Pi computer and camera.
In addition to the Vision, users will need a Raspberry Pi Zero W, a Raspberry Pi Camera, an SD card and a power supply, that must be purchased separately. The Kit includes cardboard outer shell, the VisionBonnet circuit board, an RGB arcade-style button, a piezo speaker, a macro/wide lens kit, a tripod mounting nut and other connecting components.
The main component of the Vision Kit is the VisionBonnet that features the Intel Movidius MA2450 which is a low-power vision processing unit capable of running neural network models on-device. The software includes three models:

Model to recognize common objects
Model to recognize faces and their expressions
Person, cat and dog detector

Google has also included a tool to compile models for Vision Kit, and users can train their own models with Google’s TensorFlow machine learning software. The AIY Vision kit costs $44.99 and will ship from December 31st through Micro Center. This first batch is a limited run of just 2,000 units and is available in the US only.

AWS Deeplens

Amazon’s Deeplens device introduced at Amazon ReInvent is aimed, at software developers and data scientists using machine learning, and Amazon have packed a lot of power into it a 4 megapixel camera that can capture 1080P video, a 2D microphone array, and even an Intel Atom processor. Intended to sit connected to the mains and be used as a platform, as a tool. It’s a finished product, with a $250 price. It will be shipped only after April 2018. DeepLens uses Intel-optimized deep learning software tools and libraries (including the Intel Compute Library for Deep Neural Networks, Intel clDNN) to run real-time computer vision models directly on the device for reduced cost and real-time responsiveness. It supports major machine learning frameworks like Google’s TensorFlow, Facebook’s Caffe2, Pytorch, and Apache MXNET. DeepLens will be tightly integrated with other cloud and AI services sold by AWS.

The Amazon kit is aimed at developers looking to build and train deep learning models in the real world. The Google kit is aimed at makers looking to build projects, or even products. The introduction of TensorflowLite and the Google AIY Vision kit can be regarded as a recent trend in moving the computation to the device rather than the cloud.

Reference:

https://aiyprojects.withgoogle.com/
https://aws.amazon.com/deeplens/
https://aws.amazon.com/blogs/aws/deeplens/

Adopting the right attitude can convert a negative stress into a positive one

Sunday 5 November 2017

Naive Bayes Text classification

Doc -> {+, -}
Documents are a vector or array of words
Conditional independence assumption: No relation exists between words and they are independent of each other.
Probability of review being positive is equal to probability of each word classified as positive while going through the entire length of document

Unique words- I, loved, the, movie, hated, a, great, poor, acting, good [10 unique words]
Involves 3 steps:
1. Convert docs to feature sets
2. Find probabilities of outcomes
3. Classifying new sentences

Convert docs to feature sets

Attributes: all possible words
Values: no: of times the word occurs in the doc

Find Probabilities of outcomes

Classifying new sentence

A calm and modest life brings more happiness than the pursuit of success combined with constant restlessness.