Sunday, 5 August 2018

Data Analysis with Python

Python is an open-source and object-oriented programming language developed by Guido Van Rossum in 1980s. While implementing Python, Guido Van Rossum was also reading the published scripts from “Monty Python's Flying Circus”, a BBC comedy series. Van Rossum thought he needed a name that was short and slightly mysterious, so he decided to call the language Python. Python is an interpreted language. Compiled code is the executable code in assembly language. But interpreted languages must be translated at run time to CPU machine instructions. At Google, python is one of the 3 "official languages" alongside with C++ and Java. They even have a developer portal devoted to Python, with free classes offered including exercises and lecture videos (https://developers.google.com/edu/python/). Python is also used as the configuration language for Tupperware, Facebook's container deployment system.

Installation

2 options:

  • Directly download and install Python
  • Download and install Anaconda

Development environment


  • Terminal or shell
  • IDLE(Integrated Development and Learning Environment)
  • iPython Notebook

Data structures


  • Lists: List of comma separated values in square brackets. Items must be of same type.
  • Strings: They are immutable. Enclosed between single('), double(") or triple(''') quotes. Strings enclosed in triple quotes can span over multiple lines. 
  • Tuples: A number of values separated by commas. They are surrounded by parenthesis. They are immutable. Tuples are faster in processing as compared to lists due to its immutable nature.
  • Dictionary: Set of Key:Value pairs enclosed in parenthesis. Keys are unique.

Python libraries for data analysis


  • Numpy: Numerical Python. It provides the n-dimensioanl array feature. It also includes basic linear algebra functions, Fourier transforms, advances random number capabilities and tools for inetgration withlow level languages like C, C++ and Fortran
  • Scipy: Scientific Python. Library for discrete Fourier transforms, linear algebra, optimization and sparse matrices. It is a module for science built on Numpy.
  • Matplotlib: It is used for plotting graphs.
  • Pandas: It is used for structured data operations and manipulations. The following data structures are included in Pandas:
- Series: These are one dimensional labelled arrays.
- Dataframes: These are two dimensional data structures. It has column names and row indices.
  • Scikit Learn: It is a library for Machine Learning built on Numpy, Scipy and Matplotlib. It inclued tools for Classification, Regression, Clustering and Dimensionality Reduction.
  • NLTK: Natural Language Processing Tool Kit. It is a library for Natural Language Processing.
  • Stats Models: It is a library used for statistical Modeling
  • Seaborn: It is a library based on Matplotlib for statistical data visualization.
  • Bokeh: It is used for creating interative plots and dashboards on web browsers. It can visualize large and streaming datasets.
  • Blaze: It is built on Numpy and Pandas for streaming and distributed datasets. It has connectors to Apache Spark, MongoDB etc.
  • Scrapy: It is used for web crawling. It can be used to extract information from all pages of a website.
  • Sympy: It is a library for symbolic computation. It has the capability of formatting the result of the computations as LaTeX code.
  • Astropy: It is a package for Astronomy in Python.
  • Biopython: It is a set of tools for biological computation. 


Friendship is born at that moment when one person says to another: 'What! You too? I thought I was the only one' ðŸ˜Š


Sunday, 18 March 2018

Smart Digital Store- DigiShopiGo

Tom is planning to buy a smartphone. He searches online but finds different brands with the same specifications. He even puts a Facebook post asking for suggestions. Accidentally he notices 'DigiShopiGo' web store. 'DigiShopiGo' is a world famous retail chain offering wide variety of products. Tom decides to search for the smartphone in the web store.

The first step to search for a product in 'DigiShopiGo' is to register the email-id. After registering Tom sees variety of models with the same specifications which his friends had bought. He could also see reviews of the product. Tom purchases the product. He gets intimations regarding the delivery of the product. To his surprise a drone delivered the purchased mobile in 30 minutes. Tom receives the product. He is very much satisfied with his experience in purchasing the product and shares his opinion in social media. 

'DigiShopiGo' has an advanced analytics wing that keeps track of the social media activities of its customers. Identifying Tom's opinion as Positive feedback, It starts sending intimations to Tom regarding the sale and availability of mobile accessories. 

Accidentally Tom loses his mobile charger. He searches for an electronics shop online and offline. Suddenly he gets an intimation regarding the 'DigiShopiGo' outlet just nearby. 'DigiShopiGo' uses shopper's location data to showcase the nearest retail location. On entering the store, Tom was astonished to see a map showing the exact location of the item he needs to buy.Tom also gets instant notifications regarding the estimated wait time at the store for billing. As Tom walks he could also see smart digital shelves that gives him a personalized experience highlighting products of his interest. He instantly shares the experience in social media. So being digital means being social and accessible.

Be where the world is going!

Sunday, 28 January 2018

FAIR releases Detectron

Facebook’s AI research team(FAIR) has been working on the problem of object detection by using deep learning to give computers the ability to reach conclusions about what objects are present in a scene. The company’s object detection algorithm, based on the Caffe2 deep learning framework, is called Detectron. The Detectron project was started in July 2016 with the goal of creating a fast and flexible object detection system. It implements state-of-the-art object detection algorithms. It is written in Python and powered by the Caffe2 deep learning framework. The algorithms examine video input and are able to make guesses about what discrete objects comprise the scene.

At FAIR, Detectron has enabled numerous research projects, including: 

  • Feature Pyramid Networks for Object Detection: Feature pyramids are a basic component in recognition systems for detecting objects at different scales. But it is not currently recommended due to its compute and memory intensive nature.
  • Mask R-CNN: It is a general framework for object instance segmentation. In object instance segmentation, given an image, the goal is to label each pixel according to its object class as well as its object instance. Instance segmentation is closely related to two important tasks in computer vision, namely semantic segmentation and object detection. The goal of semantic segmentation is to label each pixel according to its object class. However, semantic segmentation does not differentiate between two different object instances of the same class. For example, if there are two persons in an image, semantic segmentation will assign the same label to pixels belonging to either of these two persons. The goal of object detection is to predict the bounding box and the object class of each object instance in the image. However, object detection does not provide per-pixel labeling of the object instance. Compared with semantic segmentation and object detection, object instance segmentation is strictly more challenging, since it aims to identify object instance as well as provide per-pixel labeling of each object instance.
  • Detecting and Recognizing Human-Object Interactions: To understand the visual world, a machine must not only recognize individual object instances but also how they interact. The Human-Object interaction is detected and represented as triplets<human, verb, object> in photos. Eg: <person, reads, book>
  • Focal Loss for Dense Object Detection: The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors. An object detector named Retinanet is designed to identify the loss. RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors.
  • Non-local Neural Networks: Non-local means is an algorithm in image processing for image denoising. Unlike "local mean" filters, which take the mean value of a group of pixels surrounding a target pixel to smooth the image, non-local means filtering takes a mean of all pixels in the image, weighted by how similar these pixels are to the target pixel. This results in much greater post-filtering clarity, and less loss of detail in the image compared with local mean algorithms. Inspired by the classical non-local means method in computer vision, the non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures.
  • Learning to Segment Every Thing: Existing methods for object instance segmentation require all training instances to be labeled with segmentation masks. This requirement makes it expensive to annotate new categories and has restricted instance segmentation models to ~100 well-annotated classes. A new partially supervised training paradigm is proposed, together with a novel weight transfer function, that enables training instance segmentation models over a large set of categories for which all have box annotations, but only a small fraction have mask annotations.
  • Data Distillation: Omni-supervised learning is a special area of semi-supervised learning in which the learner exploits all available labeled data plus internet-scale sources of unlabeled data. Data distillation is a method that ensembles predictions from multiple transformations of unlabeled data, using a single model, to automatically generate new training annotations.

The goal of Detectron is to provide a high-quality, high-performance codebase for object detection research. It is designed to be flexible in order to support rapid implementation and evaluation of novel research. Detectron includes implementations of the following object detection algorithms:

  • Mask R-CNN
  • RetinaNet
  • Faster R-CNN
  • RPN
  • Fast R-CNN
  • R-FCN


 From augmented reality to various computer vision tasks, Detectron has a wide variety of uses. One of the many things that this new platform can do is object masking. Object masking takes objected detection a step further and instead of just drawing a bounding box around the image, it can actually draw a complex polygon. Detectron is available under the Apache 2.0 licence at GitHub. The company says it is also releasing extensive performance baselines for more than 70 pre-trained models that are available to download from its model zoo on GitHub. Once the model is trained, it can be deployed on the cloud and even on mobile devices.

References


  1. https://www.techleer.com/articles/469-facebook-announces-open-sourcing-of-detectron-a-real-time-object-detection/
  2. https://github.com/facebookresearch/Detectron
  3. https://arxiv.org/


Success is walking from failure to failure with no loss of enthusiasm..