Wednesday, 22 March 2017

Block chain in bits & pieces


Blockchain system is a package which contains a normal database plus some software that adds new rows, validates that new rows conform to pre-agreed rules, and listens and broadcasts new rows to its peers across a network, ensuring that all peers have the same data in their databases. The Bitcoin Blockchain ecosystem is actually quite a complex system due to its dual aims: that anyone should be able to write to the Bitcoin Blockchain; and that there shouldn’t be any centralised power or control.

Bitcoin

A 2008 whitepaper entitled "Bitcoin: A Peer-to-Peer Electronic Cash System" written by the Satoshi Nakamoto introduced the concept of bitcoin. It is described as "a purely peer-to-peer version of electronic cash [which] would allow online payments to be sent directly from one party to another without going through a financial institution".It can be thought like an international currency which can be used for transactions in internet. As an electronic asset, we can buy bitcoins, own them, and send them to someone else. Transactions of bitcoins from account to account are recognised globally in a matter of seconds, and can be considered securely settled within an hour, usually.  They have a price, and the price is set by normal supply and demand market forces in marketplaces where traders come to trade.
So, there is the concept of electronic cash: cash being a bearer asset, like the cash in wer pocket which we can spend at will without asking permission from a third party.

How does it work?

A network of computers validates and keeps track of bitcoin payments, and ensures that they are recorded by being added to an ever-growing list of all the bitcoin payments that have been made.
When we make a bitcoin payment, a payment instruction is sent to the network.  The computers on the network validate the instruction and relay it to the other computers.  After some time has passed, the payment gets included in one of the block updates, and is added to The Bitcoin Blockchain file on all the computers across the network.
Peer-to-peer:  Peer-to-peer is like a gossip network where everyone tells a few other people the news (about new transactions and new blocks), and eventually the message gets to everyone in the network. One benefit of peer-to-peer (p2p) over client-server is that with p2p, the network doesn’t rely on one central point of control which can fail.

How are bitcoins stored?

Bitcoin ownership is tracked on The Bitcoin Blockchain, and bitcoins are associated with “bitcoin addresses”.  Bitcoins themselves are not stored; but rather the keys or passwords needed to make payments are stored, in “wallets” which are apps that manage the addresses, keys, balances, and payments. 
Bitcoin addresses: A bitcoin address is similar to a bank account number.
Bitcoin wallets: Bitcoin wallets are apps that display all of wer bitcoin addresses, display balances and make it easy to send and receive payments. For a wallet to provide accurate information, it needs to be online or connected to a Bitcoin Blockchain file, which it uses as its source of information.  The wallet will read the Bitcoin Blockchain file and calculate the balances in each address.

How are bitcoins sent?

Payments, or bitcoin transactions
Each bitcoin address has its own private key, which is needed to send payments from that address. Whoever knows this private key, can now make payments from the address. wallet software is used to get this private key.
Private key: Because we can not change that private key to something more memorable, it can be a pain to remember.  Most wallet apps will encrypt that key with a password that we choose.  Later, when we want to make a payment, we just need to remember wer password. Bitcoin wallets don’t store bitcoins but store the keys that let us transfer or ‘spend’ them.

What happens when I make a bitcoin payment?

A payment is an instruction to unlink some bitcoins from an address we control, and move them to the control of another address (your recipient).Our payment instruction includes:

  • which bitcoins we’re sending
  • which address we’re sending them from
  • which address we’re sending them to

Digital cryptographic signatures:  The instruction is then digitally signed with the private key of the address which currently holds the bitcoins.  This digital signing demonstrates that we are owner of the address in question.
Validators:  When the first computer receives the instruction, it checks some technical details, and some business logic details. The same tests are done in all computers in the network. Eventually all computers on the network know about this payment, and it appears on screens everywhere in the world as an “unconfirmed transaction”.  It is unconfirmed because although the payment has been verified and passed around, it isn’t entered into the ledger yet.

How are bitcoins tracked?

Specialised nodes in the network, work to add the bitcoin transactions, in blocks, to the blockchain.  This is known as bitcoin mining. Mining is a guessing game where your chance of winning is related to the how quickly your machine can perform calculations compared to how quickly other miners are performing similar calculations.  Whoever guesses the right number first wins the right to add a new block of transactions to everyone’s blockchains, and does this by publishing this to the other computers on the network.  Each computer performs a quick validation of the block, and they agree that the block and transactions conform to the rules, then they add the block to their own blockchain.

Bitcoin security

Making payments: Bitcoin private keys are used to make payments.
Block control: There are two parts to this.  Firstly there is block-creation (“mining”), performed by some specialised nodes; secondly there is block validation, which is performed by all nodes.

References
https://bitsonblocks.net/2015/09/09/a-gentle-introduction-to-blockchain-technology/
https://bitsonblocks.net/2015/09/01/a-gentle-introduction-to-bitcoin/
https://bitcoin.org/bitcoin.pdf

It only takes a split second to smile, yet to someone that needed it, it can last a lifetime.

Saturday, 7 January 2017

Useful Queries in Neo4j

1. Create nodes with label Employee and Department
CREATE (emp:Employee{id:"1001",name:"Tom",dob:"01/10/1982"})
CREATE (dept:Department { deptno:10,dname:"Accounting",location:"Hyderabad" })

2. Create node and relationship
CREATE (emp:Employee{id:"1001"})-[:WORKS AT{since:"2 days"}]->(dept:Department { deptno:10}) 

3. Selection
MATCH (n) RETURN n : returns all nodes and relationships
MATCH (emp:Employee{id:"1001"}) RETURN emp : returns the nodes with id as 1001.
MATCH (emp:Employee) WHERE emp.id="1001" RETURN emp : returns the same result as previous query, but is not preferred ue to performance impacts

4. Limiting results
MATCH (n) RETURN n LIMIT 10

5. Deletion
MATCH(n) DELETE(n)

6. Delete nodes and relationships(A node having relationship can only be deleted if all the relationships associated with that node is deleted.)
MATCH (n) DETACH DELETE n

7. Return attribute names of a node
MATCH (n) RETURN keys(n)

8. Get attributes along with values
MATCH (n) RETURN properties(n)

9. Regular expressions
MATCH (emp:Employee) 
WHERE emp.id=~'\\d+'
RETURN (emp)

10. Get create timestamp of a node
Match (n) RETURN timestamp()

11. Find all nodes that have a relationship with the designated label, and RETURN selected properties of the nodes and the relationships
MATCH (n)-[r:WORKS_AT]->()
RETURN n,r.since

12. Return distinct in neo4j
MATCH (n1)-[r:WORKS_AT]->(n2)
RETURN DISTINCT r.since

13. Aggregation
MATCH (n1)-[r:WORKS_AT]->(n2)
RETURN r.since,COUNT(*)

14. Create-Update-Delete properties
Merge (emp:Employee{id:"1001"}) ON CREATE SET emp.city:"Kochi",emp.dob:"01/10/1982" ON MATCH  SET name:"Michael" REMOVE emp.id

15. Return any relationship
MATCH (n)--(m) RETURN n

16. Find count of relationship in both directions
MATCH (n)
RETURN n.name,size((n)-->()) as outcount, size((n)<--()) as incount



The best preparation for tomorrow is doing best today

Monday, 31 October 2016

Spark on MongoDB- Part: 1

MongoDB
MongoDB is document oriented storage. It was developed by MongoDB Inc.(formerly 10 gen) in 2007. It has APIs in various programming languages such as JavaScript, Python, Ruby, Perl, Java, Java, Scala, C#, C++, Haskell, Erlang etc. It supports built in horizontal scaling by dividing the system dataset and loading over multiple servers. MongoDB supports horizontal scaling by sharding. MongoDB supports embedded documents which eliminates the need for complex joins. It also provide full index support and high availability through replication.
Nexus Architecture
MongoDB’s design philosophy is focused on combining the critical capabilities of relational databases with the innovations of NoSQL technologies. It adopts the following features from relational databases:

  • Expressive query language & secondary Indexes. Users should be able to access and manipulate their data in sophisticated ways to support both operational and analytical applications. Indexes play a critical role in providing efficient access to data, supported natively by the database rather than maintained in application code.
  • Strong consistency. Applications should be able to immediately read what has been written to the database.
  • Enterprise Management and Integrations. Databases are just one piece of application infrastructure, and need to fit seamlessly into the enterprise IT stack. Organizations need a database that can be secured, monitored, automated, and integrated with their existing technology infrastructure, processes, and staff, including operations teams, DBAs, and data analysts.
However, modern applications impose requirements not addressed by relational databases, and this has driven the development of NoSQL databases which offer: 
  • Flexible Data Model. Whether document, graph, key-value, or wide-column, all of them offer a flexible data model, making it easy to store and combine data of any structure and allow dynamic modification of the schema without downtime or performance impact. 
  • Scalability and Performance. NoSQL databases were all built with a focus on scalability, so they all include some form of sharding or partitioning. This allows the database to scale out on commodity hardware deployed on-premises or in the cloud, enabling almost unlimited growth with higher throughput and lower latency than relational databases. 
  • Always-On Global Deployments. NoSQL databases are designed for highly available systems that provide a consistent, high quality experience for users all over the world. They are designed to run across many nodes, including replication to automatically synchronize data across servers, racks, and data centers.

With its Nexus Architecture, MongoDB is the only database that harnesses the innovations of NoSQL while maintaining the foundation of relational databases.
Data Model
MongoDB stores data as documents in a binary representation called BSON (Binary JSON). The BSON encoding extends the popular JSON (JavaScript Object Notation) representation to include additional types such as int, long, date, and floating point. BSON documents contain one or more fields, and each field contains a value of a specific data type, including arrays, binary data and sub-documents.
For example, consider the data model for a blogging application. In a relational database the data model would comprise multiple tables. To simplify the example, assume there are tables for Categories, Tags, Users, Comments and Articles. In MongoDB the data could be modeled as two collections, one for users, and the other for articles. In each blog document there might be multiple comments, multiple tags, and multiple categories, each expressed as an embedded array.
MongoDB documents tend to have all data for a given record in a single document, whereas in a relational database information for a given record is usually spread across many tables. With the MongoDB document model, data is more localized, which significantly reduces the need to JOIN separate tables. The result is dramatically higher performance and scalability across commodity hardware as a single read to the database can retrieve the entire document containing all related data. Unlike many NoSQL databases, users don’t need to give up JOINs entirely. For additional analytics flexibility, MongoDB preserves left-outer JOIN semanticswith the $lookup operator, enabling users to get the best of both relational and non-relational data modeling. MongoDB BSON documents are closely aligned to the structure of objects in the programming language. This makes it simpler and faster for developers to model how data in the application will map to data stored in the database.
Fields can vary from document to document; there is no need to declare the structure of documents to the system – documents are self-describing. If a new field needs to be added to a document then the field can be created without affecting all other documents in the system, without updating a central system catalog, and without taking the system offline. The data model is aligned to the structure of objects in the programming language. This makes it simpler and faster for developers to model how data in the application will map to data stored in the database.
Unlike NoSQL  database, MongoDB is not limited to simple Key-Value operations. A key element of this flexibility is MongoDB's support for many types of queries. A query may return a document, a subset of specific fields within the document or complex aggregations against many documents: 
  • Key-value queries return results based on any field in the document, often the primary key. 
  • Range queries return results based on values defined as inequalities (e.g, greater than, less than or equal to, between).
  • Geospatial queries return results based on proximity criteria, intersection and inclusion as specified by a point, line, circle or polygon. 
  • Text Search queries return results in relevance order based on text arguments using Boolean operators (e.g., AND, OR, NOT). 
  • Aggregation Framework queries return aggregations of values returned by the query (e.g., count, min, max, average, similar to a SQL GROUP BY statement). 
  • MapReduce queries execute complex data processing that is expressed in JavaScript and executed across data in the database.
Typical MongoDB Deployment
Applications issue queries to a query router that dispatches the query to the appropriate shards. For key-value queries that are based on the shard key, the query router will dispatch the query to the shard that manages the document with the requested key. When using range-based sharding, queries that specify ranges on the shard key are only dispatched to shards that contain documents with values within the range. For queries that don’t use the shard key, the query router will broadcast the query to all shards, aggregating and sorting the results as appropriate. Multiple query routers can be used with a MongoDB system, and the appropriate number is determined based on performance and availability requirements of the application.

Reference: https://docs.mongodb.com/

All might be wondering the title of this post is 'Spark on MongoDB' and till now we have been discussing only about MongoDB. We 'll see more features of MongoDB and Why use Spark on MongoDB in the next post..

Every Day is a new Beginning!

Sunday, 25 September 2016

Chatbots

Just as people use language for human communication, people want to use their language to communicate with computers. This led to the discovery of chatbots. Chatbots are computer programs that interact with users using natural language. They are also known as: machine conversation system, virtual agent, dialogue system or chatterbot. Chatbot architecture integrates a language model and computational algorithms. Let's have a look on the different chatbot systems:
  • ALICE chatbot system( Artificial Linguistic Internet Computer Entity):Alice’s knowledge about English conversation patterns is stored in AIML(Artificial Intelligence Markup Language) files. AIML consists of data objects called AIML objects. AIML objects are made up of units called topics and categories. The topic is an optional top-level element. It has a name and a set of categories related to that topic. Categories are the basic unit of knowledge in AIML. ALICE does not save the history of conversation.
  • Pandorabot chatbots: Pandorabots is a web service for building and deploying chatbots. Earlier in the development phase Pandorabot chatbots were text-only. But, now some Pandorabot chatbots incorporate speech synthesis.
  • ELIZA: The first attempt to build chatbot as a tool of entertainment is ELIZA. Here the responses are mainly generated from user input.
  • Sofia: This chatbot was used in Harward Mathematics Department to assist in teaching Mathematics.
  • YPA: Yellow pages contain advertisements, with the advertiser name, and contact information. YPA is a natural language dialogue system that allows users to retrieve information from British Telecom’s Yellow pages.
  • Virtual Patient bot (VPbot): VPbot simulates a patient that medical students can interview.It was successfully tested in Harvard Medical School’s virtual patient program.
  • Happy Assistant: It helps users access e-commerce sites to find relevant information about products and services.
  • Sanelma: It is a fictional person to talk with in a museum, which provides background information concerning a certain piece of art.
  • RITA(Real time Internet Technical Assistant): It is a graphical avatar used in ABN AMRO Bank to help customer for doing financial tasks.



Live to learn and you will really learn to live!

Friday, 16 September 2016

Prisma & Machine Learning

Hello... You might be wondering how I switched the topic from Natural language API to deep learning! So let me make that clear. Whatever concepts I read everyday become's my new thought for that day. That's it. Even from the last couple of months I could see that most of my social media friends has a tag Prisma in their images. So I just thought of investigating the internals of the app. Prisma is a Russian app which makes use of neural networks to turn images into paintings. It is similar to Google's Deep Dream image recognition software. While we upload an image to the app, the image is transferred to its servers in Moscow. It uses Artificial Intelligence and Neural networks to process the image and the result is returned to the phone.
Deep learning is a branch of Machine Learning. It consist of set of algorithms to model high level abstractions on the data. Some of the deep learning architectures include: deep neural networks, convolutional neural networks, deep belief networks and recurrent neural networks.Neural networks are used to perform the tasks that can be easily performed by humans but difficult by machines. Neural networks acquire knowledge by learning and this information is used to model outputs for the future inputs.
The different learning strategies can be divided into three namely:
  • Supervised learning: This involves providing set of predefined inputs and outputs for learning. Eg: Face recognition
  • Unsupervised learning: It is used when we don't have an example dataset with known answers. Eg: Clustering
  • Reinforcement learning: This is a strategy built on observation. Eg: Robotics
Neural networks are used in the following fields:
  • Pattern recognition: The most common example is facial recognition
  • Time series prediction: Popular example is predicting the ups and downs of stock markets
  • Control: It involves the design of self driving cars
  • Signal Processing: one of the best example is designing of cochlear implants
Sources:

Every time we are being redirected to something better!

Monday, 12 September 2016

Google Cloud Natural Language Processing API

Couple of months ago I got a mail from the Google Cloud team regarding their new product launch. Due to the inborn curiosity I started Googling to find what it is. So let me share my thoughts on the same.
Google is consistently making advancements in the machine learning field. In the last year it open sourced the software library for machine Learning named Tensorflow. Then in earlier this year they introduced SyntaxNet which is a neural-network Natural Language Processing framework for TensorFlow. Now the Cloud Natural Language Processing API.
This REST API reveals the structure and meaning of the text. Initially it supports the following Natural Language Processing tasks:
  • Entity Recognition: Identify the different entity types such as Person, Location, Organisation, Events etc. from the text.
  • Sentiment Analysis: Understand the overall sentiment of the given text.
  • Syntax Analysis: Identify Parts of Speech and create Dependency parse tree for the input sentence.
The primary languages supported by the API are English, Spanish and Japanese. It has connectors in Java, Python and node.js. One of the major Alpha customers for this API is the Ocado Technology which is a popular British online marketplace.
If we are particular that we need to use the Google stack for analytics then natural language processing can be done using the Google Cloud Natural Language API,the processing results can be kept in Big Query Table which is a RESTful webservice for data storage provided by Google and the visualization can be done using Google Data Studio. Please note that the Google Data Studio is currently available only in U.S.
Happy reading!


Dream up to the Stars so that you can end up in the Clouds.

Thursday, 1 September 2016

Selfie Mining

Now a days it's common that personal stories are described using social images. We might be thinking the pictures we snapped of ourselves and posted on social media sites are just for our friends on those platforms. But it's high time to correct this misbelief. Only those data we mark as private are actually guarded by the privacy laws. The rest all is public. Marketers are grabbing our images for research. This process is called selfie mining.
When we take a picture of ourselves we do so without promoting a specific product in mind. But that is not the case with marketers. They might be interested in our clothing, products we use, emotions on our face etc. There are companies that mine for selfies. They use APIs to access the images and the most interesting aspect of it is that the owners are unaware of this. Actually intentionally or unintentionally selfies promote whatever we are wearing or are sitting near or using. Many digital marketing companies have built technology to scan and process photos, to identify particular interests or hobbies. This in turn helps to better target advertisers.
Two of these companies are Ditto Labs and Piqora.
Ditto Labs: It scans photos on different sites like Instagram to generate insights for customers. Ditto Labs places users into categories, such as “sports fans” and “foodies” based on the context of their images. Advertisers such as Kraft Foods Group Inc. pay Ditto Labs to find their products’ logos in photos on social media.The following aspects are taken into consideration:

  • Products- Users who post images of food items and beverages are flagged for these interests.
  • Clothing- Ditto classifies objects. It also detects fabrics or patterns in clothing.
  • Faces- The emotions in the face help advertisers to understand sentiment.
  • Logos- Advertisers can search for photos featuring brands to steal customers.
  • Scenes- Analysing the background of images helps the advertisers to find where and how customers use their products.

Piqora: They store images for months on their own servers to show marketers what is trending in popularity. Piqora mainly analyses images in Pinterest. It was recently acquired by Olapic which analyses images on Instagram.
Well, these indicates that some of the best digital marketing trends are all on the way. Let's hope that the best is yet to come in near future.

Source:
http://programmaticadvertising.org/2014/10/20/selfie-mining-whats-really-going-on/
http://www.wsj.com/articles/smile-marketing-firms-are-mining-your-selfies-1412882222
We anyways have to think, why not think big?