Big Big Things in my Little Little World: Tools used for Sentiment Analysis

1.1 AlchemyAPI

AlchemyAPI consists of both Linguistic and statistical analysis. It was formulated based on the tweets. The linguistic analysis consist of identifying the phrases and how these phrases combine to form sentences. Statistical analysis consist of using mathematical techniques for text analysis. The AlchemyAPI consist of more than 30,000 users. AlchemyAPI Sentiment Analysis APIs are capable of computing document-level sentiment, user-specified sentiment targeting, entity-level sentiment, emoticons and keyword-level sentiment. AlchemyAPI can be easily used with any major programming language: Java, C/C++, C#, Perl, PHP, Python, Ruby, Javascript and Android OS. AlchemyAPI uses REST interface to access the different text algorithms. It can process content as plain text or HTML, and you can use URLs for web-accessible content or raw HTML for private web documents. Most of the functions work with 8 languages: English, German, French, Italian, Portuguese, Russian, Spanish and Swedish. AlchemyAPI is a paid service but it also offers a free API key to get started with 1000 calls per day.

1.2 SentiWordNet

The automatic annotation of all synsets in the wordnet has given rise to SentiWordNet. Four versions of SentiWordNet are available namely: SentiWordNet 1.0, SentiWordNet 1.1, SentiWordNet 2.0 and SentiWordNet 3.0. SentiWordNet 1.0 was based on the concept of bag of words. SentiWordNet 3.0 is widely used. It is freely distributed for noncommercial use, and licensed are available for commercial applications. In SentiWordNet the degree of positivity or negativity ranges from 0 to 1. SentiWordNet was developed by ranking the synsets according to the PoS. The parts of speech represented by the SentiWordNet are adjective, noun, adverb and verb which are represented respectively as 'a', 'n', 'r', 'v'. The database has five columns, the part of speech, the offset, positive score, negative score and synset terms that includes all terms belonging to a particular synset. Offset is a numerical ID, that when matched with a particular part of speech, identifies a synset. The SentiWordNet lexical database was formulated based on the movie review dataset.

Fields	Descriptions
POS	Parts Of Speech linked with synset. This can take four possible values: a- Adjective v- Verb n- Noun r- Adverb
Offset	Numerical ID which associated with part of speech uniquely Identifies a synset in the Database.
PosScore	Positive score for this synset. This is a numerical value ranging from 0 to 1
NegScore	Negative score for this synset. This is a numerical value ranging from 0 to 1.
Synset Terms	List of all terms included in this synset.

Table 2: SentiWordNet Database structure

POS	Offset	PosScore	NegScore	SynsetTerms
a	1740	0.125	0	Able#1
a	2098	0	0.75	Unable#1
n	388959	0	0	divarication#1
n	389043	0	0	fibrillation#2
r	76948	0.625	0	brazenly#1
r	77042	0.125	0.5	brilliantly#2
v	1827745	0	0	slobber_over#1
v	1827858	0.625	0.125	look_up_to#1

Table 3: Sentiment scores associated to SentiWordNet entries

1.3 Stanford NLP

Stanford NLP is the Java suite of NLP tools developed by the University of Stanford. It consist of a stack of products including Stanford CoreNLP, Stanford Parser, Stanford POS Tagger, Stanford Named Entity Recognizer, Stanford Word Segmenter etc. The movie review dataset was used for training the model in Stanford NLP. In Stanford NLP the raw text is put into an Annotation object and then a sequence of Annotators add information in an analysis Pipeline. The resulting Annotation, containing all the analysis information added by the Annotators, can be output in XML or plain text forms. The results of Stanford NLP can be accessed in two ways: The first method involves the conversion of annotation object to XML and is written to a file. The second method involves printing the code that gets a particular type of information out of an Annotation. Stanford NLP can be accessed easily from many languages, including Python, Ruby, Perl, Scala, Clojure, Javascript (node.js), and .NET.

The execution flow of Stanford NLP consist of the following phases:

Tokenization: It is the process of chopping a sequence of characters into pieces called tokens.
Sentence Splitting: ssplit property splits a sequence of tokens into sentences.
Part-of-speech Tagging: pos property labels tokens with their POS tags
Morphological Analysis: Morphological Analysis is the process of providing grammatical information of a word given its suffix. The smallest unit in morphological analysis is the morpheme.
Named Entity Recognition: The “ner” property recognizes named (PERSON, LOCATION, ORGANIZATION, MISC), numerical (MONEY, NUMBER, PERCENT), and temporal (DATE, TIME, DURATION, SET) entities from a given text.
Syntactic Parsing: It mainly deals with the grammatical structure of sentences. It consist of identifying phrases, subject or object of a verb.
Coreference Resolution: Coreference means that multiple expressions in a sentence or document refer the same thing. E.g. consider the sentence “John drove to Judy’s house. He made her dinner.” In this example both “John” and “He” refer to the same entity (John); and “Judy “and “her “refer to the entity (Judy).
Annotators: The backbone of the CoreNLP package is formed by two classes: Annotation and Annotator. Annotations are the data structure which hold the results of annotators. Annotations are basically maps, from keys to bits of the annotation, such as the parse, the part-of-speech tags, or named entity tags. Annotators tokenize, parse, or NER tag sentences. Annotators and Annotations are integrated by AnnotationPipelines, which create sequences of generic Annotators. Stanford CoreNLP inherits from the AnnotationPipeline class, and is customized with NLP Annotators.

1.4 viralheat API

viralheat API is used to infer the sentiment of a given piece of text. The free account of viralheat API can handle 1000 requests per day and accepts only 360 characters per request.

Just wait for more updates in the next post…

To succeed in your mission, you must have single-minded devotion to your goal.

Big Big Things in my Little Little World

Wednesday, 5 August 2015

Tools used for Sentiment Analysis

1.1 AlchemyAPI

1.2 SentiWordNet

1.3 Stanford NLP

1.4 viralheat API

No comments:

Post a Comment

Blog Archive

Total Pageviews