1.1 AlchemyAPI
AlchemyAPI consists of both Linguistic and statistical
analysis. It was formulated based on the tweets. The linguistic analysis
consist of identifying the phrases and how these phrases combine to form
sentences. Statistical analysis consist of using mathematical techniques for
text analysis. The AlchemyAPI consist of more than 30,000 users. AlchemyAPI
Sentiment Analysis APIs are capable of computing document-level sentiment,
user-specified sentiment targeting, entity-level sentiment, emoticons and
keyword-level sentiment. AlchemyAPI can be easily used with any major
programming language: Java, C/C++, C#, Perl, PHP, Python, Ruby, Javascript and
Android OS. AlchemyAPI uses REST interface to access the different text
algorithms. It can process content as plain text or HTML, and you can use URLs
for web-accessible content or raw HTML for private web documents. Most of the
functions work with 8 languages: English, German, French, Italian, Portuguese,
Russian, Spanish and Swedish. AlchemyAPI is a paid service but it also offers a free API key to get started with 1000 calls per day.
1.2 SentiWordNet
The automatic annotation of all synsets in the
wordnet has given rise to SentiWordNet. Four versions of SentiWordNet are
available namely: SentiWordNet 1.0, SentiWordNet 1.1, SentiWordNet 2.0 and
SentiWordNet 3.0. SentiWordNet 1.0 was based on the concept of bag of words.
SentiWordNet 3.0 is widely used. It is freely distributed for noncommercial
use, and licensed are available for commercial applications. In SentiWordNet
the degree of positivity or negativity ranges from 0 to 1. SentiWordNet was
developed by ranking the synsets according to the PoS. The parts of speech
represented by the SentiWordNet are adjective, noun, adverb and verb which are
represented respectively as 'a', 'n', 'r', 'v'. The database has five columns,
the part of speech, the offset, positive score, negative score and synset terms
that includes all terms belonging to a particular synset. Offset is a numerical
ID, that when matched with a particular part of speech, identifies a synset.
The SentiWordNet lexical database was formulated based on the movie review
dataset.
Fields
|
Descriptions
|
POS
|
Parts Of Speech
linked with synset. This can take four possible values:
a- Adjective
v- Verb
n- Noun
r- Adverb
|
Offset
|
Numerical ID
which associated with part of speech uniquely Identifies a synset in the
Database.
|
PosScore
|
Positive score
for this synset. This is a numerical value ranging from 0 to 1
|
NegScore
|
Negative score
for this synset. This is a numerical value ranging from 0 to 1.
|
Synset Terms
|
List of all terms
included in this synset.
|
Table 2: SentiWordNet Database structure
POS
|
Offset
|
PosScore
|
NegScore
|
SynsetTerms
|
a
|
1740
|
0.125
|
0
|
Able#1
|
a
|
2098
|
0
|
0.75
|
Unable#1
|
n
|
388959
|
0
|
0
|
divarication#1
|
n
|
389043
|
0
|
0
|
fibrillation#2
|
r
|
76948
|
0.625
|
0
|
brazenly#1
|
r
|
77042
|
0.125
|
0.5
|
brilliantly#2
|
v
|
1827745
|
0
|
0
|
slobber_over#1
|
v
|
1827858
|
0.625
|
0.125
|
look_up_to#1
|
Table 3: Sentiment scores associated to
SentiWordNet entries
1.3 Stanford
NLP
Stanford NLP is the Java suite of NLP
tools developed by the University of Stanford. It consist of a stack of
products including Stanford CoreNLP, Stanford Parser, Stanford POS Tagger,
Stanford Named Entity Recognizer, Stanford Word Segmenter etc. The movie review
dataset was used for training the model in Stanford NLP. In Stanford NLP the raw
text is put into an Annotation object and then a sequence of Annotators add
information in an analysis Pipeline. The resulting Annotation, containing all
the analysis information added by the Annotators, can be output in XML or plain
text forms. The results of Stanford NLP can be accessed in two ways: The first
method involves the conversion of annotation object to XML and is written to a
file. The second method involves printing the code that gets a particular type
of information out of an Annotation. Stanford NLP can be accessed easily from
many languages, including Python, Ruby, Perl, Scala, Clojure, Javascript
(node.js), and .NET.
The execution flow of Stanford NLP consist
of the following phases:
- Tokenization: It is the process of chopping a sequence of characters into pieces called tokens.
- Sentence Splitting: ssplit property splits a sequence of tokens into sentences.
- Part-of-speech Tagging: pos property labels tokens with their POS tags
- Morphological Analysis: Morphological Analysis is the process of providing grammatical information of a word given its suffix. The smallest unit in morphological analysis is the morpheme.
- Named Entity Recognition: The “ner” property recognizes named (PERSON, LOCATION, ORGANIZATION, MISC), numerical (MONEY, NUMBER, PERCENT), and temporal (DATE, TIME, DURATION, SET) entities from a given text.
- Syntactic Parsing: It mainly deals with the grammatical structure of sentences. It consist of identifying phrases, subject or object of a verb.
- Coreference Resolution: Coreference means that multiple expressions in a sentence or document refer the same thing. E.g. consider the sentence “John drove to Judy’s house. He made her dinner.” In this example both “John” and “He” refer to the same entity (John); and “Judy “and “her “refer to the entity (Judy).
- Annotators: The backbone of the CoreNLP package is formed by two classes: Annotation and Annotator. Annotations are the data structure which hold the results of annotators. Annotations are basically maps, from keys to bits of the annotation, such as the parse, the part-of-speech tags, or named entity tags. Annotators tokenize, parse, or NER tag sentences. Annotators and Annotations are integrated by AnnotationPipelines, which create sequences of generic Annotators. Stanford CoreNLP inherits from the AnnotationPipeline class, and is customized with NLP Annotators.
1.4 viralheat
API
viralheat API is used to infer the sentiment
of a given piece of text. The free account of viralheat API can handle 1000
requests per day and accepts only 360 characters per request.
Just wait for more updates in the next post…
To
succeed in your mission, you must have single-minded devotion to your goal.
No comments:
Post a Comment