Tuesday, 14 July 2015

Natural Language Processing


It's been a long time since I wrote a post. So here comes anjusthoughts with a Bang... This post is inspired by one of my colleagues

As we all know, Language is a means of communication. Languages can be broadly classified into two namely:
  • Natural languages are the languages that people speak, such as English, Spanish, and French. These languages are not designed and are evolved naturally.
  • Formal languages are languages that are designed by people for specific applications.


Natural Language Processing




Natural Language Processing or NLP consist of a set of tasks computers perform to understand natural language and generate natural language. The computer is used for the interpretation and analysis of Natural Language.
Natural Language Generation (NLG)
NLG is when a computer writes text of the same quality as that of a human being. It can also be termed as Text Generation.
Natural Language Understanding (NLU)
NLU attempts to understand the meaning behind a written text. NLU faces the challenge of understanding a text without ambiguity, while understanding the rules of the language used. So tow issues must be addressed:
  • What to say- What we are going to talk about
  • How to say- It deals with formulating grammatically correct sentences.

Stages of Natural Language Processing

Natural Language Processing can be divided into three stages namely:
  1. Syntactic Analysis
  2. Semantic Analysis
  3. Contextual Representation
Now let’s look into each of these stages in detail:
  1. Syntactic Analysis
In this phase the input is being checked to ensure that its syntax is correct. This is done based on a grammar. The following are the two simple methods used:
  1. Context Free Grammars(CFG)
Consider the following sentence:
The cat eats rice.”
The parse tree for the above sentence is as follows:

The list of rules for the construction of the tree are:
S -> NP VP
NP -> DET N | DET ADJ N
VP -> V NP
The above sentence consists of:
DET -> the
ADJ-> big|fat
Top- Down Parser
The parser starts with the symbol S and attempts to rewrite the sentence into a sequence of Terminals. The structure of CFG consists of:
  • LHS- It consist of Non terminals or symbols. They cannot be expanded further.
  • RHS- These include terminals or non terminals.
  1. Semantic Analysis
It involves the formulation of a logical representation of the sentence. The meaning of the sentence must be extracted for such a representation.
  1. Contextual Representation
As its name implies the sentence is analysed based on the context. The logical representation is converted into a Knowledge representation. 

More updates about Natural Language Processing in the Next Post....


I am Thankful to all those who said NO. Because of them I did it myself.


Friday, 26 June 2015

Pentaho Data Integration

Pentaho Data Integration or Kettle, consists of a core data integration (ETL) engine, and GUI applications that allow the user to define data integration jobs and transformations.
The name Kettle evolved from "KDE ETTL Environment" to "Kettle ETTL Environment" after the plan of developing the software on top of KDE (K Desktop Environment) was dropped. This tool was open sourced in December 2005 and acquired by Pentaho early in 2006. Matt Casters is the lead developer of Kettle.
ETTL stands for:
  • Data extraction from source databases
  • Transport of the data
  • Data transformation
  • Loading of data into a data warehouse

Kettle

Kettle is a set of tools and applications which allows data manipulations across multiple sources. The main components of Pentaho Data Integration are:
  • Spoon - a graphical tool which make the design of an ETTL process transformations easy to create.
  • Pan - is an application dedicated to run data transformations designed in Spoon.
  • Chef - a tool to create jobs which automate the database update process
  • Kitchen - it's an application which helps execute the jobs in a batch mode, usually using a schedule which makes it easy to start and control the ETL processing
  • Carte - a web server which allows remote monitoring of the running Pentaho Data Integration ETL processes through a web browser.

Downloading Pentaho Data Integration


Steps for installation(in Windows)

  1. Unzip the folder
  2. A folder named data-integration is created. In the folder data-integration open spoon.bat file(just double click it).

Steps for installation(in Linux)

    Run the spoon.sh file.

Connecting to Progress database using Pentaho

  1. Add the jars base.jar, openedge.jar, pool.jar, spy.jar, util.jar in the \Pentaho-Kettle\data-integration\libext\JDBC folder.
  2. Double click Transformations in the view tab. Then a new transformation is created.
  3. To change the name of the transformation, right click the newly created transformation ie, “Transformation 1” . Select settings and edit the transformation name.
  4. To connect to Progress database, right click database connections and select new.
  5. Select general, add a connection name(Eg: test). Select generic database as connection type.
  6. Custom connection URL: jdbc:datadirect:openedge://hostName:50590;databaseName=dbName;defaultSchema=PUB
  7. Custom driver class name: com.ddtek.jdbc.openedge.OpenEdgeDriver
  8. Username: userName
  9. Password: passWord
  10. Click test. It will show connection successful. Click OK.

Exporting tables as csv files

  1. Change to design view tab
  2. Select Table input under Input(Drag and drop Table input to Transformation1 window)
  3. Select Text file output under Output(Drag and drop Text file output to Transformation1 window)
  4. Right click the Text file output icon in the window and select “edit step”. In the new window opened we can either specify a file name or we can browse for a location to save the file in the “file” tab. In the content tab we can specify the separator, ie To export as csv file specify the separator as ,(comma). Click OK.
  5. Click Table input ,press shift along with left button of mouse and drag to Text file output. Thus a hop is created.
  6. Right click Table input and select edit. Specify the SQL query to be executed. Click OK.
  7. Click run(green triangle) . We can see the execution.
The importance of water is not known until the stream runs dry!!

Wednesday, 24 June 2015

Tips and Tricks- 7

1. Problem: Apache-tomcat not showing in eclipse server runtime environments

Solution: 
1. Help > Install New Software...
2. Select "Eclipse Web Tools Platform Repository (http://download.eclipse.org/webtools/updates)" from the "Work with" drop-down.
3. Select "Web Tools Platform (WTP)" and "Project Provided Components".
Complete all the installation steps and restart Eclipse. 
Source: http://stackoverflow.com/questions/2000078/apache-tomcat-not-showing-in-eclipse-server-runtime-environments



2. Problem: Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/juli/logging/LogFactoryat.Server tomcat-v6-0 server at localhost failed to start

Solution:In Eclipse, 
1. Open the "Server" tab.
2. Double click on the "Tomcat6" entry to see the configuration.
3. Then click on the "Open launch configuration" link in the "General information" block.
4. In the dialog, select the "Classpath" tab.
5. Click the "Add external jar" button.
6. Select the file "/usr/share/tomcat6/bin/tomcat-juli.jar"
7. Close the dialog.
8. Start tomcat 6 from Eclipse.
Source: http://stackoverflow.com/questions/1392383/server-tomcat-v6-0-server-at-localhost-failed-to-start


3. Change Port of Tomcat

Open the tomcat/cpnf/server.xml and modify the following ports:
<!--Shut down Port-->
<Server port="8005" shutdown="SHUTDOWN">
<!--Java HTTP Connector:-->
<Connector port="8080" protocol="HTTP/1.1" connectionTimeout="20000" redirectPort="8443" />
<!--Java AJP Connector-->
<Connector port="8009" protocol="AJP/1.3" redirectPort="8443" />


4. Searching for processid of Eclipse in Linux

sudo ps -ax | grep eclipse



Strong Walls Shake, But Never Collapse!!

Friday, 19 June 2015

More About Apache Tika


It's been a long time since I wrote. Let's see Apache Tika in detail. All might be wondering about the applications of this. Apache Tika forms the major component of Elastic Search.

Program to parse the Google Web page

public static void main (String args[]) throws Exception {

URL url = new URL("https://www.google.co.in");
InputStream input = url.openStream();
LinkContentHandler linkHandler = new LinkContentHandler();
ContentHandler textHandler = new BodyContentHandler();
ToHTMLContentHandler toHTMLHandler = new ToHTMLContentHandler();
TeeContentHandler teeHandler = new TeeContentHandler(linkHandler, textHandler, toHTMLHandler);
Metadata metadata = new Metadata();
ParseContext parseContext = new ParseContext();
HtmlParser parser = new HtmlParser();
parser.parse(input, teeHandler, metadata, parseContext);
System.out.println("TITLE:\n" + metadata.get("title").replaceAll("\\s+", " ").trim());
//System.out.println("LINKS:\n" + linkHandler.getLinks());
System.out.println("TEXT:\n" + textHandler.toString().replaceAll("\\s+", " ").trim());
//System.out.println("HTML:\n" + toHTMLHandler.toString().replaceAll("\\s+", " ").trim());
}

Result:

TITLE:
Google
TEXT:
Search Images Maps Play YouTube News Gmail Drive More » Web History | Settings | Sign in × A faster way to browse the web Install Google Chrome India   Advanced searchLanguage tools Google.co.in offered in: हिन्दी বাংলা à°¤ెà°²ుà°—ు मराठी தமிà®´் ગુજરાતી ಕನ್ನಡ മലയാà´³ം ਪੰਜਾਬੀ Advertising ProgramsBusiness Solutions+GoogleAbout GoogleGoogle.com © 2015 - Privacy - Terms

Program to parse an XML file

public static void main(String args[]) throws IOException, SAXException, TikaException{
//detecting the file type
BodyContentHandler handler = new BodyContentHandler();
Metadata metadata = new Metadata();
FileInputStream inputstream = new FileInputStream(new File("sample.xml"));
ParseContext pcontext = new ParseContext();
//Xml parser
XMLParser xmlparser = new XMLParser();
xmlparser.parse(inputstream, handler, metadata, pcontext);
System.out.println("Contents of the document:" + handler.toString());
}

Sample XML file:

<note>
<to>Tom</to>
<from>Jerry</from>
<heading>Reminder</heading>
<body>Weekend Trip..</body>
</note>

Result:

Contents of the document:
Tom
Jerry
Reminder
Weekend Trip..


Apache Tika Server


Download the tika-server.jar from the Tika project site. Start the server using

java -jar tika-server-x.x.jar -h 0.0.0.0
The -h 0.0.0.0 (host) option makes the server listen for any incoming requests, otherwise without it it would only listen for requests from localhost. You can also add the -p option to change the port, otherwise it defaults to 9998.Once the server has started we can simply access it using  browser. It will list all available endpoints.


Tika Server Versions

The Apache Tika Server is available in two versions namely:

  • tika-app.jar 
It has the --server --port 9998 options to start a simple server. It provides text extraction and returns the content as HTML
  • tika-server.jar 
It is a separate component using JAX-RS. It acts as a RESTful service.Thats all about Tika. 'll come up with another interesting technology next time....



Silence is True Wisdom's Best Reply!!








Tuesday, 9 June 2015

Tips and Tricks - 6

1. Display all tables of a particular user in Oracle

select * from user_segments;

2. Display size of all tables of a particular user in Oracle

select sum(bytes)/(1024*1024*1024) from user_segments

3. Change String to lowercase in Eclipse

ctrl+Shift+Y

4. Change String to uppercase in Eclipse

ctrl+Shift+X

5. pg_restore: [archiver] unsupported version (1.12) in file header- Error in PostgreSQL while importing dump file

Solution: 
convert the binary file into a plan SQL file via the -f parameter.

$ pg_restore --version 
pg_restore (PostgreSQL) 9.0.3

$ pg_restore -Fc dump.backup -f dump.sql

Copy dump.sql back to 8.3 server

$ psql database -f dump.sql 

6. List all databases in PostgreSQL

\list

7. List tables in PostgreSQL

\dt+

8. Grant all privileges on a particular database to a user in PostgreSQL

GRANT ALL ON DATABASE databaseName TO user;

9. Import a schema in PostgreSQL

psql -h hostName -U userName tabaseName -f backupFile.sql

10. Query to see active processes in PostgreSQL

select * from pg_stat_activity;

11. find all tables with a specific column name in PostgreSQL

select table_name from information_schema.columns where column_name = 'column_name';

12. Create a user in PostgreSQL

CREATE USER userName WITH PASSWORD 'Password';

13. Append content to MANIFEST.MF in Eclipse

1) Add the following to the pom.xml
<plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <version>2.3.2</version>
                <configuration>
<archive>
<manifestFile>src/MANIFEST.MF</manifestFile>
</archive>    
                </configuration>
            </plugin>
2) Create a file named MANIFEST.MF in src and add the content we need to append to MANIFEST.MF
3) Save the pom and check the project/target/classes/META-INF/MANIFEST.MF to see the changed content.



Think Twice Before Reserving a Space in the Heart for people who do not Wish to Stay!!

Saturday, 6 June 2015

Immutable Classes in Java

The best examples of immutable classes are String, Boolean, Byte, Short, Integer, Long, Float, Double etc. We can create immutable class by creating final class that have final data members.

Example:

public final class Student {
final int studentId;
public Student (int studentId){
this.studentId=studentId;
}
public int getStudentId (){
return studentId;
}
}
 The above class is immutable because:
  • The instance variable of the class is final.
  • we cannot create the subclass.
  • There is no setter methods i.e. we have no option to change the value of the instance variable.
Why immutable classes?
  1. Simplicity - each class is in one state only
  2. Thread Safe - because the state cannot be changed, no synchronization is required
  3. Writing in an immutable style can lead to more robust code.
Source:
http://www.javatpoint.com/how-to-create-immutable-class

Every Accomplishment starts with a decision to Try.....


Thursday, 4 June 2015

Tips and Tricks - 5

1. Ant(Another Neat Tool) installation location

eclipse/plugins/org.apache.ant_1.9.2.v201404171502

2. Using for loop in ant

Copy the ant-contrib.jar to Eclipse/plugins/org.apache.ant_1.9.2.v201404171502/lib
Add the same jar to Window-> Preferances-> Ant ->Runtime ->Ant Home Entries-> add external jars (Specify the location of ant-contrib.jar)
<project name="projectName" xmlns:ac="antlib:net.sf.antcontrib>
< target name="forLoopExample>
<a:for list="1,2,3,4" param = "val">
      <sequential>
          <echo message = "val = @{val}"/>
      </sequential>
  </ac:for>
</target>
</project>

Result:
Buildfile: build.xml
    [echo] val = 1
    [echo] val = 2
    [echo] val = 3
    [echo] val = 4
    [echo] val = 9

 BUILD SUCCESSFUL
 Total time: 2 seconds

3. Right click in eclipse is not working

Restart Eclipse with a "-clean" option
Open Command Prompt
Go to Eclipse installation location
Type
./Eclipse -clean

Open a command prompt (click Start, Run... enter "cmd"), then go to the directory where you have Eclipse installed with "cd ", and then run "eclipse -clean".

4. Hibernate tools Plugin for Eclipse Installation

For Eclipse 3.6, the URL is ” http://download.jboss.org/jbosstools/updates/stable/helios/ ”
In Eclipse IDE, menu bar, select “Help” >> “Install New Software …” , put the Eclipse update site URL.
Type “hibernate” in the filter box, to list down the necessary components for Hibernate tools. Select all the “Hibernate Tools” components and click next to download.
After the download progress is completed, restart Eclipse to take effect.
If Hibernate tools is installed properly, you are able to see the “Hibernate Perspective” in “Windows” >> “Open Perspective” >> “Others“.

5. Hibernate Tools

1. Hibernate Perspective
Open your “Hibernate Perspective“. In Eclipse IDE, select “Windows” >> “Open Perspective” >> “Others…” , choose “Hibernate“.
2. New Hibernate Configuration
In Hibernate Perspective, right click and select “Add Configuration…”

3. Set Classpath of Hibernate

You need to add the database driver to your classpath

1) Download the driver for your database

2) Point to it by clicking 'Add external jars' and selecting it from the place you downloaded it to

4. Generating Hibernate configuration file
In “Edit Configuration” dialog box,

In “Project” box, click on the “Browse..” button to select your project.
In “Database Connection” box, click “New..” button to create your database settings.
We need to specify the driver and connection details there
In “Configuration File” box, click “Setup” button to create a new or use existing “Hibernate configuration file”, hibernate.cfg.xml.
Source: http://www.mkyong.com/hibernate/how-to-install-hibernate-tools-in-eclipse-ide/

5. Creating a Hibernate Mapping File
Hibernate mapping files are used to specify how your objects relate to database tables.
To create basic mappings for properties and associations, i. e. generate .hbm.xml files, Hibernate Tools provide a basic wizard which you can display by selecting File → New → Hibernate XML mapping file.
At first you will be asked to select a package or multiple individual classes to map. It is also possible to create an empty file: do not select any packages or classes and an empty .hbm file will be created in the specified location.
Using the depth control option you can define the dependency depth used when choosing classes.
The next wizard page lists the mappings to be generated. 
The next wizard page display a preview of the generated .hbm files.
Clicking the Finish button creates the files.
Source: https://access.redhat.com/documentation/en-US/Red_Hat_JBoss_Developer_Studio/7.0/html/Hibernate_Tools_Reference_Guide/map_file_wizard.html

6.Hibernate Configuration files

The basic structure of a Java Application using Hibernate consist of the following files:

  • hibernate.cfg.xml- It consist of the database connection details
  • modelClass.hbm.xml- It describes the mapping between the POJO and the corresponding table in the database
  • hibernate.reveng.xml- This is the reverse Engineering file for Hibernate

7. Hibernate types

A Hibernate type is a bridge between an SQL type and a Java primitive/Object type.

These are the types Hibernate supports by default:

Hibernate type (org.hibernate.type)
JDBC type
Java type
StringType
VARCHAR
String
MaterializedClob
CLOB
String
TextType
LONGVARCHAR
String
CharacterType
CHAR
char or Character
BooleanType
BIT
boolean or Boolean
Numeric
INTEGER (e.g. 0 = false and 1 = true)
BooleanType boolean or Boolean
YesNoType
CHAR (e.g. ‘N’ or ‘n’ = false and ‘Y’ or ‘y’ = true)
boolean or Boolean
TrueFalseType
CHAR (e.g. ‘F’ or ‘f’ = false and ‘T’ or ‘t’ = true)
boolean or Boolean
ByteType
TINYINT
byte or Byte
ShortType
SMALLINT
short or Short
IntegerType
INTEGER
int or Integer
LongType
BIGINT
long or Long
FloatType
FLOAT
float or Float
DoubleType
DOUBLE
double or Double
BigIntegerType
NUMERIC
BigInteger
BigDecimalType
NUMERIC
BigDecimal
TimestampType
TIMESTAMP
java.sql.Timestamp or java.util.Date
TimeType
TIME
java.sql.Time
DateType
DATE
java.sql.Date
CalendarType
TIMESTAMP
java.util.Calendar or java.util.GregorianCalendar
CalendarType
DATE
java.util.Calendar or java.util.GregorianCalendar
CurrencyType
VARCHAR
java.util.Currency
LocaleType
VARCHAR
java.util.Locale
TimeZoneType
VARCHAR
java.util.TimeZone
UrlType
VARCHAR
java.net.URL
ClassType
VARCHAR
java.lang.Class
BlobType
BLOB
java.sql.Blob
ClobType
CLOB
java.sql.Clob
BinaryType
VARBINARY
byte[] or Byte[]
BinaryType
BLOB
byte[] or Byte[]
BinaryType
LONGVARBINARY
byte[] or Byte[]
BinaryType
LONGVARBINARY
byte[] or Byte[]
CharArrayType
VARCHAR
char[] or Character[]
UUIDBinaryType
BINARY
java.util.UUID
UUIDBinaryType
CHAR or VARCHAR
java.util.UUID
UUIDBinaryType
PostgreSQL UUID
java.util.UUID
SerializableType
VARBINARY
Serializable


8.org.hibernate.engine.jndi.JndiException: Error parsing JNDI name

Solution : Remove the name attribute from  <session-factory> in hibernate.cfg.xml

9. Creating Jar using Ant

<target name="CreateJar" description="Create Jar file">
<jar jarfile="jarName.jar" basedir="src/"/>
</target>



You don’t write because you want to say something, you write because you have something to say.