Tuesday, 19 May 2015

Language Detection API




Recently I came across a requirement to identify the language in the given text. First I started with the language detection API. Let's have a look into the details of it:
Language detection API is a language detection web service. It accepts text and produces result with detected language code and score. It currently detects 160 languages.

Available plans

Plan name No: of requests/day Data usage/day Price
Free 5,000 requests 1 MB Free
Basic 100,000 requests 20 MB $5/month
Plus 1M requests 200 MB $15/month
Premium 10M requests 2 GB $40/month

API Key

To use Language detection API we need an API key which can be obtained from:
https://detectlanguage.com/users/sign_up
API Clients
Language detection web service provides API clients for the following programming languages:
  • Ruby
  • Java
  • Python
  • PHP
  • C# (.NET)


JSON API Usage

  1. Basic detection
Submit HTTP request to http://ws.detectlanguage.com/0.2/detect with the following parameters:
q - Your text, mandatory
key - your API key, mandatory
Response is:
{"data":{"detections":[{"language":"es","isReliable":true,"confidence":10.24}]}}
Interpretation of results:
Confidence value depends on how much text we pass and how well it is identified. The more text we pass, the higher confidence value will be. It is not a range, it can be higher than 100.
Reliability is not directly linked to the confidence. In case our text contains words in different languages then isReliable: true would identify that first detected language is significantly more probable than the second one. When only one language is detected isReliable: false would mean that confidence is very low.
Language defines the language code identified. API returns 'xxx' code for unknown language.
  1. Batch Requests
It is possible to detect language of several texts using one query. This saves network bandwidth and increases performance. Batch request detections are counted as separate requests, i.e. if 3 texts were passed they will be counted as 3 separate requests.
Eg:
Response:
{"data":{"detections":[[{"language":"es","isReliable":true,"confidence":10.24}],[{"language":"en","isReliable":true,"confidence":11.94}]]}}
  1. Accessing plan details
User request and data counters can be accessed at http://ws.detectlanguage.com/0.2/user/status
Eg:
Response:
{"date":"2015-05-19","requests":0,"bytes":0,"plan":"FREE","plan_expires":null,"daily_requests_limit":5000,"daily_bytes_limit":1048576,"status":"ACTIVE"}
  1. Language Support
List of all supported languages are available at:
  1. Secure Mode(SSL)
Texts submitted to the API are used by language detection engine only. Texts are not stored or used in any other way. If you are passing sensitive information to the API, you can use HTTPS protocol to ensure secure network transfer.Source: https://detectlanguage.com/

Sample Code:

import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.net.URLConnection;

public class LanguageDetection {

public void detectLanguage(String message) throws IOException{
        String text;
String head="http://ws.detectlanguage.com/0.2/detect?q=";
String apiKey="&key=your API key";
text=message.replaceAll("\\s+","%20");
try
     {
URL url = new URL(head+text+apiKey); 
        URLConnection urlConnection = url.openConnection();
        HttpURLConnection connection = null;
        connection = (HttpURLConnection) urlConnection;
        BufferedReader in = new BufferedReader(
        new InputStreamReader(connection.getInputStream()));
        String urlString = "";
        String current;
        while((current = in.readLine()) != null)
        {
           urlString += current;
        }
        System.out.println(urlString);
     }catch(IOException e)
     {
        e.printStackTrace();
     }
}

public static void main(String args[]) throws IOException{
LanguageDetection langDetect=new LanguageDetection();
langDetect.detectLanguage("suprabhatham");
}
}

Result:

{"data":{"detections":[{"language":"sa","isReliable":true,"confidence":15.75}]}}
The language was identified as Sanskrit.


A great attitude makes a great life!!

No comments:

Post a Comment