Recently I came across a requirement to identify the language in the given text. First I started with the language detection API. Let's have a look into the details of it:
Language detection API is a language detection web service. It accepts text and produces result with detected language code and score. It currently detects 160 languages.
Available plans
Plan name | No: of requests/day | Data usage/day | Price |
Free | 5,000 requests | 1 MB | Free |
Basic | 100,000 requests | 20 MB | $5/month |
Plus | 1M requests | 200 MB | $15/month |
Premium | 10M requests | 2 GB | $40/month |
API Key
To use Language detection API we need
an API key which can be obtained from:
API Clients
Language detection web service provides
API clients for the following programming languages:
- Ruby
- Java
- Python
- PHP
- C# (.NET)
JSON API Usage
- Basic detection
Submit HTTP request to
http://ws.detectlanguage.com/0.2/detect
with the following parameters:
q - Your text, mandatory
key - your API key, mandatory
Response is:
{"data":{"detections":[{"language":"es","isReliable":true,"confidence":10.24}]}}
Interpretation of results:
Confidence value depends on how much
text we pass and how well it is identified. The more text we pass,
the higher confidence value will be. It is not a range, it can be
higher than 100.
Reliability is not directly linked to
the confidence. In case our text contains words in different
languages then isReliable: true would identify that first
detected language is significantly more probable than the second one.
When only one language is detected isReliable: false would
mean that confidence is very low.
Language defines the language code
identified. API returns 'xxx' code for unknown language.
- Batch Requests
It is possible to detect language of
several texts using one query. This saves network bandwidth and
increases performance. Batch request detections are counted as
separate requests, i.e. if 3 texts were passed they will be counted
as 3 separate requests.
Eg:
Response:
{"data":{"detections":[[{"language":"es","isReliable":true,"confidence":10.24}],[{"language":"en","isReliable":true,"confidence":11.94}]]}}
- Accessing plan details
User request and data counters can be
accessed at http://ws.detectlanguage.com/0.2/user/status
Eg:
Response:
{"date":"2015-05-19","requests":0,"bytes":0,"plan":"FREE","plan_expires":null,"daily_requests_limit":5000,"daily_bytes_limit":1048576,"status":"ACTIVE"}
- Language Support
List of all supported languages are
available at:
- Secure Mode(SSL)
Sample Code:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.net.HttpURLConnection;
import java.net.URL;
import java.net.URLConnection;
public class LanguageDetection {
public void detectLanguage(String message) throws IOException{
String text;
String head="http://ws.detectlanguage.com/0.2/detect?q=";
String apiKey="&key=your API key";
text=message.replaceAll("\\s+","%20");
try
{
URL url = new URL(head+text+apiKey);
URLConnection urlConnection = url.openConnection();
HttpURLConnection connection = null;
connection = (HttpURLConnection) urlConnection;
BufferedReader in = new BufferedReader(
new InputStreamReader(connection.getInputStream()));
String urlString = "";
String current;
while((current = in.readLine()) != null)
{
urlString += current;
}
System.out.println(urlString);
}catch(IOException e)
{
e.printStackTrace();
}
}
public static void main(String args[]) throws IOException{
LanguageDetection langDetect=new LanguageDetection();
langDetect.detectLanguage("suprabhatham");
}
}
Result:
{"data":{"detections":[{"language":"sa","isReliable":true,"confidence":15.75}]}}
The language was identified as Sanskrit.
A great attitude makes a great life!!
No comments:
Post a Comment