Language Recognition

I am looking for an utility to guess language and encoding of plain-text documents.

Just like some browsers which have 'Auto-detect' function. I've heard about some N-GRAM based methods, but there may be others available.

This thing has to accept file or string as an argument and return Language and Encoding. If the document contains 2 or more languages it should return the most heavily used, like 'Mostly English' or 'Mostly Russian'.

It has to be able to 'learn' new language/encodings.

It must be written in Java, encapsulated as separate class, so it can be easily plugged into any Java program. Detailed JavaDoc is required.

## Deliverables

1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.

2) Exclusive and complete copyrights to all work purchased. (No GPL, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site).

## Platform


Compétences : Ingénierie, Java, MySQL, PHP, Architecture Logicielle, Tests de Logiciels, Hébergement Web, Administration de Site Web, Tests de Sites Web

en voir plus : string source code java, recognition language, php language learn, learn java code, c language learn, russian written language, php program language, learn russian, java gram, java file utility, accept language, accept language php, php accept language, accept class argument java, utility function java, code recognition, learn russian language, file utility java, gram program java, code recognition php

Concernant l'employeur :
( 7 commentaires ) Bulgaria

Nº du projet : #3012812

Décerné à:


See private message.

%selectedBids___i_sum_sub_7% %project_currencyDetails_sign_sub_8% USD en 30 jours
(2 Commentaires)

2 freelance font une offre moyenne de $138 pour ce travail


See private message.

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 30 jours
(2 Commentaires)