Text Processing / Research tool

I need a system for research and translation of documents.

Translation involves translating a source text using a memory.

I am looking for a person with knowledge of xml and logical organization of data when bookmarking sites and the places of reference.

Search (how engines treat spaces, autocorrect or ignore capitalization) and regular expression knowledge required.

The general process will be

1. Analysis of text (simple types such as English, plus others including CJK) using segmentation rules and database/memory. This produces an extracted term list.

--- Analyze text for individual terms (these can be multiple words and multiple character types (e.g. Japanese with symbols, English numericals, letters) - corresponding example for English may be "4-stroke engine") ---

2. After initial resolution of terms by the system, some are corrected manually and the database/memory is updated.

3. Research

--- Searching of new terms. ---

--- Take a list of the new terms to the browser. Click on them one-by-one, i.e. parse them into different site searches. Once the translation for the term has been found, it is stored with the source data in a database.


4. Finally, text is batch translated.

If you have any experience in this type of project, please give your recommendations on the best scripts/technologies, best browser and whether you can make it on a server/desktop, with your quote. I am open to suggestions and the most important thing is that you understand the current FOSS software that is available to do this type of work.

Compétences : HTML5, Linux, XML

en voir plus : xml research tool, work translating, using regular expression, types of data processing, translating work, translating software, site of text translation, search term tool, research search, regular expression words, regular expression using, regular expression or example, regular expression no, regular expression in linux example, regular expression in linux, regular expression in c, regular expression for words, regular expression example, regular expression c, regular expression a, reference letters, open text, memory engine, letters of reference, japanese words translated to english

Concernant l'employeur :
( 1 commentaire ) Kawasaki, Japan

Nº du projet : #1027170