PDF, RTF, TXT to HTML Converter

En cours Publié le Apr 21, 2004 Paiement à la livraison
En cours Paiement à la livraison

PDF, RTF, TXT to HTML Converter

We need a perl or python module that converts PDF, RTF, TXT to HTML files.

Images and pictures, WMF etc can be ignored we only interested in the text itself and its logical layout - paragraphs bullets lists tables etc.

TXT translated to paragraphs only - \n means

\n followed by one or more empty lines means

do not use Word's object model, same goes for adobe acrobat

The module has simple interface convert that gets the filename, and directory and returns the filename of the htm file - example

example of how it will be used (if you use perl)

my $convertor = ModuleName->new;

my $file = ModuleName->Convert('[url removed, login to view]', 'c:\\documetns')

if (!$file)

{

print "*** Error" . ModuleName->GetLastError();

}

where $file will be '[url removed, login to view]' if everything OK

If the conversion fails the return value will be 0

And the error string should be returned by and GetErrorLast() function

The module should handle UTF-8 encoding as well as 8bit encodings (UTF-16 is bonus if you offer it)

The code should run unattended and you should create log file with all the errors

we should get all the source code documented and we get all copyrights and we can do whatever we want with the code including changing it and reselling it, or eating it ..:-)

the module should be compatible with ms windows and all the module dependencies as well, it must be based only on open source code no special modules that cost money or limit our ability to distribute the code are allowed !

we want simple code that is easy to maintain

we have several other modules we need so if you do well on this one you may get others too.

Perl Python

Nº du projet : #1146

À propos du projet

12 propositions Projet à distance Actif Apr 21, 2004