
Open
Posted
•
Ends in 6 days
Paid on delivery
I have a collection of PDFs whose tables combine normal text with embedded images, so simple copy-and-paste will not do. Your task is to pull every table out of each file, run OCR where the cells are only images, and populate a custom-designed SQLite database. We will agree on the schema up front so the final .sqlite file lines up with my downstream reporting tools. Accuracy is key: numbers must keep their formatting, column order must stay intact, and any footnotes that belong to a row need to be captured in a linked field rather than discarded. Whether you prefer Python with Camelot, Tabula-py, pdfplumber, Tesseract, or another extraction stack is up to you—as long as the workflow is reproducible. Once all conversions are complete, send me the finished database along with the script or notebook you used so I can rerun the process when new PDFs arrive.
Project ID: 40423283
Open for bidding
Remote project
Active 56 yrs ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs

Delhi, India
Member since Apr 30, 2025
$15-25 CAD / hour
$10-90 USD
₹750-1250 INR / hour
$30-250 USD
$30-250 AUD
₹600-1500 INR
$15-25 USD / hour
₹1500-12500 INR
₹250000-500000 INR
$10-30 USD
₹750-1250 INR / hour
$30-250 CAD
$30-250 USD
$30-250 USD
₹750-1250 INR / hour
₹750-1250 INR / hour
₹1500-12500 INR
$15-25 USD / hour
₹600-1500 INR
₹1500-12500 INR