
Open
Posted
•
Ends in 6 days
Paid on delivery
I have a working Python pipeline that already pulls clean text from scanned voter lists in Tamil and English by combining custom image pre-processing, a light AI layer, and Tesseract OCR. The next milestone is to make the very same code read Telugu with comparable performance—my target is 99 % character-level accuracy across the entire page, not just names or voter IDs. Once Telugu is solid, we will roll the same approach out to the rest of the major Indian scripts (Hindi, Bengali, Marathi, Malayalam, Kannada, Assamese, Gujarati, Punjabi and Odiya), but this job is strictly about nailing Telugu first. What you’ll work with • Current codebase (Python, OpenCV, pytesseract, a few custom TensorFlow helpers) • A curated set of high-resolution scanned PDFs and images of Telangana and Andhra Pradesh voter rolls for training / validation • My existing language-agnostic pre- and post-processing modules, which you are free to tweak Key responsibilities 1. Train or fine-tune a Tesseract language data set (or an alternative open-source OCR engine if it yields better accuracy) for printed Telugu voter-list fonts. 2. Integrate the new language file into the existing code, keeping the same API and CLI behaviours. 3. Validate against my test suite and push accuracy to ≥99 % on a per-character basis; document any edge-case failures and patches. 4. Hand over updated code, trained data files, and a concise technical note explaining changes and future-language scaling steps. Acceptance criteria • ≥99 % per-character accuracy on the provided blind test batch • Same or faster processing speed than the current Tamil run • telugu code will be a separate version, the same code need not read Tamil House number accuracy is extremely important I will prioritise freelancers who can point me to prior OCR/Tesseract projects in Indian scripts and explain, in a few lines, how they usually drive accuracy past the 95 % mark.
Project ID: 40187457
2 proposals
Open for bidding
Remote project
Active 34 secs ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
2 freelancers are bidding on average ₹6,750 INR for this job

Hi there, Your "Extend OCR for Telugu Voter Lists" job looks interesting and matches the kind of work I usually do with Python, Machine Learning (ML), OCR, Artificial Intelligence, Image Processing, OpenCV, Natural Language Processing, Text Recognition. I can help you get a clean result and keep you updated at each step. You can see similar projects here: https://www.freelancer.com/u/msaadarshadkhan When would you like to start and do you have any examples of styles you like?
₹1,500 INR in 2 days
2.9
2.9

Chennai, India
Member since Jan 23, 2026
₹1500-12500 INR
₹1500-12500 INR
$30-250 USD
₹600-1500 INR
₹750-1250 INR / hour
₹600-20000 INR
₹750-1250 INR / hour
$750-1500 USD
$30-250 CAD
€8-30 EUR
$30-250 USD
€30-250 EUR
$250-750 CAD
$15-25 USD / hour
₹150000-250000 INR
£10-20 GBP
$10-30 USD
$30-250 USD
$250-750 USD
$250-750 USD
₹600-1500 INR
£20-250 GBP