
In Progress
Posted
Paid on delivery
I already have a rock-solid script that converts Tamil and English voter-list PDFs (scanned images) straight into neatly structured Excel sheets with perfect accuracy. Now I need that very same reliability extended to Punjabi. The requirement is tough but clear: deliver an OCR module that handles scanned electoral rolls in Punjabi and pushes the data into Excel with better than 99 % accuracy. The current pipeline is built in Python, relying largely on classical OCR techniques and some OpenCV preprocessing; it does not depend on heavy machine-learning models, and I want to keep it that way as much as possible. If you must introduce lightweight AI tricks to hit the accuracy target, document them so they can be toggled on or off. Deliverables • A standalone Python script (or module) dedicated to Punjabi language and alphanumeric house numbers • Clear setup instructions plus any custom trained OCR data files you create • Sample run on at least one full Punjabi voter-list PDF showing an Excel output whose character-level accuracy exceeds 99 % Acceptance will be based on that verified accuracy score, speed and zero or less dependancy on paid AI. Once Punjabi is nailed, I’ll commission similar modules for Bengali, Oriya, Assamese, Malayalam, Kannada, Marathi and Gujarati, so think modular and reusable from the start.
Project ID: 40195731
18 proposals
Remote project
Active 2 mos ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs

I have reviewed the voter list screenshot and your existing Python/OpenCV pipeline requirements. Achieving >99% accuracy in Punjabi (Gurmukhi) requires more than just standard OCR; it necessitates a custom Gurmukhi-aware preprocessing layer to handle the 'Shirorekha' (headline) and 'Matras' (vowels) that often cause character merging in scanned PDFs. My Technical Strategy: - Grid-Based Segmentation: I will use OpenCV Contours to isolate each voter card, ensuring house numbers and names are cropped precisely to prevent noise. - Advanced Deskewing: Using Hough Line Transform to correct the tilt of each card independently, which is vital for character-level precision. - Gurmukhi-Tuned Tesseract: I will implement Pytesseract with a custom configuration focused on the pan language pack, supplemented by a Post-OCR Regex Validator to ensure house numbers and ages meet 100% logical consistency. - Modular Architecture: The script will be built as a plug-and-play module. You can easily swap the language config for your upcoming Bengali, Oriya, and Marathi projects.
₹9,450 INR in 3 days
0.0
0.0
18 freelancers are bidding on average ₹7,752 INR for this job

Hi, I can extend your existing Python OCR pipeline to handle Punjabi voter-list PDFs with the same reliability you already have for Tamil/English. I’ll focus on classical OCR + OpenCV preprocessing, tuning segmentation, denoising, and language-specific post-processing to achieve >99% character-level accuracy, with only optional lightweight AI tricks (clearly documented and toggleable). You’ll get a standalone Punjabi module, clean setup instructions, and a verified sample run producing accurate Excel output. I’ll keep the design modular, so it’s easy to replicate later for Bengali, Oriya, Assamese, etc. Ready to start immediately.
₹7,000 INR in 3 days
5.5
5.5

can extend your current OCR pipeline to Punjabi (Gurmukhi) using optimized OpenCV preprocessing and custom Tesseract language data, keeping the solution lightweight and independent of heavy AI. Any optional AI-based post-correction for reaching >99% accuracy will be documented and switchable. You’ll receive a standalone Python module, setup instructions with trained data files, and a verified sample run producing structured Excel output from a full Punjabi voter-list PDF. The system will be modular, making it easy to replicate later for other Indian languages. Ready to begin with a sample file.
₹15,000 INR in 1 day
5.2
5.2

Hello, I can deliver the Punjabi OCR module for your voter-list pipeline, ensuring it integrates seamlessly with your existing Python and OpenCV workflow. I have experience as a full-stack developer with specific expertise in developing automation and web scraping projects. I am available to start immediately and will provide a modular solution ready for your upcoming regional language expansions. My plan is to implement a high-accuracy pipeline using OpenCV for deskewing, noise reduction, and adaptive thresholding to optimise the scanned images. I will utilise a fine-tuned Tesseract 5.0 engine with Gurmukhi-specific configuration files to handle the tabular layout of electoral rolls. I will provide the standalone script, a sample Excel output demonstrating the verified character-level accuracy. Please reach out if you would like to discuss the specific layout of your Punjabi voter lists or the togglable AI features. Excited to hear from you, Nehal
₹8,500 INR in 7 days
3.4
3.4

Hi there, Yes, this is feasible. I can extend your existing Python OCR pipeline to support Punjabi (Gurmukhi) voter-list PDFs while keeping the approach lightweight and accurate. => Build a standalone Python OCR module for Punjabi text + alphanumeric house numbers => Use classical OCR with OpenCV preprocessing (deskewing, binarization, noise removal) => Fine-tune Punjabi OCR data for high accuracy without heavy ML dependency => Optional lightweight AI enhancements (clearly documented and toggleable) => Export clean, structured Excel output matching existing pipeline format => Provide setup guide and any custom OCR data files Acceptance-ready delivery => Run on a full Punjabi voter-list PDF => Verified character-level accuracy > 99% => Fast execution and no reliance on paid AI services Code will be modular so the same framework can be reused later for Telugu, Bengali, Oriya, Assamese, Malayalam, Kannada, Marathi, and Gujarati. Ready to start immediately. Pavan Kumar A
₹12,500 INR in 30 days
2.3
2.3

I can extend your existing Tamil + English voter-list OCR pipeline to **Punjabi** with the same Excel-ready accuracy. I’ve already built **multiple Indian-language voter-roll OCR scripts** using classical OCR + OpenCV (no heavy ML), delivering structured Excel outputs with strict accuracy targets. Punjabi will be added as a **standalone, modular Python module**, fully compatible with your current architecture. What you’ll get: * Dedicated Punjabi (Gurmukhi + English) OCR script * Robust handling of alphanumeric house numbers * OpenCV-based preprocessing + multi-pass OCR for >99% character accuracy * Optional lightweight AI fallback (toggleable, documented) * Clear setup instructions + any custom OCR data files * Sample run on a full Punjabi voter-list PDF with verified Excel output Built modularly so the same framework can later support Bengali, Oriya, Assamese, Malayalam, Kannada, Marathi, and Gujarati. Let’s connect in chat, I’ll show you my work as demo.
₹7,000 INR in 1 day
1.8
1.8

Hello, Greetings , Good afternoon! I am skilled mobile computer programmer with skills including Data Extraction, OCR, Python, OpenCV, Image Processing and Large Language Models (LLMs). Please contact me to discuss more about this project. Thanks and Regards
₹4,584 INR in 3 days
0.0
0.0

Hello, I have carefully reviewed your project and fully understand the requirements.I am an experienced professional in extracting data from PDF, images (JPG, JPEG, PNG) using OCR,as well as web scraping. I will extend your existing classical OCR pipeline using optimized preprocessing, Punjabi-specific traineddata, and rule-based post-correction — keeping the solution lightweight, fast and modular. I can accurately convert the extracted data into clean, well-structured text or CSV format and deliver error-free, organized data. You can confidently accept my proposal. I am ready to start immediately. Best regards, Rahul
₹2,500 INR in 5 days
0.0
0.0

I can extend your existing Tamil + English voter-list OCR pipeline to Punjabi while preserving your core design goals: classical OCR first, minimal AI, fully documented and modular. How I’ll approach Punjabi OCR • Script-specific OpenCV preprocessing (binarization, skew correction, line/character isolation tuned for Gurmukhi) • Tesseract OCR with custom Punjabi (Gurmukhi) training data, optimized for voter-list layouts • Rule-based post-processing and dictionary validation to push accuracy beyond 99% • Optional lightweight AI assists (e.g., character confidence filtering or error correction) implemented as toggleable modules, fully documented • Strict character-level accuracy validation against ground truth Deliverables • Standalone Python module for Punjabi OCR, compatible with your current pipeline • Any custom trained OCR data files + clear setup instructions • Sample run on a full Punjabi voter-list PDF with Excel output exceeding 99% character accuracy • Modular structure designed for easy extension to Bengali, Oriya, Assamese, Malayalam, Kannada, Marathi, and Gujarati What you can expect • Zero paid AI dependencies • Fast execution suitable for bulk rolls • Clean, readable, reusable code • Accuracy verified and documented If this aligns, I can review a sample Punjabi PDF and your current pipeline structure and start immediately. This can be the template module for the remaining languages you plan to onboard
₹10,000 INR in 7 days
0.0
0.0

I reviewed your project requirements. I understand you need to adapt your existing Python OCR pipeline (currently for Tamil/English) to handle Punjabi voter lists with high accuracy (>99%), while keeping the solution lightweight and modular. My Approach: I specialize in Python and Artificial Intelligence. I can deliver a standalone script that utilizes: OpenCV for image preprocessing (noise reduction, binarization) to clean up the scanned electoral rolls. Tesseract (with Punjabi/Gurmukhi training data) or a lightweight optimized OCR model to extract the text and alphanumeric house numbers accurately. Excel Integration to format the output exactly as required. I am ready to document the setup clearly so you can easily toggle any AI features. Why me: I have strong experience with Python, Data Structures, and Computer Vision. As a new freelancer on this platform, I am eager to prove my skills and will ensure this is delivered within your 5-day timeline. Best regards, Himanshu Sharma
₹7,000 INR in 5 days
0.0
0.0

I can extend your existing Python-based OCR pipeline to Punjabi voter-list PDFs while preserving the same classical, high-accuracy approach you already trust for Tamil and English. The focus will be on Gurmukhi script + alphanumeric house numbers, using OpenCV-driven preprocessing (deskewing, denoising, binarisation, layout/card segmentation) and custom-trained Tesseract OCR data for Punjabi, avoiding heavy ML models wherever possible.
₹7,000 INR in 7 days
0.0
0.0

Hello there, I’m available to start immediately and can plug into your existing Python OCR pipeline without disruption. I’ll adapt preprocessing for Gurmukhi scripts, tune Tesseract Punjabi data for electoral rolls, handle alphanumeric house numbers, and export clean Excel with verifiable >99% character accuracy. Any lightweight AI aids will be optional, documented, and toggleable. The module will stay fast, modular, and reusable. I’ve delivered large-scale SOP-driven OCR systems across languages, where accuracy, repeatability, and zero paid dependencies were non-negotiable. I design once and scale cleanly across scripts. Let’s review a sample PDF and lock accuracy targets. Regards, Md Laden Islam
₹9,000 INR in 7 days
0.0
0.0

I am a reliable freelancer with good English typing skills.I Can do data entry web research, copy paste tasks and virtual assistant work. lam available Full time and always deliver work on time
₹7,000 INR in 7 days
0.0
0.0

Hello, I can adapt your existing Python OCR pipeline to handle Punjabi voter-list PDFs with >99% character-level accuracy, keeping the solution lightweight and free of paid AI dependencies. I have experience with OpenCV preprocessing, classical OCR techniques, and modular Python scripting, ensuring the output Excel is accurate and ready to use. I will provide a standalone script, setup instructions, and a sample run for verification. The solution will be designed for easy extension to other languages like Bengali, Oriya, Assamese, and more. Looking forward to delivering a precise and reliable Punjabi OCR module. Best regards, Mushtaque
₹7,000 INR in 2 days
0.0
0.0

Chennai, India
Member since Jan 23, 2026
₹1500-12500 INR
₹1500-12500 INR
₹1500-12500 INR
$10-30 AUD
₹500000-1000000 INR
₹600-1500 INR
₹12500-37500 INR
€30-250 EUR
₹100-400 INR / hour
$10-30 USD
$2-8 USD / hour
$10-30 USD
$10-30 AUD
$250-750 USD
$15-25 USD / hour
$15 USD
₹12500-37500 INR
$250-750 AUD
$10000-20000 USD
$3000-5000 USD
€8-30 EUR
$10 USD
$15-25 USD / hour