AI to Extract Rate Formula from Text Description in PDF

Hello. This is a unique problem. Please provide a detailed proposal. Vague applications will be ignored. Speak to the problem. Looking for people with creative ideas.

The task is to extract a rate formula from a textual description in a PDF file.

In Texas, the electricity market is deregulated. Rates are defined by a document called an Energy Facts Label (EFL). Several examples of EFLs are attached. These PDFs then describe, in words, a math formula.

There are thousands of these EFLs.

The Rate Formulas PDF file (attached) gives several examples of different descriptions, and a graph of the formulas that result.

Rates are a function of kwh, ie R(x) where x = kilowatt hours.

EFLs include a spot pricing table at 500, 1000, and 2000 kwh. This shows the rate value at those precise points, ie R(500), R(1000), and R(2000). This is useful for testing whether an accurate rate formula solution has been found or not.

C# source code has been attached. There are two console applications.

1) PowerToChooseScraper. This program will download all the EFLs currently in the market. Just give it a target folder and it will download the PDFs there. This program may have some little bugs, but should work for you.

2) PTC. This is old code. It is a first draft attempt at creating a program to parse the PDFs and extract the rate formulas. Code hasn't been touched for many years. At the time it was created, it was looking good. Not 100%, but was getting ~65% accuracy.

I do not care if the existing PTC code is used or not. I also don't care if your work is in C# or something else, but whatever the solution, the final working version will end up in C#. If you want to use a language other than C# for developing the initial logic, I'll ask why. If using ML techniques, that could be a good reason.

This is a unique problem because it could be approached in a lot of ways. It could maybe be solved using ML/learning techniques. Maybe word similarity algorithms like Jaro-Winkler. The PTC code works by trying multiple approaches. It runs in a loop, stepping through methods, until it successfully found a solution. The approaches attempted are all fairly rudimentary. No learning algorithms have been attempted.

I also do not expect 100% accuracy. Just as close as possible. ~95%. It's possible some EFLs have human errors in them, where the numbers are actually wrong and don't make sense. In which case the goal is to discover that. If a solution can't be found, we want to flag this EFL for a human to review it and determine what is going on. Over time we can improve the accuracy.

I'm looking for for the discrete logic that processes a single PDF and outputs the rate formula, or an error code if it can't be determined. The larger infrastructure to then download and process these files, database the results, etc., is a separate thing outside the scope of this project.

I will be working with you directly on this. I am an expert in C#, ML, and well versed in these EFLs. I can help guide your approach.

Compétences : Data Extraction, Exploitation de Données, Machine Learning (ML), Programmation C#, PDF

En voir plus : extract text pictures pdf, extract text picture pdf, java extract text structure pdf, php extract text excerpt pdf, extract text data pdf, extract plain text doc pdf docx, extract text special pdf, extract text from pdf, extract text from pdf file, extract russian text from pdf, extract text from pdf image, extract text from pdf online, extract text from pdf python, vba extract text from pdf, extract text from pdf to excel, extract text from pdf command line, extract text from pdf mac, extract text from pdf free, extract text from pdf acrobat

Concernant l'employeur :
( 0 commentaires ) New York, United States

Nº du projet : #31562780

5 freelances font une offre moyenne de 705 $ pour ce travail

(88 Commentaires)

Nice to meet you. I have checked your job descriptions and can do it perfectly. My work will include these steps - OCR: detect the text from pdf files - Extract the energy and rate info from OCR result - Estimate the f Plus

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 7 jours
(12 Commentaires)

I have read project requirements. I am managing director of software company and I have team for development so we can complete it perfectly. I am from India GMT +5:30 and I am available from 8:00 AM to 11:00 PM. We Plus

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 10 jours
(13 Commentaires)

Hi I'm Hoss I have a PhD in engineering and 15 years of professional programming experience in different languages I read your full description (thank you for the full explanation) The project is a challenge and I was Plus

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 7 jours
(2 Commentaires)

My Background is Electrical Engineer only. This is my daily job, working on Electrical Bills. Preparing data from Electrical Bills is my work.I know if I get this work it will be a long time relationship. Thanks and r Plus

%bids___i_sum_sub_35% %project_currencyDetails_sign_sub_36% USD en 7 jours
(0 Commentaires)