I need a Python 3 script that will extract specific parts of text from a PDF file and generate an .rtf file with those text elements isolated, and some extra text added.
The input file is a typical "Written Discovery" request used in litigation (when people are suing/being sued). The first few pages have the case information and preliminary definitions. Then there are a series of numbered requests that all have the same title, but with a different number. Following the title, there is the text of the discovery request.
In the example I am using (see file: [login to view URL]), each discovery request is titled "SAMPLE DISCOVERY REQUEST" but in "real world" documents they could be a number of different titles (eg SPECIAL INTERROGATORY, REQUEST FOR PRODUCTION, REQUEST FOR ADMISSION), although they will all be the same in a single document.
I would like a script that uses OCR to extract all of the discovery requests and put them into an .rtf file with limited formatting (bold and underlined). Each request should be followed by the following text: RESPONSE TO [TITLE OF DISCOVERY REQUEST] (see file: [login to view URL])
What makes this tricky is that these documents always appear on "Pleading Paper" where each line is numbered on the left hand side of the page. This causes the text in OCR output to be interrupted by numbers (see file: OCR [login to view URL]).
The script will need to determine when the requests begin, what they are called, list all of them (in the example, there are 9, but 30-50 is more typical) and add in the "RESPONSE TO" language.
This is the first step in what could be a much larger project for the right developer.
Please let me know if you can handle this project in a short timeframe, and if you have any questions.
6 freelance font une offre moyenne de $214 pour ce travail
Hello Sir, I am the expert freelancer here. I am on the 6th position through out the world to deliver the quality job. I have deliver here more than 395 + projects with 100% client satisfaction. I can help you Plus
Dear client, Thanks a lot for taking your precious time to read my message. After browsing your job description, I am very interested in your project and I believe I’m qualified for the task. Regarding OCR, I Plus
Hi, there - My name is Phong. I have read your job description and I am very interested in this project because I have good experience with OCR and python programming. I would love to speak with you further about tak Plus
Hello!\nI am a python developer.\nI looked at your project and it seems interesting.\nI have all necessary skills required for this project.\nPing me to discuss in detail.
As an: 1- Bachelor of Applied Mathematics 2- Engineer in Statistics and Applied Economics 3- Microsoft Office Technician (Microsoft Graduate: MOS Diploma), I am able, available and ready to do this kind of work in Plus
hello,how are you. i read your bid carefully. i am OCR expert and have full experience for 7 years. python, c++, tesseract is my top skill and i can complete your project by using that skills. I can provide most qua Plus