I require a Java command-line program that automatically extracts specific highlighted text from PDFs.
The program should use either Apache PDFBox or Apache Tikka.
I have created a test data set of 10 PDFs. I have highlighted sections of PDFs using a different highlight color (and name). These different highlighted sections are named Objective, MethodStats and Limitations. The test dataset can be found at: [login to view URL]
The program should read in the PDF and then output ([login to view URL]) to the screen the relevant highlighted text (based upon which Section is asked for in the command line)
example:
Java -jar extractHighlight PDFname Objective
Java -jar extractHighlight PDFname Limitations
Deliverables include the following:
a) Source code with documentation
b) Jar file