PDF Content Extraction

KuJaX

[H]F Junkie
Joined
Jan 8, 2001
Messages
15,778
Is there some type of tool, even if command line based or [open source] online based where you can input a PDF file and it will spit out specific information from the PDF that matches a certain criteria that you previously setup?

I get a lot of PDF's that have information in the exact same spot (but different information) such as billing name, address, phone number, etc. They are in the same spot every time since it is derived from a template.

Anyway, I have having to look at it and copy and paste junk. It would be ideal to get one of these template based PDF's, drop it into a web app or run a command line command against it and get a CSV or something with the data points.

:)
 
If the form was created and saved a certain way within Acrobat, you can use the "Distribute PDF" functionality to send it out and then gain the information back. I've never done it this way so I'm not sure what type of info comes back or how it can be manipulated from that point, but I know it does send you back an info file (basically) containing nothing but the information entered by the end user/customer.

More info: Creating and distributing PDF forms in Adobe Acrobat DC (not limited to DC by any means; this has been in Acrobat since at least IX and probably earlier)
Collecting and managing PDF form data, Adobe Acrobat (says it can be exported as a CSV)
 
Back
Top