Converting .pdf based document portal to HTML

Baredor

Gawd
Joined
Jun 30, 2004
Messages
667
Greetings all,

My company implemented an Adobe based document portal a year or so ago, and while it might have not been the ideal solution, it was fine for what it did. We simply created a powerpoint slide, printed it to .pdf, and then threw in the links. The problem is that it has grown immensely and is now a rather large pain on our department and the users. As you can imagine, it's massive in size and is hogging bandwidth and disk usage.

What would be the best solution to convert everything to HTML? Is there any program that can do this or am I looking at rewriting the entire thing. :( I'm really not familiar with web publishing at all, but if I can get someone to point me in the direction I need to go, I can start researching the process.

Many thanks.
 
What is it that you want to convert to HTML? The PDF files? You can't, without a loss of fidelity.
 
Yeah, that's basically what I'm trying to determine - whether there's any way to do it or if there are many nights of dreamweaver in my future. And forgive the question, but what does fidelity refer to? The link structure?
 
"Fidelity" means accuracy, particualrly in the details.

HTML gives you coarse two-dimensional layout control; you can only use a few fonts, and so on. There's things that simply can't be represented.

PDF files are based on PostScript, and that language offers incredible control over layout, formatting, constructs on the page, and so on. You can even write programs to compute stuff, then show the results, or use those calcuations to draw graphics.

How will you convert such things to HTML, when they simply don't exist in HTML?

There are PDF to HTML converters; you can find 'em if you search the web. They'll vary in quality. If these PDF files are very regular, don't do anything fancy, and so on, you'll have an easy go of it. But even then, I'll bet you have to write a tool to post-process the generated HTML files and shape them up.

You might want to write your own converter; Adobe makes an SDK that will let you pick apart PDF files. It's not free, and if you've got limited programming skills it will be a challenge to use.
 
Good luck with it, if that's what you decide to do.

But I'm not sure you're on the right track. You might instead invest in tools to make the PDF files more managable. Disk space isn't so expensive, even if you remember to count the cost of backing it up and keeping it highly available. Bandwidth can get expensive, but bandwidth problems might mean you have a problem with your PDFs. For instance, when you create a PDF you can set lots of options to reduce its size -- knock down the resolution, don't embed fonts or graphics in it, and so on.

Perhaps, if you think the project through very carefully, you'll find that you're really treating a symptom rather than the problem itself.
 
Well, I probably gave a bad impression what the worst issues are. Bandwidth isn't really a problem, though saving any we can would be well and good. We actually knock the .pdf down to 144dpi, and so the average page is around 60-80KB. We would like to have the portal a little bit faster and easier to use than it is now, but the major thing for us is our own time. I spent 4 hours last night updating new documents on it. And, anytime I make a change to a page, I am having to go back and relink everything in it. That might not be normal, and I don't know really know enough to explain the technical "why" that's the case.

The pages are very simple, graphically. It's basically just a little background with some links on it. We would not convert the actual .pdf / other files that the users are attempting to get to, just the stuff on the way to it. I downloaded a demo converter program last night and gave it a try. Things came out decently other than the location of the links is being thrown off:

The yellow shade is the link. The red line is where it was originally and still should be.
demo.jpg


Someone else said that the same thing happens when they try to convert with Publisher. /shrug

So basically, we're trying to set something up that is overall better, but most importantly requires less regular lengthy maintenance. It's ok if it takes a while to get going, as long as one it's done, it's easy to maintain. I'm going to just keep looking for options and trying stuff until something clicks.
 
PDFs on the web suck. It's one thing if you need to present printed material (manuals, forms, etc...) but in most cases, you just need to beat the idiots responsible until they realize that the web is -not- the place to go for perfect document presentation - it's far more important that the information be correct, up to date and accesssable... looking nice is just a bonus. PDFs are great for printing, but painful to navigate, break the flow of navigating the web (they're paginated, links are funny, text and images are often scaled completely wrong, every time you hit one, you have to load up the PDF viewer...)

Going to an HTML-based site is definately a step in the right direction.

Sounds like it's time for you to look into setting up some sort of CMS.
 
ameoba said:
PDFs on the web suck... PDFs are great for printing, but painful to navigate, break the flow of navigating the web (they're paginated, links are funny, text and images are often scaled completely wrong, every time you hit one, you have to load up the PDF viewer...)

Exactly what I was trying to say. Any suggestions on an intensely dumbed down CMS? You see what kind of rudimentary graphics we're doing - simplicity and ease of use (since I would have to train 3 other people on it) is all that we would require.
 
There's no automated process I'm aware of that will do what your looking for. You should be able to replicate the PDFs fairly quickly with a decent understanding of HTML and CSS, though. I don't know what your skills are, but depending on how large and difficult the task is, it might be cheaper for your company to outsource it to somebody who can do it in a snap, rather than paying (and/or frustrating) you.

Also, for the CMS thing, look into LucidCMS. It might be up your alley.
 
Back
Top