Do PDFs have to be so frustrating?

Did you know PDF format is almost as old as the internet itself? Despite its age, it is still the only true cross-platform “paper like” experience available on the web. Unfortunately, it is also one of the most frustrating file formats on the web.

The primary reason for this frustration is rooted in the format itself. Its maintainers, like most maintainers of old technology still in use, have made a half-hearted attempt to keep pace with the internet as it evolved. The goal of those efforts typically rooted in the philosophy of wanting to have your cake and eat it too. Specifically, they have tried to build a mega-format that retains 20+ years of backward compatibility while bolting more features on.

This combination has resulted in a very complicated format with more options than you can shake a stick at. Many of these features are either not documented or worse, are documented incorrectly. ISO has been making an effort to at address the documentation issue and the latest documents are much better but the fact remains that PDF is both old and complicated.

Generally, this means application developers that need to support PDF either have to choose between defining some minimal profile of the standard that they feel fits their needs and building their own libraries or licensing a more complete library with onerous terms and fees associated with them.

To complicate things further these commercial libraries are usually written in languages like Java, C and PHP which don’t exactly represent the most modern development platforms for the web.

As a result, almost all solutions that work with PDFs, where the PDF itself is not the “product” do the natural thing, create an image and wrap it in a PDF file. They call this approach “flattening” the document, I can only assume this is to make it sound less hacky.

While there are legitimate cases to flatten a document in this way it causes a few problems, for example:
– Accessibility tools like screen readers no longer work,
– You can no longer select text,
– You can no longer extract the field data entered into forms,
– The document can no longer reflow to be readable on smaller screens,
– And more…

An interesting observation is that nearly every document signing solution I have seen seems to flatten the document also. From an engineering standpoint I understand why they make this decision, it’s much easier to do and there is an argument to be made that long term images are easier to handle, with that said the downsides of this approach are significant.

At Microsoft we used to talk about being “authentically digital”, the idea being you want to embrace the good things about the physical worlds way of doing things, but you also need to be true to the technology.

The technical baggage of the PDF format and the lack of freely available SDKs basically put developers in a no-win situation forcing them to give up the best parts of the digital medium if they want to work with these documents.

Are these non-flattened documents a replacement for the more modern file formats? No, they are not, but equally so, at least when you consider cross-platform needs, neither are they a for PDF.

In short, I think PDFs do not need to be so frustrating! However, if we are going to keep using the format we need to go the extra mile to retain their digital goodness.

Leave a Reply

Your email address will not be published. Required fields are marked *