31 Jan The Nightmare of Translating PDF Files
As we all know, the PDF file format is widely used in everyday digital dealings when exchanging information, technical data, general documentation, marketing materials or any other kind of documents between all sorts of counterparts, associates, customers and audiences.
This is not by chance. The PDF file type is lighter, well handled and displayed in most gadgets, very user-friendly with its built-in features for highlighting, adding editing notes, comments and such, while simultaneously preventing unauthorized editing and tampering. It seems only natural that companies and individuals seeking translation services would provide this file type for translation, as they consider it the most reliable and common file type and probably aren’t even aware that the document might have had a different original format. Most documents are created using a different program like Microsoft Word, PowerPoint, InDesign, FrameMaker, html to name a few, and converted into PDF for publishing and sharing.
However… PDF files can give real hell to translation companies.
Regardless of the provenance of the PDF file, one thing is for sure – if the PDF is not configured in the right way, its contents will not be accessible and editable. This feature poses a problem even before the translation project can get off the ground. If you require a quote on a per word rate, the translation company cannot provide one right away as it has no way of determining the word count instantly. There are three ways of solving this stalemate: counting the words manually (which can obviously take a very long time), using an independent software to perform a word count on the PDF, or converting the file into an editable format. This latter option will probably be the most reasonable choice, and become a necessary part of the translation project. Which brings us to the next topic.
Time to convert the file into an editable format! So… How do we do that?
You can simply open the PDF file with a text editor, such as Microsoft Word, or you can use an OCR (Optical Character Recognition) software. But what if the PDF was created from a poorly scanned image? Moreover, what if it was created from a desktop publishing software? A text editor software will not recognize the text as such, but rather as an image, and OCR software does not fare well in correctly interpreting and transferring complex layouts. If the characters are formatted in a non-standard font or layout, or if the document was not scanned properly, the software will not be able to recognize characters and, in most cases, will insert gibberish into the resulting editable file.
If sadly the result of the conversion is less than desirable, we end up with a very messy document with unintelligible text and chaotic formatting – full of bizarre characters, misleading similar characters, jumbled punctuation, and complex layers of styles arranged in different text blocks. DTP stands for Desktop Publishing and it could be described, in a very simple way, as the task of fixing these formatting and layout issues, rearranging text blocks to produce a file where the text flows as naturally as in the original. Proper DTP is able to identify which editing tools and software are better suited to each case and boasts a confident mastery of those same tools and software. The process of performing DTP is usually very time consuming and needs to be carried out by someone with the appropriate training. These two factors result in running up significant costs for Language Service Providers (known as LSPs). However, only after this task is correctly completed, do we end up with a trustworthy editable file to work on, which also provides an accurate word count for quoting purposes! It is finally time for the actual Translation.
After the Translation is complete, there follows another round of DTP to make sure the translated file looks the same as the source file. As one might expect, because the content is now in a different language, the formatting is, sometimes, not even close to the source file. Just as an example, if you translate a file from English into Portuguese, you will end up with a Translation with an average of up to 30% more words than the source Texts. So, where do we squeeze in 30% more content? How do we make it look the same as the source file? The answer is, once again, found in the unsung heroes who provide DTP services. Their expertise and skill will ensure both files are as identical as possible, and that the intended audience does not notice any glitches.
As with all written material, there is always room for improvement and slight oversights tend to creep in where you least expect. Very seldom do we get a spotless, finished document with impeccable spelling and perfect typesetting right off the Translation and Editing stages! That is why proofreading exists! This last stage concerns itself solely with the Final Document as it is intended to be seen by the end user. This task plays out as a sort of “tennis match” between the proofreader and the DTP professional. The proofreader spots language, context and readability issues stemming from the typesetting and serves the ball back to the DTP Wizard for the necessary amendments and readjustments.
Eventually, the file is finalized. A new version of the PDF file is created in the target language and everyone is happy with a job well done!
It is now clear that translating PDF files can be a very tricky process. When you need to procure translation services, this might be a factor you will want to keep in mind. This way you can expect the very best results you want and deserve, and also keep costs in check while having a more productive and supportive relationship between you and your Translator / LSP.
Having all the content in an editable format beforehand will cost you less money, the assignment will involve far fewer problems and the process will be much quicker and reliable. Sometimes all it takes is the littlest amount of inquiring of your documentation specialist to get your hands on an editable file and save time, money and hassle.