PDFs - Experiences and Importers
Learn about the benefits and challenges associated with using Portable Document Format files, as well as tips on importing them.
PDF (Portable Document Format) is a file format used to present documents in a standardised manner independent of the device or software it is being viewed on and we support various types.
We now render PDFs so that the actual PDF page is displayed, rather than an image. True PDFs have much better quality and are now automatically enabled everywhere, since 07/04/22. Any PDFs uploaded before that date would only be using True PDF if they were explicitly set to.
Enhanced PDFs are true PDF pages with tappable hotspots that link to HTML versions of the content. These are usually curated by a 3rd party company called XCago, then sent back to us via SFTP to be automatically uploaded to the CMS.
Note: Enhanced PDF editions are curated by NITF Manifest - Enhanced PDFs and articles. They cannot have manually added articles curated in another setting - therefore they will only appear at the top or the edition.
Double Page Spreads
Many magazines are designed with double page spreads in mind - these involve articles that span both left and right pages of the physical magazines, and the content will often run across them. Pugpig can render double page spreads in landscape mode. When we do this, the pages will fit the height of the screen so you can see an overview of the entire spread. You can zoom the pages to read the detail.
The first page (normally the cover) will sit on its own on the right of the screen, and the last page will sit on its own on the left.
Double Page Spreads can be used in conjunction with Enhanced PDF so you can click between the HTML and PDF views.
When saving double page spreads, currently it saves both the left and the right page as separate items.
When sharing double page spreads, it shares the URL to the first page to ensure a user in opening the app in portrait mode will see the correct content.
It is enabled in the "Pugpig Full Page Image Service" settings in Express, by changing the setting from None, to First page on left, or on right. On Left will display the cover of the edition next to the inside first page. On Right will display the cover on its own, then the first 2 inside pages together.
Some users wish to download a full PDF file for sharing or printing. If you wish to enable this, read this.
(Deprecated) Image PDF
This creates a JPEG image of the PDF files and displays them in the app. It is however lower quality and we no longer do this anywhere for Bolt apps.
PDF Page Importer
This is how we upload most PDFs, either 1 whole PDF file, or multiple PDF pages, to Express.
We need to know the expected filename format that we’ll be receiving. We use the filename of a PDF to give it all of its information; name, user-facing date where in the app it will appear etc. This format cannot change without prior agreement, or uploads will either fail or have the wrong information. Using file name conventions works best when the SFTP is automatic from a consistent system.
Some things to bear in mind when we agree the convention:
- Find out if the edition key matters for any other systems. In particular, if the client uses issue based auth or sells single issue IAPS then they will. If the key is not the same in these systems, the filename becomes very unpleasant
- Find out if we need filter groups, and if yes ensure something in the name tells us this
- If the name is a Date (e.g. Jan 2035) or issue # (e.g. Issue 234) then we can build this from fragments of the file name.
- If the name is a free for all string (e.g. King's Coronation Special Edition) then we suggest they change this in our CMS afterwards
If you would like to, you can upload editions manually.
To do so, go to PDF importer, in the tools section of the CMS. Then just drag and drop your PDF file into the box provided and in a few minutes it will appear in the editions section.
It will upload with no name, or date information, but you will be able to fill all of this out by clicking into the edition.
To automate the PDF flow to Express, we use an SFTP server, which will need to be set up on our end, with some additional info required:
- Whether PDFs will be sent as one whole, or each page individually
- Who will need access to the SFTP, and their SSH Public key
- The Express site code
XCago and SFTP
HTML converted or Enhanced PDF editions come from a 3rd Party company called XCago. In this case they set up the SFTP, rather than us. PDFs will be uploaded to the SFTP, (or provided to them another way) by you and they then return the output via SFTP to our CMS. If using the automated SFTP flow, it uses XCago’s server, not ours.
The NITF importer is used for both XCago converted HTML editions, and Enhanced PDF editions.
Repub (.epub) is another format we can use for HTML conversions and Enhanced PDF. The Repub format gives us more metadata and more inline information. Note, this format does not map into the header group meta fields in Express but places all article content into the body.