Pugpig Bolt RSS Overview
What is RSS and when should I use it?
RSS is an XML based feed format that allows users and applications to access updates to web content in a standardised format. If you wish to send content from an existing CMS to Pugpig automatically, it is the simplest approach to use. When we talk about RSS, we also mean Atom feeds, which are really just a more modern version of RSS.
The main advantage of using RSS over JSON is that RSS has standard elements, so there is less custom work to do as most of the importing should already be supported by our platform.
You can use RSS to:
- Import articles, audio and other content types into your CMS
- Specify the order of articles in an edition or curated timeline
- Provide advanced layout or formatting information for a timeline
- Import metadata about editions into the CMS
More detail about each of these is given below.
We also have a plugin that allows you to monitor your RSS feeds.
Article RSS Feeds
This section discusses what is needed to import articles into the CMS via RSS.
Standard RSS Elements
RSS feeds produced by most well known vendors (for example WordPress or Arc XP) will have a structure that just works. There will be some metadata or taxonomies that you need to supply. The standard elements we always use include:
- The unique identified of your article in the <rss:guid>. This should never change once we've seen an article, or it will create a duplicate.
- The title of the article in the <rss:title>
- The summary/excerpt of the article in the <rss:description> element
- The full content of the article, ideally in the <content:encoded> element. We use the description as the full body. This can contain many inline elements, and standard inline HTML will work.
- The publish date of the article, usually the <rss:pubDate> element, but also the <dc:date> or <rss:issued> element work
- The section the article should appear in. This is normally the standard <rss:category> element, of which there can be multiple.
Certain more advanced inline elements will need specific mark-up (for example images, galleries, videos) to perform well in the app.
Pugpig-Specific Extra RSS Elements
Extra Pugpig Specific metadata may be included. These are normally custom elements which we then map to fields in the CMS. Common examples of these include:
- Which collections (edition or timeline) this article should appear in. This is normally a custom element - we recommend <collections>, <editions> or <timelines>, all of which can contain multiple entries. If it is not possible to provide this, we map an entire feed to a specific collection or timeline. Note that this mapping approach does not work if an article can appear in multiple RSS feeds.
- The ordering of an article if it appears in an edition
- Flags to specify if the content is free or paid for, or which paywall meter it is part of
We will of course display all inline images in the content but often it is useful to get more structured images. For example, the feed can specify which is the main image for the timeline view, or supply author headshots for the article pages. These can simply be supplied as a link in a field, for example:
Specifying Advanced Placement or Layout
In more advanced cases, you may wish to specify in your feed how an article will appear in the app. You can specify specific Bolt layouts and Timeline Cards in the feed. There is much more detail on how to achieve this in the Bolt Timeline and Article Specific Features article.
Feed Structure and Updating Logic
Checking for updates
Ideally, your feed will tell us how often to check. There are standard RSS elements that allow you to specify this per feed (<rss:ttl>,<sy:updatePeriod>,<sy:updateFrequency>). For example, we should check a breaking news feed far more frequently than a long-form opinion piece feed. The downside of checking too often is the extra load we place on your servers. If your feed doesn't specify the polling interval, then we can configure this on our side.
Detecting modifications in an article
By default, we use standard RSS logic for this. Each element in an RSS feed has a last modified date. We will store this date, and only update an article if the modified date changes. We will use any of <dc:modified>, <dcterms:modified>, <rss:modified>, <rss:updated> or <atom:updated>
Sometimes it may be difficult for you to supply an accurate modified date. For example, layout information might change more often than the article has changed. In this case, we can also use a hash (provided as <hash>) of the content in the RSS item to detect if there has been a change. If the hash is used, then we require the modified date of all articles to be the current time. For example:
<updated>[current time in correct format]</updated>
Having multiple RSS feeds
You can supply as many Content RSS feeds as you would like. However, there are some important considerations:
- Content RSS feeds should always be ordered by article modified date
- Having too many feeds can slow down the time for content to arrive as we poll your feeds one at a time. A single large feed is much faster
- Your feed should be long enough to ensure we never miss an update. For example, if the maximum number of articles you would change in a 10 minute period is 50, and we are polling every 10 minutes, then set your feed length to be about 100 to be safe
Manually updating an article
Changes made manually in your Pugpig CMS will be overwritten by any updates that may come in your RSS feed later. However, we do provide a per-article setting in the CMS which allows you to ignore further RSS updates and make manual edits (see right).
Deleting an article
If an article is removed from an RSS feed, we will not delete this article from the CMS. It is usual for articles to drop off the bottom of an RSS feed, so it is difficult to know what should be deleted.
We do support the <at:deleted-entry> Atom tombstone element for automatic deletion, although this is rarely seen.
In most cases, if you need to delete an article from the Pugpig CMS, you'll first need to delete it from your CMS to ensure it doesn't return, and then delete it manually from the Pugpig CMS.
Restricting Access to a Feed
If you wish to ensure the public cannot access your full fat RSS feed, we support
- Providing an HTTP Header, including Basic Authentication, which your server can use to validate against
- Locking down by IP address - check here for the details of our CMS IP Ranges.
If your RSS feed is not valid, the system will not be able to process it, and updates from the feed will not continue. The Syndication area of the CMS will highlight if any feeds are invalid, and when they broke. In order to test your feed, we recommend the W3C validation tool at https://validator.w3.org/feed/
The most common problems we see with feeds are:
- The feed is not valid XML, normally because it is not properly XML escaped (usually an & somewhere!)
- Our systems cannot access the feed, normally because it is behind a firewall and our IP address or user agent is blocked.
Curated RSS Feeds
In some cases, you may have RSS feeds that represent the curated ordering of articles on, for example, your home page or section landing pages. We can use these feeds to order a timeline in your app. If we are doing this, we will need a feed per curated timeline. Unlike normal RSS feeds, these feeds will not be ordered by article update date, and they can have a variable length. Using this approach, if an article drops off your feed it will be removed from the timeline.
Edition RSS Feeds
While it is most common to use an RSS feed for articles, you can also use one to create or update your edition metadata. This can either create new editions, or update existing editions. You can read more detail about this feed in the Pugpig RSS Edition & Timeline Feed Specification article.
Feeds for non-article content
We often take RSS feeds from Podcast Providers and Video Providers and map them directly into the relevant content types in the Pugpig CMS. Providers like iTunes and ACast work extremely well for this.
Feed Importing Technology
We use the open source FeedWordPress plugin - our thanks to the creators.
We also have our own FeedWordPress fork where we submit fixes and improvements.