Content retention and expiration
Table of Contents
Sometimes, for compliance or rights reasons, you might need to ensure content, either an entire article or the images within an article, is no longer available after a certain amount of time.
You'll need to send us the content's expiration date via your content importer (RSS, JSON, etc) and whether it applies to the images in the article or the article itself.
Once an article or its images expires, our Distribution service will exclude the content from appearing in timelines and appearing in search results.
Saved Timeline and Content Expiry
If you are expiring a lot of content, be aware that this might make the saved article timeline less useful for users. Expired articles will disappear from their timeline, assuming we knew the expiry date at the time they saved it. Consider not using Saved Timelines in this case.
How it works behind the scenes
- The origin CMS sends 2 categories that determine when the article should expire, and when images inside the article should expire.
<category scheme="http://schema.pugpig.com/content_html_expiration" term="2024-05-21T03:59:39+0000"/> (expires 2 s3 files: /content.html and redirect_stub)
<category scheme="http://schema.pugpig.com/html_expiration" term="2024-05-21T03:59:39+0000"/> (expires everything - all images files, /content.html file, redirect_stub file and removes from search, won't render it in timelines.json)
<category scheme="http://schema.pugpig.com/image_expiration" term="2024-05-21T03:59:39+0000"/> (expires images from content.html)
- These are processed by Distribution and do the following:
- Set a header on the S3 objects x-amz-meta-pugpig-expiration which can be seen in a HTTP response header as: x-pugpig-internal-expiration: 1703918600
- The Content Delivery logic (VCL) will look for these headers, and if the date is in the past it will response with a HTTP 410 (Gone) status code
- The HTML page search index will include the expiration_date facet, and the search results will suppress any results with a value in the past
- This will remove the article from dynamic timeline which are search driven
- If an item has an expiry date in the future, you can see it in the edition/timeline JSON feed or the search result feed. If it has already expired, you will not see it in the feed or search results.
- You can CURL the article or image and look at the headers to confirm they are want you expect on an article