Welcome to Search in Bolt
Table of Contents
A good search is an incredible powerful way for uses to discover content, and is a large part of the Pugpig solution. When content is ingested into the Pugpig Distribution Service, it is automatically added to a search index, powered by Amazon CloudSearch. The title and the body of the article is always indexed, and all categories are available for powerful faceted searches.
What is the search index used for?
The search index is used for many aspects of the app. These are:
- Powering the explicit full text search experience in the app
- Powering dynamic timelines, which return results with a specific category or author
- Powering the "lookup service" - this is important when we deep link into an article. We use the search to find the article and determine in which context it should be shown
- If you wish to have personalised timelines, these are also search driven. The inputs to this search will be the favourite categories of the user
Search Result Format
Note that the system is designed so that the format of a timeline is exactly the same as the format of a search result, which gives us great flexibility.
What is in the index?
The search index will contain all the content from all the editions in your app. It will also, by default, contain all content that has ever been in one of the timelines in your app, even if that content is no longer visible in a timeline. If you wish items to be removed from the index once they drop off a timeline, you can use the Delete Orphans setting
The index knows if an item has been published or not. You live app will only return published content, while preview mode should search everything.
You can also manually remove a single item from the index, for example for legal reasons.
You can explore what is in your search index from the Pugpig Distribution Service, and run test searches.
Customising your search results
You are able to customise the search results in the following ways:
- Exclude certain content types (for example, exclude all adverts, PDF pages or HTML page from the full text results
- Exclude certain timelines or editions from the results.
- Change the relative weightings of the headline and the content
- Provide standalone pieces of text which can "boost" the results
- Mark parts of the HTML pages not to be indexed by configuring CSS selectors.
- Add synonyms specific to your content
In rare cases, you might need to deduplicate search results after the search has happened. This is not a preferred approach, but is sometimes unavoidable. To achieve this, a special category needs to be added to each results that groups items together. For example, if two posts had the value A, only the first would be shown. You can use a comma separated list of values. For example, if one had A,B and another had just A, only the first would be shown.
Searching Protected Content
If you have protected content, you can force users to be logged in before that can use the search but using the Require Search Token setting, which will also control what comes back in the results. It can be applied independently per auth type.
If some content is protected, you can either:
- show a Lock icon on results the user will not have access to. Users will be able to click through to see the paywall. If you have a meter in your app, users maybe able to read the article and use one of their meter views
- hide the items completely from the search results
You can see the search token in the user attributes for a user via the Pugpig Distribution Auth Test forms, including an explanation of what it means. We have two versions (one is legacy and is formatted to be substituted directly into a query string, the user is more semantic)
<category scheme="http://schema.pugpig.com/attribute/collection_access_token" term="eyJzdGF0dXMiOiJhY3RpdmUifQ=="/>
<category scheme="http://schema.pugpig.com/attribute/user_search_token" term="search_token=eyJzdGF0dXMiOiJhY3RpdmUifQ%3D%3D"/>
This value is sent to search queries as a header x-pugpig-user-search-token
Wildcard Access to Editions in Issue Based Auth
If you are sending a list of issues in the verify auth response, the results will be restricted to these. By default, we also all access to:
- all dynamic timelines (by sending d.* in the response)
- all archive timelines (by sending archive-* in the response)
These are controlled by the endpoint setting auto_verify_timeline_ids - you can use these to add access to timelines or editions not mentioned in the verify response.
Search Experience In App
The Bolt Search Experience in the app functions as a standalone tab or page. It can include sorting options, filtering options and different styling options.
Updating the Search Experience
Bolt Search is released as a standalone product, which does not require an app update for your app to get the improvements. We upgrade all apps to the latest approved version. If you're interested, take a look at our search release notes.
Multi-language Stemming
By default, all new clients are placed into our multi-tenanted, optimised index. However, this will used mainly English stemming rules. Get in touch if your Pugpig App is in a different language and the results seem wrong.