Wordpress REST API Importer Documentation
Written by Frank Lockett
Updated at May 27th, 2026
Pugpig WordPress REST API Importer
What this plugin does
The Pugpig WordPress REST API Importer is a WordPress plugin that imports posts from a WordPress rest API post endpoint. It does this by reading the source site's public REST API (the standard /wp-json/wp/v2/posts-style endpoint that every modern WordPress install exposes by default) and turning each remote post into a real, native post on the destination site.
Think of it as a one-way bridge: the source site doesn't need to know it's being imported from, and the destination site treats the imported posts exactly the same as posts that were authored directly in its own admin. They appear in the post list, they're queryable, they show up in feeds and templates, and they can be edited locally if you want.
Crucially, this plugin doesn't do the actual writing to the database itself. It is a thin orchestration layer that sits on top of another Pugpig plugin called the Pugpig JSON Importer. The split of responsibilities is:
- This plugin — knows how to fetch JSON from a remote URL, how to page through a multi-page response, and how to shape each remote post into the format the JSON Importer expects.
-
Pugpig JSON Importer — takes that shaped data and does the actual
wp_insert_post()/wp_update_post()calls, downloads images, creates taxonomy terms, and writes post meta.
The reason for that split matters when you're customising the importer: anything that is about the shape of the data (which fields go where, what extra metadata to attach, which images to pull in) is configured through filters on this plugin. Anything that is about how WordPress stores it (deduplication rules, image fetch policies, taxonomy registration) is the JSON Importer's job.
What you get out of the box
Once the plugin is installed and you've set a source endpoint URL, running an import will create or update posts on the destination site with:
- The post title, slug, content (HTML body), and excerpt.
- The post type and status copied across from the source (so a draft stays a draft, a CPT stays as that CPT).
- All four date fields (published date, GMT, modified date, modified GMT).
- A baseline set of metadata that links each imported post back to its source: remote ID, remote URL, remote GUID, a content hash, and the full original JSON payload preserved in meta for reference.
- Two convenience meta keys (
headlineandarticle-standfirst) that Pugpig theme templates traditionally read from.
What you don't get automatically:
- No categories or tags — the plugin doesn't make any assumptions about which taxonomies exist on the destination site, or how the source site exposes its taxonomy data (it might be IDs, slugs, embedded objects, ACF fields, or a custom REST route).
- No author records — authors are referenced by remote ID only unless you wire up the contributor filter.
- No featured image or inline images — the plugin won't download any media until you tell it to via the media filter.
This is a deliberate design choice. Two sites that look superficially similar can have wildly different REST payloads depending on which plugins, custom post types and meta fields are in play. Rather than guess and get it wrong, this plugin ships with a minimal baseline and exposes filters so you can wire up the rest in a small companion plugin or in your theme's functions.php.
Contents
- Plugin dependencies
- Configuration
- Admin Tools page
- How the importer works (the import pipeline)
- What the importer produces by default
- Filter reference
- Worked examples
- Constants and meta keys
Plugin dependencies
This plugin doesn't try to be self-sufficient. It expects to live in a Pugpig-flavoured WordPress install alongside several other plugins. If any of them are missing, the plugin will refuse to bootstrap and will instead show a red admin notice listing what's missing, so you'll spot the problem the next time you load an admin page.
The required plugins are:
-
pugpig/pugpig.php— Pugpig App. Provides the Pugpig-specific helper functions, likepugpig_get_taxonomy_to_order_by()which determines which taxonomy categories get stored under. -
meta-box/meta-box.php— Meta Box. The framework used to render the Settings page fields. -
mb-settings-page/mb-settings-page.php— MB Settings Page. The Meta Box add-on that lets us register a stand-alone settings page under WordPress's Settings menu. -
pugpig_pugpigjsonimporter/pugpigjsonimporter.php— Pugpig JSON Importer. The plugin that actually performs the inserts. It also provides two helper classes we use for the admin UI:LogHtml(renders the per-line import log on the Tools page) andUiUtils(renders the collapsible box around it).
The dependency check happens at the very top of src/index.php. If it fails, the rest of the plugin code is not loaded at all.
Configuration
The settings live at Settings → Pugpig WordPress REST API Importer Settings in the WordPress admin sidebar. Anyone with the publish_pages capability can edit them — this is a deliberately broader capability than manage_options so that editorial staff (not just full administrators) can change which endpoint is being imported from.
| Field | What it does |
|---|---|
| Post endpoint URL | The full URL to the remote site's posts endpoint, e.g. https://example.com/wp-json/wp/v2/posts. This is the only required setting — without it, the importer has nowhere to fetch from. You can append query parameters here yourself if you need to (for example, ?_embed to get embedded author/featured-media/term objects in each post, which the worked examples below rely on). |
| Posts per request | An optional positive integer. When supplied, the importer appends ?per_page=N to the endpoint URL on every fetch. Useful if the default of 10 isn't right for you — raising it reduces the number of HTTP calls during a large import, lowering it helps avoid memory/timeout issues on slow sites. If you leave this blank, the source site's default applies. |
| Cron interval | How often the scheduled import should run. Options are hourly, twicedaily, daily, or custom. This setting is only consulted when you click Set Cron on the Tools page — it doesn't take effect automatically when changed. |
| Custom cron interval (minutes) | Used only when Cron interval is set to custom. Supply a positive integer number of minutes. The plugin registers a new WP-Cron schedule called every_{N}_minutes dynamically, so any value works (e.g. 15 for every quarter of an hour, 5 for every five minutes). |
The settings are stored as a single option row in the wp_options table under the name pugpig-wordpress-rest-api-importer-settings.
Admin Tools page
Found at Tools → Pugpig WordPress REST API Importer Tool. Unlike the Settings page, this one is restricted to users with manage_options — running an import or scheduling cron is treated as a more sensitive operation than just configuring URLs.
There are three buttons:
-
Import Posts — runs an import immediately, synchronously, while you wait. It fetches the endpoint, processes every page (following
next_page_urlin each response), and renders an inline log showing which URLs were fetched and what happened to each item. This is the right button to use when testing a configuration change or doing a one-off backfill. -
Set Cron — schedules a recurring WP-Cron event called
pugpig_rest_api_importer_cron_eventusing the interval configured in Settings, passing the endpoint URL as the event argument. From that point on, WordPress will run an import on every cron tick. Clicking this again replaces any previously scheduled event, so it's safe to use as an "apply my settings change" button. - Delete Cron — unschedules the cron event entirely. Imports stop happening automatically until you click Set Cron again.
Note that WP-Cron in WordPress is technically "poor man's cron" — it only runs when someone visits the site. On a low-traffic site you may want to wire up a real OS-level cron job hitting wp-cron.php instead. That setup is outside the scope of this plugin.
How the importer works (the import pipeline)
Every import — whether triggered manually from the Tools page or fired by WP-Cron — goes through the same six-step pipeline. Understanding the order matters because it tells you which filter to hook for which job.
Step 1: Fetch the page
The importer calls wp_remote_get() against the current URL (initially the endpoint configured in Settings, then on subsequent iterations whatever next_page_url was returned). It pulls back the response body, JSON-decodes it into an associative array, and passes that array to the next step. If the request fails or the body isn't valid JSON, an error is logged and the loop terminates — no partial state is left behind, but no retry happens either.
Step 2: Extract the list of posts
The plugin tries three locations, in order, to find the array of posts inside the decoded response:
- A top-level
datakey (i.e.{ "data": […] }). - A top-level
postskey. - The root array itself, on the assumption that the response is the list of posts.
The first one that contains a list of arrays wins. This flexibility means the importer can handle both bare REST responses ([ {…}, {…} ], which is what WordPress core returns) and wrapped envelopes ({ "data": […], "next_page_url": "…" }, which some custom endpoints use).
Step 3: Paginate
After processing the current page, the importer looks for a next_page_url key at the top level of the response. If present and non-empty, it loops back to Step 1 with that URL. If absent, the import ends.
Important caveat: this is not standard WordPress REST API pagination. WordPress core paginates via the Link: rel="next" HTTP header and the ?page=N query parameter; this plugin ignores both of those and instead expects the source endpoint to embed a next_page_url field in the JSON body itself. If you're importing from a stock WordPress site, you have a few options:
- Set Posts per request high enough that everything fits on one page (works for small sites).
- Build a tiny custom REST route on the source side that wraps
wp/v2/postsand adds anext_page_urlto the body. - Add a
pre_http_requestor similar WordPress filter that munges the response on the destination side.
Step 4: Shape each post
This is where most of the interesting work happens, and where almost all of the customisation filters live. For each post in the extracted list, the function buildPostImportItem() transforms the remote JSON object into the structure that the JSON Importer expects.
The filters fire in this order during the shaping step:
-
pugpig_rest_api_importer_post— lets you rewrite the entire raw remote post first. Every subsequent step sees the rewritten version. -
pugpig_rest_api_importer_post_content— rewrite the HTML body that will be stored aspost_content. -
pugpig_rest_api_importer_meta_fields— choose which top-level fields of the remote post to copy verbatim intopost_meta. -
pugpig_rest_api_importer_additional_meta— add any other meta values you want. -
pugpig_rest_api_importer_meta— final review of the complete meta array before it's attached. -
pugpig_rest_api_importer_categories_source— provide the list of categories to turn into taxonomy terms. -
pugpig_rest_api_importer_contributors_source— provide the list of contributors to turn into terms in thecontributortaxonomy. -
pugpig_rest_api_importer_term— modify each assembled term (categories and contributors) one by one. -
pugpig_rest_api_importer_media_items— provide the list of attachments (featured images, inline images) to download.
The output of this step is an "import item" array — see What the importer produces by default for the exact shape.
Step 5: Hand off to the JSON Importer
Once all the posts on the current page have been shaped, the entire batch is passed in a single call to \Pugpig\JsonImporter\Importer::createOrUpdateItems($items, $logger). The JSON Importer then:
- For each item, looks up whether a post already exists locally with the same
uid_field_namemeta key (which we set to the remote post ID). - If a match is found, it updates the existing post; otherwise it inserts a new one.
- Downloads any attachments listed under
items, respecting theirfetch_policy(e.g. don't re-download if we already have a local copy). - Creates any taxonomy terms listed under
taxonomy_termsand attaches them to the post. - Writes all the
post_metaentries.
It also writes per-item log lines back to the same LogHtml logger that this plugin instantiated, so the admin UI shows a unified log covering both the fetch/shape phase and the write phase.
Step 6: Repeat or finish
If the response from Step 1 contained a next_page_url, the loop restarts with that URL. Otherwise, the import is complete.
What the importer produces by default
For each remote post, the shape step produces an array roughly like this (simplified for readability):
[
'uid_field_name' => 'pugpig_rest_api_importer_remote_post_id',
'post_type' => $post['type'] ?? 'post',
'post_status' => $post['status'] ?? 'draft',
'post_title' => $cleaned_title,
'post_name' => $post['slug'] ?: sanitize_title($cleaned_title),
'post_content' => apply_filters('pugpig_rest_api_importer_post_content', $rendered_html, $post),
'post_excerpt' => $cleaned_excerpt,
'post_date' => 'Y-m-d H:i:s' formatted from $post['date'],
'post_date_gmt' => formatted from $post['date_gmt'],
'post_modified' => formatted from $post['modified'],
'post_modified_gmt' => formatted from $post['modified_gmt'],
'post_meta' => [ /* see table below */ ],
'taxonomy_terms' => [ /* from the categories + contributors filters */ ],
'items' => [ /* media items, from the media_items filter */ ],
]
A few notes on the cleaning that happens here:
-
Title and excerpt have their HTML tags stripped and their HTML entities decoded. So
"It’s <em>here</em>!"becomes"It’s here!". This matches what most templates expect. -
Content is not cleaned — the rendered HTML is kept intact. The only modification is whatever your
pugpig_rest_api_importer_post_contentfilter does (if any). -
Slug is taken from the remote
slugfield if present; if not, it's generated from the cleaned title viasanitize_title(). -
Dates are parsed via
DateTimeImmutableand reformatted to MySQL'sY-m-d H:i:s. Invalid or missing dates are skipped (the local post will be assigned WordPress's current time when inserted).
Default post_meta contents
Before any meta filters run, the following keys are seeded automatically:
| Meta key | Value |
|---|---|
pugpig_rest_api_importer_remote_post_id |
The remote post ID, as a string. This is also declared as the uid_field_name on the import item, which is how the JSON Importer matches re-imports back to existing local posts. |
pugpig_rest_api_importer_remote_post_link |
The link field from the remote post — the original permalink. |
syndication_permalink |
Identical to the above. Stored under this conventional key so that feed readers and syndication-aware themes that already understand syndication_permalink work without further wiring. |
pugpig_rest_api_importer_remote_post_guid |
The guid.rendered field from the remote post. |
pugpig_rest_api_importer_remote_post_source |
The literal string wordpress_rest_api. Useful when multiple importers feed the same destination site and you want to identify where a post came from. |
pugpig_rest_api_importer_remote_post_raw |
The complete remote post payload, JSON-encoded and wp_slashed for safe storage. You can decode this at runtime to access fields that weren't promoted into proper meta keys. |
pugpig_rest_api_importer_hash |
An MD5 hash of the remote post, calculated after removing the volatile _links key and recursively sorting all other keys alphabetically. Designed to be a stable signature of the meaningful content, so downstream code can answer "has this post actually changed since last import?". |
headline |
The cleaned title. Pugpig themes traditionally read from this. |
article-standfirst |
The cleaned excerpt. Pugpig themes traditionally read from this for the standfirst / lede. |
post_published_date |
The Y-m-d portion of the published date, as a separate searchable key. |
post_published_time |
The H:i:s portion of the published date. |
After this baseline is set, the plugin also:
- Copies any top-level remote fields named in
pugpig_rest_api_importer_meta_fields(default list:template,format,comment_status,ping_status,sticky,featured_media,author) intopost_meta. - Merges in everything under the remote post's
metakey (WordPress's REST API places registered post meta here). - Runs
pugpig_rest_api_importer_additional_metafor any free-form additions. - Runs
pugpig_rest_api_importer_metafor a final review and cleanup.
Filter reference
This section is the canonical reference for every filter the importer exposes. Each entry includes:
- A description of what the filter is for, and when you'd typically use it.
- A parameters table listing every argument passed to your callback, its PHP type, and what's actually in it.
- A return type with notes on how unexpected values are handled.
- A small example showing how to register a callback.
pugpig_rest_api_importer_post
Purpose: rewrite the raw remote post object before any of the rest of the importer sees it.
This filter fires as the very first thing inside the shape step (immediately after the importer has checked that the remote object actually has an id field). Whatever your callback returns replaces the original $post variable for the entire rest of that item's processing.
That means: every subsequent filter (_post_content, _meta_fields, _additional_meta, _meta, _categories_source, _contributors_source, _term, _media_items) will see the rewritten version, as will the internal extraction of title, excerpt, content, dates and seeded meta.
This is the right filter to use when:
- The remote payload is wrapped in an envelope (e.g.
{ "post": {…}, "extra": {…} }) and you want to unwrap it once. - You need to copy a deeply nested field up to a shallower location so multiple downstream filters can read it (e.g. lifting
_embedded.author[0].nameto$post['author_name']). - You're providing default values for fields the remote site occasionally omits.
- You want to do one source-site-specific normalisation pass in a single place, rather than handling the same quirk in three different downstream filters.
Parameters
| Name | Type | Description |
|---|---|---|
$post |
array<string, mixed> |
The raw post object, exactly as decoded from the source endpoint's JSON. Keys are whatever the source site emits — for a stock WordPress endpoint that means id, date, date_gmt, modified, modified_gmt, slug, status, type, link, title, content, excerpt, author, featured_media, categories, tags, meta, _links, and (with ?_embed) _embedded. |
Return value
Type: array<string, mixed>
Return the (possibly modified) post array. If your callback returns anything that isn't an array — null, false, a string, etc. — the importer ignores the return value and keeps the original $post untouched.
Example
add_filter('pugpig_rest_api_importer_post', function (array $post): array {
// Some endpoints wrap the post in { "data": { ...the post... } }.
if (isset($post['data']) && is_array($post['data'])) {
return $post['data'];
}
return $post;
});
pugpig_rest_api_importer_post_content
Purpose: rewrite the rendered HTML body of the post before it is stored as post_content.
By default, the importer takes $post['content']['rendered'] verbatim and stores it as the post content. That's usually fine, but the rendered HTML can contain things that aren't appropriate for the destination site — shortcodes that don't exist locally, image URLs that need to be rewritten, Gutenberg block comments that you want to strip, or markup that needs to be wrapped for Pugpig's article templates.
This filter is the place to do all of that. You're given the rendered HTML and the raw remote post (in case you want to vary your behaviour by post type, category, author, etc.), and you return the rewritten HTML.
Parameters
| Name | Type | Description |
|---|---|---|
$content |
string |
The rendered HTML body of the post, taken from $post['content']['rendered']. May contain shortcodes, Gutenberg block comments (<!-- wp:… -->), inline images with absolute URLs pointing at the source site, and so on. |
$post |
array<string, mixed> |
The complete remote post object (after any pugpig_rest_api_importer_post filter has run). Provided so you can make context-aware decisions, e.g. "only run my shortcode stripper on posts of type news". |
Return value
Type: string
Return the rewritten HTML. If your callback returns a non-string value (e.g. null from a preg_replace error, or accidentally an array), the importer coerces it to an empty string before storing — so a misbehaving callback can wipe the post body entirely. Always make sure you're returning a string.
Example
add_filter('pugpig_rest_api_importer_post_content', function (string $content, array $post): string {
// Remove Gutenberg block markers but leave the actual HTML inside them.
$stripped = preg_replace('/<!--\s*\/?wp:[^>]*-->/', '', $content);
return is_string($stripped) ? $stripped : $content;
}, 10, 2);
pugpig_rest_api_importer_meta_fields
Purpose: choose which top-level fields of the remote post should be copied verbatim into the destination post's post_meta.
The remote REST response includes a number of fields that are about the post (the author ID, the featured-media ID, the comment status, the format, and so on) but aren't part of the post's content or title. These don't map onto WordPress columns — they're meta values. This filter lets you control exactly which of those fields are promoted.
The default list is sensible for a typical WordPress-to-Pugpig sync: template, format, comment_status, ping_status, sticky, featured_media, author. You can extend it, replace it, or drop fields you don't want.
The filter supports two array styles, and you can mix them in the same return value:
-
Numeric list —
['featured_media', 'sticky']. The string is used as both the remote field name and the local meta key. -
Associative map —
['remote_author_id' => 'author']. The value is the remote field name; the key is the meta key it should be stored under locally. This is how you rename fields on the way in.
Values pulled from the remote post are passed through an internal normalizeMetaValue() helper, which:
- Converts booleans to
'1'or'0'(because WordPress meta storage doesn't natively support booleans). - Unwraps single-key
['rendered' => '…']structures to just the string. (WordPress REST API uses this shape for "rich" fields like titles.) - Passes scalars and
nullthrough unchanged. - JSON-encodes anything else (arrays, objects), so complex values still round-trip safely.
Parameters
| Name | Type | Description |
|---|---|---|
$default_fields |
array<int|string, string> |
The default list of remote fields to copy. As shipped: ['template', 'format', 'comment_status', 'ping_status', 'sticky', 'featured_media', 'author']. |
$meta |
array<string, mixed> |
The meta array as assembled so far (i.e. the baseline meta keys listed in the previous section). Provided for context, in case you want to base your decision on what's already there. |
$post |
array<string, mixed> |
The complete remote post object. Useful if you want to make per-post decisions about which fields to copy. |
Return value
Type: array<int|string, string>
Return the list of fields to copy, using either numeric or associative entries as described above. If you return something that isn't an array, the filter is treated as a no-op and the default fields are used. Entries whose value isn't a string are silently skipped.
Example
add_filter('pugpig_rest_api_importer_meta_fields', function (array $defaults, array $meta, array $post): array {
// Don't store comment status or pingback status; rename author -> remote_author_id.
$kept = array_values(array_diff($defaults, ['comment_status', 'ping_status', 'author']));
$kept['remote_author_id'] = 'author';
// Also pull `parent` across (for hierarchical CPTs).
$kept[] = 'parent';
return $kept;
}, 10, 3);
pugpig_rest_api_importer_additional_meta
Purpose: add arbitrary extra meta entries to the post, including computed values that don't correspond 1:1 with any remote field.
Where pugpig_rest_api_importer_meta_fields is for "copy this field straight across", this filter is for "compute a new value and store it as meta". Use it when:
- The value needs to be derived — concatenating first/last names, parsing a URL, formatting a date differently.
- The data lives in a non-top-level part of the remote post (e.g. inside
_embedded, or inside an ACF payload nested underacf). - You want to inject a value that has no source-side equivalent at all — like a "imported at" timestamp or a source-site identifier.
Keys returned here are merged into the meta array. Values pass through the same normalizeMetaValue() helper described above, so you can return booleans, scalars, arrays, etc., and they'll be normalised before storage.
Parameters
| Name | Type | Description |
|---|---|---|
$extra |
array<string, mixed> |
The accumulated extras so far. Starts as an empty array. If multiple callbacks are registered against this filter, later ones receive the merged result of the earlier ones. |
$meta |
array<string, mixed> |
The meta array as assembled so far, including everything from pugpig_rest_api_importer_meta_fields and the remote post's own meta key. Useful when your additions need to reference existing values. |
$post |
array<string, mixed> |
The complete remote post object. |
Return value
Type: array<string, mixed>
Return the (modified) extras array. If you return something that isn't an array, the filter is treated as a no-op and no extras are added. Non-string keys are coerced to strings before merging.
Example
add_filter('pugpig_rest_api_importer_additional_meta', function (array $extra, array $meta, array $post): array {
$extra['source_site'] = parse_url($post['link'] ?? '', PHP_URL_HOST) ?: '';
$extra['imported_at'] = current_time('mysql');
// Pull a value out of an ACF payload, if present.
if (isset($post['acf']['subtitle'])) {
$extra['subtitle'] = $post['acf']['subtitle'];
}
return $extra;
}, 10, 3);
pugpig_rest_api_importer_meta
Purpose: final review of the entire assembled meta array before it's attached to the import item.
This is the last meta-related filter in the pipeline. By the time it fires, the meta array contains everything: the baseline keys, the fields promoted by _meta_fields, the remote post's own meta values, and the additions from _additional_meta. Whatever you return is the meta that gets stored.
Use this filter for:
- Cross-field invariants — "if
headlineended up empty, fall back to the slug". - Removing keys you don't want persisted — for example, dropping
pugpig_rest_api_importer_remote_post_rawif you don't need the full original JSON kept around. - Last-mile sanitisation that's easier to do in one place once everything is assembled.
Because this filter sees the final shape of the meta array, it's also a good place to do cheap validation: throw a log entry, set a flag, or null out values that obviously don't make sense.
Parameters
| Name | Type | Description |
|---|---|---|
$meta |
array<string, mixed> |
The fully-assembled meta array, with all earlier filters already applied. |
$post |
array<string, mixed> |
The complete remote post object, for context. |
Return value
Type: array<string, mixed>
Return the final meta array. If you return something that isn't an array, the filter is treated as a no-op and the input meta is kept.
Example
add_filter('pugpig_rest_api_importer_meta', function (array $meta, array $post): array {
if (empty($meta['headline'])) {
$meta['headline'] = $meta['article-standfirst'] ?? 'Untitled';
}
// Don't bloat the database with the full raw payload.
unset($meta['pugpig_rest_api_importer_remote_post_raw']);
return $meta;
}, 10, 2);
pugpig_rest_api_importer_categories_source
Purpose: provide the list of categories that should be created as taxonomy terms and attached to the imported post.
The plugin doesn't make any assumption about how the source site exposes categories. They might be a flat array of IDs at $post['categories'] (which you'd then need to look up against another endpoint), they might be embedded objects under _embedded['wp:term'] (available when you request ?_embed), they might be in a custom field, or they might not exist at all.
Because of that variability, the default list is empty — nothing happens until you register a callback. Your callback's job is to return a list of category-shaped records, each of which the importer will convert into a taxonomy term.
The taxonomy used is determined dynamically by calling pugpig_get_taxonomy_to_order_by() (a helper from the Pugpig App plugin). On a default Pugpig setup this resolves to category; on sites that have re-pointed Pugpig at a custom taxonomy, it'll resolve to whatever's configured there.
Expected return shape
Return a list (numerically indexed array) where each entry is an associative array with these keys:
| Key | Type | Description |
|---|---|---|
slug |
string |
The term slug. Required — if omitted or empty, the entry is skipped. If you only have a name, derive a slug with sanitize_title(). |
name |
string |
The human-readable term name. If omitted, the slug is humanised (hyphens to spaces, words capitalised) to derive a name. |
meta |
array<string, mixed> |
Optional. Term meta to set on the created term, as key→value. Values are normalised the same way as post meta. |
Parameters
| Name | Type | Description |
|---|---|---|
$categories |
array<int, array<string, mixed>> |
The categories accumulated so far (starts as an empty array; if multiple callbacks register against this filter, each one sees the previous ones' output). |
$post |
array<string, mixed> |
The complete remote post object, which is where you'll typically read the category data from. |
Return value
Type: array<int, array<string, mixed>>
Return a list of category records as described above. Entries that aren't arrays, or that lack a usable slug, are silently skipped. Duplicate slugs within the same post are deduplicated automatically.
Example
// Assumes the source endpoint is called with ?_embed so categories are inlined.
add_filter('pugpig_rest_api_importer_categories_source', function (array $sources, array $post): array {
$term_groups = $post['_embedded']['wp:term'] ?? [];
foreach ($term_groups as $group) {
foreach ($group as $term) {
if (($term['taxonomy'] ?? '') !== 'category') {
continue;
}
$sources[] = [
'name' => $term['name'] ?? '',
'slug' => $term['slug'] ?? '',
];
}
}
return $sources;
}, 10, 2);
pugpig_rest_api_importer_contributors_source
Purpose: provide the list of contributors (authors, co-authors, photographers, etc.) that should be created as terms in the contributor taxonomy.
This filter mirrors pugpig_rest_api_importer_categories_source, but produces terms in the hard-coded contributor taxonomy (which Pugpig themes know about and use to render bylines, contributor pages, etc.). Like categories, the default is empty — you need to register a callback for any contributors to be imported.
The contributor record shape is the same as for categories, with one extra option: each contributor may also carry an items array, which the JSON Importer treats as nested import items attached to the term. This is the mechanism for things like contributor avatars: you supply the avatar as a media item under items, and it ends up linked to the contributor term in the database.
Expected return shape
| Key | Type | Description |
|---|---|---|
name |
string |
The contributor's display name. Required — unlike categories, an empty name causes the entry to be skipped. |
slug |
string |
The term slug. If omitted, it's derived from the name via sanitize_title(). |
meta |
array<string, mixed> |
Optional. Term meta — this is where you'd put things like role, biography, social handles, external profile URL, etc. |
items |
array<int, array<string, mixed>> |
Optional. Nested import items attached to the term — typically media items built with ImporterUtils::getMediaImportItem() for the contributor's avatar/headshot. |
Parameters
| Name | Type | Description |
|---|---|---|
$contributors |
array<int, array<string, mixed>> |
The contributors accumulated so far. Starts empty. |
$post |
array<string, mixed> |
The complete remote post object, which is where you'll read contributor data from. |
Return value
Type: array<int, array<string, mixed>>
Return the list of contributor records. Entries without a usable name are silently skipped. Duplicate slugs are deduplicated.
Example
add_filter('pugpig_rest_api_importer_contributors_source', function (array $sources, array $post): array {
// Assumes ?_embed and the standard WordPress author embed shape.
$authors = $post['_embedded']['author'] ?? [];
foreach ($authors as $author) {
if (empty($author['name'])) {
continue;
}
$sources[] = [
'name' => (string) $author['name'],
'slug' => (string) ($author['slug'] ?? ''),
'meta' => [
'description' => (string) ($author['description'] ?? ''),
'profile_url' => (string) ($author['link'] ?? ''),
],
];
}
return $sources;
}, 10, 2);
pugpig_rest_api_importer_term
Purpose: modify each assembled taxonomy term (both categories and contributors) just before it's added to the import item.
This filter fires once per term, immediately after the categories or contributors callback has produced it and before it's appended to the post's taxonomy_terms array. It's the right place for cross-cutting modifications that apply to all terms regardless of which source filter produced them — for example, "stamp every term with a meta key recording which source site it came from", or "normalise term names to title case".
You can tell categories and contributors apart by checking the term's taxonomy key.
Parameters
| Name | Type | Description |
|---|---|---|
$term |
array<string, mixed> |
The fully-assembled term, ready to be appended. Keys include taxonomy, slug, name, optional term_meta, and (for contributors) optional items. |
$post |
array<string, mixed> |
The complete remote post object that this term is being created for. |
Return value
Type: array<string, mixed>
Return the (modified) term. Whatever you return is appended as-is to the post's taxonomy term list, so if you return something nonsensical, the JSON Importer will likely error out on it.
Example
add_filter('pugpig_rest_api_importer_term', function (array $term, array $post): array {
// Stamp every term with the source post's permalink, so we can later
// tell which terms were created during which import.
$term['term_meta'] = $term['term_meta'] ?? [];
$term['term_meta']['source_post_link'] = $post['link'] ?? '';
return $term;
}, 10, 2);
pugpig_rest_api_importer_media_items
Purpose: provide the list of attachments (featured images, inline images, downloadable assets) that should be downloaded and attached to the imported post.
This is the only way to attach any media. By default, no images come across at all — not the featured image, not inline body images. The plugin's hands-off approach is deliberate because there are several legitimate strategies (download everything locally, keep using remote URLs, only download images on demand) and the right choice depends on the destination site's storage and bandwidth constraints.
When you register a callback for this filter, you return a list of "media import items" — structured arrays that tell the JSON Importer what to download, how to identify the resulting attachment, and where (if anywhere) to link the local attachment ID back into the parent post. Don't build these by hand; use the helper:
$item = ImporterUtils::getMediaImportItem(
$id, // string: remote attachment ID (used to dedupe and as filename prefix)
$title, // string: attachment title
$src_url, // string: URL to download from
$parent_field_for_id, // string|null: parent post meta key to write the local attachment ID to
$fetch_policy, // string|null: 'keep_existing' / 'replace' / 'force_download'
$contents, // string|null: raw bytes if you want to inline-encode instead of fetching by URL
$caption, // string: caption to store
$credit, // string: credit line
$copyright_caption, // string: copyright line
$mime_base_type, // string: e.g. 'image', 'video' — used when inferring mimetype
$mime_type // string|null: explicit mimetype; if null, inferred from $src_url's extension
);
A few key parameters explained in more depth:
-
$parent_field_for_id: when set, the JSON Importer writes the resulting local attachment ID back into the parent post under this meta key. The canonical use is'_thumbnail_id', which is how WordPress stores featured images — passing this turns your media item into the post's featured image automatically. -
$fetch_policy: controls re-download behaviour.'keep_existing'means "if we already have this attachment locally, don't fetch it again", which is what you want most of the time.'replace'forces a re-download. -
$contents: rarely used — for the case where you have the raw bytes of an image in PHP already (e.g. you generated it) and want to import it without an HTTP fetch. The URL becomes adata:URI internally.
Parameters
| Name | Type | Description |
|---|---|---|
$items |
array<int, array<string, mixed>> |
The media items accumulated so far. Starts empty; multiple callbacks can each contribute items. |
$post |
array<string, mixed> |
The complete remote post object, which is where you'll find the featured-media reference and any inline image references. |
Return value
Type: array<int, array<string, mixed>>
Return the list of media items. Entries that aren't arrays are silently dropped. The returned list is also re-indexed numerically before being attached, so you don't need to worry about preserving keys.
Example
add_filter('pugpig_rest_api_importer_media_items', function (array $items, array $post): array {
// Attach the featured image, assuming ?_embed.
$featured = $post['_embedded']['wp:featuredmedia'][0] ?? null;
if (!is_array($featured) || empty($featured['source_url'])) {
return $items;
}
$items[] = \Pugpig\RestAPIImporter\ImporterUtils::getMediaImportItem(
(string) ($featured['id'] ?? ''),
(string) ($featured['title']['rendered'] ?? 'Featured image'),
(string) $featured['source_url'],
'_thumbnail_id', // makes this the local post's featured image
'keep_existing',
null,
(string) ($featured['caption']['rendered'] ?? ''),
(string) ($featured['credit'] ?? ''),
''
);
return $items;
}, 10, 2);
Worked examples
The snippets below assume they live in a small companion plugin, an mu-plugin, or your active theme's functions.php. They're presented as standalone recipes — you can combine them freely.
1. Strip Gutenberg block markers from imported content
The block editor surrounds every block with HTML comments like <!-- wp:paragraph --> … <!-- /wp:paragraph -->. These don't render visibly but they bloat the HTML and can confuse anything downstream that parses the content. This callback removes them while leaving the actual HTML intact.
add_filter('pugpig_rest_api_importer_post_content', function (string $content, array $post): string {
$stripped = preg_replace('/<!--\s*\/?wp:[^>]*-->/', '', $content);
return is_string($stripped) ? $stripped : $content;
}, 10, 2);
2. Map embedded categories into the destination taxonomy
Call your source endpoint with ?_embed to get the category objects inlined under _embedded['wp:term'], and then pluck them out:
add_filter('pugpig_rest_api_importer_categories_source', function (array $sources, array $post): array {
foreach ($post['_embedded']['wp:term'] ?? [] as $group) {
foreach ($group as $term) {
if (($term['taxonomy'] ?? '') !== 'category') {
continue;
}
$sources[] = [
'name' => (string) ($term['name'] ?? ''),
'slug' => (string) ($term['slug'] ?? ''),
];
}
}
return $sources;
}, 10, 2);
3. Attach the featured image
add_filter('pugpig_rest_api_importer_media_items', function (array $items, array $post): array {
$featured = $post['_embedded']['wp:featuredmedia'][0] ?? null;
if (!is_array($featured) || empty($featured['source_url'])) {
return $items;
}
$items[] = \Pugpig\RestAPIImporter\ImporterUtils::getMediaImportItem(
(string) ($featured['id'] ?? ''),
(string) ($featured['title']['rendered'] ?? 'Featured image'),
(string) $featured['source_url'],
'_thumbnail_id',
'keep_existing',
null,
(string) ($featured['caption']['rendered'] ?? ''),
'',
''
);
return $items;
}, 10, 2);
4. Stamp custom meta on every imported post
add_filter('pugpig_rest_api_importer_additional_meta', function (array $extra, array $meta, array $post): array {
$extra['source_site'] = parse_url($post['link'] ?? '', PHP_URL_HOST) ?: '';
$extra['imported_at'] = current_time('mysql');
return $extra;
}, 10, 3);
5. Customise the list of fields promoted into post meta
add_filter('pugpig_rest_api_importer_meta_fields', function (array $defaults, array $meta, array $post): array {
// Drop two of the defaults, rename `author` -> `remote_author_id`.
$kept = array_values(array_diff($defaults, ['comment_status', 'ping_status', 'author']));
$kept['remote_author_id'] = 'author';
return $kept;
}, 10, 3);
6. Final cleanup of meta before write
add_filter('pugpig_rest_api_importer_meta', function (array $meta, array $post): array {
if (empty($meta['headline'])) {
$meta['headline'] = $meta['article-standfirst'] ?? 'Untitled';
}
// We don't need the full raw payload in the database.
unset($meta['pugpig_rest_api_importer_remote_post_raw']);
return $meta;
}, 10, 2);
Constants and meta keys
All of the meta keys the plugin writes are defined as class constants in src/Constants.php, so you can reference them from your own code without retyping the strings.
| Constant | Value | Used for |
|---|---|---|
META_KEY_HASH |
pugpig_rest_api_importer_hash |
MD5 of the remote post body (with _links removed and keys sorted). A stable signature you can use to detect "has anything actually changed?". |
META_KEY_REMOTE_POST_ID |
pugpig_rest_api_importer_remote_post_id |
Remote post ID. Also used as uid_field_name on every import item, which is how the JSON Importer matches re-imports back to existing local posts. |
META_KEY_REMOTE_POST_GUID |
pugpig_rest_api_importer_remote_post_guid |
The guid.rendered field from the remote post. |
META_KEY_REMOTE_POST_LINK |
pugpig_rest_api_importer_remote_post_link |
The original permalink (link field) from the remote post. |
META_KEY_REMOTE_POST_SOURCE |
pugpig_rest_api_importer_remote_post_source |
Always wordpress_rest_api. Useful when multiple importers feed the same destination site. |
META_KEY_REMOTE_POST_RAW |
pugpig_rest_api_importer_remote_post_raw |
The full original remote post payload, JSON-encoded and slashed. Decode at runtime to access fields you didn't promote into proper meta. |
META_KEY_MEDIA_ID |
pugpig_rest_api_importer_media_id |
Remote attachment ID. Used as uid_field_name for media items so the JSON Importer can dedupe attachments across imports. |
META_KEY_MEDIA_SRC_URL |
pugpig_rest_api_importer_src_url |
The original source URL of the attachment, stored on the local attachment record. |
META_KEY_MEDIA_ALT_TEXT |
_wp_attachment_image_alt |
WordPress core's alt-text meta key. Used as-is so the alt text is read by standard WordPress functions. |
META_KEY_MEDIA_CAPTION_ORIGINAL |
pugpig_rest_api_importer_caption |
Caption preserved exactly as supplied (separate from post_excerpt, which is what WordPress normally uses for attachment captions). |
META_KEY_MEDIA_CREDIT_ORIGINAL |
pugpig_rest_api_importer_credit |
Credit line. |
META_KEY_MEDIA_COPYRIGHT_ORIGINAL |
pugpig_rest_api_importer_copyright |
Copyright line. |
META_KEY_CRON_EVENT |
pugpig_rest_api_importer_cron_event |
The WP-Cron hook name used by the Tools page's Set Cron button. |
Quick reference: filter signatures
For copy-paste convenience, here are all the filter signatures in one place:
apply_filters('pugpig_rest_api_importer_post', array $post); // returns array<string,mixed>
apply_filters('pugpig_rest_api_importer_post_content', string $content, array $post); // returns string
apply_filters('pugpig_rest_api_importer_meta_fields', array $default_fields, array $meta, array $post); // returns array<int|string,string>
apply_filters('pugpig_rest_api_importer_additional_meta', array $extra, array $meta, array $post); // returns array<string,mixed>
apply_filters('pugpig_rest_api_importer_meta', array $meta, array $post); // returns array<string,mixed>
apply_filters('pugpig_rest_api_importer_categories_source', array $categories, array $post); // returns array<int,array<string,mixed>>
apply_filters('pugpig_rest_api_importer_contributors_source', array $contributors, array $post); // returns array<int,array<string,mixed>>
apply_filters('pugpig_rest_api_importer_term', array $term, array $post); // returns array<string,mixed>
apply_filters('pugpig_rest_api_importer_media_items', array $items, array $post); // returns array<int,array<string,mixed>>


