• Home
    • Pugpig Bolt
    • Content and Workflows
    • JSON

    Wordpress REST API Importer Documentation

    Written by Frank Lockett

    Updated at May 27th, 2026

    • Pugpig Bolt

      • Pugpig Site

        • Pugpig Archive

          • Working with Pugpig

            • Pugpig Consulting

              Pugpig WordPress REST API Importer

              What this plugin does

              The Pugpig WordPress REST API Importer is a WordPress plugin that imports posts from a WordPress rest API post endpoint. It does this by reading the source site's public REST API (the standard /wp-json/wp/v2/posts-style endpoint that every modern WordPress install exposes by default) and turning each remote post into a real, native post on the destination site.

              Think of it as a one-way bridge: the source site doesn't need to know it's being imported from, and the destination site treats the imported posts exactly the same as posts that were authored directly in its own admin. They appear in the post list, they're queryable, they show up in feeds and templates, and they can be edited locally if you want.

              Crucially, this plugin doesn't do the actual writing to the database itself. It is a thin orchestration layer that sits on top of another Pugpig plugin called the Pugpig JSON Importer. The split of responsibilities is:

              • This plugin — knows how to fetch JSON from a remote URL, how to page through a multi-page response, and how to shape each remote post into the format the JSON Importer expects.
              • Pugpig JSON Importer — takes that shaped data and does the actual wp_insert_post() / wp_update_post() calls, downloads images, creates taxonomy terms, and writes post meta.

              The reason for that split matters when you're customising the importer: anything that is about the shape of the data (which fields go where, what extra metadata to attach, which images to pull in) is configured through filters on this plugin. Anything that is about how WordPress stores it (deduplication rules, image fetch policies, taxonomy registration) is the JSON Importer's job.

              What you get out of the box

              Once the plugin is installed and you've set a source endpoint URL, running an import will create or update posts on the destination site with:

              • The post title, slug, content (HTML body), and excerpt.
              • The post type and status copied across from the source (so a draft stays a draft, a CPT stays as that CPT).
              • All four date fields (published date, GMT, modified date, modified GMT).
              • A baseline set of metadata that links each imported post back to its source: remote ID, remote URL, remote GUID, a content hash, and the full original JSON payload preserved in meta for reference.
              • Two convenience meta keys (headline and article-standfirst) that Pugpig theme templates traditionally read from.

              What you don't get automatically:

              • No categories or tags — the plugin doesn't make any assumptions about which taxonomies exist on the destination site, or how the source site exposes its taxonomy data (it might be IDs, slugs, embedded objects, ACF fields, or a custom REST route).
              • No author records — authors are referenced by remote ID only unless you wire up the contributor filter.
              • No featured image or inline images — the plugin won't download any media until you tell it to via the media filter.

              This is a deliberate design choice. Two sites that look superficially similar can have wildly different REST payloads depending on which plugins, custom post types and meta fields are in play. Rather than guess and get it wrong, this plugin ships with a minimal baseline and exposes filters so you can wire up the rest in a small companion plugin or in your theme's functions.php.


              Contents

              1. Plugin dependencies
              2. Configuration
              3. Admin Tools page
              4. How the importer works (the import pipeline)
              5. What the importer produces by default
              6. Filter reference
              7. Worked examples
              8. Constants and meta keys

              Plugin dependencies

              This plugin doesn't try to be self-sufficient. It expects to live in a Pugpig-flavoured WordPress install alongside several other plugins. If any of them are missing, the plugin will refuse to bootstrap and will instead show a red admin notice listing what's missing, so you'll spot the problem the next time you load an admin page.

              The required plugins are:

              • pugpig/pugpig.php — Pugpig App. Provides the Pugpig-specific helper functions, like pugpig_get_taxonomy_to_order_by() which determines which taxonomy categories get stored under.
              • meta-box/meta-box.php — Meta Box. The framework used to render the Settings page fields.
              • mb-settings-page/mb-settings-page.php — MB Settings Page. The Meta Box add-on that lets us register a stand-alone settings page under WordPress's Settings menu.
              • pugpig_pugpigjsonimporter/pugpigjsonimporter.php — Pugpig JSON Importer. The plugin that actually performs the inserts. It also provides two helper classes we use for the admin UI: LogHtml (renders the per-line import log on the Tools page) and UiUtils (renders the collapsible box around it).

              The dependency check happens at the very top of src/index.php. If it fails, the rest of the plugin code is not loaded at all.


              Configuration

              The settings live at Settings → Pugpig WordPress REST API Importer Settings in the WordPress admin sidebar. Anyone with the publish_pages capability can edit them — this is a deliberately broader capability than manage_options so that editorial staff (not just full administrators) can change which endpoint is being imported from.

              Field What it does
              Post endpoint URL The full URL to the remote site's posts endpoint, e.g. https://example.com/wp-json/wp/v2/posts. This is the only required setting — without it, the importer has nowhere to fetch from. You can append query parameters here yourself if you need to (for example, ?_embed to get embedded author/featured-media/term objects in each post, which the worked examples below rely on).
              Posts per request An optional positive integer. When supplied, the importer appends ?per_page=N to the endpoint URL on every fetch. Useful if the default of 10 isn't right for you — raising it reduces the number of HTTP calls during a large import, lowering it helps avoid memory/timeout issues on slow sites. If you leave this blank, the source site's default applies.
              Cron interval How often the scheduled import should run. Options are hourly, twicedaily, daily, or custom. This setting is only consulted when you click Set Cron on the Tools page — it doesn't take effect automatically when changed.
              Custom cron interval (minutes) Used only when Cron interval is set to custom. Supply a positive integer number of minutes. The plugin registers a new WP-Cron schedule called every_{N}_minutes dynamically, so any value works (e.g. 15 for every quarter of an hour, 5 for every five minutes).

              The settings are stored as a single option row in the wp_options table under the name pugpig-wordpress-rest-api-importer-settings.


              Admin Tools page

              Found at Tools → Pugpig WordPress REST API Importer Tool. Unlike the Settings page, this one is restricted to users with manage_options — running an import or scheduling cron is treated as a more sensitive operation than just configuring URLs.

              There are three buttons:

              • Import Posts — runs an import immediately, synchronously, while you wait. It fetches the endpoint, processes every page (following next_page_url in each response), and renders an inline log showing which URLs were fetched and what happened to each item. This is the right button to use when testing a configuration change or doing a one-off backfill.
              • Set Cron — schedules a recurring WP-Cron event called pugpig_rest_api_importer_cron_event using the interval configured in Settings, passing the endpoint URL as the event argument. From that point on, WordPress will run an import on every cron tick. Clicking this again replaces any previously scheduled event, so it's safe to use as an "apply my settings change" button.
              • Delete Cron — unschedules the cron event entirely. Imports stop happening automatically until you click Set Cron again.

              Note that WP-Cron in WordPress is technically "poor man's cron" — it only runs when someone visits the site. On a low-traffic site you may want to wire up a real OS-level cron job hitting wp-cron.php instead. That setup is outside the scope of this plugin.


              How the importer works (the import pipeline)

              Every import — whether triggered manually from the Tools page or fired by WP-Cron — goes through the same six-step pipeline. Understanding the order matters because it tells you which filter to hook for which job.

              Step 1: Fetch the page

              The importer calls wp_remote_get() against the current URL (initially the endpoint configured in Settings, then on subsequent iterations whatever next_page_url was returned). It pulls back the response body, JSON-decodes it into an associative array, and passes that array to the next step. If the request fails or the body isn't valid JSON, an error is logged and the loop terminates — no partial state is left behind, but no retry happens either.

              Step 2: Extract the list of posts

              The plugin tries three locations, in order, to find the array of posts inside the decoded response:

              1. A top-level data key (i.e. { "data": […] }).
              2. A top-level posts key.
              3. The root array itself, on the assumption that the response is the list of posts.

              The first one that contains a list of arrays wins. This flexibility means the importer can handle both bare REST responses ([ {…}, {…} ], which is what WordPress core returns) and wrapped envelopes ({ "data": […], "next_page_url": "…" }, which some custom endpoints use).

              Step 3: Paginate

              After processing the current page, the importer looks for a next_page_url key at the top level of the response. If present and non-empty, it loops back to Step 1 with that URL. If absent, the import ends.

              Important caveat: this is not standard WordPress REST API pagination. WordPress core paginates via the Link: rel="next" HTTP header and the ?page=N query parameter; this plugin ignores both of those and instead expects the source endpoint to embed a next_page_url field in the JSON body itself. If you're importing from a stock WordPress site, you have a few options:

              • Set Posts per request high enough that everything fits on one page (works for small sites).
              • Build a tiny custom REST route on the source side that wraps wp/v2/posts and adds a next_page_url to the body.
              • Add a pre_http_request or similar WordPress filter that munges the response on the destination side.

              Step 4: Shape each post

              This is where most of the interesting work happens, and where almost all of the customisation filters live. For each post in the extracted list, the function buildPostImportItem() transforms the remote JSON object into the structure that the JSON Importer expects.

              The filters fire in this order during the shaping step:

              1. pugpig_rest_api_importer_post — lets you rewrite the entire raw remote post first. Every subsequent step sees the rewritten version.
              2. pugpig_rest_api_importer_post_content — rewrite the HTML body that will be stored as post_content.
              3. pugpig_rest_api_importer_meta_fields — choose which top-level fields of the remote post to copy verbatim into post_meta.
              4. pugpig_rest_api_importer_additional_meta — add any other meta values you want.
              5. pugpig_rest_api_importer_meta — final review of the complete meta array before it's attached.
              6. pugpig_rest_api_importer_categories_source — provide the list of categories to turn into taxonomy terms.
              7. pugpig_rest_api_importer_contributors_source — provide the list of contributors to turn into terms in the contributor taxonomy.
              8. pugpig_rest_api_importer_term — modify each assembled term (categories and contributors) one by one.
              9. pugpig_rest_api_importer_media_items — provide the list of attachments (featured images, inline images) to download.

              The output of this step is an "import item" array — see What the importer produces by default for the exact shape.

              Step 5: Hand off to the JSON Importer

              Once all the posts on the current page have been shaped, the entire batch is passed in a single call to \Pugpig\JsonImporter\Importer::createOrUpdateItems($items, $logger). The JSON Importer then:

              • For each item, looks up whether a post already exists locally with the same uid_field_name meta key (which we set to the remote post ID).
              • If a match is found, it updates the existing post; otherwise it inserts a new one.
              • Downloads any attachments listed under items, respecting their fetch_policy (e.g. don't re-download if we already have a local copy).
              • Creates any taxonomy terms listed under taxonomy_terms and attaches them to the post.
              • Writes all the post_meta entries.

              It also writes per-item log lines back to the same LogHtml logger that this plugin instantiated, so the admin UI shows a unified log covering both the fetch/shape phase and the write phase.

              Step 6: Repeat or finish

              If the response from Step 1 contained a next_page_url, the loop restarts with that URL. Otherwise, the import is complete.


              What the importer produces by default

              For each remote post, the shape step produces an array roughly like this (simplified for readability):

              [
                  'uid_field_name' => 'pugpig_rest_api_importer_remote_post_id',
                  'post_type'      => $post['type']   ?? 'post',
                  'post_status'    => $post['status'] ?? 'draft',
                  'post_title'     => $cleaned_title,
                  'post_name'      => $post['slug'] ?: sanitize_title($cleaned_title),
                  'post_content'   => apply_filters('pugpig_rest_api_importer_post_content', $rendered_html, $post),
                  'post_excerpt'   => $cleaned_excerpt,
                  'post_date'         => 'Y-m-d H:i:s' formatted from $post['date'],
                  'post_date_gmt'     => formatted from $post['date_gmt'],
                  'post_modified'     => formatted from $post['modified'],
                  'post_modified_gmt' => formatted from $post['modified_gmt'],
                  'post_meta'      => [ /* see table below */ ],
                  'taxonomy_terms' => [ /* from the categories + contributors filters */ ],
                  'items'          => [ /* media items, from the media_items filter */ ],
              ]

              A few notes on the cleaning that happens here:

              • Title and excerpt have their HTML tags stripped and their HTML entities decoded. So "It&rsquo;s <em>here</em>!" becomes "It’s here!". This matches what most templates expect.
              • Content is not cleaned — the rendered HTML is kept intact. The only modification is whatever your pugpig_rest_api_importer_post_content filter does (if any).
              • Slug is taken from the remote slug field if present; if not, it's generated from the cleaned title via sanitize_title().
              • Dates are parsed via DateTimeImmutable and reformatted to MySQL's Y-m-d H:i:s. Invalid or missing dates are skipped (the local post will be assigned WordPress's current time when inserted).

              Default post_meta contents

              Before any meta filters run, the following keys are seeded automatically:

              Meta key Value
              pugpig_rest_api_importer_remote_post_id The remote post ID, as a string. This is also declared as the uid_field_name on the import item, which is how the JSON Importer matches re-imports back to existing local posts.
              pugpig_rest_api_importer_remote_post_link The link field from the remote post — the original permalink.
              syndication_permalink Identical to the above. Stored under this conventional key so that feed readers and syndication-aware themes that already understand syndication_permalink work without further wiring.
              pugpig_rest_api_importer_remote_post_guid The guid.rendered field from the remote post.
              pugpig_rest_api_importer_remote_post_source The literal string wordpress_rest_api. Useful when multiple importers feed the same destination site and you want to identify where a post came from.
              pugpig_rest_api_importer_remote_post_raw The complete remote post payload, JSON-encoded and wp_slashed for safe storage. You can decode this at runtime to access fields that weren't promoted into proper meta keys.
              pugpig_rest_api_importer_hash An MD5 hash of the remote post, calculated after removing the volatile _links key and recursively sorting all other keys alphabetically. Designed to be a stable signature of the meaningful content, so downstream code can answer "has this post actually changed since last import?".
              headline The cleaned title. Pugpig themes traditionally read from this.
              article-standfirst The cleaned excerpt. Pugpig themes traditionally read from this for the standfirst / lede.
              post_published_date The Y-m-d portion of the published date, as a separate searchable key.
              post_published_time The H:i:s portion of the published date.

              After this baseline is set, the plugin also:

              1. Copies any top-level remote fields named in pugpig_rest_api_importer_meta_fields (default list: template, format, comment_status, ping_status, sticky, featured_media, author) into post_meta.
              2. Merges in everything under the remote post's meta key (WordPress's REST API places registered post meta here).
              3. Runs pugpig_rest_api_importer_additional_meta for any free-form additions.
              4. Runs pugpig_rest_api_importer_meta for a final review and cleanup.

              Filter reference

              This section is the canonical reference for every filter the importer exposes. Each entry includes:

              • A description of what the filter is for, and when you'd typically use it.
              • A parameters table listing every argument passed to your callback, its PHP type, and what's actually in it.
              • A return type with notes on how unexpected values are handled.
              • A small example showing how to register a callback.

              pugpig_rest_api_importer_post

              Purpose: rewrite the raw remote post object before any of the rest of the importer sees it.

              This filter fires as the very first thing inside the shape step (immediately after the importer has checked that the remote object actually has an id field). Whatever your callback returns replaces the original $post variable for the entire rest of that item's processing.

              That means: every subsequent filter (_post_content, _meta_fields, _additional_meta, _meta, _categories_source, _contributors_source, _term, _media_items) will see the rewritten version, as will the internal extraction of title, excerpt, content, dates and seeded meta.

              This is the right filter to use when:

              • The remote payload is wrapped in an envelope (e.g. { "post": {…}, "extra": {…} }) and you want to unwrap it once.
              • You need to copy a deeply nested field up to a shallower location so multiple downstream filters can read it (e.g. lifting _embedded.author[0].name to $post['author_name']).
              • You're providing default values for fields the remote site occasionally omits.
              • You want to do one source-site-specific normalisation pass in a single place, rather than handling the same quirk in three different downstream filters.

              Parameters

              Name Type Description
              $post array<string, mixed> The raw post object, exactly as decoded from the source endpoint's JSON. Keys are whatever the source site emits — for a stock WordPress endpoint that means id, date, date_gmt, modified, modified_gmt, slug, status, type, link, title, content, excerpt, author, featured_media, categories, tags, meta, _links, and (with ?_embed) _embedded.

              Return value

              Type: array<string, mixed>

              Return the (possibly modified) post array. If your callback returns anything that isn't an array — null, false, a string, etc. — the importer ignores the return value and keeps the original $post untouched.

              Example

              add_filter('pugpig_rest_api_importer_post', function (array $post): array {
                  // Some endpoints wrap the post in { "data": { ...the post... } }.
                  if (isset($post['data']) && is_array($post['data'])) {
                      return $post['data'];
                  }
                  return $post;
              });

              pugpig_rest_api_importer_post_content

              Purpose: rewrite the rendered HTML body of the post before it is stored as post_content.

              By default, the importer takes $post['content']['rendered'] verbatim and stores it as the post content. That's usually fine, but the rendered HTML can contain things that aren't appropriate for the destination site — shortcodes that don't exist locally, image URLs that need to be rewritten, Gutenberg block comments that you want to strip, or markup that needs to be wrapped for Pugpig's article templates.

              This filter is the place to do all of that. You're given the rendered HTML and the raw remote post (in case you want to vary your behaviour by post type, category, author, etc.), and you return the rewritten HTML.

              Parameters

              Name Type Description
              $content string The rendered HTML body of the post, taken from $post['content']['rendered']. May contain shortcodes, Gutenberg block comments (<!-- wp:… -->), inline images with absolute URLs pointing at the source site, and so on.
              $post array<string, mixed> The complete remote post object (after any pugpig_rest_api_importer_post filter has run). Provided so you can make context-aware decisions, e.g. "only run my shortcode stripper on posts of type news".

              Return value

              Type: string

              Return the rewritten HTML. If your callback returns a non-string value (e.g. null from a preg_replace error, or accidentally an array), the importer coerces it to an empty string before storing — so a misbehaving callback can wipe the post body entirely. Always make sure you're returning a string.

              Example

              add_filter('pugpig_rest_api_importer_post_content', function (string $content, array $post): string {
                  // Remove Gutenberg block markers but leave the actual HTML inside them.
                  $stripped = preg_replace('/<!--\s*\/?wp:[^>]*-->/', '', $content);
                  return is_string($stripped) ? $stripped : $content;
              }, 10, 2);

              pugpig_rest_api_importer_meta_fields

              Purpose: choose which top-level fields of the remote post should be copied verbatim into the destination post's post_meta.

              The remote REST response includes a number of fields that are about the post (the author ID, the featured-media ID, the comment status, the format, and so on) but aren't part of the post's content or title. These don't map onto WordPress columns — they're meta values. This filter lets you control exactly which of those fields are promoted.

              The default list is sensible for a typical WordPress-to-Pugpig sync: template, format, comment_status, ping_status, sticky, featured_media, author. You can extend it, replace it, or drop fields you don't want.

              The filter supports two array styles, and you can mix them in the same return value:

              • Numeric list — ['featured_media', 'sticky']. The string is used as both the remote field name and the local meta key.
              • Associative map — ['remote_author_id' => 'author']. The value is the remote field name; the key is the meta key it should be stored under locally. This is how you rename fields on the way in.

              Values pulled from the remote post are passed through an internal normalizeMetaValue() helper, which:

              • Converts booleans to '1' or '0' (because WordPress meta storage doesn't natively support booleans).
              • Unwraps single-key ['rendered' => '…'] structures to just the string. (WordPress REST API uses this shape for "rich" fields like titles.)
              • Passes scalars and null through unchanged.
              • JSON-encodes anything else (arrays, objects), so complex values still round-trip safely.

              Parameters

              Name Type Description
              $default_fields array<int|string, string> The default list of remote fields to copy. As shipped: ['template', 'format', 'comment_status', 'ping_status', 'sticky', 'featured_media', 'author'].
              $meta array<string, mixed> The meta array as assembled so far (i.e. the baseline meta keys listed in the previous section). Provided for context, in case you want to base your decision on what's already there.
              $post array<string, mixed> The complete remote post object. Useful if you want to make per-post decisions about which fields to copy.

              Return value

              Type: array<int|string, string>

              Return the list of fields to copy, using either numeric or associative entries as described above. If you return something that isn't an array, the filter is treated as a no-op and the default fields are used. Entries whose value isn't a string are silently skipped.

              Example

              add_filter('pugpig_rest_api_importer_meta_fields', function (array $defaults, array $meta, array $post): array {
                  // Don't store comment status or pingback status; rename author -> remote_author_id.
                  $kept = array_values(array_diff($defaults, ['comment_status', 'ping_status', 'author']));
                  $kept['remote_author_id'] = 'author';
              
                  // Also pull `parent` across (for hierarchical CPTs).
                  $kept[] = 'parent';
              
                  return $kept;
              }, 10, 3);

              pugpig_rest_api_importer_additional_meta

              Purpose: add arbitrary extra meta entries to the post, including computed values that don't correspond 1:1 with any remote field.

              Where pugpig_rest_api_importer_meta_fields is for "copy this field straight across", this filter is for "compute a new value and store it as meta". Use it when:

              • The value needs to be derived — concatenating first/last names, parsing a URL, formatting a date differently.
              • The data lives in a non-top-level part of the remote post (e.g. inside _embedded, or inside an ACF payload nested under acf).
              • You want to inject a value that has no source-side equivalent at all — like a "imported at" timestamp or a source-site identifier.

              Keys returned here are merged into the meta array. Values pass through the same normalizeMetaValue() helper described above, so you can return booleans, scalars, arrays, etc., and they'll be normalised before storage.

              Parameters

              Name Type Description
              $extra array<string, mixed> The accumulated extras so far. Starts as an empty array. If multiple callbacks are registered against this filter, later ones receive the merged result of the earlier ones.
              $meta array<string, mixed> The meta array as assembled so far, including everything from pugpig_rest_api_importer_meta_fields and the remote post's own meta key. Useful when your additions need to reference existing values.
              $post array<string, mixed> The complete remote post object.

              Return value

              Type: array<string, mixed>

              Return the (modified) extras array. If you return something that isn't an array, the filter is treated as a no-op and no extras are added. Non-string keys are coerced to strings before merging.

              Example

              add_filter('pugpig_rest_api_importer_additional_meta', function (array $extra, array $meta, array $post): array {
                  $extra['source_site']  = parse_url($post['link'] ?? '', PHP_URL_HOST) ?: '';
                  $extra['imported_at']  = current_time('mysql');
              
                  // Pull a value out of an ACF payload, if present.
                  if (isset($post['acf']['subtitle'])) {
                      $extra['subtitle'] = $post['acf']['subtitle'];
                  }
              
                  return $extra;
              }, 10, 3);

              pugpig_rest_api_importer_meta

              Purpose: final review of the entire assembled meta array before it's attached to the import item.

              This is the last meta-related filter in the pipeline. By the time it fires, the meta array contains everything: the baseline keys, the fields promoted by _meta_fields, the remote post's own meta values, and the additions from _additional_meta. Whatever you return is the meta that gets stored.

              Use this filter for:

              • Cross-field invariants — "if headline ended up empty, fall back to the slug".
              • Removing keys you don't want persisted — for example, dropping pugpig_rest_api_importer_remote_post_raw if you don't need the full original JSON kept around.
              • Last-mile sanitisation that's easier to do in one place once everything is assembled.

              Because this filter sees the final shape of the meta array, it's also a good place to do cheap validation: throw a log entry, set a flag, or null out values that obviously don't make sense.

              Parameters

              Name Type Description
              $meta array<string, mixed> The fully-assembled meta array, with all earlier filters already applied.
              $post array<string, mixed> The complete remote post object, for context.

              Return value

              Type: array<string, mixed>

              Return the final meta array. If you return something that isn't an array, the filter is treated as a no-op and the input meta is kept.

              Example

              add_filter('pugpig_rest_api_importer_meta', function (array $meta, array $post): array {
                  if (empty($meta['headline'])) {
                      $meta['headline'] = $meta['article-standfirst'] ?? 'Untitled';
                  }
              
                  // Don't bloat the database with the full raw payload.
                  unset($meta['pugpig_rest_api_importer_remote_post_raw']);
              
                  return $meta;
              }, 10, 2);

              pugpig_rest_api_importer_categories_source

              Purpose: provide the list of categories that should be created as taxonomy terms and attached to the imported post.

              The plugin doesn't make any assumption about how the source site exposes categories. They might be a flat array of IDs at $post['categories'] (which you'd then need to look up against another endpoint), they might be embedded objects under _embedded['wp:term'] (available when you request ?_embed), they might be in a custom field, or they might not exist at all.

              Because of that variability, the default list is empty — nothing happens until you register a callback. Your callback's job is to return a list of category-shaped records, each of which the importer will convert into a taxonomy term.

              The taxonomy used is determined dynamically by calling pugpig_get_taxonomy_to_order_by() (a helper from the Pugpig App plugin). On a default Pugpig setup this resolves to category; on sites that have re-pointed Pugpig at a custom taxonomy, it'll resolve to whatever's configured there.

              Expected return shape

              Return a list (numerically indexed array) where each entry is an associative array with these keys:

              Key Type Description
              slug string The term slug. Required — if omitted or empty, the entry is skipped. If you only have a name, derive a slug with sanitize_title().
              name string The human-readable term name. If omitted, the slug is humanised (hyphens to spaces, words capitalised) to derive a name.
              meta array<string, mixed> Optional. Term meta to set on the created term, as key→value. Values are normalised the same way as post meta.

              Parameters

              Name Type Description
              $categories array<int, array<string, mixed>> The categories accumulated so far (starts as an empty array; if multiple callbacks register against this filter, each one sees the previous ones' output).
              $post array<string, mixed> The complete remote post object, which is where you'll typically read the category data from.

              Return value

              Type: array<int, array<string, mixed>>

              Return a list of category records as described above. Entries that aren't arrays, or that lack a usable slug, are silently skipped. Duplicate slugs within the same post are deduplicated automatically.

              Example

              // Assumes the source endpoint is called with ?_embed so categories are inlined.
              add_filter('pugpig_rest_api_importer_categories_source', function (array $sources, array $post): array {
                  $term_groups = $post['_embedded']['wp:term'] ?? [];
                  foreach ($term_groups as $group) {
                      foreach ($group as $term) {
                          if (($term['taxonomy'] ?? '') !== 'category') {
                              continue;
                          }
                          $sources[] = [
                              'name' => $term['name'] ?? '',
                              'slug' => $term['slug'] ?? '',
                          ];
                      }
                  }
                  return $sources;
              }, 10, 2);

              pugpig_rest_api_importer_contributors_source

              Purpose: provide the list of contributors (authors, co-authors, photographers, etc.) that should be created as terms in the contributor taxonomy.

              This filter mirrors pugpig_rest_api_importer_categories_source, but produces terms in the hard-coded contributor taxonomy (which Pugpig themes know about and use to render bylines, contributor pages, etc.). Like categories, the default is empty — you need to register a callback for any contributors to be imported.

              The contributor record shape is the same as for categories, with one extra option: each contributor may also carry an items array, which the JSON Importer treats as nested import items attached to the term. This is the mechanism for things like contributor avatars: you supply the avatar as a media item under items, and it ends up linked to the contributor term in the database.

              Expected return shape

              Key Type Description
              name string The contributor's display name. Required — unlike categories, an empty name causes the entry to be skipped.
              slug string The term slug. If omitted, it's derived from the name via sanitize_title().
              meta array<string, mixed> Optional. Term meta — this is where you'd put things like role, biography, social handles, external profile URL, etc.
              items array<int, array<string, mixed>> Optional. Nested import items attached to the term — typically media items built with ImporterUtils::getMediaImportItem() for the contributor's avatar/headshot.

              Parameters

              Name Type Description
              $contributors array<int, array<string, mixed>> The contributors accumulated so far. Starts empty.
              $post array<string, mixed> The complete remote post object, which is where you'll read contributor data from.

              Return value

              Type: array<int, array<string, mixed>>

              Return the list of contributor records. Entries without a usable name are silently skipped. Duplicate slugs are deduplicated.

              Example

              add_filter('pugpig_rest_api_importer_contributors_source', function (array $sources, array $post): array {
                  // Assumes ?_embed and the standard WordPress author embed shape.
                  $authors = $post['_embedded']['author'] ?? [];
                  foreach ($authors as $author) {
                      if (empty($author['name'])) {
                          continue;
                      }
                      $sources[] = [
                          'name' => (string) $author['name'],
                          'slug' => (string) ($author['slug'] ?? ''),
                          'meta' => [
                              'description' => (string) ($author['description'] ?? ''),
                              'profile_url' => (string) ($author['link'] ?? ''),
                          ],
                      ];
                  }
                  return $sources;
              }, 10, 2);

              pugpig_rest_api_importer_term

              Purpose: modify each assembled taxonomy term (both categories and contributors) just before it's added to the import item.

              This filter fires once per term, immediately after the categories or contributors callback has produced it and before it's appended to the post's taxonomy_terms array. It's the right place for cross-cutting modifications that apply to all terms regardless of which source filter produced them — for example, "stamp every term with a meta key recording which source site it came from", or "normalise term names to title case".

              You can tell categories and contributors apart by checking the term's taxonomy key.

              Parameters

              Name Type Description
              $term array<string, mixed> The fully-assembled term, ready to be appended. Keys include taxonomy, slug, name, optional term_meta, and (for contributors) optional items.
              $post array<string, mixed> The complete remote post object that this term is being created for.

              Return value

              Type: array<string, mixed>

              Return the (modified) term. Whatever you return is appended as-is to the post's taxonomy term list, so if you return something nonsensical, the JSON Importer will likely error out on it.

              Example

              add_filter('pugpig_rest_api_importer_term', function (array $term, array $post): array {
                  // Stamp every term with the source post's permalink, so we can later
                  // tell which terms were created during which import.
                  $term['term_meta'] = $term['term_meta'] ?? [];
                  $term['term_meta']['source_post_link'] = $post['link'] ?? '';
                  return $term;
              }, 10, 2);

              pugpig_rest_api_importer_media_items

              Purpose: provide the list of attachments (featured images, inline images, downloadable assets) that should be downloaded and attached to the imported post.

              This is the only way to attach any media. By default, no images come across at all — not the featured image, not inline body images. The plugin's hands-off approach is deliberate because there are several legitimate strategies (download everything locally, keep using remote URLs, only download images on demand) and the right choice depends on the destination site's storage and bandwidth constraints.

              When you register a callback for this filter, you return a list of "media import items" — structured arrays that tell the JSON Importer what to download, how to identify the resulting attachment, and where (if anywhere) to link the local attachment ID back into the parent post. Don't build these by hand; use the helper:

              $item = ImporterUtils::getMediaImportItem(
                  $id,                  // string: remote attachment ID (used to dedupe and as filename prefix)
                  $title,               // string: attachment title
                  $src_url,             // string: URL to download from
                  $parent_field_for_id, // string|null: parent post meta key to write the local attachment ID to
                  $fetch_policy,        // string|null: 'keep_existing' / 'replace' / 'force_download'
                  $contents,            // string|null: raw bytes if you want to inline-encode instead of fetching by URL
                  $caption,             // string: caption to store
                  $credit,              // string: credit line
                  $copyright_caption,   // string: copyright line
                  $mime_base_type,      // string: e.g. 'image', 'video' — used when inferring mimetype
                  $mime_type            // string|null: explicit mimetype; if null, inferred from $src_url's extension
              );

              A few key parameters explained in more depth:

              • $parent_field_for_id: when set, the JSON Importer writes the resulting local attachment ID back into the parent post under this meta key. The canonical use is '_thumbnail_id', which is how WordPress stores featured images — passing this turns your media item into the post's featured image automatically.
              • $fetch_policy: controls re-download behaviour. 'keep_existing' means "if we already have this attachment locally, don't fetch it again", which is what you want most of the time. 'replace' forces a re-download.
              • $contents: rarely used — for the case where you have the raw bytes of an image in PHP already (e.g. you generated it) and want to import it without an HTTP fetch. The URL becomes a data: URI internally.

              Parameters

              Name Type Description
              $items array<int, array<string, mixed>> The media items accumulated so far. Starts empty; multiple callbacks can each contribute items.
              $post array<string, mixed> The complete remote post object, which is where you'll find the featured-media reference and any inline image references.

              Return value

              Type: array<int, array<string, mixed>>

              Return the list of media items. Entries that aren't arrays are silently dropped. The returned list is also re-indexed numerically before being attached, so you don't need to worry about preserving keys.

              Example

              add_filter('pugpig_rest_api_importer_media_items', function (array $items, array $post): array {
                  // Attach the featured image, assuming ?_embed.
                  $featured = $post['_embedded']['wp:featuredmedia'][0] ?? null;
                  if (!is_array($featured) || empty($featured['source_url'])) {
                      return $items;
                  }
              
                  $items[] = \Pugpig\RestAPIImporter\ImporterUtils::getMediaImportItem(
                      (string) ($featured['id'] ?? ''),
                      (string) ($featured['title']['rendered'] ?? 'Featured image'),
                      (string) $featured['source_url'],
                      '_thumbnail_id',          // makes this the local post's featured image
                      'keep_existing',
                      null,
                      (string) ($featured['caption']['rendered'] ?? ''),
                      (string) ($featured['credit'] ?? ''),
                      ''
                  );
              
                  return $items;
              }, 10, 2);

              Worked examples

              The snippets below assume they live in a small companion plugin, an mu-plugin, or your active theme's functions.php. They're presented as standalone recipes — you can combine them freely.

              1. Strip Gutenberg block markers from imported content

              The block editor surrounds every block with HTML comments like <!-- wp:paragraph --> … <!-- /wp:paragraph -->. These don't render visibly but they bloat the HTML and can confuse anything downstream that parses the content. This callback removes them while leaving the actual HTML intact.

              add_filter('pugpig_rest_api_importer_post_content', function (string $content, array $post): string {
                  $stripped = preg_replace('/<!--\s*\/?wp:[^>]*-->/', '', $content);
                  return is_string($stripped) ? $stripped : $content;
              }, 10, 2);

              2. Map embedded categories into the destination taxonomy

              Call your source endpoint with ?_embed to get the category objects inlined under _embedded['wp:term'], and then pluck them out:

              add_filter('pugpig_rest_api_importer_categories_source', function (array $sources, array $post): array {
                  foreach ($post['_embedded']['wp:term'] ?? [] as $group) {
                      foreach ($group as $term) {
                          if (($term['taxonomy'] ?? '') !== 'category') {
                              continue;
                          }
                          $sources[] = [
                              'name' => (string) ($term['name'] ?? ''),
                              'slug' => (string) ($term['slug'] ?? ''),
                          ];
                      }
                  }
                  return $sources;
              }, 10, 2);

              3. Attach the featured image

              add_filter('pugpig_rest_api_importer_media_items', function (array $items, array $post): array {
                  $featured = $post['_embedded']['wp:featuredmedia'][0] ?? null;
                  if (!is_array($featured) || empty($featured['source_url'])) {
                      return $items;
                  }
              
                  $items[] = \Pugpig\RestAPIImporter\ImporterUtils::getMediaImportItem(
                      (string) ($featured['id'] ?? ''),
                      (string) ($featured['title']['rendered'] ?? 'Featured image'),
                      (string) $featured['source_url'],
                      '_thumbnail_id',
                      'keep_existing',
                      null,
                      (string) ($featured['caption']['rendered'] ?? ''),
                      '',
                      ''
                  );
              
                  return $items;
              }, 10, 2);

              4. Stamp custom meta on every imported post

              add_filter('pugpig_rest_api_importer_additional_meta', function (array $extra, array $meta, array $post): array {
                  $extra['source_site'] = parse_url($post['link'] ?? '', PHP_URL_HOST) ?: '';
                  $extra['imported_at'] = current_time('mysql');
                  return $extra;
              }, 10, 3);

              5. Customise the list of fields promoted into post meta

              add_filter('pugpig_rest_api_importer_meta_fields', function (array $defaults, array $meta, array $post): array {
                  // Drop two of the defaults, rename `author` -> `remote_author_id`.
                  $kept = array_values(array_diff($defaults, ['comment_status', 'ping_status', 'author']));
                  $kept['remote_author_id'] = 'author';
              
                  return $kept;
              }, 10, 3);

              6. Final cleanup of meta before write

              add_filter('pugpig_rest_api_importer_meta', function (array $meta, array $post): array {
                  if (empty($meta['headline'])) {
                      $meta['headline'] = $meta['article-standfirst'] ?? 'Untitled';
                  }
                  // We don't need the full raw payload in the database.
                  unset($meta['pugpig_rest_api_importer_remote_post_raw']);
                  return $meta;
              }, 10, 2);

              Constants and meta keys

              All of the meta keys the plugin writes are defined as class constants in src/Constants.php, so you can reference them from your own code without retyping the strings.

              Constant Value Used for
              META_KEY_HASH pugpig_rest_api_importer_hash MD5 of the remote post body (with _links removed and keys sorted). A stable signature you can use to detect "has anything actually changed?".
              META_KEY_REMOTE_POST_ID pugpig_rest_api_importer_remote_post_id Remote post ID. Also used as uid_field_name on every import item, which is how the JSON Importer matches re-imports back to existing local posts.
              META_KEY_REMOTE_POST_GUID pugpig_rest_api_importer_remote_post_guid The guid.rendered field from the remote post.
              META_KEY_REMOTE_POST_LINK pugpig_rest_api_importer_remote_post_link The original permalink (link field) from the remote post.
              META_KEY_REMOTE_POST_SOURCE pugpig_rest_api_importer_remote_post_source Always wordpress_rest_api. Useful when multiple importers feed the same destination site.
              META_KEY_REMOTE_POST_RAW pugpig_rest_api_importer_remote_post_raw The full original remote post payload, JSON-encoded and slashed. Decode at runtime to access fields you didn't promote into proper meta.
              META_KEY_MEDIA_ID pugpig_rest_api_importer_media_id Remote attachment ID. Used as uid_field_name for media items so the JSON Importer can dedupe attachments across imports.
              META_KEY_MEDIA_SRC_URL pugpig_rest_api_importer_src_url The original source URL of the attachment, stored on the local attachment record.
              META_KEY_MEDIA_ALT_TEXT _wp_attachment_image_alt WordPress core's alt-text meta key. Used as-is so the alt text is read by standard WordPress functions.
              META_KEY_MEDIA_CAPTION_ORIGINAL pugpig_rest_api_importer_caption Caption preserved exactly as supplied (separate from post_excerpt, which is what WordPress normally uses for attachment captions).
              META_KEY_MEDIA_CREDIT_ORIGINAL pugpig_rest_api_importer_credit Credit line.
              META_KEY_MEDIA_COPYRIGHT_ORIGINAL pugpig_rest_api_importer_copyright Copyright line.
              META_KEY_CRON_EVENT pugpig_rest_api_importer_cron_event The WP-Cron hook name used by the Tools page's Set Cron button.

              Quick reference: filter signatures

              For copy-paste convenience, here are all the filter signatures in one place:

              apply_filters('pugpig_rest_api_importer_post',                array  $post);                                         // returns array<string,mixed>
              apply_filters('pugpig_rest_api_importer_post_content',        string $content,        array $post);                  // returns string
              apply_filters('pugpig_rest_api_importer_meta_fields',         array  $default_fields, array $meta, array $post);     // returns array<int|string,string>
              apply_filters('pugpig_rest_api_importer_additional_meta',     array  $extra,          array $meta, array $post);     // returns array<string,mixed>
              apply_filters('pugpig_rest_api_importer_meta',                array  $meta,           array $post);                  // returns array<string,mixed>
              apply_filters('pugpig_rest_api_importer_categories_source',   array  $categories,     array $post);                  // returns array<int,array<string,mixed>>
              apply_filters('pugpig_rest_api_importer_contributors_source', array  $contributors,   array $post);                  // returns array<int,array<string,mixed>>
              apply_filters('pugpig_rest_api_importer_term',                array  $term,           array $post);                  // returns array<string,mixed>
              apply_filters('pugpig_rest_api_importer_media_items',         array  $items,          array $post);                  // returns array<int,array<string,mixed>>

               

               

              Was this article helpful?

              Yes
              No
              Give feedback about this article

              Related Articles

              • Manually Editing Articles Ingested from an RSS Feed
              • Custom Edition Importer: What We Need
              pugpig logo white
              Navigation
              • Products
              • Customers
              • News
              • Podcast
              Contact
              • Contact us
              • LinkedIn
              • Twitter
              Technical Support
              • Status Page
              • Documentation
              • Customer Support
              Corporate
              • Company
              • Jobs
              • Privacy Policy

              © Kaldor Ltd. 2022

              Powered by Pugpig


              Knowledge Base Software powered by Helpjuice

              Expand