Building a custom migration in Drupal 8, Part 4: Files and Content
 
In the last post, we finally wrote and executed our first migrations. We performed a dependency mapping to determine we needed to first migrate roles, then our users. We created new migrations in *.yml directly by searching our Drupal core directory for useful migration_templates. We're four parts in, and we have yet to migrate any nodes! Argh! Can we just start migrating nodes already!?

File migrations

It's really tempting at this point to jump in and start writing a migration for some of your simpler node types. Typically this is the Basic Page or Blog post types. Often, however, you'll discover a hidden dependency: Files. Even if you've avoided Media module and relied on the File or Image field types alone, files still exist as a separate entity type within Drupal 7. Before we can migrate nodes, we also have to migrate the files too. 

Drupal 7 actually makes migrating files easier than if you needed to migrate from a Drupal 6 site. File management was a lot more standardized, so there's really only one migration template we need, core/modules/file/migration_templates/d7_file.yml. Like the user and role migrations we created earlier, we can leave most of the template as it is. Start by copying the template into your editor, changing the ID and label, and adding the group:

<code><code>id: yoursite_file
migration_tags:
 - 'Drupal 7'
migration_group: yoursite
label: 'yoursite files'</code></code>

The File entity in Drupal 7 records who originally uploaded the file. While we don't strictly need the file owner, it's certainly something we would like to preserve if  we can. Like we did for our user migration -- which depended on the role migration -- our file migration depends on our user migration. Specify the dependency by adding it to the bottom of the file on a new line:

<code><code>migration_dependencies:
 required:
   - yoursite_users</code></code>

The last required thing we need to do is to tell the migration where to find our Drupal 7 file directory. The template expects us to have this on the same file system as our Drupal 8 site. This is done by specifying the path to the file directory in source_base_path. While this can be any directory on the system hosting the Drupal 8 site, I found it easiest to put it in a subdirectory of my Drupal 8 files directory:

<code><code>source:
 plugin: d7_file
 constants:
   source_base_path: /var/www/html/files/migrate_files/</code></code>

For reasons I still can't quite figure out, I needed to download the entire Drupal 7 files directory -- and not just the directory's contents -- to the directory I specified in source_file_path:

<code><code>path/to/yoursite
└── sites
   └── default
       └── files
           └── migrate_files
               └── files</code></code>

Once we have that set up, save it to a file in you sync directory named migrate_plus.migration.yoursite_file.yml where yoursite_file is the id of your file migration. Use drush to import the configuration and check the migration status:

<code><code>$ drush cim -y
...
$ drush ms

Group: YourSite group (yoursite)    Status  Total  Imported  Unprocessed  Last imported       
yoursite_role                      Idle    4      0         4            N/A
yoursite_file                      Idle    507    0         507          N/A</code></code>

If the migrate-status (ms) command shows a set number of files to import, we know that we've set everything up properly. Now we can run the migration with the migrate-import (mi) command:

<code><code>$ drush mi yoursite_file</code></code>

The file migration can take quite some time to run, even though the files are local. This is because the migration copies every file from our source_base_path to the expected place in our Drupal 8 file directory. So when running the file migration, be sure you have plenty of disk space! When you've finished the file migration, you can actually delete the directory specified in source_base_path, we will no longer need it unless we intend to rerun the file migration.

Simple node migrations

With the file migration finished, we can -- finally! -- get to a simple node migration. This can me a number of different content types depending on your site configuration. Ideally, the content type you'll choose to migrate first should only depend on the user migration for the author, and the file migration for any attached files or images. The target content type on your Drupal 8 site should only be using Drupal core's provided fields, and not any custom fields or reference fields such as Paragraphs. 

When I looked at my site, the simplest migration was one form the Picture content type to the Gallery type. Both types have have the following fields:

  • Title
  • Body
  • field_images, a multi-value core Image field

This means there was only one field that had a lot of complexity for us to worry about. The Title and the Body fields can be copied through. Next, we need a template. Like our other migrations, Drupal core provides a template for Drupal 7 nodes in the node module: core/modules/node/migration_templates/d7_node.yml. Customize the idlabel, and add the migration_group as you did with all of our other migrations:

<code><code>id: yoursite_gallery
migration_tags:
 - 'Drupal 7'
migration_group: yoursite
label: 'yoursite gallery'</code></code>

Then, scroll down to the bottom of the node migration *.yml and add your user and file migrations as dependencies:

<code><code>migration_dependencies:
 required:
   - deninet_user
   - deninet_file</code></code>

Specifying the source type

If we look over the rest of the migration *.yml, you might find something...missing. For each of the migrations we've created so far, the source entity on the Drupal 7 site has been the same on the new Drupal 8 site. Users were migrated to users, roles to roles, and files to files. When migrating nodes, however, you need to specify both the source and destination content type. When you examine the our node migration template, you might have noticed the source section:

<code><code>source:
  plugin: d7_node
</code></code>

You might think, "There's got to be a way to specify the node type there!" and you're right! Admittedly, the migration template doesn't make this obvious at all.

 Notice that under the source section we specify a plugin with the unique ID of d7_node. If we search the core directory for that plugin ID, we'll eventually find the following PHP class: core/modules/node/src/Plugin/migrate/source/d7/Node.php. This class queries the Drupal 7 database, pulling both built-in and custom node fields out of the database. It stands to reason that core would make this plugin configurable, so that we might tell it what node type to pull from the database. Looking at the plugin code, we find this:

<code><code>if (isset($this->configuration['node_type'])) {
  $query->condition('n.type', $this->configuration['node_type']);
}</code></code>

Ah-ha! The d7_node plugin is passed a configuration parameter named node_type. When provided, the database query performed against the node table of the Drupal 7 database is restricted to the provided node type. You might hang your head wondering how to pass this value to something buried as deep in core as this plugin, but it's actually really easy!

<code><code>source:
 plugin: d7_node
 node_type: 'picture'
</code></code>

Not too hard at all!

Customizing the field mapping

In our previous migrations, we largly ignored the bulk of the migration *.yml contained in the process section. This section specifies one or more field mappings between the target and source objects. The simplest is a direct mapping:

<code><code>field_name_on_target: field_name_on_source</code></code>

In a direct mapping, the source field value is copied into the target field. Notice that in the *.yml the mapping is written opposite what you might first expect. The field name on the Drupal 8 site is on the left, whereas the field name on the Drupal 7 site is on the right. For fields like the title, a direct mapping is good enough:

<code><code>title: title</code></code>

Having the target field name on the left and the source on the right seems silly at first, but it actually has a very good reason!  The purpose of the process section is more than just to specify mappings. As its name implies, the process section can be used to modify field values before storing them in your new site. Drupal 8 uses process plugins to allow you to modify field values during the migration. Several process plugins are included out of the box, but you can always write your own. The full format of a field mapping looks like this:

<code><code>field_name_on_target:
  source: field_name_on_source
  plugin: the_process_plugin_id
  parameter: value
  parameter2: value2</code></code>

When we use the short format, Drupal 8 falls back to the default process plugin, get. The get process plugin attempts to copy the source field entirely into the target field. This works fine for fields that hold a sting or number like our title field. 

Migrating multi-part fields

For fields like the body, the get plugin isn't enough. The body field actually is a multi-part field, containing a complex structure we need to traverse to migrate correctly. The two that are important to us are the body's value part which contains the actual text, and the format part. The format contains the text format of the value, usually plain_text, filtered_html, or full_html. There is also a third part, the summary, which was unused in my site, so I ignore it in my migration. In Drupal 8.3 or later, we can traverse each part of a multipart field by using the iterator plugin:

<code><code> body:
   plugin: iterator
   source: body
   process:
     value: value
     format:
       plugin: default_value
       default_value: full_html</code></code>

There's a lot going on in this! First we specify the plugin and then the source. The iterator plugin lets us specify how to migate each part under it's own process section. For the body's value, we just use a direct mapping and copy it over. For the format, however, we do something a little different.

In the case of my site, the filtered_html text format was a lot more permissive than I wanted it to be on Drupal 8. Since I didn't want to lose any potential formatting, I decided to migrate all my content setting the full_html format instead. To accomplish this, I employed yet another plugin, default_value. As it's name implies, it allows you to specify a static, default value in place of copying a value from the source site. The plugin takes only one paramter which specifies the static value to save to the target site.

Mapping a field value through a migration

That takes care of the title and body field, but we still have all the images to migrate! Like the body field, field_images is a multi-part field. We're interested in preserving the following parts:

  • target_id/fid specifies the file entity ID. It's called target_id in Drupal 8, and simply fid in Drupal 7.
  • alt and title which specify the image alt and title attributes respectively.
  • width and height which specify the image's dimensions.

Again, like the body, we're going to use the iterator plugin to handle each part:

<code><code> field_images:
   plugin: iterator
   source: field_picture_file
   process:
     target_id: fid
     alt: alt
     title: title
     height: height
     width: width</code></code>

This works, but we've made a potentially dangerous assumption. The default migration template for file migrations is set up to preserve the original site's fid for each file entity. This is probably a safe assumption to make if we're migrating to a pristine and empty Drupal 8 site, but what if we weren't? What if we added content -- including attached images or files -- prior to running the migration? In that case it's possible that our file migration might file when it encounters an fid that's already in use, or worse, overwrite the existing file entity. The solution for the file migration is to remove the following mapping:

<code><code>fid: fid</code></code>

If we do that, however, it creates another problem. In our Gallery migration, we only have the original Drupal 7 fid. If it's not the same as the fid in Drupal 8, how can we know what to set for the target_id? The answer is simple, ask the file migration! The migration_lookup process plugin lets us ask a previously run migration for the target ID when given the source ID. When can nest the call to the migration plugin inside our call to the iterator plugin:

<code><code> field_images:
   plugin: iterator
   source: field_picture_file
   process:
     target_id:
       plugin: migration_lookup
       migration: yoursite_file
       source: fid
     alt: alt
     title: title
     height: height
     width: width</code></code>

Here we see that under the target_id mapping, we call the migration plugin, passing it the name of the mgiration to query as well as the where to get the source ID. The migration plugin queries a table in our Drupal 8 database that preserves the ID mappings for each migration. You might wonder, however, how does the migration know what field/database column/thingawhatsit to use for the ID? That logic is actually contained in the mgiration's source plugin. If you look at our previous migrations, the source plugin was specific for the kind of entity we were migrating. So the d7_user plugin knows to use the uid, the d7_file plugin uses the fid, and the d7_node source plugin uses the nid. Nifty!

What about that destination type?

Speaking of source plugins... We know the d7_node source plugin knows what content type to retreive data from because we specify it in the node_type parameter. Where do we specify the destination node type? You might think that we can do what we did before. We would look at the destination section of our migration template, find the plugin class in the core directory, and then specify a parameter for the destination type. When you look at the destination section, however, things look very, very different:

<code><code>destination:
 plugin: entity:node</code></code>

Huh. Well, if we dig around we do find out there's a plugin class with the ID entity:node, but it doesn't seem to take any parameters. What? This is also something about Drupal 8 node migrations that's rather counterintuitive. Instead of setting a plugin parameter, the destination node type is treated as a field. We need to specify the type in our field mapping. Recall thought that there's a complication. Our source type name IS NOT the same as our target type name. Our source node type is named picture, but our target type is named gallery. Fortunately, we've already learned everything we need to solve this problem:

<code><code> type:
   plugin: default_value
   default_value: gallery</code></code>

In our process section we add a new mapping for the type field. Since we set the node_type in the source plugin already, we know that every piece of content we'll migrate here will be an image. So, we only need to specify that the type is gallery each time. For that, we use the default_value plugin. 

Why is the type a field mapping and not a parameter of the destination plugin? It turns out that the source plugin is the culprit. The source plugin does NOT require the node_type parameter! When it's not specified, the plugin defaults to all content types. In order to be sure all nodes are migrated properly, it's more versatile to treat the type as a field. That way, one node migration could slurp up all source nodes, and then create new Drupal 8 nodes with the correct content types. This is how Drupal 8's auto-generated migrations work for nodes. 

Save your node migration to the sync directory like you did the others. Do a configuration import, check the migration status for errors, then run the migration.

Summary

Wow! We've come a long, long way in this post. We created a file migration and a simple node migration. We've leveraged the heirarchical nature of the migration system to preserve our entities IDs even when they differ between Drupal 7 and 8. We've finally explored the process section and learned to customize our field mappings. We've uncovered and dissected multi-part fields, as well as picked up some useful tools like the default_value plugin to further customize how we migrate our data. Next time, we start to explore even more complex node migrations involving the Paragraphs module.

Thanks to our sponsors!

This post was created with the support of my wonderful supporters on Patreon:

  • Alina Mackenzie
  • Karoly Negyesi
  • Chris Weber

If you like this post, consider becoming a supporter at patreon.com/socketwench.

Thank you!!!