wiki:how-to/cms/Archive

Version 2 (modified by Mallory Knodel, 4 months ago) (diff)

WikiFormat from MD

Content note: We followed this guide and added in some of our own learnings: https://www.drupal.org/node/27882.

Your site might be running one of a variety of content management system software (CMS) like Drupal or Wordpress, but you aren't actively adding content to the site anymore. You might be considering how to handle an old site that still has value to readers but that is costing you a lot of time to maintain just for security updates to your CMS.

If your site doesn't require regular updates, or serves only as an information archive, consider moving your site off of CMS software and into a format that will never need another security upgrade again: HTML. We're calling this process site archival, and it's a good end-of-life solution for your site: keep the content without the hassle.

Your site will remain up, it will use the same domain, and all of the links will still work. Here's how you do it[0]:

Take note of the archive

You want your readers to know when this site was archived, so we suggest creating a block or footer message on your site that will appear on all pages, noting when the content was archived.

Remember, the CMS backend is about to go away for good. You are about to make all of your pages independent from one another, so changing anything, site-wide, at a later will be next to impossible.

Disable any interactivity

Your readers won't be able to fill in forms, login or submit content. You need to disable all of these features at this stage in the process.

For Drupal: Follow this documentation for a list of all modules that should be disabled and/or tips on how to do this automatically

For Wordpress and other CMSs: Look for any search widgets, contact me forms, or other bits of interactivity that wouldn't work on a static site anyway and remove them entirely from your site.

Create static pages

There are a few ways you can create HTML pages from your entire site, and a few are listed here: https://www.drupal.org/node/27882, along with examples for how to run the commands. There is also software you can download to your laptop, like SiteSucker.

Tip: Choose HTTrack (slightly better results than wget, in our experience)

Move your old CMS files

Eventually you'll want to remove these files from the server completely. But if you'd like to compare your site side-by-side with its static version, you can move your site files to a new place, like old.mysite.org, for now.

Upload your new HTML files

With your old CMS files out of the way, you can now upload the static files generated by HTTrack (or another method) to the server directory where your CMS lived before. Nothing needs to be tweaked, no cron needs to run-- it's that simple.

Click around the site and make sure it's all there

This process never goes the same way twice, but we have added here some common problems and their solutions:

  • The site pages end in .html, breaking the internal site's links. Solution: https://support.mayfirst.org/ticket/13775#comment:9
  • Some pages didn't get archived: One solution is to run HTTrack again if many pages are missing, perhaps tweaking the command parameters. Another is just to grab the missing page in a one-off wget and upload it to the server with the rest.
  • ...