Opened 11 months ago

Closed 10 months ago

Last modified 10 months ago

#13775 closed Task/To do item (fixed)

Upgrade and Old Drupal Site

Reported by: dswanson Owned by:
Priority: Medium Component: Tech
Keywords: Cc:
Sensitive: no


Hi guys As you know is on old Drupal Please don't cause it to cease working THANKS

Change History (19)

comment:1 Changed 11 months ago by Jamie McClelland

Resolution: fixed
Status: newclosed

Done. I've coded that web site to remain on PHP5 through December 31, 2018.

comment:2 Changed 11 months ago by dswanson

thanks and 2019?

comment:3 Changed 11 months ago by Jamie McClelland

Unfortunately, we can't support php beyond 2018 (it's simply no longer being maintained by anyone).

One option might be to convert it to a static HTML site. That's not a simple process for a web site as large as yours, but it may be the best option, especially since I think you are mostly interested in keeping your articles available, right?

comment:4 Changed 11 months ago by dswanson

right. how can we do that?

comment:5 Changed 10 months ago by Marc Eliot Stein

Resolution: fixed
Status: closedassigned

We would like to use a reverse proxy / front end cache such as Varnish to keep this website alive even though we are no longer adding new content. Would MayFirst be able to help with this? Do you have Varnish already working on your servers? I am a Drupal developer but not a Varnish expert, so any advice you can give would be appreciated.

comment:6 Changed 10 months ago by Jamie McClelland

We don't use varnish any more - but we do run an nginx reverse proxy and can set up the site behind it. That won't solve the PHP problem though - since even a reverse proxy depends on the back end server working properly.

I think what you may be lookng for is this tutorial on how to convert a drupal web site into a static HTML site:

In my experience httrack works better the wget. I would be happy to help you get around any tight spots with the project.

comment:7 Changed 10 months ago by Marc Eliot Stein

Thanks for the suggestion of httrack. I'm reading the docs right now, and yes, this looks good.

Thanks for offering to help get around any problems - if you don't mind I'll keep this ticket open while I work on this.

comment:8 Changed 10 months ago by Marc Eliot Stein

I think I now have a complete archive of in this directory:


This appears to be complete, and as far as I can see httrack returned no error messages after running for several days.

What I'm stuck on now is how to turn this archive into an active mirror replacing the actual site. Can you advise me on this? I believe we want to disable or remove the current Drupal site and replace it with this archive. But I can't find clear documentation about how to do this in such a way that all URLs remain exactly as is. Can you advise or help with this? Thanks again.

comment:9 Changed 10 months ago by Jamie McClelland

Excellent work!

I just moved the old PHP files out of the way (I placed them in the include directory) and then I moved the httrack generated files into web directory and it all seems to be working fine.

I noticed just one problem: httrack creates all files with .html at the end.

However, drupal links do not have the .html at the end.

That means if you have a link out there some where that is:

You get a page not found, because the httrack file is:

So, I fixed it with this .htaccess:

RewriteEngine On
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html  [L]

Let me know if you notice any other problems. If all looks fine, we can get rid of the PHP files and drop the database.

comment:10 Changed 10 months ago by dswanson

If this works to preserve the site forever that will be wonderful! Thank you! But can you all please not actually remove the actual site without me while I'm on vacation. Pages like /iraq seem gone. Graphics in sidebar seem gone. I can't be sure these things weren't gone already. I have nothing to compare it to.

comment:11 Changed 10 months ago by Marc Eliot Stein

Jamie - thanks for your help with this. Yes, it was the question of the .html extensions and the .htaccess that I wasn't sure about. Now - As David asks above, would it be possible to move the new archive into a subdirectory called "archive", and reactivate the original site, so that David can do a side by side comparison?

Alternatively, if the above is difficult or if the Drupal site can not be recovered, would it be possible to leave the new site in place but place the files for the old site in a directory called "drupal"? That would at least let us access images and assets from the old site that might not have come over via httrack.

Thanks again.

comment:12 Changed 10 months ago by Jamie McClelland

Got it. I've put the old site back in place here: - that's the old location.

I've put the new archive version of the site here:

So now you can compare before we make a final change over. Let us know how it looks.

comment:13 Changed 10 months ago by dswanson

The first thing I checked is this: /iraq after the old site works /iraq after the new one does not

comment:14 in reply to:  13 Changed 10 months ago by Marc Eliot Stein

I'm trying to understand why "/iraq" wasn't archived, so we can know if this is an isolated problem that just involves this one page or not. David, is it possible that there are no active links to "/iraq" on the site? This archive process is a crawl that follows links starting from the front page. I don't know the structure of this site at all, but I haven't been able to find any active link to this page. Can you?

If it turns out the only problem is this one missing page, then we can solve it with a wget. If we have a more comprehensive loss of pages, we need to figure out why.


Replying to

The first thing I checked is this: /iraq after the old site works /iraq after the new one does not

comment:15 Changed 10 months ago by dswanson

I don't know.

comment:16 Changed 10 months ago by Jamie McClelland

Marc - I think your theory is correct.

I just added iraq by hand using wget.

David - can you keep checking and letting us know if you find any other missing pages? Based on the number of missing pages we can re-assess whether it's un-common and decide this process worked, or if it is more widespread and will require a different process to ensure we get all the pages.


I added the iraq page by hand by:

  • Logged in as archive-warisacrime user
  • mkdir web
  • cd web
  • wget -p
  • cd

I examined the contents. I choose to ignore the image_captcha folder and the robots.txt file. Next, I checked to see if any new files are added to the sites directory:

rsync --dry-run -a -v sites/ ../../

There were a few so I re-ran without --dry-run.

Then, I copied the iraq file but added an html extension to it.

comment:17 Changed 10 months ago by dswanson

I checked /bush and /cheney and they seemed to be there. I hadn't realized graphics were missing from the blocks/widgets on the righthand side of every page, and that others are out of date. Ideally, I would have just removed the whole righthand column or all the widgets in it. But if it's too late for that, then it's at least wonderful to have the site preserved as is. Please go ahead with saving it at THANKS.

comment:18 Changed 10 months ago by Jamie McClelland

Resolution: fixed
Status: assignedclosed

Great - all done. The site is now a static site and will be immune from future upgrades.

Also - with some style sheet tweaking, it might be possible to kill the side bar.

comment:19 Changed 10 months ago by dswanson


Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.