Opened 6 months ago

Closed 5 months ago

Last modified 5 months ago

#13775 closed Task/To do item (fixed)

Upgrade and Old Drupal Site

Reported by: https://id.mayfirst.org/dswanson Owned by:
Priority: Medium Component: Tech
Keywords: Cc:
Sensitive: no

Description

Hi guys As you know http://old.warisacrime.org is on old Drupal Please don't cause it to cease working THANKS

Change History (19)

comment:1 Changed 6 months ago by https://id.mayfirst.org/jamie

  • Resolution set to fixed
  • Status changed from new to closed

Done. I've coded that web site to remain on PHP5 through December 31, 2018.

comment:2 Changed 6 months ago by https://id.mayfirst.org/dswanson

thanks and 2019?

comment:3 Changed 6 months ago by https://id.mayfirst.org/jamie

Unfortunately, we can't support php beyond 2018 (it's simply no longer being maintained by anyone).

One option might be to convert it to a static HTML site. That's not a simple process for a web site as large as yours, but it may be the best option, especially since I think you are mostly interested in keeping your articles available, right?

comment:4 Changed 6 months ago by https://id.mayfirst.org/dswanson

right. how can we do that?

comment:5 Changed 6 months ago by https://id.mayfirst.org/marceliotstein

  • Resolution fixed deleted
  • Status changed from closed to assigned

We would like to use a reverse proxy / front end cache such as Varnish to keep this website alive even though we are no longer adding new content. Would MayFirst be able to help with this? Do you have Varnish already working on your servers? I am a Drupal developer but not a Varnish expert, so any advice you can give would be appreciated.

comment:6 Changed 6 months ago by https://id.mayfirst.org/jamie

We don't use varnish any more - but we do run an nginx reverse proxy and can set up the site behind it. That won't solve the PHP problem though - since even a reverse proxy depends on the back end server working properly.

I think what you may be lookng for is this tutorial on how to convert a drupal web site into a static HTML site:

https://www.drupal.org/node/27882

In my experience httrack works better the wget. I would be happy to help you get around any tight spots with the project.

comment:7 Changed 6 months ago by https://id.mayfirst.org/marceliotstein

Thanks for the suggestion of httrack. I'm reading the docs right now, and yes, this looks good.

Thanks for offering to help get around any problems - if you don't mind I'll keep this ticket open while I work on this.

comment:8 Changed 6 months ago by https://id.mayfirst.org/marceliotstein

I think I now have a complete archive of old.warisacrime.org in this directory:

/home/members/dswanson/sites/dev.worldbeyondwar.org/users/wbwdev/websites/old-warisacrime

This appears to be complete, and as far as I can see httrack returned no error messages after running for several days.

What I'm stuck on now is how to turn this archive into an active mirror replacing the actual site. Can you advise me on this? I believe we want to disable or remove the current Drupal site and replace it with this archive. But I can't find clear documentation about how to do this in such a way that all URLs remain exactly as is. Can you advise or help with this? Thanks again.

comment:9 Changed 6 months ago by https://id.mayfirst.org/jamie

Excellent work!

I just moved the old PHP files out of the way (I placed them in the include directory) and then I moved the httrack generated files into web directory and it all seems to be working fine.

I noticed just one problem: httrack creates all files with .html at the end.

However, drupal links do not have the .html at the end.

That means if you have a link out there some where that is: http://old.warisacrime.org/content/democratic-party-beyond-hope-we-need-mass-movement-demand-radical-progressive-change

You get a page not found, because the httrack file is:

http://old.warisacrime.org/content/democratic-party-beyond-hope-we-need-mass-movement-demand-radical-progressive-change.html.html

So, I fixed it with this .htaccess:

RewriteEngine On
RewriteCond %{REQUEST_FILENAME}\.html -f
RewriteRule ^(.*)$ $1.html  [L]

Let me know if you notice any other problems. If all looks fine, we can get rid of the PHP files and drop the database.

comment:10 Changed 6 months ago by https://id.mayfirst.org/dswanson

If this works to preserve the site forever that will be wonderful! Thank you! But can you all please not actually remove the actual site without me while I'm on vacation. Pages like /iraq seem gone. Graphics in sidebar seem gone. I can't be sure these things weren't gone already. I have nothing to compare it to.

comment:11 Changed 6 months ago by https://id.mayfirst.org/marceliotstein

Jamie - thanks for your help with this. Yes, it was the question of the .html extensions and the .htaccess that I wasn't sure about. Now - As David asks above, would it be possible to move the new archive into a subdirectory called "archive", and reactivate the original site, so that David can do a side by side comparison?

Alternatively, if the above is difficult or if the Drupal site can not be recovered, would it be possible to leave the new site in place but place the files for the old site in a directory called "drupal"? That would at least let us access images and assets from the old site that might not have come over via httrack.

Thanks again.

comment:12 Changed 6 months ago by https://id.mayfirst.org/jamie

Got it. I've put the old site back in place here: http://old.warisacrime.org/ - that's the old location.

I've put the new archive version of the site here: http://archive.warisacrime.org/

So now you can compare before we make a final change over. Let us know how it looks.

comment:13 follow-up: Changed 5 months ago by https://id.mayfirst.org/dswanson

The first thing I checked is this: /iraq after the old site works /iraq after the new one does not

comment:14 in reply to: ↑ 13 Changed 5 months ago by https://id.mayfirst.org/marceliotstein

I'm trying to understand why "/iraq" wasn't archived, so we can know if this is an isolated problem that just involves this one page or not. David, is it possible that there are no active links to "/iraq" on the site? This archive process is a crawl that follows links starting from the front page. I don't know the structure of this site at all, but I haven't been able to find any active link to this page. Can you?

If it turns out the only problem is this one missing page, then we can solve it with a wget. If we have a more comprehensive loss of pages, we need to figure out why.

Marc

Replying to https://id.mayfirst.org/dswanson:

The first thing I checked is this: /iraq after the old site works /iraq after the new one does not

comment:15 Changed 5 months ago by https://id.mayfirst.org/dswanson

I don't know.

comment:16 Changed 5 months ago by https://id.mayfirst.org/jamie

Marc - I think your theory is correct.

I just added iraq by hand using wget.

David - can you keep checking and letting us know if you find any other missing pages? Based on the number of missing pages we can re-assess whether it's un-common and decide this process worked, or if it is more widespread and will require a different process to ensure we get all the pages.

p.s.

I added the iraq page by hand by:

  • Logged in as archive-warisacrime user
  • mkdir web
  • cd web
  • wget -p http://old.warisacrime.org/iraq
  • cd old.warisacrime.org

I examined the contents. I choose to ignore the image_captcha folder and the robots.txt file. Next, I checked to see if any new files are added to the sites directory:

rsync --dry-run -a -v sites/ ../../archive.warisacrime.org/web/sites/

There were a few so I re-ran without --dry-run.

Then, I copied the iraq file but added an html extension to it.

comment:17 Changed 5 months ago by https://id.mayfirst.org/dswanson

I checked /bush and /cheney and they seemed to be there. I hadn't realized graphics were missing from the blocks/widgets on the righthand side of every page, and that others are out of date. Ideally, I would have just removed the whole righthand column or all the widgets in it. But if it's too late for that, then it's at least wonderful to have the site preserved as is. Please go ahead with saving it at old.warisacrime.org THANKS.

comment:18 Changed 5 months ago by https://id.mayfirst.org/jamie

  • Resolution set to fixed
  • Status changed from assigned to closed

Great - all done. The site is now a static site and will be immune from future upgrades.

Also - with some style sheet tweaking, it might be possible to kill the side bar.

comment:19 Changed 5 months ago by https://id.mayfirst.org/dswanson

THANKS

Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.