Opened 5 months ago

Last modified 3 months ago

#13853 assigned Bug/Something is broken

upgrade all servers to stretch

Reported by: https://id.mayfirst.org/jamie Owned by: https://id.mayfirst.org/jamie
Priority: Medium Component: Tech
Keywords: Cc:
Sensitive: no

Description

This ticket is intended to track our progress upgrading all servers to stretch.

See our jessie stretch upgrade page for more info.

Change History (15)

comment:1 Changed 5 months ago by https://id.mayfirst.org/jamie

  • Owner set to https://id.mayfirst.org/jamie
  • Status changed from new to assigned

Over the past weekend I planned to upgrade over a dozen mosh's by I only go through: debs, boggs, viewsic, hashmi, didier, chavez, and octavia.

It was rough going for the first three since I used that opportunity to perfect the mf-dist-upgrade-mosh script which automates the process and reduces the need for user input.

The second batch was rough thanks to #13851 - which should not affect most other MOSH'es.

So... hopefully the rest of the MOSH'es will go more smoothly.

In addition, I took advantage of the weekend to upgrade single servers in a list of pairs:

  • gil (one of two mail.mayfirst.org servers, the other is paul)
  • kennedy (one of two authoritative DNS servers, the other is gamiz)
  • sankara (one of two icecast servers, the other is toussaint - which also runs mumble)

If things go smoothly with these newly upgraded servers, we should expect it to go smoothly with their pairs.

comment:2 Changed 5 months ago by https://id.mayfirst.org/jaimev

Great work jamie. Thank you for doing all of this.

comment:3 Changed 4 months ago by https://id.mayfirst.org/jaimev

This evening/morning I went ahead with upgrading the following moshes to stretch:

ginsberg bety daza binh ekpo albizu chelsea biko annette annapurna yippie brown rodolpho buffy gaspar julia clara dorothy sojourner jones caceres goldman larkin eagle roe jacobs

The helper script works great. I did run into issues with the upgrade on some moshes, the two most common being that if for some reason in the upgrade process mysql ens up in a failed state then phpmyadmin update throws a fit. Getting mysql into a working state and reinstall phpmyadmin seems to fix it. We may want to run dpkg-reconfigure phpmyadmin on all of these for good measure.

The other frequent thing I saw was php5-fpm failing because of an empty pool.d dir on the servers where all configs were moved to php7-fpm. I think we could either adjust the main config to stop referencing that dir when we know it is empty or leave the default www.conf config there. I've been doing the latter.

A few other random things came up , old emacs packages on old moshes gumming up the works (why do we even need that on the moshes), a half installed redmine package on brown I had to purge. I didn't see that any of the webconfigs were using it but I could be wrong, I'll write to the in the morning to ask. Some old dovecot packages causing problems on roe, purged them all and let puppet reinstall. A dialog screen asking to configure KERBEROS on sojourner ? What was that?

Please check on any nagios warnings regarding the above server in the morning if you get a chance. Still waiting on sojourner and eagle to finish now... they're soooo slow.

comment:4 Changed 4 months ago by https://id.mayfirst.org/jamie

Awesome!! Thanks for putting all this through. milk and dubois have been showing nagios errors for a while - I just re-configured them to go away (until they are fully setup). Everything else seems to be up.

I'm working through tickets now.

comment:5 Changed 4 months ago by https://id.mayfirst.org/jamie

Three problems so far:

  • one site that doesn't work in php7 (easy to downgrade)
  • two sites that did not properly switch to php7. agaric.com happened because /etc/apache2/sites-available/agaric.conf.conf was not a symlink. We can ignore this one, it should be rare. The other was tierramor. I simply re-saved the web conf and it was properly converted. Not sure why it didn't convert, all the other sites did on gaspar.

The tierramor site revealed another potential problem. It failed because the php5-mysql package was removed on gaspar (so any file that downgrades to php5 and depends on mysql will fail). It might have been removed via apt autoremove.

I just copied the deb files for php5-mysql and libmysqlclient18 from menchu to gaspar:/root and installed them via dpkg -i. We may want to check other servers to make sure they have these files installed too.

comment:6 Changed 4 months ago by https://id.mayfirst.org/jaimev

Thos missing packages on binh affected another site that needed php5 and another case of web configs on biko not properly upgrading to use the new php.

comment:7 Changed 4 months ago by https://id.mayfirst.org/jaimev

Going through the list it looks like actually quite a large number of web configs that did not include the # mfplphpversion: 5 comment were not properly upgraded to use php7.0-fpm.

Should I just run red-regenerate-web-config again on these?

comment:8 Changed 4 months ago by https://id.mayfirst.org/jamie

Yes - that worked for me (and returned an error on the site that did not properly have a symlink).

comment:9 Changed 4 months ago by https://id.mayfirst.org/jamie

One more weird thing - about red-regenerate-web-config...

red has to support two ways to securely connect to mysql server, one for jessie and one for stretch.

If the /usr/local/etc/red/red_node.conf file contains mysql_cert_file then it uses the php7 way, otherwise it uses the php5 way.

however, red-generate-web-config was ignoring mysql_cert_file, so always using the php5 way. I think this succeeded in many cases because when it was run, php5 was the default cli version of php.

However, just now with jacobs, it failed and I had to patch red-regenerate-web-config.

Now, it is patched in puppet, so you can push the latest puppet to get the fix.

comment:10 Changed 4 months ago by https://id.mayfirst.org/jaimev

Also I was basing my estimate that many sites had not been upgraded correctly on the number of files in the php5/fpm/pool.d/dir. It turns out that the majority of these are leftover from sites that have been disabled.

comment:11 Changed 4 months ago by https://id.mayfirst.org/jaimev

And the only remaining sites that I found without the mfplphpversion comment that couldn't be upgraded with red-regenerate-web-config turned out to have been left in soft-error-mode some time ago.

comment:12 Changed 3 months ago by https://id.mayfirst.org/jaimev

Currently upgrading emma kahlo claudette kerr stokely lewis stone june floriberto magon colin molina tresca mandela rose ella rushdie ossie malcolm

To avoid some of the problems I ran into last time I'm running the following commands before running the mf-dist-upgrade-mosh script.

Make sure default php5-fpm config exists

[[ ! -f /etc/php5/fpm/pool.d/www.conf ]] && mv /etc/php5/fpm/pool.d/www.conf.dpkg-dist /etc/php5/fpm/pool.d/www.conf;'

Don't remove packages needed for php5

apt-mark hold libmysqlclient18
apt-mark hold php5-mysql

Sorry st.ignucio but these can be put back in after the upgrade if anyone really needs emacs.

apt-get remove emacs emacs-nox emacs24-common emacs24 emacs24-nox emacs23 emacs23-common emacs23-nox emacsen-common --purge --auto-remove -y

Don't upgrade phpmyadmin for now. After script finishes I will unhold phpmyadmin and run upgrade after confirming mariadb is up and running correctly.

apt-mark hold phpmyadmin

Removing any references to log_slow_queries in mysql configs.

for file in $(grep -R -l log_slow_queries /etc/mysql/); do echo $file; sed -i /log_slow_queries/d $file; done

comment:13 Changed 3 months ago by https://id.mayfirst.org/jaimev

The above listed moshes have been upgraded now. The process was quite slow for some servers but for the most part worked without any major issues. After mf-dist-upgrade-mosh finished some servers were unable to start mariadb but in all of those cases after a reboot it was working without problems. For dedicated moshes I tried to spot check the principal websites to make sure they were working after the upgrade or downgrade them to php5 when necessary. Reports about sites on the shared servers that might be having problems under php7.0 will likely stream in throughout the day.

I also noticed that the mosh kerr does not seem to be used.

comment:14 Changed 3 months ago by https://id.mayfirst.org/jamie

Great work Jaime! I'm not sure how many moshes remain ... but I think making the changes you documented to the script itself is a great idea because the next time we do upgrades we will probably pick up the script again so having all of these steps documented in code will be helpful.

I'll keep in eye on the support queue for problems this morning since I hope you will be getting some rest this morning.

comment:15 Changed 3 months ago by https://id.mayfirst.org/jaimev

It looks like I was missing one more step last night. After doing the final upgrade of phpmyadmin at the end I should have run freepuppet-run again to ensure phpmyadmin was configured correctly. Doing so now for the above moshes.

Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.