Opened 3 years ago

Closed 4 months ago

#11535 closed Bug/Something is broken (worksforme)

apache fails to restart on log rotation

Reported by: https://id.mayfirst.org/jamie Owned by:
Priority: Medium Component: Tech
Keywords: Cc:
Sensitive: no

Description

This has been happening regularly on dorothy these days and a few times on floriberto.

The apache error log reports:

[Wed Mar 09 06:25:18.693746 2016] [core:notice] [pid 10447] AH00060: seg fault or similar nasty error detected in the parent process

Change History (20)

comment:2 Changed 3 years ago by https://id.mayfirst.org/jamie

I just edited /etc/logrotate.d/red (via puppet - modules/mayfirst/files/red/logrotate.d/red) and I edited /etc/logrotate.d/apache2 (on the server directly since this is not in puppet). Those are the only logrotate files that restart apache.

Let's wait a few days and see if this stops the problem on dorothy and if so we can push these changes to all moshes.

This might only affect servers that have been switched to php-fpm since those are running the worker mpm. This would explain why this problem has suddenly surfaced as we have been switching servers over.

comment:3 Changed 3 years ago by https://id.mayfirst.org/jamie

  • Resolution set to fixed
  • Status changed from new to closed

I've signed a tag that pushes these changes to all moshes.

comment:4 Changed 3 years ago by https://id.mayfirst.org/jamie

  • Resolution fixed deleted
  • Status changed from closed to assigned

This fix didn't seem to do the trick - more web sites were down this morning:

menchu, gaspar, mandela, viewsic, roe, buffy, stone julia

comment:5 Changed 3 years ago by https://id.mayfirst.org/jamie

Again this morning: debs menchu ossie stone viewsic

comment:6 Changed 3 years ago by https://id.mayfirst.org/jamie

I'm pretty sure we are stuck between the php transition. It seems that servers running apache prefork (php cgi) crash if we run restart and the apache worker servers (php-fpm) crash if we run reload.

Currently, we are using restart - so it is the php cgi servers that are struggling.

comment:7 Changed 3 years ago by https://id.mayfirst.org/jamie

I have switched mandela, malcolm, gaspar, viewsic, menchu and debs to php-fpm. I will continue switching servers that seem to have trouble restarting after logrotate.

comment:8 Changed 3 years ago by https://id.mayfirst.org/jamie

I've updated all servers running jessie to php-fpm and still getting this error on handful of sites each morning. Now, making this change to the red log rotate script:

0 jamie@turkey:logrotate.d$ git diff
diff --git a/modules/mayfirst/files/red/logrotate.d/red b/modules/mayfirst/files/red/logrotate.d/red
index 26f23cc..89b0660 100644
--- a/modules/mayfirst/files/red/logrotate.d/red
+++ b/modules/mayfirst/files/red/logrotate.d/red
@@ -8,8 +8,8 @@
        create 644 root adm
        sharedscripts
        postrotate
-               if [ -f "`. /etc/apache2/envvars ; echo ${APACHE_PID_FILE:-/var/run/apache2.pid}`" ]; then
-                       /etc/init.d/apache2 restart > /dev/null
-               fi
+    if /etc/init.d/apache2 status > /dev/null ; then \
+                       /etc/init.d/apache2 restart > /dev/null \
+               fi;
        endscript
 }
0 jamie@turkey:logrotate.d

comment:9 Changed 3 years ago by https://id.mayfirst.org/jamie

I'm still getting about 3 - 4 servers whose apache2 service doesn't come back up after log rotation. This morning's servers were: magon, mandela, buffy.

Now I've signed a tag that combines the red and apache2 logrotate files into a single file. Maybe they were running parallel and the postrotate scripts (which now both check if apache2 is running) were somehow conflicting with each other?

Also - I removed all the line continuation slashes and changed all tabs to spaces after getting this error:

logrotate_script: 5: logrotate_script: Syntax error: end of file unexpected (expecting "fi")
error: error running shared postrotate script for '/home/members/*/sites/*/logs/*.log '
run-parts: /etc/cron.daily/logrotate exited with return code 1

comment:10 Changed 3 years ago by https://id.mayfirst.org/jamie

  • Resolution set to fixed
  • Status changed from assigned to closed

This last change seems to have fixed it - not web servers failed to restart this morning.

comment:11 Changed 3 years ago by https://id.mayfirst.org/jaimev

Thank you for resolving this jamie.

comment:12 Changed 3 years ago by https://id.mayfirst.org/jaimev

  • Resolution fixed deleted
  • Status changed from closed to assigned

This may not be resolved yet. Reports of dorothy crashing this morning. #11607

comment:13 Changed 3 years ago by https://id.mayfirst.org/jamie

Still struggling with this issue. Now putting the following in /etc/logrotate.d/apache2:

logger -t "mfpl" "apache2 rotating" && /etc/init.d/apache2 stop && sleep 5 && /etc/init.d/apache2 restart

And pusshing to dorothy to test (seems to only happen on dorothy, malcolm, buffy, magon, mandela, menchu).

comment:14 Changed 3 years ago by https://id.mayfirst.org/dkg

apparently this happened on rose again this morning.

comment:15 Changed 4 months ago by https://id.mayfirst.org/jamie

See #12487

comment:16 Changed 4 months ago by https://id.mayfirst.org/jamie

comment:17 Changed 4 months ago by https://id.mayfirst.org/dkg

fwiw, if apache would just log to syslog (supplied in our case by journald) then there wouldn't be this weird logrotation restart at all.

that might make splitting out the logs more difficult, but that could itself probably be handled either by journald or by a journald consumer (e.g. the way that rsyslog can be attached to journald).

i don't have a concrete fix here, just a pointer to the fact that the problem we're seeing appears to be based on the use of logrotate itself, and it needing to restart the daemon. a better architecture wouldn't have that situation.

comment:18 Changed 4 months ago by https://id.mayfirst.org/jaimev

After a few searches I learned that this might be possible in apache 2.5 which is not yet released.

https://httpd.apache.org/docs/trunk/mod/mod_journald.html

However I also found lots of proposed solutions/hacks that appear to use apache's current ability to pipe log output.

https://www.loggly.com/ultimate-guide/centralizing-apache-logs/

https://raymii.org/s/snippets/Apache_access_and_error_log_to_syslog.html

https://stackoverflow.com/questions/18637921/how-to-log-apache-errors-to-different-syslog-facilities-for-each-virtual-host

I don't know how viable some variation of any of the above would be for us.

comment:19 Changed 4 months ago by https://id.mayfirst.org/jamie

After some initial investigation, I'm not sure this round of apache crashes are related to logrotate. Jury is still out though.

As for changing logrotate - I think the best ticket to discuss that would be #13018 which has a slightly bigger scope but seems like the right place for it.

comment:20 Changed 4 months ago by https://id.mayfirst.org/jamie

  • Resolution set to worksforme
  • Status changed from assigned to closed

I think I found the culprit. I suspect needsrestart is trying to restart apache during package upgrades in a way that apache doesn't like.

Here's the output of a comparison of when apache reported the error followed by a grep of /var/log/dpkg for "startup archives unpack" which shows an exact correlation.

annette: apache error.log: Mon Jul 09 03:15
annette: dpkg.log: 2018-07-09 03:14:56 startup archives unpack
bety: apache error.log: Mon Jul 09 03:51
bety: dpkg.log: 2018-07-09 03:51:10 startup archives unpack
daza: apache error.log: Sat Jul 07 03:18
daza: dpkg.log: 2018-07-07 03:18:28 startup archives unpack
dorothy: apache error.log: Mon Jul 09 03:43
dorothy: dpkg.log: 2018-07-09 03:41:50 startup archives unpack
jacobs: apache error.log: Mon Jul 09 03:08
jacobs: dpkg.log: 2018-07-09 03:08:12 startup archives unpack
julia: apache error.log: Sat Jul 07 03:32
julia: dpkg.log: 2018-07-07 03:32:00 startup archives unpack
june: apache error.log: Mon Jul 09 03:58
june: dpkg.log: 2018-07-09 03:43:58 startup archives unpack
marx: apache error.log: Sat Jul 07 03:43
marx: dpkg.log: 2018-07-07 03:43:27 startup archives unpack
ossie: apache error.log: Sun Jul 08 03:48
ossie: dpkg.log: 2018-07-08 03:47:59 startup archives unpack
proudhon: apache error.log: Sun Jul 08 04:24
proudhon: dpkg.log: 2018-07-08 04:23:28 startup archives unpack
rodolpho: apache error.log: Thu Mar 10 17:13
rodolpho: dpkg.log: 2018-07-06 03:57:22 startup archives unpack
sarah: apache error.log: Sun Jul 08 03:44
sarah: dpkg.log: 2018-07-08 03:44:04 startup archives unpack
slaapbeen: apache error.log: Sun Jul 08 03:56
slaapbeen: dpkg.log: 2018-07-08 03:53:55 startup archives unpack
stone: apache error.log: Mon Jul 09 03:57
stone: dpkg.log: 2018-07-09 03:55:16 startup archives unpack

I think this round of upgrades is done so closing as worksforme.

Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.