Opened 12 days ago

Last modified 7 days ago

#14386 assigned Bug/Something is broken

Site is down

Reported by: https://id.mayfirst.org/cvtweb Owned by: https://id.mayfirst.org/jaimev
Priority: Urgent Component: Tech
Keywords: Cc: https://id.mayfirst.org/jamie
Sensitive: no

Description

Our site, healtorture.org is down or very slow to load depending on the browser. On google and internet explorer the message is the site cannot be reached. On firefox it will load, but it takes a long time. help!

Change History (6)

comment:1 Changed 12 days ago by https://id.mayfirst.org/jaimev

  • Owner set to https://id.mayfirst.org/jaimev
  • Status changed from new to assigned

Hi, the site appears to load for me from here and I don't see anything in the resource usage graphs for ossie that indicate there should be a problem. Are you still experiencing issues with the site?

comment:2 Changed 12 days ago by https://id.mayfirst.org/jackaponte

Hi folks! I'm not sure about earlier, but we're definitely seeing site downtime for three Palante-monitored sites on ossie.mayfirst.org including healtorture.org. https://monitor.mayfirst.org/cgi-bin/nagios3/extinfo.cgi?type=2&host=ossie&service=HTTP currently shows HTTP as critical for the past 18 minutes.

comment:3 Changed 12 days ago by https://id.mayfirst.org/jaimev

  • Cc https://id.mayfirst.org/jamie added

Thanks jack. I was able to confirm the same.

Systemd reported the process apache as up however in the logs I could see errors "server reached MaxRequestWorkers setting". Restarting apache seems to have resolved the issue. We've seen this problem before but we haven't figured out how to detect and solve it automatically.

I like this explanation about why this occurs but pinpointing which site's php scripts are responsible isn't as easy for us.

https://serverpilot.io/docs/fix-apache-error-server-reached-maxrequestworkers-setting

comment:4 Changed 12 days ago by https://id.mayfirst.org/jamie

I have been seeing more of these lately - the main problem is that apache2 is not able to recover from the state of too many client connections. If everything is working properly, the server should eventually start accepting connections again. But for some reason, it seems to get stuck in the state until we reboot.

I think we'll need to add a new monitoring script to check for that error and restart apache if it sees it. But... it would be nice to fix the underlying error.

Lastly... I've upped the limit on ossie (via apache_max_request_workers => 250) - we may simply need to reset this default on our larger Moshes.

comment:5 Changed 8 days ago by https://id.mayfirst.org/jaimev

In ticket #14393 cvtweb reports that problems have continued. I do not see more instances of "server reached MaxRequestWorkers setting" error in apache logs manifesting since Friday so it appears the changes to apache resolve prevent that error message but not the root cause of the problem. In the munin graphs I see indication of some spikes throughout the day yesterday but nothing severe.

comment:6 Changed 7 days ago by https://id.mayfirst.org/cvtweb

I am not sure if this issue is due to the current issue, however, our eLearning classes are not running on our website today. I am not sure if this is due to speed of the site being an issue or if this has to do with something else. The last time I know it was working was on January 11th. Can you get back to me if this is an issue you can see an issue with?

I have tested with Google and Firefox to see if the class would run on either and neither worked. The classes require login so not sure if you already have access to that. Here is a link to one of the classes (although none are running). https://healtorture.org/content/fundamentals-self-care.

Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.