Opened 2 months ago

Last modified 2 months ago

#14386 assigned Bug/Something is broken

Site is down

Reported by: Owned by:
Priority: Urgent Component: Tech
Keywords: Cc:
Sensitive: no


Our site, is down or very slow to load depending on the browser. On google and internet explorer the message is the site cannot be reached. On firefox it will load, but it takes a long time. help!

Change History (6)

comment:1 Changed 2 months ago by

  • Owner set to
  • Status changed from new to assigned

Hi, the site appears to load for me from here and I don't see anything in the resource usage graphs for ossie that indicate there should be a problem. Are you still experiencing issues with the site?

comment:2 Changed 2 months ago by

Hi folks! I'm not sure about earlier, but we're definitely seeing site downtime for three Palante-monitored sites on including currently shows HTTP as critical for the past 18 minutes.

comment:3 Changed 2 months ago by

  • Cc added

Thanks jack. I was able to confirm the same.

Systemd reported the process apache as up however in the logs I could see errors "server reached MaxRequestWorkers setting". Restarting apache seems to have resolved the issue. We've seen this problem before but we haven't figured out how to detect and solve it automatically.

I like this explanation about why this occurs but pinpointing which site's php scripts are responsible isn't as easy for us.

comment:4 Changed 2 months ago by

I have been seeing more of these lately - the main problem is that apache2 is not able to recover from the state of too many client connections. If everything is working properly, the server should eventually start accepting connections again. But for some reason, it seems to get stuck in the state until we reboot.

I think we'll need to add a new monitoring script to check for that error and restart apache if it sees it. But... it would be nice to fix the underlying error.

Lastly... I've upped the limit on ossie (via apache_max_request_workers => 250) - we may simply need to reset this default on our larger Moshes.

comment:5 Changed 2 months ago by

In ticket #14393 cvtweb reports that problems have continued. I do not see more instances of "server reached MaxRequestWorkers setting" error in apache logs manifesting since Friday so it appears the changes to apache resolve prevent that error message but not the root cause of the problem. In the munin graphs I see indication of some spikes throughout the day yesterday but nothing severe.

comment:6 Changed 2 months ago by

I am not sure if this issue is due to the current issue, however, our eLearning classes are not running on our website today. I am not sure if this is due to speed of the site being an issue or if this has to do with something else. The last time I know it was working was on January 11th. Can you get back to me if this is an issue you can see an issue with?

I have tested with Google and Firefox to see if the class would run on either and neither worked. The classes require login so not sure if you already have access to that. Here is a link to one of the classes (although none are running).

Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.