Opened 3 months ago

Last modified 6 days ago

#14068 assigned Bug/Something is broken

Website not working

Reported by: https://id.mayfirst.org/gvshp Owned by: https://id.mayfirst.org/jaimev
Priority: Urgent Component: Tech
Keywords: Cc: https://id.mayfirst.org/jamie
Sensitive: no

Description

Hi, our blog gvshp.org/blog is not loading. What is the problem? This is like a daily occurrence. Something is obviously wrong. Please help us track down the issue.

Change History (36)

comment:1 Changed 3 months ago by https://id.mayfirst.org/jaimev

  • Owner set to https://id.mayfirst.org/jaimev
  • Status changed from new to assigned

Hi the problems seems to be intermittent which has made it hard to track down. I am looking for any source of resource contention in your shared server or its physical host and looking at your blogs wordpress files for any anomalies.

comment:2 Changed 3 months ago by https://id.mayfirst.org/gvshp

Thanks. It is still down now so a good hour.

comment:3 Changed 3 months ago by https://id.mayfirst.org/jaimev

Actually since the moment I first responded to the ticket I have been able to load http://gvshp.org/blog/ without problems. We are however making plans to move your host chavez to a new physical server to alleviate some of the load. Would you like me to activate a TLS certificate for your site so it can be reached securely over https and browsers won't complain about it?

Last edited 3 months ago by https://id.mayfirst.org/jaimev (previous) (diff)

comment:4 Changed 3 months ago by https://id.mayfirst.org/gvshp

Yes, thanks.

comment:5 Changed 3 months ago by https://id.mayfirst.org/jaimev

As you may have seen in our service advisory we have been experiencing general slowdowns on several servers , especially on chavez where your site is. We were consistently seeing your wordpress blog site reach the limit of php processes on chavez and tracked that back to a cron job that was recurring every few seconds.

209.51.172.13 - - [24/Sep/2018:13:28:42 -0400] "POST /blog/wp-cron.php?doing_wp_cron=1537810122.5123519897460937500000 HTTP/1.1" 200 166 "http://gvshp.org/blog/wp-cron.php?doing_wp_cron=1537810122.5123519897460937500000" "WordPress/4.9.8; http://gvshp.org/blog"
209.51.172.13 - - [24/Sep/2018:13:28:43 -0400] "POST /blog/wp-cron.php?doing_wp_cron=1537810123.8951709270477294921875 HTTP/1.1" 200 166 "http://gvshp.org/blog/wp-cron.php?doing_wp_cron=1537810123.8951709270477294921875" "WordPress/4.9.8; http://gvshp.org/blog"
209.51.172.13 - - [24/Sep/2018:13:28:44 -0400] "POST /blog/wp-cron.php?doing_wp_cron=1537810124.8592278957366943359375 HTTP/1.1" 200 166 "http://gvshp.org/blog/wp-cron.php?doing_wp_cron=1537810124.8592278957366943359375" "WordPress/4.9.8; http://gvshp.org/blog"
209.51.172.13 - - [24/Sep/2018:13:28:56 -0400] "POST /blog/wp-cron.php?doing_wp_cron=1537810136.6724200248718261718750 HTTP/1.1" 200 166 "http://gvshp.org/blog/wp-cron.php?doing_wp_cron=1537810136.6724200248718261718750" "WordPress/4.9.8; http://gvshp.org/blog"
209.51.172.13 - - [24/Sep/2018:13:29:01 -0400] "POST /blog/wp-cron.php?doing_wp_cron=1537810141.9011449813842773437500 HTTP/1.1" 200 166 "http://gvshp.org/blog/wp-cron.php?doing_wp_cron=1537810141.9011449813842773437500" "WordPress/4.9.8; http://gvshp.org/blog"

That cron job is named "inpsyde_phone-home_checkin" and is apparently triggered by the "backwpup" wordpress plugin. I've found mention of it in several forum posts on wordpress support site. It is only supposed to run every 14 days but like other users have reported it is out of control on this instance. I have disabled the plugin for your site.

https://wordpress.org/support/topic/cron-task/ https://wordpress.org/support/topic/inpsyde_phone-home_checkin/

comment:6 Changed 7 weeks ago by https://id.mayfirst.org/gvshp

Thanks. We're still having major issues with our site and blog crashing every day. This has become a major issue for us. Its a drag on staff productivity and it makes us look like crap to people trying to access our page. What else can we do here?

comment:7 Changed 7 weeks ago by https://id.mayfirst.org/jaimev

Hi, we are sorry about this. I can see chavez under heavy load right now and I am working on resolving the issue.

comment:8 Changed 7 weeks ago by https://id.mayfirst.org/jaimev

While it has been difficult to pinpoint the exact cause of problems on chavez we think it may be related to sudden spikes in memory consumption that subsequently provoke a flood of disk io slowing down all processes. We have taken some steps to limit memory consumption and will be monitoring for changes.

comment:9 Changed 5 weeks ago by https://id.mayfirst.org/gvshp

Is there anything we can do? Our blog gvsho.org/blog seems to go down for 5-10 mins at a time several times every day.

comment:10 Changed 5 weeks ago by https://id.mayfirst.org/jaimev

  • Cc https://id.mayfirst.org/jamie added

There were some issues with chavez this morning still running backups from overnight.

Now that we've moved the chavez vm to a new physical server we can add an SSD backed virtual drive for mysql. I can get that setup tonight and I think this should provide better performance for all sites on chavez.

comment:11 Changed 5 weeks ago by https://id.mayfirst.org/gvshp

Thanks. Another issue that just popped up is on the web configuration page we are getting the following error message: You cannot modify a record that with the status set to pending-update.

comment:12 Changed 5 weeks ago by https://id.mayfirst.org/jaimev

I've just restored the web configuration.

comment:13 Changed 5 weeks ago by https://id.mayfirst.org/gvshp

Thanks, working now.

comment:14 Changed 4 weeks ago by https://id.mayfirst.org/gvshp

Our blog is down right now. Again, this is very critical to us. What can we do to improve this? This issue is not affecting our general website as often. Just the blog. Please advice.

comment:15 Changed 4 weeks ago by https://id.mayfirst.org/jamie

Is it still down for you? Is this the address: http://gvshp.org/blog/

That site appears to be working at the moment. But I see that your site exceeded the limit on the allowed number of processes at around 11:38 am which would have resulted in it refusing additional connections.

comment:16 Changed 4 weeks ago by https://id.mayfirst.org/gvshp

Its back up now but it is regularly going down for 5 or so mins at a time. Is there anything we can do about this processes issue?

comment:17 Changed 4 weeks ago by https://id.mayfirst.org/jamie

It seems you are running two wordpress sites and a static html site under the same hosting order. If we could split up the three sites into their own hosting orders that would help a lot. It would mean you would have 12 proceses per site instead of having to split the 12 processes.

It also will help us understand which of the three sites is causing the problem.

If you need help with splitting them up let us know.

It would mean setting up new domain names, e.g. blog.gvshp.org and buildingblocks.gsvhp.org for each site. However, we can automatically redirect from the old addresses to the new ones.

comment:18 Changed 4 weeks ago by https://id.mayfirst.org/gvshp

Yes, we should do this. Please advise. I will speak to our IT guy about it.

comment:19 Changed 4 weeks ago by https://id.mayfirst.org/jaimev

Hi let me know if you need any help getting this done , I think it is something we've talked about before.

https://support.mayfirst.org/ticket/13571#comment:4

https://support.mayfirst.org/ticket/13222#comment:3

comment:20 Changed 2 weeks ago by https://id.mayfirst.org/gvshp

Thanks, we are working on this. Should take a couple weeks. Our blogs are down again now for the past 10 minutes or so. Anything that can be done now?

comment:21 Changed 2 weeks ago by https://id.mayfirst.org/jamie

working on it now...

comment:22 Changed 2 weeks ago by https://id.mayfirst.org/jamie

Things should be better now on chavez.

comment:23 Changed 2 weeks ago by https://id.mayfirst.org/jaimev

It isn't clear to us excatly why yet, but the gvshp.org/blog wordpress instance is attracting several thousands of POST requests from the google robots. This is unusual because most legitimate bots send only GET requests. POST requests generally create more load on the server. Google does say however that they do use POST requests for pages that "are missing information and/or look completely broken without the resources returned from POST". The URL attracting these requests is created by the wordpress-popular-posts plugin on your site. I have disabled this temporarily. We'd like to leave this off a few days to see if there is any change in behavior.

comment:24 Changed 2 weeks ago by https://id.mayfirst.org/gvshp

Thanks. I don't know if that's the case because /blog and /building blocks are both down again now.

comment:25 Changed 2 weeks ago by https://id.mayfirst.org/gvshp

It seems like it is getting worse. We have long periods of time when these sites are down.

comment:26 Changed 2 weeks ago by https://id.mayfirst.org/jamie

Working on it...

comment:27 Changed 2 weeks ago by https://id.mayfirst.org/jamie

chavez is back to normal.

Would you be willing to give us persmission to break up your sites into separate hosting orders? We can do it fairly easily for you.

That would allow us to better pinpoint the activity generated by your username and figure out why it is so resource hungry.

comment:28 Changed 10 days ago by https://id.mayfirst.org/gvshp

Yes, we have our IT guy Roger working on it. I will ask him to comment here to check in on that status. Our sites are currently all down, both blogs and regular gvshp.org now.

comment:29 Changed 10 days ago by https://id.mayfirst.org/gvshp

Yes, please proceed to break up your sites into separate hosting orders for us and we will follow up on this ticket.

comment:30 Changed 10 days ago by https://id.mayfirst.org/gvshp

But our sites are still down now so anything you can do would be great. Thanks.

comment:31 Changed 10 days ago by https://id.mayfirst.org/jaimev

Hi, sorry I've just restarted apache on chavez.

This time there wasn't any clear resource contention but searching the apache logs I see this.

[Tue Dec 04 12:46:25.435813 2018] [mpm_worker:error] [pid 19097:tid 140146692650176] AH00286: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting

comment:32 Changed 10 days ago by https://id.mayfirst.org/jaimev

I've increased the number of apache worker processes on chavez following jamie's example to resolve similar issues for colin in ticket #https://support.mayfirst.org/ticket/14283#comment:3

comment:33 Changed 8 days ago by https://id.mayfirst.org/gvshp

Thanks, do you know the timeline on splitting the hosting orders? I want to have our IT guy aware so he is available to go fix any issues that may arise because of it. Thanks.

comment:34 Changed 8 days ago by https://id.mayfirst.org/jaimev

I can start working on this tomorrow afternoon so you can monitor for problems over the weekend.

comment:35 Changed 8 days ago by https://id.mayfirst.org/gvshp

Thanks.

comment:36 Changed 6 days ago by https://id.mayfirst.org/jaimev

I wasn't able to start this yesterday but spent several hours today trying to separate the https://www.gvshp.org/buildingblocks wordpress folder into it's own site at https://buildingblocks.gvshp.org

I was able to move the folder and restore wordpress there but could not get subpages like guided-tours and building-blocks working again and other elements working again.

Honestly all of gvshp.org and subsites appears to be an interweaving tangled mess. In my humble opinion no website should be created this way. I want to tone down that last comment. You may have a perfectly good reason for using this meshing strategy between multiple wp installs. I do think finding a way to separate to the subsites cleanly will make maintenance easier. Unfortunately I don't think this is something I can do for you without more knowledge of how the underlying structure is intended to work.

I've commented out my redirects and the site continues as it was before.

Last edited 6 days ago by https://id.mayfirst.org/jaimev (previous) (diff)

Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.