Opened 3 years ago

Last modified 3 years ago

#11794 assigned Bug/Something is broken

Civi again going super slow

Reported by: https://id.mayfirst.org/nlg-membership Owned by: https://id.mayfirst.org/jamie
Priority: Urgent Component: Tech
Keywords: Cc: multi.lectical@…, ben@…, maya@…, nadia@…
Sensitive: no

Description (last modified by https://id.mayfirst.org/fatimab)

Update from our drupal/website support person, Ben:

I would definitely ask Mayfirst about the slowness though, as well as the max_user_connections.


Civi is slow & people have been getting various error messages.

See attached photos as reference.

From Dave, our civi support technician:

hmm a mysterious message indeed. In the future the full screen, but especially the url would be helpful. Also the exact time it happened could help me find errors in the logs.

As far as slowness, CIVI does use a good deal of resources but really should be doing better on the system its on. If the server is shared one, it could be that others are using the resources and making nesri.org in general slower.

I took a look and saw in Drupal:

Database schema Inconsistent The Schema comparison report shows:

33 modules with matching tables 54 extra tables 6 modules with mis-matching tables

also took a look at the memory available and there's only 335 MB of 3965MB RAM available, which is not a ton but should be plenty for civi to run on.

so I ran a drupal database update to see if that would fix the errors with the tables and it seemed to have fixed some but there are still a number that are mismatched. I know you guys have a drupal person. Maybe they should check it out at that, though not sure that's the problem with the slowness.

I also some other errors related to civi but will need some more time to untangle what the issue might be there.

Attachments (5)

Screenshot 2016-05-24 17.01.12.png (13.1 KB) - added by https://id.mayfirst.org/fatimab 3 years ago.
SnipImage.JPG (28.0 KB) - added by https://id.mayfirst.org/fatimab 3 years ago.
Screenshot 2016-05-23 17.17.55.png (31.6 KB) - added by https://id.mayfirst.org/fatimab 3 years ago.
mfpl-civimail-bounce.png (47.6 KB) - added by https://id.mayfirst.org/jamie 3 years ago.
settings.png (81.3 KB) - added by https://id.mayfirst.org/daveo 3 years ago.
bounce settings

Download all attachments as: .zip

Change History (20)

comment:1 Changed 3 years ago by https://id.mayfirst.org/jaimev

  • Owner set to https://id.mayfirst.org/jaimev
  • Status changed from new to assigned

Right now kinoy seems to be responding well. I see you have opened previous ticket #11598 I am going to continue following up there.

Changed 3 years ago by https://id.mayfirst.org/fatimab

Changed 3 years ago by https://id.mayfirst.org/fatimab

Changed 3 years ago by https://id.mayfirst.org/fatimab

comment:2 Changed 3 years ago by https://id.mayfirst.org/fatimab

Civi is slow & people have been getting various error messages.

See attached photos as reference.

From Dave, our civi support technician:

hmm a mysterious message indeed. In the future the full screen, but especially the url would be helpful. Also the exact time it happened could help me find errors in the logs.

As far as slowness, CIVI does use a good deal of resources but really should be doing better on the system its on. If the server is shared one, it could be that others are using the resources and making nesri.org in general slower.

I took a look and saw in Drupal:

Database schema Inconsistent The Schema comparison report shows:

33 modules with matching tables 54 extra tables 6 modules with mis-matching tables

also took a look at the memory available and there's only 335 MB of 3965MB RAM available, which is not a ton but should be plenty for civi to run on.

so I ran a drupal database update to see if that would fix the errors with the tables and it seemed to have fixed some but there are still a number that are mismatched. I know you guys have a drupal person. Maybe they should check it out at that, though not sure that's the problem with the slowness.

I also some other errors related to civi but will need some more time to untangle what the issue might be there.

anyhow that's what i came up with for now, will talk to both of you at 1pm.

comment:3 Changed 3 years ago by https://id.mayfirst.org/fatimab

  • Priority changed from Medium to Urgent

comment:4 Changed 3 years ago by https://id.mayfirst.org/fatimab

  • Cc multi.lectical@… ben@… maya@… nadia@… added
  • Description modified (diff)

comment:5 Changed 3 years ago by https://id.mayfirst.org/fatimab

  • Description modified (diff)

comment:6 Changed 3 years ago by https://id.mayfirst.org/jaimev

  • Owner changed from https://id.mayfirst.org/jaimev to https://id.mayfirst.org/jamie

Let's get jamie's input on this.

comment:7 Changed 3 years ago by https://id.mayfirst.org/jamie

I just increased the number of mysql user connections from 25 to 50. I imagine that hitting that limit may have caused some sql query to fail. However, usually hitting that max is a symptom of a different, underlying problem that should be addressed.

I've recently seen many CiviCRM sites hit a mysql deadlock issues - particularly large sites or ones with many groups. The CiviLogConfig file should have a traceback for the error and it probably reports "deadlock" - in which case you are probably hitting this problem.

In short, the group cache gets triggered too often and sometimes in ways that conflict so that one query locks a table needed by another (and many many others) - so the queries pile up until you hit the max and then you start getting failures. This makes the site feel slow because the page load is waiting for queries that only complete after the deadlock time out has been reached.

This seems to be addressed (at least a bit) in an upcoming release: https://civicrm.org/blog/eileen/478-group-contact-cache-deadlocks-improvement

comment:8 Changed 3 years ago by https://id.mayfirst.org/fatimab

Thanks everyone!

Can you give us guidance on how to get the issue more thoroughly addressed, especially regarding the underlying problem that this is signaling? And who is the person/group/ or what is the expertise that is needed to address it? Is that a mayfirst issue, a drupal issue, a civi issue?

Is there anything we/as NESRI should do regarding the group cache (and pardon this likely silly question, but does this have anything to do with our number of groups on civi)?

Thanks!

comment:9 Changed 3 years ago by https://id.mayfirst.org/jamie

Hi - All good questions and there aren't crystal clear answers.

It is almost certainly not a problem with Drupal.

It's possible that May First could tweak things like your MySQL database settings to get slightly better performance or we could add more memory to your guest that might help. However, in my experience hosting CiviCRM for PTP (via https://ourpowerbase.net) - where we have dedicated hardware with plenty of resources, we still see this issue quite frequently which suggests to me that it's a CiviCRM issue.

One tactic that has worked for CiviCRM installs with more than 500 groups, is to reduce the number of Groups you have - specifically the number of smart groups. That will help. Also, you might create non-smart groups for any smart groups that you use frequently, but are not likely to change - at least until the next version of CiviCRM comes out which might help (you can do a search on your smart group, then select all contacts and create a new regular group with them).

Other tips include:

  • Don't de-dupe during the day. De-duping is a performance killer and can contribute to the problem
  • Try to avoid creating smart groups made out of smart groups. Sometimes you really need to - but if you can avoid nesting to the degree possible it will help

Let us know if that helps or if there are any other questions you have.

jamie

comment:10 Changed 3 years ago by https://id.mayfirst.org/daveo

ahh, figured out how to get on this ticket.pasting what i tried to submit via email:

it does feel a bit faster at moments (though not sure if it's a placebo effect), but watching the logs while doing some more expensive searches (like hitting return on a blank search box) still causes a sql deadlock. Also running top while loading a page one can see mysqld and apache blow up beyond what seems right even on non some civi drupal pages when logged in.

One issue I've had was getting the bounce processing set up correctly (largely so it doesn't overwhelm the logs). I managed to reset the bounce user password and reconfigure it but getting these errors now when bounce processing i

Parameters parsed (and passed to API method): 
a:1:{s:7:"version";i:3;}

Full message: 
Finished execution of Fetch Bounces with result: Failure, Error message: An error occured while sending or receiving mail. Could not read from the stream. It was probably terminated by the host.

Having a hard time understanding what's happening on that front. Do you have recommended settings for reading the mayfirst email from civi?

comment:11 Changed 3 years ago by https://id.mayfirst.org/jamie

Hi - That's strange. Did you use mail.mayfirst.org as the server name? Also - I would try logging in via our webmail using the same credentials in PowerBase to ensure that they work properly.

We haven't had any known problems with this on other cvicrm sites.

Changed 3 years ago by https://id.mayfirst.org/jamie

comment:12 Changed 3 years ago by https://id.mayfirst.org/jamie

I just attached what our config looks like:

Changed 3 years ago by https://id.mayfirst.org/daveo

bounce settings

comment:13 Changed 3 years ago by https://id.mayfirst.org/daveo

thanks! looks pretty similar to what nesri is using. does anything look amiss? https://support.mayfirst.org/attachment/ticket/11794/settings.png

Also been seeing super slow load times still even loading drupal pages with some basic views (about 5-25 seconds) I noticed looking at processes while trying to load the homepage (about 25 secs on that one) that puppet hit the top of top several times and then some process called something like restart-needed. didn't have time to copy down the exact command, before it disappeared. is this a puppet thing? could it be the server or apache need restarting?

comment:14 Changed 3 years ago by https://id.mayfirst.org/jamie

Sorry for the slow response!

The slow down during a puppet call is fairly unusual - that means we are performaing server updates which will take a performance hit. That usually happens once or twice a month and usually during off hours (unless there is a security concern).

As for the bounce processing - can you login with those credentials via https://roundcube.mayfirst.org? If you don't have a plain text version of the password, you might try changing the password, wait 15 minutes, then login via roundcube. If that works, update the password in your CiviCRM config and re-run the schedule job to see if it works.

comment:15 Changed 3 years ago by https://id.mayfirst.org/jamie

btw - I just hit this exact same error on a different CiviCRM site and the cause was: no email had been delivered to that address so the Maildir was not yet setup. Once email was delivered, it went away.

Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.