Opened 3 months ago

Last modified 2 months ago

#13866 assigned Bug/Something is broken

under stretch, mariadb is turning on innodb monitors

Reported by: https://id.mayfirst.org/jamie Owned by: https://id.mayfirst.org/jamie
Priority: Medium Component: Tech
Keywords: chavez hashmi Cc:
Sensitive: no

Description

innodb monitors means that mariadb writes copious information to /var/log/mysql/error.log every minute about the status of mariadb.

This causes nagios alerts.

In can be turned on and off via the innodb_status_output variable.

It's currently turned on on chavez and hashmi.

I suspect it was turned on automatically, which happens:

  • A long semaphore wait
  • InnoDB cannot find free blocks in the buffer pool
  • Over 67% of the buffer pool is occupied by lock heaps or the adaptive hash index

I'm turning it off on both hashmi and chavez via:

mysql -e "SET GLOBAL innodb_status_output=OFF"

I'll be curious to see if it keeps coming back on.

Change History (21)

comment:1 Changed 3 months ago by https://id.mayfirst.org/jamie

  • Owner set to https://id.mayfirst.org/jamie
  • Status changed from new to assigned

comment:2 Changed 2 months ago by https://id.mayfirst.org/jamie

I think this is going to be a problem on many servers as we upgrade.

On chavez it has come back, and it seems the cause is "innoDB cannot find free blocks in the buffer pool" as I found this in the error.log:

2018-07-10 12:12:49 7f9b669e3700  InnoDB: Warning: difficult to find free blocks in
InnoDB: the buffer pool (336 search iterations)!
InnoDB: 0 failed attempts to flush a page! Consider
InnoDB: increasing the buffer pool size.
InnoDB: It is also possible that in your Unix version
InnoDB: fsync is very slow, or completely frozen inside
InnoDB: the OS kernel. Then upgrading to a newer version
InnoDB: of your operating system may help. Look at the
InnoDB: number of fsyncs in diagnostic info below.
InnoDB: Pending flushes (fsync) log: 0; buffer pool: 1
InnoDB: 5800936 OS file reads, 15979334 OS file writes, 1825702 OS fsyncs
InnoDB: Starting InnoDB Monitor to print further
InnoDB: diagnostics to the standard output.

chavez's db is not on a ssd drive, and the innodb buffer pool size is: 256MB (which is probably grossly inadequate - especially since chavez has 20GB of RAM (really??).

I've just boosted the buffer pool size to 1GB and am restarting mysql. Monitoring to be continued...

comment:3 Changed 2 months ago by https://id.mayfirst.org/jamie

Now moving on to viewsic - the current error.log did not seem to indicate why it was turned on... but you can find that with:

zgrep 'Starting InnoDB Monitor to print further' /var/log/mysql/error.log*

And it shows the same message as chavez and also has only 256MB buffer pool which I am increasing to 1GB.

comment:4 Changed 2 months ago by https://id.mayfirst.org/jamie

Chavez immediately had the monitor turned back on. During the next off hours, I will boost it from 1GB to 2GB.

comment:5 Changed 2 months ago by https://id.mayfirst.org/jaimev

Should we begin increasing this value to 1G on all servers who have at least 6GB of ram?

comment:6 Changed 2 months ago by https://id.mayfirst.org/jamie

I would suggest that we:

  • Edit modules/mayfirst/manifests/m_mosh.pp and change the default $mysql_innodb_buffer_pool_size from 256MB to 1GB.
  • Edit the .pp file of any MOSH with less than 6GB of RAM and add a mysql_innodb_buffer_pool_size => 512MB to the pp file so we don't kill those servers by allowing MySQL to balloon like crazy
  • Edit all other .pp files that have a custom mysql_innodb_buffer_pool_size that is less than 1GB and remove that customization to allow it to take the new default of 1GB

But... puppet is configured to restart mysql when this configuration change happens, so I would suggest waiting until off hours, make all puppet changes, push to a single server and make it sure it works properly, and then sign a tag.

comment:7 Changed 2 months ago by https://id.mayfirst.org/jaimev

So from what I'm reading this change might also imply a change to innodb_log_file_size like in #12965 ?

The general rule of thumb seems to be that the preferred value size for the log_file_size is 25% of the innodb_buffer_pool_size.

This post argues differently:

https://www.percona.com/blog/2008/11/21/how-to-calculate-a-good-innodb-log-file-size/

This is the reference:

https://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_log_file_size

comment:8 Changed 2 months ago by https://id.mayfirst.org/jaimev

A more recent blog post by the same author suggests that there are advantages to increasing the innodb_log_file_size and in this example he goes with the 25% ratio.

https://www.percona.com/blog/2016/05/31/what-is-a-big-innodb_log_file_size/

MySQL tuners recommendations uses a calculus suggests that the innodb_log_file_size multiplied by the innodb_log_files_in_group (default 2) should be 25% of innodb_buffer_pool_size.

https://github.com/major/MySQLTuner-perl/issues/258#issuecomment-330470943

So following the calculation we would increase innodb_log_file_size to 128MB everywhere we are increasing innodb_buffer_pool_size to 1G. The above would require a clean shutdown to do properly.

On some servers we've already increased innodb_log_file_size by creating config files with the server name in modules/mayfirst/files/mysql-server/ that are copied by puppet to /etc/mysql/conf.d/

It seems like we should incorporate innodb_log_file_size as a variable we can set to create a single config file in /etc/mysql/conf.d/ using a template the same way we are doing with innodb-buffer-pool

 class mayfirst::m_mysql_server ( $innodb_buffer_pool_size = "256MB" ) 
 ... 
  m_mysql::config { "innodb-buffer-pool-size.cnf": 
    content => template("mayfirst/mysql-server/innodb-buffer-pool-size.cnf.erb")
  }

Could that become something like the following if we add the template?

 class mayfirst::m_mysql_server ( $innodb_buffer_pool_size = "256MB", $innodb_log_file_size = "48MB" ) 
 ... 
  m_mysql::config { "innodb-buffer-pool-size.cnf": 
    content => template("mayfirst/mysql-server/innodb-buffer-pool-size.cnf.erb")
  }
  m_mysql::config { "innodb-log-file-size.cnf": 
    content => template("mayfirst/mysql-server/innodb-log-file-size.cnf.erb")
  }

Then we could override both those default values with the higher values by setting them in the m_mosh.pp manifest and then for special cases directly in the mosh manifest file like is already done for innodb-buffer-pool-size for some servers.

Last edited 2 months ago by https://id.mayfirst.org/jaimev (previous) (diff)

comment:9 Changed 2 months ago by https://id.mayfirst.org/jaimev

I am going to go ahead and try this now. I've commented out the lines in the new template innodb-log-file-size.cnf.erb so it shouldn't affect anything.

comment:10 Changed 2 months ago by https://id.mayfirst.org/jaimev

Even with the lines commented out creating the new file still triggers a restart so that is something to be careful of. I would like to go forward with uncommenting these lines and ensure that all mosh servers that already have innodb_log_file_size set to anything other than the default value, have that same value set in the new variable in their manifest file to avoid any mismatch problems.

comment:11 Changed 2 months ago by https://id.mayfirst.org/jaimev

Ok, I've done the above for any servers already using a non-default innodb_log_file_size. In some cases this meant I could go ahead and remove a custom config file created under the same name of the server as the variables set there are already covered by the manifests now.

I'm not pushing this anywhere else yet. Something I think I have to be careful of are any non mosh servers that have set a innodb_log_file_size in some local file that isn't tracked by puppet. Aside from m_mosh it seems only m_dovecot references m_mysql_server. Also I see that variable is set in the file modules/mayfirst/files/mysql-server/optimize-ptp-civicrm.cnf It's not clear whether to me the default value of 48MB in m_mysql_server would cause a conflict there.

comment:12 Changed 2 months ago by https://id.mayfirst.org/jamie

Nice work on all of this. And I think your change to allow setting the log file size are excellent.

I think we should consider changing puppet so a change in these files does not trigger a mysql restart. I think in most cases we may want to standardize some of these settings and waiting for the next mysql restart is good enough (some of the settings can be changed via a query in the form of SET GLOBAL foo = bar; but that is probably more complicated to automate that it's worth and some variables are read-only unless set in the config file so this method won't work).

The only big exception here is the innodb log file - that one is really tricky and potentially dangerous to set without a restart. It not only requires a clean shutdown of mysql (as you mentioned), but you have to move the old /var/lib/mysql/ib_data{0,1} files out of the way before you start mysql again or mysql will fail. And, if you don't get a clean shutdown and you delete those files, then we will have data loss. So, it requires a bit of a dance: shutdown mysql, move files out of the way, set new value, start mysql, ensure it started properly, delete old files.

comment:13 Changed 2 months ago by https://id.mayfirst.org/jamie

So... perhaps our next steps could be:

  • Change puppet so a mysql restart is not triggered with a change in these files
  • Move forward with the innodb_buffer changes and sign a release
  • During off hours, run through the innodb log file changes on a one-by-one basis

Also, for the record, I boosted chavez innodb buffer pool size to 2GB this morning. Tomorrow the mysql error log will rotate and hopefully the innodb monitor will stay off.

comment:14 Changed 2 months ago by https://id.mayfirst.org/jaimev

Ok, after reading up a little on puppet syntax I think I found a simple way to choose which config files in /etc/mysql/conf.d will trigger a restart with a default of true.

However before going forward I think there is some cleaning up to do. Some mosh's are still using custom config files that aren't tracked by puppet but are referencing the innodb_buffer_pool_size or innodb_log_file_size. So I think I'm going to turn off restarts of mysql for all config file changes while I move things around, eliminate unneeded custom configs or if unique make sure they are tracked under puppet.

Then I think we'll be ready to go forward without any surprises.

comment:15 Changed 2 months ago by https://id.mayfirst.org/jaimev

So I've made the above changes and gone ahead with changing default innodb_buffer_pool_size committed and pushed to origin but for some reason I can't sign a new commit now.

git tag -s mfpl-puppet-3.68 mfpl-puppet-3.68
fatal: Failed to resolve 'mfpl-puppet-3.68' as a valid ref.

comment:16 Changed 2 months ago by https://id.mayfirst.org/jamie

I think you just want:

git tag -s mfpl-puppet-3.68

comment:17 Changed 2 months ago by https://id.mayfirst.org/jaimev

Oh! I see what I did now, the second reference is supposed to be the message , forgot the '-m' argument.

comment:18 Changed 2 months ago by https://id.mayfirst.org/jaimev

I went ahead and signed a tag that has set innodb_buffer_pool_size to 1G. I'm going to go ahead with changing the default innodb_log_file_size to 256MB which should allow for up to 2G innodb_buffer_pool_size within the recommended ratio. Then I will begin shutting down mysql manually and doing the dance to start with new logfile size for each mosh one by one.

comment:19 Changed 2 months ago by https://id.mayfirst.org/jaimev

Ok, well while the above didn't go as smoothly as I'd hoped with issues restarting some databases for various different reasons I was eventually able to get all mysql/maridb servers up and running with the new innodb values.

Instead of signing the tag that would push out the innodb_log_file_size I decided it would be safer to pull in the changes from puppet origin master manually when I was ready to change them. I did this for all mosh's to ensure the transition.

But now that this is done I want to sign the tag that will make the innodb_log_file_size final. I am having trouble signing the tag again, now with a different error.

~/mayfirst/puppet$ GIT_TRACE=1 git tag -s mfpl-puppet-3.69 -m mfpl-puppet-3.69
01:34:10.351201 git.c:371               trace: built-in: git 'tag' '-s' 'mfpl-puppet-3.69' '-m' 'mfpl-puppet-3.69'
01:34:10.357875 run-command.c:350       trace: run_command: 'gpg' '--status-fd=2' '-bsau' '0x2EEBCD9E3FE324F2'
error: gpg failed to sign the data
error: unable to sign the tag

comment:20 Changed 2 months ago by https://id.mayfirst.org/jaimev

Ok, resolved the signing error again, this time it was some issue with gpg-agent.

comment:21 Changed 2 months ago by https://id.mayfirst.org/jamie

Great work Jaime! This is huge move for us and I think will give us fairly significant improvements in disk i/o immediately.

Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.