Opened 6 years ago

Closed 6 years ago

#561 closed Bug/Something is broken (invalid)

leslie voluntarily re-synced its RAID arrays

Reported by: https://id.mayfirst.org/dkg Owned by: https://id.mayfirst.org/jamie
Priority: Medium Component: Tech
Keywords: leslie.mayfirst.org RAID Cc:
Sensitive: no

Description

during the latest round of kernel upgrades, we noticed that leslie decided to re-sync all of its RAID arrays at 2008-02-03 1:06:01 (Americas/New_York):

Feb  3 01:06:01 leslie kernel: md: syncing RAID array md0
Feb  3 01:06:01 leslie kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Feb  3 01:06:01 leslie kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Feb  3 01:06:01 leslie kernel: md: delaying resync of md1 until md0 has finished resync (they share one or more physical units)
Feb  3 01:06:01 leslie kernel: md: delaying resync of md2 until md0 has finished resync (they share one or more physical units)
Feb  3 01:06:01 leslie kernel: md: delaying resync of md3 until md0 has finished resync (they share one or more physical units)
Feb  3 01:06:01 leslie kernel: md: delaying resync of md1 until md2 has finished resync (they share one or more physical units)
Feb  3 01:06:01 leslie kernel: md: delaying resync of md2 until md3 has finished resync (they share one or more physical units)
Feb  3 01:06:01 leslie kernel: md: using 128k window, over a total of 2931712 blocks.

The disks in question are both 160GB SATA devices, reported by smartctl this way:

Model Family:     Seagate Barracuda 7200.7 and 7200.7 Plus family
Device Model:     ST3160023AS
Serial Number:    ********
Firmware Version: 8.12

But smartctl didn't report any other problem with the disks, though.

The RAID re-syncs completed by 03:21:48, so it wasn't that long that were running with only one disk.

It's worrisome that this happened without alerts, though, and it's also unclear what caused the re-sync in the first place.

Change History (4)

comment:1 Changed 6 years ago by https://id.mayfirst.org/dkg

I've just backed up the relevant data on leslie for later review if we want:

0 leslie:~# cp /var/log/kern.log.0 ticket561/
0 leslie:~# smartctl -d ata -a /dev/sda > ticket561/smartctl.sda
0 leslie:~# smartctl -d ata -a /dev/sdb > ticket561/smartctl.sdb
0 leslie:~# 

comment:2 Changed 6 years ago by https://id.mayfirst.org/jamie

The same thing happened on malcolm at the same time - making me wonder if there was a power surge.

0 malcolm:~# zgrep "kernel: md:" /var/log/syslog*
/var/log/syslog.0:Feb  3 01:06:02 malcolm kernel: md: syncing RAID array md0
/var/log/syslog.0:Feb  3 01:06:02 malcolm kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
/var/log/syslog.0:Feb  3 01:06:02 malcolm kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
/var/log/syslog.0:Feb  3 01:06:02 malcolm kernel: md: delaying resync of md1 until md0 has finished resync (they share one or more physical units)
/var/log/syslog.0:Feb  3 01:06:02 malcolm kernel: md: delaying resync of md2 until md1 has finished resync (they share one or more physical units)
/var/log/syslog.0:Feb  3 01:06:02 malcolm kernel: md: delaying resync of md3 until md2 has finished resync (they share one or more physical units)
/var/log/syslog.0:Feb  3 01:06:02 malcolm kernel: md: delaying resync of md2 until md1 has finished resync (they share one or more physical units)
/var/log/syslog.0:Feb  3 01:06:02 malcolm kernel: md: delaying resync of md1 until md2 has finished resync (they share one or more physical units)
/var/log/syslog.0:Feb  3 01:06:02 malcolm kernel: md: delaying resync of md2 until md0 has finished resync (they share one or more physical units)
/var/log/syslog.0:Feb  3 01:06:02 malcolm kernel: md: delaying resync of md3 until md2 has finished resync (they share one or more physical units)
/var/log/syslog.0:Feb  3 01:06:02 malcolm kernel: md: using 128k window, over a total of 4883648 blocks.
/var/log/syslog.0:Feb  3 01:07:37 malcolm kernel: md: md0: sync done.
/var/log/syslog.0:Feb  3 01:07:37 malcolm kernel: md: delaying resync of md3 until md2 has finished resync (they share one or more physical units)
/var/log/syslog.0:Feb  3 01:07:37 malcolm kernel: md: syncing RAID array md2
/var/log/syslog.0:Feb  3 01:07:37 malcolm kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
/var/log/syslog.0:Feb  3 01:07:37 malcolm kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
/var/log/syslog.0:Feb  3 01:07:37 malcolm kernel: md: using 128k window, over a total of 497920 blocks.
/var/log/syslog.0:Feb  3 01:07:37 malcolm kernel: md: delaying resync of md1 until md2 has finished resync (they share one or more physical units)
/var/log/syslog.0:Feb  3 01:07:44 malcolm kernel: md: md2: sync done.
/var/log/syslog.0:Feb  3 01:07:44 malcolm kernel: md: syncing RAID array md1
/var/log/syslog.0:Feb  3 01:07:44 malcolm kernel: md: delaying resync of md3 until md1 has finished resync (they share one or more physical units)
/var/log/syslog.0:Feb  3 01:07:44 malcolm kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
/var/log/syslog.0:Feb  3 01:07:44 malcolm kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
/var/log/syslog.0:Feb  3 01:07:44 malcolm kernel: md: using 128k window, over a total of 4883648 blocks.
/var/log/syslog.0:Feb  3 01:09:30 malcolm kernel: md: md1: sync done.
/var/log/syslog.0:Feb  3 01:09:30 malcolm kernel: md: syncing RAID array md3
/var/log/syslog.0:Feb  3 01:09:30 malcolm kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
/var/log/syslog.0:Feb  3 01:09:30 malcolm kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
/var/log/syslog.0:Feb  3 01:09:30 malcolm kernel: md: using 128k window, over a total of 145982528 blocks.
/var/log/syslog.0:Feb  3 02:32:55 malcolm kernel: md: md3: sync done.
1 malcolm:~#

Smartmon doesn't report errors either and reports disks as:

Device Model:     ST3160812AS
Firmware Version: 3.ADJ

comment:3 Changed 6 years ago by https://id.mayfirst.org/jamie

Backed up data for malcolm as well:

0 malcolm:~# mkdir ticket561
0 malcolm:~# cp /var/log/kern.log.0 ticket561/
0 malcolm:~# smartctl -d ata -a /dev/sda > ticket561/smartctl.sda
0 malcolm:~# smartctl -d ata -a /dev/sdb > ticket561/smartctl.sdb
0 malcolm:~#

comment:4 Changed 6 years ago by https://id.mayfirst.org/dkg

  • Resolution set to invalid
  • Status changed from new to closed
  • Summary changed from leslie voluntarily re-synced its RAID arrays and we don't know why. to leslie voluntarily re-synced its RAID arrays

Hrm. this was happening on chun also. Looking into /var/log/syslog.0, i can see that the resync appears to be triggered by a monthly mdadm cronjob that rescans the arrays:

if [ $cron = 1 ] && ! is_true ${AUTOCHECK:-false}; then
0 chun:~# grep mdadm /var/log/syslog.0
Feb  3 01:06:01 localhost /USR/SBIN/CRON[1707]: (root) CMD ([ -x /usr/share/mdadm/checkarray ] && [ $(date +%d) -le 7 ] && /usr/share/mdadm/checkarray --cron --all --quiet)
0 chun:~# cat /etc/cron.d/mdadm 
#
# cron.d/mdadm -- schedules periodic redundancy checks of MD devices
#
# Copyright © martin f. krafft <madduck@madduck.net>
# distributed under the terms of the Artistic Licence 2.0
#
# $Id: mdadm.cron.d 147 2006-08-30 09:26:11Z madduck $
#

# By default, run at 01:06 on every Sunday, but do nothing unless the day of
# the month is less than or equal to 7. Thus, only run on the first Sunday of
# each month. crontab(5) sucks, unfortunately, in this regard; therefore this
# hack (see #380425).
6 1 * * 0 root [ -x /usr/share/mdadm/checkarray ] && [ $(date +\%d) -le 7 ] && /usr/share/mdadm/checkarray --cron --all --quiet
0 chun:~# 

This apparently shows up as a RAID rebuild, but in fact is just a healthy redundancy check. I'm removing the "we don't know why" from the summary here, and closing this as "invalid" because it's not a problem -- just healthy, routine maintenance that i didn't recognize as such. Whew!

Please login to add comments to this ticket.

Note: See TracTickets for help on using tickets.