2007-09-03 07:56:23

by Xavier Bestel

[permalink] [raw]
Subject: very very strange simultaneous RAID resync on sep 2, 01:06 CEST (+2)

Hi,

I have a server running with RAID5 disks, under debian/stable, kernel
2.6.18-5-686. Yesterday the RAID resync'd for no apparent reason,
without even mdamd sending a mail to warn about that:

Sep 2 01:06:01 awak kernel: md: syncing RAID array md0
Sep 2 01:06:01 awak kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Sep 2 01:06:01 awak kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Sep 2 01:06:01 awak kernel: md: using 128k window, over a total of 48064 blocks.
Sep 2 01:06:01 awak kernel: md: delaying resync of md1 until md0 has finished resync (they share one or more physical units)
Sep 2 01:06:01 awak kernel: md: delaying resync of md2 until md0 has finished resync (they share one or more physical units)
Sep 2 01:06:05 awak kernel: md: md0: sync done.
Sep 2 01:06:05 awak kernel: RAID1 conf printout:
Sep 2 01:06:05 awak kernel: --- wd:3 rd:3
Sep 2 01:06:05 awak kernel: disk 0, wo:0, o:1, dev:hda1
Sep 2 01:06:05 awak kernel: disk 1, wo:0, o:1, dev:hde1
Sep 2 01:06:05 awak kernel: disk 2, wo:0, o:1, dev:hdc1
Sep 2 01:06:05 awak kernel: md: delaying resync of md2 until md1 has finished resync (they share one or more physical units)
Sep 2 01:06:05 awak kernel: md: syncing RAID array md1
Sep 2 01:06:05 awak kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Sep 2 01:06:05 awak kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Sep 2 01:06:05 awak kernel: md: using 128k window, over a total of 10000384 blocks.

In itself, this event is already strange. But what's even stranger is
that another guy had the same resync exactely at the same time (all
times are CEST (+0200)):

Sep 2 01:06:01 in22 kernel: md: syncing RAID array md0
Sep 2 01:06:01 in22 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Sep 2 01:06:01 in22 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Sep 2 01:06:01 in22 kernel: md: using 128k window, over a total of 1003904 blocks.
Sep 2 01:06:01 in22 kernel: md: delaying resync of md1 until md0 has finished resync (they share one or more physical units)
Sep 2 01:06:01 in22 kernel: md: delaying resync of md2 until md0 has finished resync (they share one or more physical units)
Sep 2 01:06:01 in22 kernel: md: delaying resync of md1 until md0 has finished resync (they share one or more physical units)
Sep 2 01:06:01 in22 kernel: md: delaying resync of md3 until md0 has finished resync (they share one or more physical units)
Sep 2 01:06:01 in22 kernel: md: delaying resync of md1 until md0 has finished resync (they share one or more physical units)
Sep 2 01:06:01 in22 kernel: md: delaying resync of md2 until md3 has finished resync (they share one or more physical units)
Sep 2 01:06:39 in22 kernel: md: md0: sync done.
Sep 2 01:06:39 in22 kernel: md: delaying resync of md2 until md3 has finished resync (they share one or more physical units)
Sep 2 01:06:39 in22 kernel: md: syncing RAID array md1
Sep 2 01:06:39 in22 kernel: md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
Sep 2 01:06:39 in22 kernel: md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
Sep 2 01:06:39 in22 kernel: md: using 128k window, over a total of 7004224 blocks.
Sep 2 01:06:39 in22 kernel: md: delaying resync of md3 until md1 has finished resync (they share one or more physical units)
Sep 2 01:06:39 in22 kernel: RAID1 conf printout:
Sep 2 01:06:39 in22 kernel: --- wd:2 rd:2
Sep 2 01:06:39 in22 kernel: disk 0, wo:0, o:1, dev:hda1
Sep 2 01:06:39 in22 kernel: disk 1, wo:0, o:1, dev:hdb1
(reboot)

I'm still gathering informations (no idea what his disks are, etc.), but
does anyone have the same problem ? Does anyone know where it can come
from (debian trouble, md bug, drive firmware problem, rootkit, ..) and
how I can pinpoint that ?

Thanks,
Xav



2007-09-03 08:06:38

by Xavier Bestel

[permalink] [raw]
Subject: forget the noise (Re: very very strange simultaneous RAID resync on sep 2, 01:06 CEST (+2))

On Mon, 2007-09-03 at 09:56 +0200, Xavier Bestel wrote:
> In itself, this event is already strange. But what's even stranger is
> that another guy had the same resync exactely at the same time

That mystery is solved, see /etc/cron.d/mdadm:

# By default, run at 01:06 on every Sunday, but do nothing unless the day of
# the month is less than or equal to 7. Thus, only run on the first Sunday of
# each month. crontab(5) sucks, unfortunately, in this regard; therefore this
# hack (see #380425).
6 1 * * 0 root [ -x /usr/share/mdadm/checkarray ] && [ $(date +\%d) -le 7 ] && /usr/share/mdadm/checkarray --cron --all --quiet

Sorry for the noise.

Xav


2007-09-03 08:07:52

by Patrick Mau

[permalink] [raw]
Subject: Re: very very strange simultaneous RAID resync on sep 2, 01:06 CEST (+2)

On Mon, Sep 03, 2007 at 09:56:10AM +0200, Xavier Bestel wrote:
> Hi,

Hi Xavier

> I have a server running with RAID5 disks, under debian/stable, kernel
> 2.6.18-5-686. Yesterday the RAID resync'd for no apparent reason,
> without even mdamd sending a mail to warn about that:
>
> Sep 2 01:06:01 awak kernel: md: syncing RAID array md0

[snip]

> I'm still gathering informations (no idea what his disks are, etc.), but
> does anyone have the same problem ? Does anyone know where it can come
> from (debian trouble, md bug, drive firmware problem, rootkit, ..) and
> how I can pinpoint that ?

My debian installation has a system cronjob that will perform a resync
every first Sunday morning at 1:06 AM:

[root@oscar] cat /etc/cron.d/mdadm
...
6 1 * * 0 root [ -x /usr/share/mdadm/checkarray ] && [ $(date +\%d) -le 7 ] && /usr/share/mdadm/checkarray --cron --all --quiet

I did not read the manpage, but my guess is that 'quiet' will suppress
the mail notification.

Regards,
Patrick

2007-09-03 08:13:57

by Xavier Bestel

[permalink] [raw]
Subject: [OT] Re: very very strange simultaneous RAID resync on sep 2, 01:06 CEST (+2)

On Mon, 2007-09-03 at 10:06 +0200, Patrick Mau wrote:
> My debian installation has a system cronjob that will perform a resync
> every first Sunday morning at 1:06 AM:
>
> [root@oscar] cat /etc/cron.d/mdadm
> ...
> 6 1 * * 0 root [ -x /usr/share/mdadm/checkarray ] && [ $(date +\%d) -
> le 7 ] && /usr/share/mdadm/checkarray --cron --all --quiet
>
> I did not read the manpage, but my guess is that 'quiet' will suppress
> the mail notification.

Yes, that was it, checkarray leaves traces in the syslog.
Now I'm really ashamed I jumped on my mailer before using what's left of
my braincells. Could I take it back please ?

Thanks,

Xav


2007-09-03 09:39:58

by Justin Piszcz

[permalink] [raw]
Subject: Re: very very strange simultaneous RAID resync on sep 2, 01:06 CEST (+2)



On Mon, 3 Sep 2007, Xavier Bestel wrote:

> Hi,
>
> I have a server running with RAID5 disks, under debian/stable, kernel
> 2.6.18-5-686. Yesterday the RAID resync'd for no apparent reason,
> without even mdamd sending a mail to warn about that:

This is normal, you probably are running Debian(?) or a Debian-derived
distribution and the checkarray script runs once a month by default I
believe.

Justin.