2006-08-12 06:54:09

by Chuck Ebbert

[permalink] [raw]
Subject: [bug?] raid1 integrity checking is broken on 2.6.18-rc4

Doing this on a raid1 array:
echo "check" >/sys/block/md0/md/sync_action

On 2.6.16.27:
Activity lights on both mirrors show activity for a while,
then the array status prints on the console.

On 2.6.18-rc4 + the below patch:
Drive activity light blinks once on one drive, then the
array status prints (obviously no checking takes place.)


Applied hotfix on 2.6.18-rc4:

--- .prev/drivers/md/md.c 2006-08-08 09:00:44.000000000 +1000
+++ ./drivers/md/md.c 2006-08-08 09:04:04.000000000 +1000
@@ -1597,6 +1597,19 @@ void md_update_sb(mddev_t * mddev)

repeat:
spin_lock_irq(&mddev->write_lock);
+
+ if (mddev->degraded && mddev->sb_dirty == 3)
+ /* If the array is degraded, then skipping spares is both
+ * dangerous and fairly pointless.
+ * Dangerous because a device that was removed from the array
+ * might have a event_count that still looks up-to-date,
+ * so it can be re-added without a resync.
+ * Pointless because if there are any spares to skip,
+ * then a recovery will happen and soon that array won't
+ * be degraded any more and the spare can go back to sleep then.
+ */
+ mddev->sb_dirty = 1;
+
sync_req = mddev->in_sync;
mddev->utime = get_seconds();
if (mddev->sb_dirty == 3)
--
Chuck


2006-08-12 09:13:13

by Justin Piszcz

[permalink] [raw]
Subject: Re: [bug?] raid1 integrity checking is broken on 2.6.18-rc4



On Sat, 12 Aug 2006, Chuck Ebbert wrote:

> Doing this on a raid1 array:
> echo "check" >/sys/block/md0/md/sync_action
>
> On 2.6.16.27:
> Activity lights on both mirrors show activity for a while,
> then the array status prints on the console.
>
> On 2.6.18-rc4 + the below patch:
> Drive activity light blinks once on one drive, then the
> array status prints (obviously no checking takes place.)
>
>
> Applied hotfix on 2.6.18-rc4:
>
> --- .prev/drivers/md/md.c 2006-08-08 09:00:44.000000000 +1000
> +++ ./drivers/md/md.c 2006-08-08 09:04:04.000000000 +1000
> @@ -1597,6 +1597,19 @@ void md_update_sb(mddev_t * mddev)
>
> repeat:
> spin_lock_irq(&mddev->write_lock);
> +
> + if (mddev->degraded && mddev->sb_dirty == 3)
> + /* If the array is degraded, then skipping spares is both
> + * dangerous and fairly pointless.
> + * Dangerous because a device that was removed from the array
> + * might have a event_count that still looks up-to-date,
> + * so it can be re-added without a resync.
> + * Pointless because if there are any spares to skip,
> + * then a recovery will happen and soon that array won't
> + * be degraded any more and the spare can go back to sleep then.
> + */
> + mddev->sb_dirty = 1;
> +
> sync_req = mddev->in_sync;
> mddev->utime = get_seconds();
> if (mddev->sb_dirty == 3)
> --
> Chuck
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>

Is there a doc for all of the options you can echo into the sync_action?
I'm assuming mdadm does these as well and echo is just another way to run
work with the array?

2006-08-12 11:59:46

by Michael Tokarev

[permalink] [raw]
Subject: Re: [bug?] raid1 integrity checking is broken on 2.6.18-rc4

Justin Piszcz wrote:
> Is there a doc for all of the options you can echo into the sync_action?
> I'm assuming mdadm does these as well and echo is just another way to
> run work with the array?

How about the obvious, Documentation/md.txt ?

And no, mdadm does not perform or trigger data integrity checking.

/mjt

2006-08-14 06:14:12

by NeilBrown

[permalink] [raw]
Subject: Re: [bug?] raid1 integrity checking is broken on 2.6.18-rc4

On Saturday August 12, [email protected] wrote:
> Doing this on a raid1 array:
> echo "check" >/sys/block/md0/md/sync_action
>
> On 2.6.16.27:
> Activity lights on both mirrors show activity for a while,
> then the array status prints on the console.
>
> On 2.6.18-rc4 + the below patch:
> Drive activity light blinks once on one drive, then the
> array status prints (obviously no checking takes place.)
>

Thanks for the report.
Easily duplicated, easily fixed.
I'll make sure this patch gets into 2.6.18.

Thanks again,
NeilBrown

Signed-off-by: Neil Brown <[email protected]>

diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c
--- .prev/drivers/md/raid1.c 2006-07-31 17:24:36.000000000 +1000
+++ ./drivers/md/raid1.c 2006-08-14 15:52:48.000000000 +1000
@@ -1644,15 +1644,16 @@ static sector_t sync_request(mddev_t *md
return 0;
}

- /* before building a request, check if we can skip these blocks..
- * This call the bitmap_start_sync doesn't actually record anything
- */
if (mddev->bitmap == NULL &&
mddev->recovery_cp == MaxSector &&
+ !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
conf->fullsync == 0) {
*skipped = 1;
return max_sector - sector_nr;
}
+ /* before building a request, check if we can skip these blocks..
+ * This call the bitmap_start_sync doesn't actually record anything
+ */
if (!bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, 1) &&
!conf->fullsync && !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
/* We can skip this block, and probably several more */

2006-08-17 20:06:30

by Chuck Ebbert

[permalink] [raw]
Subject: Re: [bug?] raid1 integrity checking is broken on 2.6.18-rc4

In-Reply-To: <[email protected]>

On Mon, 14 Aug 2006 16:14:06 +1000, Neil Brown wrote:

> > On 2.6.18-rc4 + the below patch:
> > Drive activity light blinks once on one drive, then the
> > array status prints (obviously no checking takes place.)
> >
>
> Thanks for the report.
> Easily duplicated, easily fixed.
> I'll make sure this patch gets into 2.6.18.
>
> Thanks again,
> NeilBrown
>

I just tried the patch and now it seems to be syncing the drives instead
of only checking them? (At the very least the message is misleading.)

# echo "check" >/sys/block/md0/md/sync_action
# dmesg | tail -9
md: syncing RAID array md0
md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
md: using 128k window, over a total of 104256 blocks.
md: md0: sync done.
RAID1 conf printout:
--- wd:2 rd:2
disk 0, wo:0, o:1, dev:hda9
disk 1, wo:0, o:1, dev:sda5


> Signed-off-by: Neil Brown <[email protected]>
>
> diff .prev/drivers/md/raid1.c ./drivers/md/raid1.c
> --- .prev/drivers/md/raid1.c 2006-07-31 17:24:36.000000000 +1000
> +++ ./drivers/md/raid1.c 2006-08-14 15:52:48.000000000 +1000
> @@ -1644,15 +1644,16 @@ static sector_t sync_request(mddev_t *md
> return 0;
> }
>
> - /* before building a request, check if we can skip these blocks..
> - * This call the bitmap_start_sync doesn't actually record anything
> - */
> if (mddev->bitmap == NULL &&
> mddev->recovery_cp == MaxSector &&
> + !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery) &&
> conf->fullsync == 0) {
> *skipped = 1;
> return max_sector - sector_nr;
> }
> + /* before building a request, check if we can skip these blocks..
> + * This call the bitmap_start_sync doesn't actually record anything
> + */
> if (!bitmap_start_sync(mddev->bitmap, sector_nr, &sync_blocks, 1) &&
> !conf->fullsync && !test_bit(MD_RECOVERY_REQUESTED, &mddev->recovery)) {
> /* We can skip this block, and probably several more */

--
Chuck

2006-08-28 03:47:22

by NeilBrown

[permalink] [raw]
Subject: Re: [bug?] raid1 integrity checking is broken on 2.6.18-rc4

On Thursday August 17, [email protected] wrote:
>
> I just tried the patch and now it seems to be syncing the drives instead
> of only checking them? (At the very least the message is misleading.)
>

Yes, the message is misleading. I should fix that.

NeilBrown


> # echo "check" >/sys/block/md0/md/sync_action
> # dmesg | tail -9
> md: syncing RAID array md0
> md: minimum _guaranteed_ reconstruction speed: 1000 KB/sec/disc.
> md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for reconstruction.
> md: using 128k window, over a total of 104256 blocks.
> md: md0: sync done.
> RAID1 conf printout:
> --- wd:2 rd:2
> disk 0, wo:0, o:1, dev:hda9
> disk 1, wo:0, o:1, dev:sda5
>