2013-03-12 17:49:12

by Markus

[permalink] [raw]
Subject: hybrid raid1 with trim support

Hello!

I created a hybrid raid1 with one ssd and one hdd. Used writemostly and writebehind and put ext4 with discard enabled on it.
This setup worked quite well for the last months (last kernel was 3.7.6). But now as I booted 3.8.2 the hdd was dropped from the raid with:
> md/raid1:md2: Disk failure on sdb1, disabling device.
> md/raid1:md2: Operation continuing on 1 devices.

Re-adding this drive it will try to resync but the hdd will be dropped short time after. Now I remounted the device without the discard flag and the resync and usage works as before.
After remounting it again with discard enabled the hdd is dropped again. So I think this is the culprit as the hdd does obviously not support TRIM...

As it worked fine before, I think this is a regression? Or is this an intended change?


Thanks,
Markus


2013-03-24 09:22:30

by Markus

[permalink] [raw]
Subject: Re: hybrid raid1 with trim support

Hi!

Still the same with 3.8.4 ... anybody?

Best regards,
Markus

Markus schrieb am 12.03.2013:
> Hello!
>
> I created a hybrid raid1 with one ssd and one hdd. Used writemostly and writebehind and put ext4 with discard enabled on it.
> This setup worked quite well for the last months (last kernel was 3.7.6). But now as I booted 3.8.2 the hdd was dropped from the raid with:
> > md/raid1:md2: Disk failure on sdb1, disabling device.
> > md/raid1:md2: Operation continuing on 1 devices.
>
> Re-adding this drive it will try to resync but the hdd will be dropped short time after. Now I remounted the device without the discard flag and the resync and usage works as before.
> After remounting it again with discard enabled the hdd is dropped again. So I think this is the culprit as the hdd does obviously not support TRIM...
>
> As it worked fine before, I think this is a regression? Or is this an intended change?
>
>
> Thanks,
> Markus

2013-04-27 16:29:55

by Markus

[permalink] [raw]
Subject: Re: hybrid raid1 with trim support [REGRESSION]

Hi!

Now I had the time to bisect, started with 3.7 as good and 3.8 as bad.
0cfbcafcae8b7364b5fa96c2b26ccde7a3a296a9 is the bad commit. [1]
block: add plug for blkdev_issue_discard

While 3.8.10 was still bad, the same kernel with the reverted patch applied is fine.


I found another report. [2]


Thanks,
Markus

[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=0cfbcafcae8b7364b5fa96c2b26ccde7a3a296a9
[2] http://www.spinics.net/lists/raid/msg42758.html

Markus schrieb am 24.03.2013:
> Hi!
>
> Still the same with 3.8.4 ... anybody?
>
> Best regards,
> Markus
>
> Markus schrieb am 12.03.2013:
> > Hello!
> >
> > I created a hybrid raid1 with one ssd and one hdd. Used writemostly and writebehind and put ext4 with discard enabled on it.
> > This setup worked quite well for the last months (last kernel was 3.7.6). But now as I booted 3.8.2 the hdd was dropped from the raid with:
> > > md/raid1:md2: Disk failure on sdb1, disabling device.
> > > md/raid1:md2: Operation continuing on 1 devices.
> >
> > Re-adding this drive it will try to resync but the hdd will be dropped short time after. Now I remounted the device without the discard flag and the resync and usage works as before.
> > After remounting it again with discard enabled the hdd is dropped again. So I think this is the culprit as the hdd does obviously not support TRIM...
> >
> > As it worked fine before, I think this is a regression? Or is this an intended change?
> >
> >
> > Thanks,
> > Markus

2013-04-28 00:54:55

by Shaohua Li

[permalink] [raw]
Subject: Re: hybrid raid1 with trim support [REGRESSION]

On Sat, Apr 27, 2013 at 06:29:49PM +0200, Markus wrote:
> Hi!
>
> Now I had the time to bisect, started with 3.7 as good and 3.8 as bad.
> 0cfbcafcae8b7364b5fa96c2b26ccde7a3a296a9 is the bad commit. [1]
> block: add plug for blkdev_issue_discard
>
> While 3.8.10 was still bad, the same kernel with the reverted patch applied is fine.
Thanks for the reporting. Does below patch work for you?

Thanks,
Shaohua


---
drivers/md/raid1.c | 4 ++++
1 file changed, 4 insertions(+)

Index: linux/drivers/md/raid1.c
===================================================================
--- linux.orig/drivers/md/raid1.c 2013-03-07 14:14:05.950824173 +0800
+++ linux/drivers/md/raid1.c 2013-04-28 08:52:06.761964780 +0800
@@ -981,6 +981,10 @@ static void raid1_unplug(struct blk_plug
while (bio) { /* submit pending writes */
struct bio *next = bio->bi_next;
bio->bi_next = NULL;
+ if (unlikely((bio->bi_rw & REQ_DISCARD) &&
+ !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
+ /* Just ignore it */
+ bio_endio(bio, 0);
generic_make_request(bio);
bio = next;
}

2013-04-28 01:00:24

by Shaohua Li

[permalink] [raw]
Subject: Re: hybrid raid1 with trim support [REGRESSION]

On Sun, Apr 28, 2013 at 08:54:46AM +0800, Shaohua Li wrote:
> On Sat, Apr 27, 2013 at 06:29:49PM +0200, Markus wrote:
> > Hi!
> >
> > Now I had the time to bisect, started with 3.7 as good and 3.8 as bad.
> > 0cfbcafcae8b7364b5fa96c2b26ccde7a3a296a9 is the bad commit. [1]
> > block: add plug for blkdev_issue_discard
> >
> > While 3.8.10 was still bad, the same kernel with the reverted patch applied is fine.
> Thanks for the reporting. Does below patch work for you?
Oops, there is a typo there, should be this one:

---
drivers/md/raid1.c | 7 ++++++-
drivers/md/raid10.c | 7 ++++++-
2 files changed, 12 insertions(+), 2 deletions(-)

Index: linux/drivers/md/raid1.c
===================================================================
--- linux.orig/drivers/md/raid1.c 2013-03-07 14:14:05.950824173 +0800
+++ linux/drivers/md/raid1.c 2013-04-28 08:57:17.874058434 +0800
@@ -981,7 +981,12 @@ static void raid1_unplug(struct blk_plug
while (bio) { /* submit pending writes */
struct bio *next = bio->bi_next;
bio->bi_next = NULL;
- generic_make_request(bio);
+ if (unlikely((bio->bi_rw & REQ_DISCARD) &&
+ !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
+ /* Just ignore it */
+ bio_endio(bio, 0);
+ else
+ generic_make_request(bio);
bio = next;
}
kfree(plug);
Index: linux/drivers/md/raid10.c
===================================================================
--- linux.orig/drivers/md/raid10.c 2013-03-07 14:14:05.950824173 +0800
+++ linux/drivers/md/raid10.c 2013-04-28 08:57:44.765719067 +0800
@@ -1133,7 +1133,12 @@ static void raid10_unplug(struct blk_plu
while (bio) { /* submit pending writes */
struct bio *next = bio->bi_next;
bio->bi_next = NULL;
- generic_make_request(bio);
+ if (unlikely((bio->bi_rw & REQ_DISCARD) &&
+ !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
+ /* Just ignore it */
+ bio_endio(bio, 0);
+ else
+ generic_make_request(bio);
bio = next;
}
kfree(plug);

2013-04-28 09:40:47

by Markus

[permalink] [raw]
Subject: Re: hybrid raid1 with trim support [REGRESSION]

Hi!

Thanks for your work. The patch seems to work for me on a vanilla 3.8.10, at
least the hdds are no longer dropped from the raid.
The code now ignores some request? What was the reason the disks fell off the
raid? The discards are still passed to the ssd?


Thanks,
Markus


Shaohua Li schrieb am 28.04.2013:
> On Sun, Apr 28, 2013 at 08:54:46AM +0800, Shaohua Li wrote:
> > On Sat, Apr 27, 2013 at 06:29:49PM +0200, Markus wrote:
> > > Hi!
> > >
> > > Now I had the time to bisect, started with 3.7 as good and 3.8 as bad.
> > > 0cfbcafcae8b7364b5fa96c2b26ccde7a3a296a9 is the bad commit. [1]
> > > block: add plug for blkdev_issue_discard
> > >
> > > While 3.8.10 was still bad, the same kernel with the reverted patch
applied is fine.
> > Thanks for the reporting. Does below patch work for you?
> Oops, there is a typo there, should be this one:
>
> ---
> drivers/md/raid1.c | 7 ++++++-
> drivers/md/raid10.c | 7 ++++++-
> 2 files changed, 12 insertions(+), 2 deletions(-)
>
> Index: linux/drivers/md/raid1.c
> ===================================================================
> --- linux.orig/drivers/md/raid1.c 2013-03-07 14:14:05.950824173 +0800
> +++ linux/drivers/md/raid1.c 2013-04-28 08:57:17.874058434 +0800
> @@ -981,7 +981,12 @@ static void raid1_unplug(struct blk_plug
> while (bio) { /* submit pending writes */
> struct bio *next = bio->bi_next;
> bio->bi_next = NULL;
> - generic_make_request(bio);
> + if (unlikely((bio->bi_rw & REQ_DISCARD) &&
> + !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
> + /* Just ignore it */
> + bio_endio(bio, 0);
> + else
> + generic_make_request(bio);
> bio = next;
> }
> kfree(plug);
> Index: linux/drivers/md/raid10.c
> ===================================================================
> --- linux.orig/drivers/md/raid10.c 2013-03-07 14:14:05.950824173 +0800
> +++ linux/drivers/md/raid10.c 2013-04-28 08:57:44.765719067 +0800
> @@ -1133,7 +1133,12 @@ static void raid10_unplug(struct blk_plu
> while (bio) { /* submit pending writes */
> struct bio *next = bio->bi_next;
> bio->bi_next = NULL;
> - generic_make_request(bio);
> + if (unlikely((bio->bi_rw & REQ_DISCARD) &&
> + !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
> + /* Just ignore it */
> + bio_endio(bio, 0);
> + else
> + generic_make_request(bio);
> bio = next;
> }
> kfree(plug);

2013-04-28 10:10:42

by Shaohua Li

[permalink] [raw]
Subject: Re: hybrid raid1 with trim support [REGRESSION]

On Sun, Apr 28, 2013 at 11:40:42AM +0200, Markus wrote:
> Hi!
>
> Thanks for your work. The patch seems to work for me on a vanilla 3.8.10, at
> least the hdds are no longer dropped from the raid.
> The code now ignores some request? What was the reason the disks fell off the
> raid? The discards are still passed to the ssd?
Thanks for testing, I'll send to Neil soon.

Yes, the discard will still be passed to SSD, we just ignore the request for harddisk.

Thanks,
Shaohua

> Thanks,
> Markus
>
>
> Shaohua Li schrieb am 28.04.2013:
> > On Sun, Apr 28, 2013 at 08:54:46AM +0800, Shaohua Li wrote:
> > > On Sat, Apr 27, 2013 at 06:29:49PM +0200, Markus wrote:
> > > > Hi!
> > > >
> > > > Now I had the time to bisect, started with 3.7 as good and 3.8 as bad.
> > > > 0cfbcafcae8b7364b5fa96c2b26ccde7a3a296a9 is the bad commit. [1]
> > > > block: add plug for blkdev_issue_discard
> > > >
> > > > While 3.8.10 was still bad, the same kernel with the reverted patch
> applied is fine.
> > > Thanks for the reporting. Does below patch work for you?
> > Oops, there is a typo there, should be this one:
> >
> > ---
> > drivers/md/raid1.c | 7 ++++++-
> > drivers/md/raid10.c | 7 ++++++-
> > 2 files changed, 12 insertions(+), 2 deletions(-)
> >
> > Index: linux/drivers/md/raid1.c
> > ===================================================================
> > --- linux.orig/drivers/md/raid1.c 2013-03-07 14:14:05.950824173 +0800
> > +++ linux/drivers/md/raid1.c 2013-04-28 08:57:17.874058434 +0800
> > @@ -981,7 +981,12 @@ static void raid1_unplug(struct blk_plug
> > while (bio) { /* submit pending writes */
> > struct bio *next = bio->bi_next;
> > bio->bi_next = NULL;
> > - generic_make_request(bio);
> > + if (unlikely((bio->bi_rw & REQ_DISCARD) &&
> > + !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
> > + /* Just ignore it */
> > + bio_endio(bio, 0);
> > + else
> > + generic_make_request(bio);
> > bio = next;
> > }
> > kfree(plug);
> > Index: linux/drivers/md/raid10.c
> > ===================================================================
> > --- linux.orig/drivers/md/raid10.c 2013-03-07 14:14:05.950824173 +0800
> > +++ linux/drivers/md/raid10.c 2013-04-28 08:57:44.765719067 +0800
> > @@ -1133,7 +1133,12 @@ static void raid10_unplug(struct blk_plu
> > while (bio) { /* submit pending writes */
> > struct bio *next = bio->bi_next;
> > bio->bi_next = NULL;
> > - generic_make_request(bio);
> > + if (unlikely((bio->bi_rw & REQ_DISCARD) &&
> > + !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
> > + /* Just ignore it */
> > + bio_endio(bio, 0);
> > + else
> > + generic_make_request(bio);
> > bio = next;
> > }
> > kfree(plug);