Hello!
I created a hybrid raid1 with one ssd and one hdd. Used writemostly and writebehind and put ext4 with discard enabled on it.
This setup worked quite well for the last months (last kernel was 3.7.6). But now as I booted 3.8.2 the hdd was dropped from the raid with:
> md/raid1:md2: Disk failure on sdb1, disabling device.
> md/raid1:md2: Operation continuing on 1 devices.
Re-adding this drive it will try to resync but the hdd will be dropped short time after. Now I remounted the device without the discard flag and the resync and usage works as before.
After remounting it again with discard enabled the hdd is dropped again. So I think this is the culprit as the hdd does obviously not support TRIM...
As it worked fine before, I think this is a regression? Or is this an intended change?
Thanks,
Markus
Hi!
Still the same with 3.8.4 ... anybody?
Best regards,
Markus
Markus schrieb am 12.03.2013:
> Hello!
>
> I created a hybrid raid1 with one ssd and one hdd. Used writemostly and writebehind and put ext4 with discard enabled on it.
> This setup worked quite well for the last months (last kernel was 3.7.6). But now as I booted 3.8.2 the hdd was dropped from the raid with:
> > md/raid1:md2: Disk failure on sdb1, disabling device.
> > md/raid1:md2: Operation continuing on 1 devices.
>
> Re-adding this drive it will try to resync but the hdd will be dropped short time after. Now I remounted the device without the discard flag and the resync and usage works as before.
> After remounting it again with discard enabled the hdd is dropped again. So I think this is the culprit as the hdd does obviously not support TRIM...
>
> As it worked fine before, I think this is a regression? Or is this an intended change?
>
>
> Thanks,
> Markus
Hi!
Now I had the time to bisect, started with 3.7 as good and 3.8 as bad.
0cfbcafcae8b7364b5fa96c2b26ccde7a3a296a9 is the bad commit. [1]
block: add plug for blkdev_issue_discard
While 3.8.10 was still bad, the same kernel with the reverted patch applied is fine.
I found another report. [2]
Thanks,
Markus
[1] https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=0cfbcafcae8b7364b5fa96c2b26ccde7a3a296a9
[2] http://www.spinics.net/lists/raid/msg42758.html
Markus schrieb am 24.03.2013:
> Hi!
>
> Still the same with 3.8.4 ... anybody?
>
> Best regards,
> Markus
>
> Markus schrieb am 12.03.2013:
> > Hello!
> >
> > I created a hybrid raid1 with one ssd and one hdd. Used writemostly and writebehind and put ext4 with discard enabled on it.
> > This setup worked quite well for the last months (last kernel was 3.7.6). But now as I booted 3.8.2 the hdd was dropped from the raid with:
> > > md/raid1:md2: Disk failure on sdb1, disabling device.
> > > md/raid1:md2: Operation continuing on 1 devices.
> >
> > Re-adding this drive it will try to resync but the hdd will be dropped short time after. Now I remounted the device without the discard flag and the resync and usage works as before.
> > After remounting it again with discard enabled the hdd is dropped again. So I think this is the culprit as the hdd does obviously not support TRIM...
> >
> > As it worked fine before, I think this is a regression? Or is this an intended change?
> >
> >
> > Thanks,
> > Markus
On Sat, Apr 27, 2013 at 06:29:49PM +0200, Markus wrote:
> Hi!
>
> Now I had the time to bisect, started with 3.7 as good and 3.8 as bad.
> 0cfbcafcae8b7364b5fa96c2b26ccde7a3a296a9 is the bad commit. [1]
> block: add plug for blkdev_issue_discard
>
> While 3.8.10 was still bad, the same kernel with the reverted patch applied is fine.
Thanks for the reporting. Does below patch work for you?
Thanks,
Shaohua
---
drivers/md/raid1.c | 4 ++++
1 file changed, 4 insertions(+)
Index: linux/drivers/md/raid1.c
===================================================================
--- linux.orig/drivers/md/raid1.c 2013-03-07 14:14:05.950824173 +0800
+++ linux/drivers/md/raid1.c 2013-04-28 08:52:06.761964780 +0800
@@ -981,6 +981,10 @@ static void raid1_unplug(struct blk_plug
while (bio) { /* submit pending writes */
struct bio *next = bio->bi_next;
bio->bi_next = NULL;
+ if (unlikely((bio->bi_rw & REQ_DISCARD) &&
+ !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
+ /* Just ignore it */
+ bio_endio(bio, 0);
generic_make_request(bio);
bio = next;
}
On Sun, Apr 28, 2013 at 08:54:46AM +0800, Shaohua Li wrote:
> On Sat, Apr 27, 2013 at 06:29:49PM +0200, Markus wrote:
> > Hi!
> >
> > Now I had the time to bisect, started with 3.7 as good and 3.8 as bad.
> > 0cfbcafcae8b7364b5fa96c2b26ccde7a3a296a9 is the bad commit. [1]
> > block: add plug for blkdev_issue_discard
> >
> > While 3.8.10 was still bad, the same kernel with the reverted patch applied is fine.
> Thanks for the reporting. Does below patch work for you?
Oops, there is a typo there, should be this one:
---
drivers/md/raid1.c | 7 ++++++-
drivers/md/raid10.c | 7 ++++++-
2 files changed, 12 insertions(+), 2 deletions(-)
Index: linux/drivers/md/raid1.c
===================================================================
--- linux.orig/drivers/md/raid1.c 2013-03-07 14:14:05.950824173 +0800
+++ linux/drivers/md/raid1.c 2013-04-28 08:57:17.874058434 +0800
@@ -981,7 +981,12 @@ static void raid1_unplug(struct blk_plug
while (bio) { /* submit pending writes */
struct bio *next = bio->bi_next;
bio->bi_next = NULL;
- generic_make_request(bio);
+ if (unlikely((bio->bi_rw & REQ_DISCARD) &&
+ !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
+ /* Just ignore it */
+ bio_endio(bio, 0);
+ else
+ generic_make_request(bio);
bio = next;
}
kfree(plug);
Index: linux/drivers/md/raid10.c
===================================================================
--- linux.orig/drivers/md/raid10.c 2013-03-07 14:14:05.950824173 +0800
+++ linux/drivers/md/raid10.c 2013-04-28 08:57:44.765719067 +0800
@@ -1133,7 +1133,12 @@ static void raid10_unplug(struct blk_plu
while (bio) { /* submit pending writes */
struct bio *next = bio->bi_next;
bio->bi_next = NULL;
- generic_make_request(bio);
+ if (unlikely((bio->bi_rw & REQ_DISCARD) &&
+ !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
+ /* Just ignore it */
+ bio_endio(bio, 0);
+ else
+ generic_make_request(bio);
bio = next;
}
kfree(plug);
Hi!
Thanks for your work. The patch seems to work for me on a vanilla 3.8.10, at
least the hdds are no longer dropped from the raid.
The code now ignores some request? What was the reason the disks fell off the
raid? The discards are still passed to the ssd?
Thanks,
Markus
Shaohua Li schrieb am 28.04.2013:
> On Sun, Apr 28, 2013 at 08:54:46AM +0800, Shaohua Li wrote:
> > On Sat, Apr 27, 2013 at 06:29:49PM +0200, Markus wrote:
> > > Hi!
> > >
> > > Now I had the time to bisect, started with 3.7 as good and 3.8 as bad.
> > > 0cfbcafcae8b7364b5fa96c2b26ccde7a3a296a9 is the bad commit. [1]
> > > block: add plug for blkdev_issue_discard
> > >
> > > While 3.8.10 was still bad, the same kernel with the reverted patch
applied is fine.
> > Thanks for the reporting. Does below patch work for you?
> Oops, there is a typo there, should be this one:
>
> ---
> drivers/md/raid1.c | 7 ++++++-
> drivers/md/raid10.c | 7 ++++++-
> 2 files changed, 12 insertions(+), 2 deletions(-)
>
> Index: linux/drivers/md/raid1.c
> ===================================================================
> --- linux.orig/drivers/md/raid1.c 2013-03-07 14:14:05.950824173 +0800
> +++ linux/drivers/md/raid1.c 2013-04-28 08:57:17.874058434 +0800
> @@ -981,7 +981,12 @@ static void raid1_unplug(struct blk_plug
> while (bio) { /* submit pending writes */
> struct bio *next = bio->bi_next;
> bio->bi_next = NULL;
> - generic_make_request(bio);
> + if (unlikely((bio->bi_rw & REQ_DISCARD) &&
> + !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
> + /* Just ignore it */
> + bio_endio(bio, 0);
> + else
> + generic_make_request(bio);
> bio = next;
> }
> kfree(plug);
> Index: linux/drivers/md/raid10.c
> ===================================================================
> --- linux.orig/drivers/md/raid10.c 2013-03-07 14:14:05.950824173 +0800
> +++ linux/drivers/md/raid10.c 2013-04-28 08:57:44.765719067 +0800
> @@ -1133,7 +1133,12 @@ static void raid10_unplug(struct blk_plu
> while (bio) { /* submit pending writes */
> struct bio *next = bio->bi_next;
> bio->bi_next = NULL;
> - generic_make_request(bio);
> + if (unlikely((bio->bi_rw & REQ_DISCARD) &&
> + !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
> + /* Just ignore it */
> + bio_endio(bio, 0);
> + else
> + generic_make_request(bio);
> bio = next;
> }
> kfree(plug);
On Sun, Apr 28, 2013 at 11:40:42AM +0200, Markus wrote:
> Hi!
>
> Thanks for your work. The patch seems to work for me on a vanilla 3.8.10, at
> least the hdds are no longer dropped from the raid.
> The code now ignores some request? What was the reason the disks fell off the
> raid? The discards are still passed to the ssd?
Thanks for testing, I'll send to Neil soon.
Yes, the discard will still be passed to SSD, we just ignore the request for harddisk.
Thanks,
Shaohua
> Thanks,
> Markus
>
>
> Shaohua Li schrieb am 28.04.2013:
> > On Sun, Apr 28, 2013 at 08:54:46AM +0800, Shaohua Li wrote:
> > > On Sat, Apr 27, 2013 at 06:29:49PM +0200, Markus wrote:
> > > > Hi!
> > > >
> > > > Now I had the time to bisect, started with 3.7 as good and 3.8 as bad.
> > > > 0cfbcafcae8b7364b5fa96c2b26ccde7a3a296a9 is the bad commit. [1]
> > > > block: add plug for blkdev_issue_discard
> > > >
> > > > While 3.8.10 was still bad, the same kernel with the reverted patch
> applied is fine.
> > > Thanks for the reporting. Does below patch work for you?
> > Oops, there is a typo there, should be this one:
> >
> > ---
> > drivers/md/raid1.c | 7 ++++++-
> > drivers/md/raid10.c | 7 ++++++-
> > 2 files changed, 12 insertions(+), 2 deletions(-)
> >
> > Index: linux/drivers/md/raid1.c
> > ===================================================================
> > --- linux.orig/drivers/md/raid1.c 2013-03-07 14:14:05.950824173 +0800
> > +++ linux/drivers/md/raid1.c 2013-04-28 08:57:17.874058434 +0800
> > @@ -981,7 +981,12 @@ static void raid1_unplug(struct blk_plug
> > while (bio) { /* submit pending writes */
> > struct bio *next = bio->bi_next;
> > bio->bi_next = NULL;
> > - generic_make_request(bio);
> > + if (unlikely((bio->bi_rw & REQ_DISCARD) &&
> > + !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
> > + /* Just ignore it */
> > + bio_endio(bio, 0);
> > + else
> > + generic_make_request(bio);
> > bio = next;
> > }
> > kfree(plug);
> > Index: linux/drivers/md/raid10.c
> > ===================================================================
> > --- linux.orig/drivers/md/raid10.c 2013-03-07 14:14:05.950824173 +0800
> > +++ linux/drivers/md/raid10.c 2013-04-28 08:57:44.765719067 +0800
> > @@ -1133,7 +1133,12 @@ static void raid10_unplug(struct blk_plu
> > while (bio) { /* submit pending writes */
> > struct bio *next = bio->bi_next;
> > bio->bi_next = NULL;
> > - generic_make_request(bio);
> > + if (unlikely((bio->bi_rw & REQ_DISCARD) &&
> > + !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
> > + /* Just ignore it */
> > + bio_endio(bio, 0);
> > + else
> > + generic_make_request(bio);
> > bio = next;
> > }
> > kfree(plug);