2009-09-09 16:47:02

by Ed L. Cashin

[permalink] [raw]
Subject: [PATCH] aoe: end barrier bios with EOPNOTSUPP

BugLink: http://bugzilla.kernel.org/show_bug.cgi?id=13942

Bruno Premont noticed that aoe throws a BUG during umount of an XFS in
2.6.31:

[ 5259.349897] aoe: bi_io_vec is NULL
[ 5259.349940] ------------[ cut here ]------------
[ 5259.349958] kernel BUG at /usr/src/linux-2.6/drivers/block/aoe/aoeblk.c:177!
[ 5259.349990] invalid opcode: 0000 [#1]

The bio in question is a barrier. Jens Axboe suggested that such bios
need to be recognized and ended with -EOPNOTSUPP by any driver that
provides its own ->make_request_fn handler and does not handle
barriers.

In testing the changes below eliminate the BUG.

Signed-off-by: Ed L. Cashin <[email protected]>
---
After this patch has been reviewed I plan to add this patch to
the aoe quilt tree for linux-next.

drivers/block/aoe/aoeblk.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 2307a27..d6806fb 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -172,6 +172,9 @@ aoeblk_make_request(struct request_queue *q, struct bio *bio)
BUG();
bio_endio(bio, -ENXIO);
return 0;
+ } else if (bio_barrier(bio)) {
+ bio_endio(bio, -EOPNOTSUPP);
+ return 0;
} else if (bio->bi_io_vec == NULL) {
printk(KERN_ERR "aoe: bi_io_vec is NULL\n");
BUG();
--
1.5.6.5


--
Ed Cashin
http://noserose.net/e/
http://www.coraid.com/


2009-09-09 16:50:22

by Ed L. Cashin

[permalink] [raw]
Subject: Re: [PATCH] aoe: end barrier bios with EOPNOTSUPP

On Wed Sep 9 12:45:17 EDT 2009, [email protected] wrote:
...
> + } else if (bio_barrier(bio)) {
> + bio_endio(bio, -EOPNOTSUPP);
> + return 0;
> } else if (bio->bi_io_vec == NULL) {
> printk(KERN_ERR "aoe: bi_io_vec is NULL\n");
> BUG();

Bah. I should have run checkpatch.pl. I will re-do this patch using
real tabs.

--
Ed

2009-09-09 16:51:08

by Daniel Walker

[permalink] [raw]
Subject: Re: [PATCH] aoe: end barrier bios with EOPNOTSUPP

On Wed, 2009-09-09 at 12:45 -0400, Ed Cashin wrote:
> --- a/drivers/block/aoe/aoeblk.c
> +++ b/drivers/block/aoe/aoeblk.c
> @@ -172,6 +172,9 @@ aoeblk_make_request(struct request_queue *q,
> struct bio *bio)
> BUG();
> bio_endio(bio, -ENXIO);
> return 0;
> + } else if (bio_barrier(bio)) {
> + bio_endio(bio, -EOPNOTSUPP);
> + return 0;
> } else if (bio->bi_io_vec == NULL) {
> printk(KERN_ERR "aoe: bi_io_vec is NULL\n");
> BUG();

Could you run this through script/checkpatch.pl and fix any errors you
find.. It looks like your not indenting properly ..

Daniel

2009-09-09 17:06:59

by Alan

[permalink] [raw]
Subject: Re: [PATCH] aoe: end barrier bios with EOPNOTSUPP

> The bio in question is a barrier. Jens Axboe suggested that such bios
> need to be recognized and ended with -EOPNOTSUPP by any driver that
> provides its own ->make_request_fn handler and does not handle
> barriers.
>
> In testing the changes below eliminate the BUG.

Presumably AoE should actually issue an ATA cache flush for this case ?

2009-09-09 18:02:11

by Ed L. Cashin

[permalink] [raw]
Subject: Re: [PATCH] aoe: end barrier bios with EOPNOTSUPP

On Wed Sep 9 13:06:42 EDT 2009, [email protected] wrote:
> > The bio in question is a barrier. Jens Axboe suggested that such bios
> > need to be recognized and ended with -EOPNOTSUPP by any driver that
> > provides its own ->make_request_fn handler and does not handle
> > barriers.
> >
> > In testing the changes below eliminate the BUG.
>
> Presumably AoE should actually issue an ATA cache flush for this case ?

Yes. The aoe driver juggles a set of in-process I/O operations that
have been sent to the AoE target but have not yet received responses.

To implement the barrier, it would stop generating new AoE write
commands, wait for AoE write responses for all the outstanding write
commands, issue the ATA cache flush command, wait for the response to
the flush, and then resume normal activity.

--
Ed

2009-09-09 19:04:23

by Ed L. Cashin

[permalink] [raw]
Subject: Re: [PATCH] aoe: end barrier bios with EOPNOTSUPP

aoe: end barrier bios with EOPNOTSUPP

BugLink: http://bugzilla.kernel.org/show_bug.cgi?id=13942

Bruno Premont noticed that aoe throws a BUG during umount of an XFS in
2.6.31:

[ 5259.349897] aoe: bi_io_vec is NULL
[ 5259.349940] ------------[ cut here ]------------
[ 5259.349958] kernel BUG at /usr/src/linux-2.6/drivers/block/aoe/aoeblk.c:177!
[ 5259.349990] invalid opcode: 0000 [#1]

The bio in question is a barrier. Jens Axboe suggested that such bios
need to be recognized and ended with -EOPNOTSUPP by any driver that
provides its own ->make_request_fn handler and does not handle
barriers.

In testing the changes below eliminate the BUG.

Signed-off-by: Ed L. Cashin <[email protected]>
---
After this patch has been reviewed I plan to add this patch to
the aoe quilt tree for linux-next.

drivers/block/aoe/aoeblk.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 2307a27..22efb33 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -172,6 +172,9 @@ aoeblk_make_request(struct request_queue *q, struct bio *bio)
BUG();
bio_endio(bio, -ENXIO);
return 0;
+ } else if (bio_barrier(bio)) {
+ bio_endio(bio, -EOPNOTSUPP);
+ return 0;
} else if (bio->bi_io_vec == NULL) {
printk(KERN_ERR "aoe: bi_io_vec is NULL\n");
BUG();
--
1.5.6.5


--
Ed Cashin <[email protected]>
http://noserose.net/e/
http://www.coraid.com/

2009-09-09 20:58:34

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] aoe: end barrier bios with EOPNOTSUPP

On Wed, Sep 09 2009, Ed Cashin wrote:
> On Wed Sep 9 13:06:42 EDT 2009, [email protected] wrote:
> > > The bio in question is a barrier. Jens Axboe suggested that such bios
> > > need to be recognized and ended with -EOPNOTSUPP by any driver that
> > > provides its own ->make_request_fn handler and does not handle
> > > barriers.
> > >
> > > In testing the changes below eliminate the BUG.
> >
> > Presumably AoE should actually issue an ATA cache flush for this case ?
>
> Yes. The aoe driver juggles a set of in-process I/O operations that
> have been sent to the AoE target but have not yet received responses.
>
> To implement the barrier, it would stop generating new AoE write
> commands, wait for AoE write responses for all the outstanding write
> commands, issue the ATA cache flush command, wait for the response to
> the flush, and then resume normal activity.

That's how barriers work with sata to begin with, so I'd very much
recommend that you do the same in aoe instead of just not supporting it.

--
Jens Axboe

2009-09-09 21:24:53

by Ed L. Cashin

[permalink] [raw]
Subject: Re: [PATCH] aoe: end barrier bios with EOPNOTSUPP

On Wed Sep 9 16:59:30 EDT 2009, [email protected] wrote:
> On Wed, Sep 09 2009, Ed Cashin wrote:
...
> > To implement the barrier, [aoe] would stop generating new AoE write
> > commands, wait for AoE write responses for all the outstanding write
> > commands, issue the ATA cache flush command, wait for the response to
> > the flush, and then resume normal activity.
>
> That's how barriers work with sata to begin with, so I'd very much
> recommend that you do the same in aoe instead of just not supporting it.

This patch with EOPNOTSUPP just fixes the regression in 2.6.31-rc9.

A patch adding support for barriers would be nice. Since I have
limited time I tried to motivate that work by using the torture tests
that Chris Mason (I believe it was) posted a while ago, but I could
not get any problems to manifest using that torture test.

If I had a reproducable problem case that the barrier implementation
could fix, it would be very helpful.

--
Ed

2009-09-10 07:48:19

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] aoe: end barrier bios with EOPNOTSUPP

On Wed, Sep 09 2009, Ed Cashin wrote:
> On Wed Sep 9 16:59:30 EDT 2009, [email protected] wrote:
> > On Wed, Sep 09 2009, Ed Cashin wrote:
> ...
> > > To implement the barrier, [aoe] would stop generating new AoE write
> > > commands, wait for AoE write responses for all the outstanding write
> > > commands, issue the ATA cache flush command, wait for the response to
> > > the flush, and then resume normal activity.
> >
> > That's how barriers work with sata to begin with, so I'd very much
> > recommend that you do the same in aoe instead of just not supporting it.
>
> This patch with EOPNOTSUPP just fixes the regression in 2.6.31-rc9.
>
> A patch adding support for barriers would be nice. Since I have
> limited time I tried to motivate that work by using the torture tests
> that Chris Mason (I believe it was) posted a while ago, but I could
> not get any problems to manifest using that torture test.
>
> If I had a reproducable problem case that the barrier implementation
> could fix, it would be very helpful.

Depending on timing and the other end, it may not be easy to reproduce.
But the problem is indeed real, so I think you should just add the
proper barrier implementation instead of spending too much time trying
to create a reproducable problem. By the time you get there, you could
have fixed it many times over :-)

--
Jens Axboe

2009-09-10 15:17:56

by Ed L. Cashin

[permalink] [raw]
Subject: Re: [PATCH] aoe: end barrier bios with EOPNOTSUPP

On Thu Sep 10 03:48:22 EDT 2009, [email protected] wrote:
> On Wed, Sep 09 2009, Ed Cashin wrote:
...
> > If I had a reproducable problem case that the barrier implementation
> > could fix, it would be very helpful.
>
> Depending on timing and the other end, it may not be easy to reproduce.
> But the problem is indeed real, so I think you should just add the
> proper barrier implementation instead of spending too much time trying
> to create a reproducable problem. By the time you get there, you could
> have fixed it many times over :-)

OK. I will start work on it. But if anybody can give me a torture
test that works on aoe devices, I'll be grateful.

I think the EOPNOTSUPP patch is good for the for-next aoe quilt tree,
while the new barrier support will be good for 2.6.32.

--
Ed

2009-09-10 19:50:20

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] aoe: end barrier bios with EOPNOTSUPP

On Thu, Sep 10 2009, Ed Cashin wrote:
> On Thu Sep 10 03:48:22 EDT 2009, [email protected] wrote:
> > On Wed, Sep 09 2009, Ed Cashin wrote:
> ...
> > > If I had a reproducable problem case that the barrier implementation
> > > could fix, it would be very helpful.
> >
> > Depending on timing and the other end, it may not be easy to reproduce.
> > But the problem is indeed real, so I think you should just add the
> > proper barrier implementation instead of spending too much time trying
> > to create a reproducable problem. By the time you get there, you could
> > have fixed it many times over :-)
>
> OK. I will start work on it. But if anybody can give me a torture
> test that works on aoe devices, I'll be grateful.
>
> I think the EOPNOTSUPP patch is good for the for-next aoe quilt tree,
> while the new barrier support will be good for 2.6.32.

.31 is already out, so there's plenty of time to fix it for .32 for
real. I don't mind putting in the temporary fix in the mean time,
though.

--
Jens Axboe

2009-09-10 20:04:51

by Ed L. Cashin

[permalink] [raw]
Subject: Re: [PATCH] aoe: end barrier bios with EOPNOTSUPP

On Thu Sep 10 15:50:23 EDT 2009, [email protected] wrote:
> On Thu, Sep 10 2009, Ed Cashin wrote:
...
> > I think the EOPNOTSUPP patch is good for the for-next aoe quilt tree,
> > while the new barrier support will be good for 2.6.32.
>
> .31 is already out, so there's plenty of time to fix it for .32 for
> real. I don't mind putting in the temporary fix in the mean time,
> though.

I missed that. It was still rc9 when I pulled earlier today, so I put
the temporary fix into the aoe quilt tree for linux-next.

Now that 2.6.31 is out, I suppose this patch should be taken out of
the aoe quilt tree for linux-next and submitted for inclusion in
2.6.31.1.

So, yes, if you don't mind pushing the patch to the appropriate place,
that would be great.

Now I have to figure out why the other patch in aoe's quilt tree for
linux-next never made it to 2.6.31.

--
Ed

2009-09-10 20:07:34

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] aoe: end barrier bios with EOPNOTSUPP

On Thu, Sep 10 2009, Ed Cashin wrote:
> On Thu Sep 10 15:50:23 EDT 2009, [email protected] wrote:
> > On Thu, Sep 10 2009, Ed Cashin wrote:
> ...
> > > I think the EOPNOTSUPP patch is good for the for-next aoe quilt tree,
> > > while the new barrier support will be good for 2.6.32.
> >
> > .31 is already out, so there's plenty of time to fix it for .32 for
> > real. I don't mind putting in the temporary fix in the mean time,
> > though.
>
> I missed that. It was still rc9 when I pulled earlier today, so I put
> the temporary fix into the aoe quilt tree for linux-next.
>
> Now that 2.6.31 is out, I suppose this patch should be taken out of
> the aoe quilt tree for linux-next and submitted for inclusion in
> 2.6.31.1.
>
> So, yes, if you don't mind pushing the patch to the appropriate place,
> that would be great.

Did you repost a non-whitespace damaged version?

> Now I have to figure out why the other patch in aoe's quilt tree for
> linux-next never made it to 2.6.31.

Things don't go from -next to mainstream automatically, -next is just a
testing base. You have to submit it explicitly, eg send it to me.

--
Jens Axboe

2009-09-10 20:22:11

by Ed L. Cashin

[permalink] [raw]
Subject: Re: [PATCH] aoe: end barrier bios with EOPNOTSUPP

On Thu Sep 10 16:07:37 EDT 2009, [email protected] wrote:
> On Thu, Sep 10 2009, Ed Cashin wrote:
...
> > So, yes, if you don't mind pushing the patch to the appropriate place,
> > that would be great.
>
> Did you repost a non-whitespace damaged version?

Yes. I'll post it again as a response to this email.

> > Now I have to figure out why the other patch in aoe's quilt tree for
> > linux-next never made it to 2.6.31.
>
> Things don't go from -next to mainstream automatically, -next is just a
> testing base. You have to submit it explicitly, eg send it to me.

Ah. Thanks.

--
Ed

2009-09-10 20:27:47

by Ed L. Cashin

[permalink] [raw]
Subject: Re: [PATCH] aoe: end barrier bios with EOPNOTSUPP

aoe: end barrier bios with EOPNOTSUPP

BugLink: http://bugzilla.kernel.org/show_bug.cgi?id=13942

Bruno Premont noticed that aoe throws a BUG during umount of an XFS in
2.6.31:

[ 5259.349897] aoe: bi_io_vec is NULL
[ 5259.349940] ------------[ cut here ]------------
[ 5259.349958] kernel BUG at /usr/src/linux-2.6/drivers/block/aoe/aoeblk.c:177!
[ 5259.349990] invalid opcode: 0000 [#1]

The bio in question is a barrier. Jens Axboe suggested that such bios
need to be recognized and ended with -EOPNOTSUPP by any driver that
provides its own ->make_request_fn handler and does not handle
barriers.

In testing the changes below eliminate the BUG.

Signed-off-by: Ed L. Cashin <[email protected]>
---
drivers/block/aoe/aoeblk.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
index 2307a27..22efb33 100644
--- a/drivers/block/aoe/aoeblk.c
+++ b/drivers/block/aoe/aoeblk.c
@@ -172,6 +172,9 @@ aoeblk_make_request(struct request_queue *q, struct bio *bio)
BUG();
bio_endio(bio, -ENXIO);
return 0;
+ } else if (bio_barrier(bio)) {
+ bio_endio(bio, -EOPNOTSUPP);
+ return 0;
} else if (bio->bi_io_vec == NULL) {
printk(KERN_ERR "aoe: bi_io_vec is NULL\n");
BUG();
--
1.5.6.5


--
Ed Cashin <[email protected]>
http://noserose.net/e/
http://www.coraid.com/

2009-09-10 20:29:07

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] aoe: end barrier bios with EOPNOTSUPP

On Thu, Sep 10 2009, Ed Cashin wrote:
> aoe: end barrier bios with EOPNOTSUPP
>
> BugLink: http://bugzilla.kernel.org/show_bug.cgi?id=13942
>
> Bruno Premont noticed that aoe throws a BUG during umount of an XFS in
> 2.6.31:
>
> [ 5259.349897] aoe: bi_io_vec is NULL
> [ 5259.349940] ------------[ cut here ]------------
> [ 5259.349958] kernel BUG at /usr/src/linux-2.6/drivers/block/aoe/aoeblk.c:177!
> [ 5259.349990] invalid opcode: 0000 [#1]
>
> The bio in question is a barrier. Jens Axboe suggested that such bios
> need to be recognized and ended with -EOPNOTSUPP by any driver that
> provides its own ->make_request_fn handler and does not handle
> barriers.
>
> In testing the changes below eliminate the BUG.
>
> Signed-off-by: Ed L. Cashin <[email protected]>
> ---
> drivers/block/aoe/aoeblk.c | 3 +++
> 1 files changed, 3 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/block/aoe/aoeblk.c b/drivers/block/aoe/aoeblk.c
> index 2307a27..22efb33 100644
> --- a/drivers/block/aoe/aoeblk.c
> +++ b/drivers/block/aoe/aoeblk.c
> @@ -172,6 +172,9 @@ aoeblk_make_request(struct request_queue *q, struct bio *bio)
> BUG();
> bio_endio(bio, -ENXIO);
> return 0;
> + } else if (bio_barrier(bio)) {
> + bio_endio(bio, -EOPNOTSUPP);
> + return 0;
> } else if (bio->bi_io_vec == NULL) {
> printk(KERN_ERR "aoe: bi_io_vec is NULL\n");
> BUG();
> --
> 1.5.6.5

I have applied this with a note that proper barrier support is coming.

--
Jens Axboe