2007-09-13 12:27:20

by Jens Axboe

[permalink] [raw]
Subject: [PATCH] Fix race with shared tag queue maps

Hi,

There's a race condition in blk_queue_end_tag() for shared tag maps,
users include stex (promise supertrak thingy) and qla2xxx. The former at
least has reported bugs in this area, not sure why we haven't seen any
for the latter. It could be because the window is narrow and that other
conditions in the qla2xxx code hide this. It's a real bug, though, as
the stex smp users can attest.

We need to ensure two things - the tag bit clearing needs to happen
AFTER we cleared the tag pointer, as the tag bit clearing/setting is
what protects this map. Secondly, we need to ensure that the visibility
of the tag pointer and tag bit clear are ordered properly.

This is for 2.6.23-rc6-current, but it needs to go into -stable as well
once Linus has committed it. I'm cc'ing users that reported stex
problems, hopefully they can test this patch and report back.

Also see http://bugzilla.kernel.org/show_bug.cgi?id=7842

Signed-off-by: Jens Axboe <[email protected]>

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index a15845c..3d9e6a1 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -1075,12 +1075,6 @@ void blk_queue_end_tag(struct request_queue *q, struct request *rq)
*/
return;

- if (unlikely(!__test_and_clear_bit(tag, bqt->tag_map))) {
- printk(KERN_ERR "%s: attempt to clear non-busy tag (%d)\n",
- __FUNCTION__, tag);
- return;
- }
-
list_del_init(&rq->queuelist);
rq->cmd_flags &= ~REQ_QUEUED;
rq->tag = -1;
@@ -1090,6 +1084,23 @@ void blk_queue_end_tag(struct request_queue *q, struct request *rq)
__FUNCTION__, tag);

bqt->tag_index[tag] = NULL;
+
+ /*
+ * Ensure ordering with tag section
+ */
+ smp_mb__before_clear_bit();
+
+ if (unlikely(!test_and_clear_bit(tag, bqt->tag_map))) {
+ printk(KERN_ERR "%s: attempt to clear non-busy tag (%d)\n",
+ __FUNCTION__, tag);
+ return;
+ }
+
+ /*
+ * Ensure ordering between ->tag_index[tag] clear and tag clear
+ */
+ smp_mb__after_clear_bit();
+
bqt->busy--;
}


--
Jens Axboe


2007-09-13 15:19:20

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix race with shared tag queue maps



On Thu, 13 Sep 2007, Jens Axboe wrote:
> +
> + /*
> + * Ensure ordering with tag section
> + */
> + smp_mb__before_clear_bit();
> +
> + if (unlikely(!test_and_clear_bit(tag, bqt->tag_map))) {

You don't need the "smp_mb__before_clear_bit()" there.

The regular "clear_bit()" needs it, but the "test_and_xxx()" versions are
architecturally defined to be memory barriers, exactly because they are
regularly used for locking.

This is even documented - see Documentation/atomic_ops.txt.

Linus

2007-09-13 15:22:49

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] Fix race with shared tag queue maps

On Thu, Sep 13 2007, Linus Torvalds wrote:
>
>
> On Thu, 13 Sep 2007, Jens Axboe wrote:
> > +
> > + /*
> > + * Ensure ordering with tag section
> > + */
> > + smp_mb__before_clear_bit();
> > +
> > + if (unlikely(!test_and_clear_bit(tag, bqt->tag_map))) {
>
> You don't need the "smp_mb__before_clear_bit()" there.
>
> The regular "clear_bit()" needs it, but the "test_and_xxx()" versions are
> architecturally defined to be memory barriers, exactly because they are
> regularly used for locking.
>
> This is even documented - see Documentation/atomic_ops.txt.

My bad, I think I added the smp_mb__before_clear_bit() when it was
__test_and_set_bit() like in the first hunk.

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index a15845c..dfc9f22 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -1075,12 +1075,6 @@ void blk_queue_end_tag(struct request_queue *q, struct request *rq)
*/
return;

- if (unlikely(!__test_and_clear_bit(tag, bqt->tag_map))) {
- printk(KERN_ERR "%s: attempt to clear non-busy tag (%d)\n",
- __FUNCTION__, tag);
- return;
- }
-
list_del_init(&rq->queuelist);
rq->cmd_flags &= ~REQ_QUEUED;
rq->tag = -1;
@@ -1090,6 +1084,18 @@ void blk_queue_end_tag(struct request_queue *q, struct request *rq)
__FUNCTION__, tag);

bqt->tag_index[tag] = NULL;
+
+ if (unlikely(!test_and_clear_bit(tag, bqt->tag_map))) {
+ printk(KERN_ERR "%s: attempt to clear non-busy tag (%d)\n",
+ __FUNCTION__, tag);
+ return;
+ }
+
+ /*
+ * Ensure ordering between ->tag_index[tag] clear and tag clear
+ */
+ smp_mb__after_clear_bit();
+
bqt->busy--;
}


--
Jens Axboe

2007-09-13 15:43:34

by Linus Torvalds

[permalink] [raw]
Subject: Re: [PATCH] Fix race with shared tag queue maps



On Thu, 13 Sep 2007, Jens Axboe wrote:
>
> My bad, I think I added the smp_mb__before_clear_bit() when it was
> __test_and_set_bit() like in the first hunk.

Ahh, that wouldn't work at all. The "__test_and_set_bit()" thing isn't
atomic at all, and no amount of memory barriers around it would help
(you'd need to use real locking, but at that point the memory barriers are
pointless anyway).

> + /*
> + * Ensure ordering between ->tag_index[tag] clear and tag clear
> + */
> + smp_mb__after_clear_bit();

You still left this one. But never mind - I already edited your original
patch and it's in my tree with both of those things removed.

Linus

2007-09-14 06:19:20

by Arkadiusz Miśkiewicz

[permalink] [raw]
Subject: Re: [PATCH] Fix race with shared tag queue maps

On Thursday 13 of September 2007, Jens Axboe wrote:
> Hi,
>
> There's a race condition in blk_queue_end_tag() for shared tag maps,
> users include stex (promise supertrak thingy) and qla2xxx.
[...]
> I'm cc'ing users that reported stex
> problems, hopefully they can test this patch and report back.

I'm running 2.6.22 with f3da54ba140c6427fa4a32913e1bf406f41b5dda patch from
Linus git tree (instead of Ed Lin workaround patch) and doing heavy rsync for
last 8 hours. No oops - seems to be working fine (+stex driver).

Thanks!

--
Arkadiusz Mi?kiewicz PLD/Linux Team
arekm / maven.pl http://ftp.pld-linux.org/

2007-09-14 07:48:01

by Jens Axboe

[permalink] [raw]
Subject: Re: [PATCH] Fix race with shared tag queue maps

On Thu, Sep 13 2007, Linus Torvalds wrote:
>
>
> On Thu, 13 Sep 2007, Jens Axboe wrote:
> >
> > My bad, I think I added the smp_mb__before_clear_bit() when it was
> > __test_and_set_bit() like in the first hunk.
>
> Ahh, that wouldn't work at all. The "__test_and_set_bit()" thing isn't
> atomic at all, and no amount of memory barriers around it would help
> (you'd need to use real locking, but at that point the memory barriers are
> pointless anyway).

Hence the change, it looks like an oversight from when we didn't allow
sharing of tag maps (then the queue lock provided adequate protection).

> > + /*
> > + * Ensure ordering between ->tag_index[tag] clear and tag clear
> > + */
> > + smp_mb__after_clear_bit();
>
> You still left this one. But never mind - I already edited your original
> patch and it's in my tree with both of those things removed.

I took at look at the committed patch and it looks fine, thanks Linus.

--
Jens Axboe