LinuxLists.cc - [PATCH] 2.6.0 NBD driver: remove send/recieve race for request

2003-08-05 16:53:10

Subject: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request

diff -urN linux-2.6.0-test2-mm4/drivers/block/nbd.c linux-2.6.0-test2-mm4-no_send_race/drivers/block/nbd.c
--- linux-2.6.0-test2-mm4/drivers/block/nbd.c 2003-08-04 22:01:24.000000000 -0600
+++ linux-2.6.0-test2-mm4-no_send_race/drivers/block/nbd.c 2003-08-04 22:01:45.000000000 -0600
@@ -234,15 +234,16 @@
return result;
}

-void nbd_send_req(struct nbd_device *lo, struct request *req)
+static int nbd_send_req(struct nbd_device *lo, struct request *req)
{
- int result, i, flags;
+ int result, i, flags, rw;
struct nbd_request request;
unsigned long size = req->nr_sectors << 9;
struct socket *sock = lo->sock;

+ rw = nbd_cmd(req);
request.magic = htonl(NBD_REQUEST_MAGIC);
- request.type = htonl(nbd_cmd(req));
+ request.type = htonl(rw);
request.from = cpu_to_be64((u64) req->sector << 9);
request.len = htonl(size);
memcpy(request.handle, &req, sizeof(req));
@@ -256,19 +257,18 @@
}

dprintk(DBG_TX, "%s: request %p: sending control (%s@%llu,%luB)\n",
- lo->disk->disk_name, req,
- nbdcmd_to_ascii(nbd_cmd(req)),
+ lo->disk->disk_name, req, nbdcmd_to_ascii(rw),
(unsigned long long)req->sector << 9,
req->nr_sectors << 9);
result = sock_xmit(sock, 1, &request, sizeof(request),
- (nbd_cmd(req) == NBD_CMD_WRITE)? MSG_MORE: 0);
+ (rw == NBD_CMD_WRITE)? MSG_MORE: 0);
if (result <= 0) {
printk(KERN_ERR "%s: Send control failed (result %d)\n",
lo->disk->disk_name, result);
goto error_out;
}

- if (nbd_cmd(req) == NBD_CMD_WRITE) {
+ if (rw == NBD_CMD_WRITE) {
struct bio *bio;
/*
* we are really probing at internals to determine
@@ -294,11 +294,12 @@
}
}
up(&lo->tx_lock);
- return;
+ return 0;

error_out:
up(&lo->tx_lock);
req->errors++;
+ return req->errors;
}

static struct request *nbd_find_request(struct nbd_device *lo, char *handle)
@@ -492,9 +493,7 @@
list_add(&req->queuelist, &lo->queue_head);
spin_unlock(&lo->queue_lock);

- nbd_send_req(lo, req);
-
- if (req->errors) {
+ if (nbd_send_req(lo, req) != 0) {
printk(KERN_ERR "%s: Request send failed\n",
lo->disk->disk_name);
spin_lock(&lo->queue_lock);

Attachments:

patch-2.6.0-test2-mm4-no_send_race (2.09 kB)

2003-08-05 19:39:49

by Paul Clements

[permalink] [raw]

Subject: Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request

Lou Langholtz wrote:
>
> The following patch removes a race condition in the network block device
> driver in 2.6.0*. Without this patch, the reply receiving thread could
> end (and free up the memory for) the request structure before the
> request sending thread is completely done accessing it and would then
> access invalid memory.

Indeed, there is a race condition here. It's a very small window, but it
looks like it could possibly be trouble on SMP/preempt kernels.

This patch looks OK, but it appears to still leave the race window open
in the error case (it seems to fix the non-error case, though). We
probably could actually use the ref_count field of struct request to
better fix this problem. I'll take a look at doing this, and send a
patch out in a while.

Thanks,
Paul

2003-08-05 22:48:39

by Lou Langholtz

[permalink] [raw]

Subject: Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request

Paul Clements wrote:

>Lou Langholtz wrote:
>
>
>>The following patch removes a race condition in the network block device
>>driver in 2.6.0*. Without this patch, the reply receiving thread could
>>end (and free up the memory for) the request structure before the
>>request sending thread is completely done accessing it and would then
>>access invalid memory.
>>
>>
>
>Indeed, there is a race condition here. It's a very small window, but it
>looks like it could possibly be trouble on SMP/preempt kernels.
>
>This patch looks OK, but it appears to still leave the race window open
>in the error case (it seems to fix the non-error case, though). We
>probably could actually use the ref_count field of struct request to
>better fix this problem. I'll take a look at doing this, and send a
>patch out in a while.
>
>Thanks,
>Paul
>
>
Except that in the error case, the send basically didn't succeed. So no
need to worry about recieving a reply and no race possibility in that case.

2003-08-06 00:52:56

by Paul Clements

[permalink] [raw]

Subject: Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request

Lou Langholtz wrote:
>
> Paul Clements wrote:
>
> >Lou Langholtz wrote:
> >
> >
> >>The following patch removes a race condition in the network block device
> >>driver in 2.6.0*. Without this patch, the reply receiving thread could
> >>end (and free up the memory for) the request structure before the
> >>request sending thread is completely done accessing it and would then
> >>access invalid memory.
> >>
> >>
> >
> >Indeed, there is a race condition here. It's a very small window, but it
> >looks like it could possibly be trouble on SMP/preempt kernels.
> >
> >This patch looks OK, but it appears to still leave the race window open
> >in the error case (it seems to fix the non-error case, though). We
> >probably could actually use the ref_count field of struct request to
> >better fix this problem. I'll take a look at doing this, and send a
> >patch out in a while.
> >
> >Thanks,
> >Paul
> >
> >
> Except that in the error case, the send basically didn't succeed. So no
> need to worry about recieving a reply and no race possibility in that case.

As long as the request is on the queue, it is possible for nbd-client to
die, thus freeing the request (via nbd_clear_que -> nbd_end_request),
and leaving us with a race between the free and do_nbd_request()
accessing the request structure.

--
Paul

2003-08-06 07:34:33

by Lou Langholtz

[permalink] [raw]

Subject: Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request

Paul Clements wrote:

> . . .
>
>>Except that in the error case, the send basically didn't succeed. So no
>>need to worry about recieving a reply and no race possibility in that case.
>>
>>
>
>As long as the request is on the queue, it is possible for nbd-client to
>die, thus freeing the request (via nbd_clear_que -> nbd_end_request),
>and leaving us with a race between the free and do_nbd_request()
>accessing the request structure.
>
>--
>Paul
>
>
Quite right. I missed that case in this last patch (when nbd_do_it has
returned and NBD_DO_IT is about to call nbd_clear_que [1]). Just moving
the errors increment (near the end of nbd_send_req) to within the
semaphore protected region would fix this particular case. An even
larger race window exists with the request getting free'd when
nbd-client is used to disconnect in which it calls NBD_CLEAR_QUE before
NBD_DISCONNECT [2]. In this case, moving the errors increment doesn't
help of course since the nbd_clear_queue in 2.6.0-test2 doesn't bother
to check the tx_lock semaphore anyway. I believe reference counting the
request (as you suggest) would protect against both these windows though.

It's ironic that I'd fixed both these races [1+2] a ways back in an
earlier patch and had forgotten about these cases in this last patch I
submitted. The earlier patch p6.2 against linux-2.5.73 looks about
right. By that patch, the call to clear the queue before NBD_DO_IT
returned was gone and it made sure the clear_queue functionality would
return -EBUSY if invoked when the socket wasn't NULL (and potentially
while nbd_send_req functionality could be called). Not that I'm arguing
we should roll in these ealier patches again. That would re-introduce
the compatibility break which I wouldn't want either.

Will you be working on closing the other clear-queue race also then?
Here's the comments I shared on this in one of these earlier patches
that didn't make it into the mainstream distro (from patch #7):

/*

* Don't allow queue to be cleared while device is running!

* Device must be stopped (disconnected) first. Otherwise

* clearing is meaningless & can lock up processes: it's a

* race with users who may queue up more requests after the

* clearing is done that may then never be freed till the

* system reboots or clear que is run again which just

* opens the race yet again.

*/

2003-08-08 05:03:58

by Paul Clements

[permalink] [raw]

Subject: Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request

Lou Langholtz wrote:
>
> Paul Clements wrote:
>
> >>Except that in the error case, the send basically didn't succeed. So no
> >>need to worry about recieving a reply and no race possibility in that case.
> >
> >As long as the request is on the queue, it is possible for nbd-client to
> >die, thus freeing the request (via nbd_clear_que -> nbd_end_request),
> >and leaving us with a race between the free and do_nbd_request()
> >accessing the request structure.
>
> Quite right. I missed that case in this last patch (when nbd_do_it has
> returned and NBD_DO_IT is about to call nbd_clear_que [1]). Just moving
> the errors increment (near the end of nbd_send_req) to within the
> semaphore protected region would fix this particular case. An even
> larger race window exists with the request getting free'd when
> nbd-client is used to disconnect in which it calls NBD_CLEAR_QUE before
> NBD_DISCONNECT [2]. In this case, moving the errors increment doesn't
> help of course since the nbd_clear_queue in 2.6.0-test2 doesn't bother
> to check the tx_lock semaphore anyway. I believe reference counting the
> request (as you suggest) would protect against both these windows though.

> Will you be working on closing the other clear-queue race also then?

Here's the patch to fix up several race conditions in nbd. It requires
reverting the already included (but admittedly incomplete)
nbd-race-fix.patch that's in -mm5.

Andrew, please apply.

Thanks,
Paul

Attachments:

nbd-race_fixes.diff (3.00 kB)

2003-08-08 05:25:24

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request

Paul Clements <[email protected]> wrote:
>
> Here's the patch to fix up several race conditions in nbd. It requires
> reverting the already included (but admittedly incomplete)
> nbd-race-fix.patch that's in -mm5.
>
> Andrew, please apply.

Sure. Could I please have a summary of what races were fixed, and how?

Thanks.

2003-08-08 06:30:53

by Lou Langholtz

[permalink] [raw]

Subject: Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request

Paul Clements wrote:

>. . .
>Here's the patch to fix up several race conditions in nbd. It requires
>reverting the already included (but admittedly incomplete)
>nbd-race-fix.patch that's in -mm5.
>
>Andrew, please apply.
>
>Thanks,
>Paul
>
>------------------------------------------------------------------------
>
>--- linux-2.6.0-test2-mm4-PRISTINE/drivers/block/nbd.c Sun Jul 27 12:58:51 2003
>+++ linux-2.6.0-test2-mm4/drivers/block/nbd.c Thu Aug 7 18:02:23 2003
>@@ -416,11 +416,19 @@ void nbd_clear_que(struct nbd_device *lo
> BUG_ON(lo->magic != LO_MAGIC);
> #endif
>
>+retry:
> do {
> req = NULL;
> spin_lock(&lo->queue_lock);
> if (!list_empty(&lo->queue_head)) {
> req = list_entry(lo->queue_head.next, struct request, queuelist);
>+ if (req->ref_count > 1) { /* still in xmit */
>+ spin_unlock(&lo->queue_lock);
>+ printk(KERN_DEBUG "%s: request %p: still in use (%d), waiting...\n",
>+ lo->disk->disk_name, req, req->ref_count);
>+ schedule_timeout(HZ); /* wait a second */
>
Isn't there something more deterministic than just waiting a second and
hoping things clear up that you can use here? How about not clearing the
queue unless lo->sock is NULL and using whatever lock it is now that's
protecting lo->sock. That way the queue clearing race can be eliminated too.

>+ goto retry;
>+ }
> list_del_init(&req->queuelist);
> }
> spin_unlock(&lo->queue_lock);
>@@ -490,6 +498,7 @@ static void do_nbd_request(request_queue
> }
>
> list_add(&req->queuelist, &lo->queue_head);
>+ req->ref_count++; /* make sure req does not get freed */
> spin_unlock(&lo->queue_lock);
>
> nbd_send_req(lo, req);
>@@ -499,12 +508,14 @@ static void do_nbd_request(request_queue
> lo->disk->disk_name);
> spin_lock(&lo->queue_lock);
> list_del_init(&req->queuelist);
>+ req->ref_count--;
> spin_unlock(&lo->queue_lock);
> nbd_end_request(req);
> spin_lock_irq(q->queue_lock);
> continue;
> }
>
>+ req->ref_count--;
> spin_lock_irq(q->queue_lock);
>
Since ref_count isn't atomic, shouldn't ref_count only be changed while
the queue_lock is held???

2003-08-08 06:41:50

by Andrew Morton

[permalink] [raw]

Subject: Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request

Lou Langholtz <[email protected]> wrote:
>
> >+ spin_unlock(&lo->queue_lock);
> >+ printk(KERN_DEBUG "%s: request %p: still in use (%d), waiting...\n",
> >+ lo->disk->disk_name, req, req->ref_count);
> >+ schedule_timeout(HZ); /* wait a second */
> >
> Isn't there something more deterministic than just waiting a second and
> hoping things clear up that you can use here?

you'll be needing a set_current_state() before calling schedule_timeout()
anyway. It will fall straight through as it is now.

2003-08-08 06:59:22

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request

On Fri, Aug 08 2003, Lou Langholtz wrote:
> >@@ -499,12 +508,14 @@ static void do_nbd_request(request_queue
> > lo->disk->disk_name);
> > spin_lock(&lo->queue_lock);
> > list_del_init(&req->queuelist);
> >+ req->ref_count--;
> > spin_unlock(&lo->queue_lock);
> > nbd_end_request(req);
> > spin_lock_irq(q->queue_lock);
> > continue;
> > }
> >
> >+ req->ref_count--;
> > spin_lock_irq(q->queue_lock);
> >
> Since ref_count isn't atomic, shouldn't ref_count only be changed while
> the queue_lock is held???

Indeed, needs to be done after regrabbing the lock.

--
Jens Axboe

2003-08-08 15:02:42

by Paul Clements

[permalink] [raw]

Subject: Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request

Jens Axboe wrote:
>
> On Fri, Aug 08 2003, Lou Langholtz wrote:
> > >@@ -499,12 +508,14 @@ static void do_nbd_request(request_queue
> > > lo->disk->disk_name);
> > > spin_lock(&lo->queue_lock);
> > > list_del_init(&req->queuelist);
> > >+ req->ref_count--;
> > > spin_unlock(&lo->queue_lock);
> > > nbd_end_request(req);
> > > spin_lock_irq(q->queue_lock);
> > > continue;
> > > }
> > >
> > >+ req->ref_count--;
> > > spin_lock_irq(q->queue_lock);
> > >
> > Since ref_count isn't atomic, shouldn't ref_count only be changed while
> > the queue_lock is held???
>
> Indeed, needs to be done after regrabbing the lock.

But req is pulled off of nbd's main request queue at this point, so I
don't think anyone else could be touching it, could they?

--
Paul

2003-08-08 16:49:17

by Paul Clements

[permalink] [raw]

Subject: Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request

Lou Langholtz wrote:

> >--- linux-2.6.0-test2-mm4-PRISTINE/drivers/block/nbd.c Sun Jul 27 12:58:51 2003
> >+++ linux-2.6.0-test2-mm4/drivers/block/nbd.c Thu Aug 7 18:02:23 2003
> >@@ -416,11 +416,19 @@ void nbd_clear_que(struct nbd_device *lo
> > BUG_ON(lo->magic != LO_MAGIC);
> > #endif
> >
> >+retry:
> > do {
> > req = NULL;
> > spin_lock(&lo->queue_lock);
> > if (!list_empty(&lo->queue_head)) {
> > req = list_entry(lo->queue_head.next, struct request, queuelist);
> >+ if (req->ref_count > 1) { /* still in xmit */
> >+ spin_unlock(&lo->queue_lock);
> >+ printk(KERN_DEBUG "%s: request %p: still in use (%d), waiting...\n",
> >+ lo->disk->disk_name, req, req->ref_count);
> >+ schedule_timeout(HZ); /* wait a second */
> >

> Isn't there something more deterministic than just waiting a second and
> hoping things clear up that you can use here?

It's not exactly "hoping" something will happen...we're waiting until
do_nbd_request decrements the ref_count, which is completely
deterministic, but just not guaranteed to have already happened when
nbd_clear_que() starts.

The use of the ref_count closes a race window (the one that I pointed
out in my response to Lou's original patch a few days ago). It protects
us in the following case(s):

1) do_nbd_request begins -- sock and file are valid
2) do_nbd_request queues the request and then calls nbd_send_req to send
the request out on the network
3) on another CPU, or after preempt, nbd-client gets a signal or an
"nbd-client -d" is called, which results in sock and file being set to
NULL, and the queue begins to be cleared, and requests that were on the
queue get freed (before do_nbd_request has finished accessing req for
the last time)

> How about not clearing the
> queue unless lo->sock is NULL and using whatever lock it is now that's
> protecting lo->sock. That way the queue clearing race can be eliminated too.

That makes sense. I hadn't done this (for compatibility reasons) since
"nbd-client -d" (disconnect) calls NBD_CLEAR_QUE before disconnecting
the socket. But I think we can just make NBD_CLEAR_QUE silently fail
(return 0) if lo->sock is set and basically turn that first
NBD_CLEAR_QUE into a noop, and let the nbd_clear_que() call at the end
of NBD_CLEAR_SOCK handle it (properly). Note that the race condition
above still applies, even with this lo->sock check in place. But this
check does eliminate the race in "nbd-client -d" where, e.g., requests
are being queued up faster than nbd_clear_que can dequeue them,
potentially making nbd_clear_que loop forever.

As a side note, NBD_CLEAR_QUE is actually now completely superfluous,
since NBD_DO_IT and NBD_CLEAR_SOCK (which always get called in
conjunction with NBD_CLEAR_QUE) contain the nbd_clear_que() call
themselves. We could just make NBD_CLEAR_QUE a noop in all cases, but I
guess we'd risk breaking some nbd client tool that's out there...

I'm testing the updated patch now, I'll send it out in a little while...

Thanks,
Paul

2003-08-08 17:06:45

by Paul Clements

[permalink] [raw]

Subject: Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request

Andrew Morton wrote:
>
> Paul Clements <[email protected]> wrote:
> >
> > Here's the patch to fix up several race conditions in nbd. It requires
> > reverting the already included (but admittedly incomplete)
> > nbd-race-fix.patch that's in -mm5.
> >
> > Andrew, please apply.
>
> Sure. Could I please have a summary of what races were fixed, and how?

I outlined the other races in the mail I just sent. In addition, the
updated patch will fix an Oops where lo->sock gets set to NULL (by
NBD_CLEAR_SOCK) before NBD_DO_IT completes. This can happen when
"nbd-client -d" is called to disconnect the nbd socket.

--
Paul

2003-08-25 09:59:16

by Jens Axboe

[permalink] [raw]

Subject: Re: [PATCH] 2.6.0 NBD driver: remove send/recieve race for request

On Fri, Aug 08 2003, Paul Clements wrote:
> Jens Axboe wrote:
> >
> > On Fri, Aug 08 2003, Lou Langholtz wrote:
> > > >@@ -499,12 +508,14 @@ static void do_nbd_request(request_queue
> > > > lo->disk->disk_name);
> > > > spin_lock(&lo->queue_lock);
> > > > list_del_init(&req->queuelist);
> > > >+ req->ref_count--;
> > > > spin_unlock(&lo->queue_lock);
> > > > nbd_end_request(req);
> > > > spin_lock_irq(q->queue_lock);
> > > > continue;
> > > > }
> > > >
> > > >+ req->ref_count--;
> > > > spin_lock_irq(q->queue_lock);
> > > >
> > > Since ref_count isn't atomic, shouldn't ref_count only be changed while
> > > the queue_lock is held???
> >
> > Indeed, needs to be done after regrabbing the lock.
>
> But req is pulled off of nbd's main request queue at this point, so I
> don't think anyone else could be touching it, could they?

In that case you are right, and it would probably be best not to touch
the reference count at all (just leave it at 1).

--
Jens Axboe