2006-05-05 12:20:08

by Jan Niehusmann

[permalink] [raw]
Subject: [PATCH] smbfs: Fix slab corruption in samba error path

Yesterday, I got the following error with 2.6.16.13 during a file copy
from a smb filesystem over a wireless link. I guess there was some error
on the wireless link, which in turn caused an error condition for the
smb filesystem.

In the log, smb_file_read reports error=4294966784 (0xfffffe00), which
also shows up in the slab dumps, and also is -ERESTARTSYS. Error code
27499 corresponds to 0x6b6b, so the rq_errno field seems to be the only
one being set after freeing the slab.

In smb_add_request (which is the only place in smbfs where I found
ERESTARTSYS), I found the following:

if (!timeleft || signal_pending(current)) {
/*
* On timeout or on interrupt we want to try and remove the
* request from the recvq/xmitq.
*/
smb_lock_server(server);
if (!(req->rq_flags & SMB_REQ_RECEIVED)) {
list_del_init(&req->rq_queue);
smb_rput(req);
}
smb_unlock_server(server);
}
[...]
if (signal_pending(current))
req->rq_errno = -ERESTARTSYS;

I guess that some codepath like smbiod_flush() caused the request
to be removed from the queue, and smb_rput(req) be called, without
SMB_REQ_RECEIVED being set. This violates an asumption made by the
quoted code.

Then, the above code calls smb_rput(req) again, the req gets freed,
and req->rq_errno = -ERESTARTSYS writes into the already freed slab. As
list_del_init doesn't cause an error if called multiple times, that does
cause the observed behaviour (freed slab with rq_errno=-ERESTARTSYS).

If this observation is correct, the following patch should fix it.

I wonder why the smb code uses list_del_init everywhere - using
list_del instead would catch such situations by poisoning the next and
prev pointers.

Signed-off-by: Jan Niehusmann <[email protected]>

---

May 4 23:29:21 knautsch kernel: [17180085.456000] ipw2200: Firmware error detected. Restarting.
May 4 23:29:21 knautsch kernel: [17180085.456000] ipw2200: Sysfs 'error' log captured.
May 4 23:33:02 knautsch kernel: [17180306.316000] ipw2200: Firmware error detected. Restarting.
May 4 23:33:02 knautsch kernel: [17180306.316000] ipw2200: Sysfs 'error' log already exists.
May 4 23:33:02 knautsch kernel: [17180306.968000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:34:18 knautsch kernel: [17180383.256000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:34:18 knautsch kernel: [17180383.284000] SMB connection re-established (-5)
May 4 23:37:19 knautsch kernel: [17180563.956000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:40:09 knautsch kernel: [17180733.636000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:40:26 knautsch kernel: [17180750.700000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:43:02 knautsch kernel: [17180907.304000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:43:08 knautsch kernel: [17180912.324000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:43:34 knautsch kernel: [17180938.416000] smb_errno: class Unknown, code 27499 from command 0x6b
May 4 23:43:34 knautsch kernel: [17180938.416000] Slab corruption: start=c4ebe09c, len=244
May 4 23:43:34 knautsch kernel: [17180938.416000] Redzone: 0x5a2cf071/0x5a2cf071.
May 4 23:43:34 knautsch kernel: [17180938.416000] Last user: [<e087b903>](smb_rput+0x53/0x90 [smbfs])
May 4 23:43:34 knautsch kernel: [17180938.416000] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
May 4 23:43:34 knautsch kernel: [17180938.416000] 0f0: 00 fe ff ff
May 4 23:43:34 knautsch kernel: [17180938.416000] Next obj: start=c4ebe19c, len=244
May 4 23:43:34 knautsch kernel: [17180938.416000] Redzone: 0x5a2cf071/0x5a2cf071.
May 4 23:43:34 knautsch kernel: [17180938.416000] Last user: [<00000000>](_stext+0x3feffde0/0x30)
May 4 23:43:34 knautsch kernel: [17180938.416000] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
May 4 23:43:34 knautsch kernel: [17180938.416000] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
May 4 23:43:34 knautsch kernel: [17180938.460000] SMB connection re-established (-5)
May 4 23:43:42 knautsch kernel: [17180946.292000] ipw2200: Firmware error detected. Restarting.
May 4 23:43:42 knautsch kernel: [17180946.292000] ipw2200: Sysfs 'error' log already exists.
May 4 23:45:04 knautsch kernel: [17181028.752000] ipw2200: Firmware error detected. Restarting.
May 4 23:45:04 knautsch kernel: [17181028.752000] ipw2200: Sysfs 'error' log already exists.
May 4 23:45:05 knautsch kernel: [17181029.868000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:45:36 knautsch kernel: [17181060.984000] smb_errno: class Unknown, code 27499 from command 0x6b
May 4 23:45:36 knautsch kernel: [17181060.984000] Slab corruption: start=c4ebe09c, len=244
May 4 23:45:36 knautsch kernel: [17181060.984000] Redzone: 0x5a2cf071/0x5a2cf071.
May 4 23:45:36 knautsch kernel: [17181060.984000] Last user: [<e087b903>](smb_rput+0x53/0x90 [smbfs])
May 4 23:45:36 knautsch kernel: [17181060.984000] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
May 4 23:45:36 knautsch kernel: [17181060.984000] 0f0: 00 fe ff ff
May 4 23:45:36 knautsch kernel: [17181060.984000] Next obj: start=c4ebe19c, len=244
May 4 23:45:36 knautsch kernel: [17181060.984000] Redzone: 0x5a2cf071/0x5a2cf071.
May 4 23:45:36 knautsch kernel: [17181060.984000] Last user: [<00000000>](_stext+0x3feffde0/0x30)
May 4 23:45:36 knautsch kernel: [17181060.984000] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
May 4 23:45:36 knautsch kernel: [17181060.984000] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
May 4 23:45:36 knautsch kernel: [17181061.024000] SMB connection re-established (-5)
May 4 23:46:17 knautsch kernel: [17181102.132000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:47:46 knautsch kernel: [17181190.468000] smb_errno: class Unknown, code 27499 from command 0x6b
May 4 23:47:46 knautsch kernel: [17181190.468000] Slab corruption: start=c4ebe09c, len=244
May 4 23:47:46 knautsch kernel: [17181190.468000] Redzone: 0x5a2cf071/0x5a2cf071.
May 4 23:47:46 knautsch kernel: [17181190.468000] Last user: [<e087b903>](smb_rput+0x53/0x90 [smbfs])
May 4 23:47:46 knautsch kernel: [17181190.468000] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6a 6b 6b 6b 6b 6b 6b 6b
May 4 23:47:46 knautsch kernel: [17181190.468000] 0f0: 00 fe ff ff
May 4 23:47:46 knautsch kernel: [17181190.468000] Next obj: start=c4ebe19c, len=244
May 4 23:47:46 knautsch kernel: [17181190.468000] Redzone: 0x5a2cf071/0x5a2cf071.
May 4 23:47:46 knautsch kernel: [17181190.468000] Last user: [<00000000>](_stext+0x3feffde0/0x30)
May 4 23:47:46 knautsch kernel: [17181190.468000] 000: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
May 4 23:47:46 knautsch kernel: [17181190.468000] 010: 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b
May 4 23:47:46 knautsch kernel: [17181190.492000] SMB connection re-established (-5)
May 4 23:49:20 knautsch kernel: [17181284.828000] smb_file_read: //some_file validation failed, error=4294966784
May 4 23:49:39 knautsch kernel: [17181303.896000] smb_file_read: //some_file validation failed, error=4294966784


diff --git a/fs/smbfs/request.c b/fs/smbfs/request.c
index c71c375..d2d8226 100644
--- a/fs/smbfs/request.c
+++ b/fs/smbfs/request.c
@@ -339,9 +339,11 @@ #endif
/*
* On timeout or on interrupt we want to try and remove the
* request from the recvq/xmitq.
+ * First check if the request is still part of a queue. (May
+ * have been removed by some error condition)
*/
smb_lock_server(server);
- if (!(req->rq_flags & SMB_REQ_RECEIVED)) {
+ if (&req->rq_queue != req->rq_queue.next) {
list_del_init(&req->rq_queue);
smb_rput(req);
}


2006-05-10 09:29:01

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] smbfs: Fix slab corruption in samba error path

Jan Niehusmann <[email protected]> wrote:
>
> Yesterday, I got the following error with 2.6.16.13 during a file copy
> from a smb filesystem over a wireless link. I guess there was some error
> on the wireless link, which in turn caused an error condition for the
> smb filesystem.
>
> In the log, smb_file_read reports error=4294966784 (0xfffffe00), which
> also shows up in the slab dumps, and also is -ERESTARTSYS. Error code
> 27499 corresponds to 0x6b6b, so the rq_errno field seems to be the only
> one being set after freeing the slab.
>
> In smb_add_request (which is the only place in smbfs where I found
> ERESTARTSYS), I found the following:
>
> if (!timeleft || signal_pending(current)) {
> /*
> * On timeout or on interrupt we want to try and remove the
> * request from the recvq/xmitq.
> */
> smb_lock_server(server);
> if (!(req->rq_flags & SMB_REQ_RECEIVED)) {
> list_del_init(&req->rq_queue);
> smb_rput(req);
> }
> smb_unlock_server(server);
> }
> [...]
> if (signal_pending(current))
> req->rq_errno = -ERESTARTSYS;
>
> I guess that some codepath like smbiod_flush() caused the request
> to be removed from the queue, and smb_rput(req) be called, without
> SMB_REQ_RECEIVED being set. This violates an asumption made by the
> quoted code.
>
> Then, the above code calls smb_rput(req) again, the req gets freed,
> and req->rq_errno = -ERESTARTSYS writes into the already freed slab. As
> list_del_init doesn't cause an error if called multiple times, that does
> cause the observed behaviour (freed slab with rq_errno=-ERESTARTSYS).
>
> If this observation is correct, the following patch should fix it.
>
> ..
>
> diff --git a/fs/smbfs/request.c b/fs/smbfs/request.c
> index c71c375..d2d8226 100644
> --- a/fs/smbfs/request.c
> +++ b/fs/smbfs/request.c
> @@ -339,9 +339,11 @@ #endif
> /*
> * On timeout or on interrupt we want to try and remove the
> * request from the recvq/xmitq.
> + * First check if the request is still part of a queue. (May
> + * have been removed by some error condition)
> */
> smb_lock_server(server);
> - if (!(req->rq_flags & SMB_REQ_RECEIVED)) {
> + if (&req->rq_queue != req->rq_queue.next) {
> list_del_init(&req->rq_queue);
> smb_rput(req);
> }
>

I think the bug is actually that this code is accessing *req after having
doen the smb_rput(). I worry that your patch "fixes" this by accidentally
leaking the request.

We can fairly simply restructure this code so that it doesn't touch the
request after that possible smb_rput().

How does this look? If "OK", are you able to test it?


fs/smbfs/request.c | 30 ++++++++++++++++--------------
1 files changed, 16 insertions(+), 14 deletions(-)

diff -puN fs/smbfs/request.c~smbfs-fix-slab-corruption-in-samba-error-path fs/smbfs/request.c
--- devel/fs/smbfs/request.c~smbfs-fix-slab-corruption-in-samba-error-path 2006-05-10 01:59:08.000000000 -0700
+++ devel-akpm/fs/smbfs/request.c 2006-05-10 02:21:41.000000000 -0700
@@ -335,19 +335,6 @@ int smb_add_request(struct smb_request *

timeleft = wait_event_interruptible_timeout(req->rq_wait,
req->rq_flags & SMB_REQ_RECEIVED, 30*HZ);
- if (!timeleft || signal_pending(current)) {
- /*
- * On timeout or on interrupt we want to try and remove the
- * request from the recvq/xmitq.
- */
- smb_lock_server(server);
- if (!(req->rq_flags & SMB_REQ_RECEIVED)) {
- list_del_init(&req->rq_queue);
- smb_rput(req);
- }
- smb_unlock_server(server);
- }
-
if (!timeleft) {
PARANOIA("request [%p, mid=%d] timed out!\n",
req, req->rq_mid);
@@ -372,7 +359,22 @@ int smb_add_request(struct smb_request *
req->rq_errno = smb_errno(req);
if (signal_pending(current))
req->rq_errno = -ERESTARTSYS;
- return req->rq_errno;
+ result = req->rq_errno;
+
+ if (!timeleft || signal_pending(current)) {
+ /*
+ * On timeout or on interrupt we want to try and remove the
+ * request from the recvq/xmitq.
+ */
+ smb_lock_server(server);
+ if (!(req->rq_flags & SMB_REQ_RECEIVED)) {
+ list_del_init(&req->rq_queue);
+ smb_rput(req);
+ }
+ smb_unlock_server(server);
+ }
+
+ return result;
}

/*
_

2006-05-10 10:32:44

by Jan Niehusmann

[permalink] [raw]
Subject: Re: [PATCH] smbfs: Fix slab corruption in samba error path

On Wed, May 10, 2006 at 02:25:29AM -0700, Andrew Morton wrote:
> I think the bug is actually that this code is accessing *req after having
> doen the smb_rput(). I worry that your patch "fixes" this by accidentally
> leaking the request.
>
> We can fairly simply restructure this code so that it doesn't touch the
> request after that possible smb_rput().
>
> How does this look? If "OK", are you able to test it?

No, it doesn't look ok: The callers of smb_add_request (which are all in
fs/smbfs/proc.c) do touch the req structure after calling
smb_add_request, even if an error is returned. So your code would still
cause access after release and double frees on the req object.

As I understand the code smb_add_request is not allowed to completely
release the req structure at all.

What smb_add_request should do is
- increase the req usage counter by one (by calling smb_rget), and add
the req to one of the work queues
- or leave the usage counter alone, and don't add the req to one of the
work queues

On error, one has to be careful: If we actively remove the req from the
work queues again, we have to decrease the usage counter (otherwise we
leak requests). But if some other function already removed the req from
the queue, that function already did decrease the counter, so we are not
allowed to do it again.

The original code did get the latter case partly wrong. It assumed that
the only way a req could be removed from the work queue would be in
smb_request_recv, where the SMB_REQ_RECEIVED flag gets set. But it did
miss the error cases in smbiod.c, eg. smbiod_flush(), where the req gets
removed from the queues (and the usage counter decreased), without the
SMB_REQ_RECEIVED flag being set.

Therefore I changed the code to not check SMB_REQ_RECEIVED, but test if
the req is still on one of the work queue linked lists.

After that change, smb_add_request never releases the req, so reordering
is not necessary.

Unfortunately it's not easy to test the patch: Of course I did check
that it properly compiles, but the bug is not easily reproducible.

Jan

2006-05-11 03:19:32

by Andrew Morton

[permalink] [raw]
Subject: Re: [PATCH] smbfs: Fix slab corruption in samba error path

Jan Niehusmann <[email protected]> wrote:
>
> On Wed, May 10, 2006 at 02:25:29AM -0700, Andrew Morton wrote:
> > I think the bug is actually that this code is accessing *req after having
> > doen the smb_rput(). I worry that your patch "fixes" this by accidentally
> > leaking the request.
> >
> > We can fairly simply restructure this code so that it doesn't touch the
> > request after that possible smb_rput().
> >
> > How does this look? If "OK", are you able to test it?
>
> No, it doesn't look ok: The callers of smb_add_request (which are all in
> fs/smbfs/proc.c) do touch the req structure after calling
> smb_add_request, even if an error is returned. So your code would still
> cause access after release and double frees on the req object.

OK.

> As I understand the code smb_add_request is not allowed to completely
> release the req structure at all.

yup.

> What smb_add_request should do is
> - increase the req usage counter by one (by calling smb_rget), and add
> the req to one of the work queues
> - or leave the usage counter alone, and don't add the req to one of the
> work queues
>
> On error, one has to be careful: If we actively remove the req from the
> work queues again, we have to decrease the usage counter (otherwise we
> leak requests). But if some other function already removed the req from
> the queue, that function already did decrease the counter, so we are not
> allowed to do it again.
>
> The original code did get the latter case partly wrong. It assumed that
> the only way a req could be removed from the work queue would be in
> smb_request_recv, where the SMB_REQ_RECEIVED flag gets set. But it did
> miss the error cases in smbiod.c, eg. smbiod_flush(), where the req gets
> removed from the queues (and the usage counter decreased), without the
> SMB_REQ_RECEIVED flag being set.
>
> Therefore I changed the code to not check SMB_REQ_RECEIVED, but test if
> the req is still on one of the work queue linked lists.
>
> After that change, smb_add_request never releases the req, so reordering
> is not necessary.

yup, makes sense. It'd be nice if we knew who was doing the smb_rput()
without setting SMB_REQ_RECEIVED.

> Unfortunately it's not easy to test the patch: Of course I did check
> that it properly compiles, but the bug is not easily reproducible.

btw, is there any particular reason why you're using smbfs rather than
cifs?

I'll queue up an smbfs patch to inform people that the fs is deprecated and
I'll schedule its removal.

I tweaked your patch a bit:

diff -puN fs/smbfs/request.c~smbfs-fix-slab-corruption-in-samba-error-path fs/smbfs/request.c
--- devel/fs/smbfs/request.c~smbfs-fix-slab-corruption-in-samba-error-path 2006-05-10 20:11:53.000000000 -0700
+++ devel-akpm/fs/smbfs/request.c 2006-05-10 20:12:25.000000000 -0700
@@ -339,9 +339,11 @@ int smb_add_request(struct smb_request *
/*
* On timeout or on interrupt we want to try and remove the
* request from the recvq/xmitq.
+ * First check if the request is still part of a queue. (May
+ * have been removed by some error condition)
*/
smb_lock_server(server);
- if (!(req->rq_flags & SMB_REQ_RECEIVED)) {
+ if (!list_empty(&req->rq_queue)) {
list_del_init(&req->rq_queue);
smb_rput(req);
}
_