2003-09-19 19:16:58

by Andrew Zabolotny

[permalink] [raw]
Subject: __make_request() bug and a fix variant

Hello!

While developing a driver I found something that I think is a bug in
kernel (I'm using 2.4.20, I just hope it is already fixed in 2.6.x
series). It manifests itself by bread() randomly crashing (in different
places) if called for, say, reading 1024 bytes from block 0 of device
0300 from a driver's module_init() (at least in my very stripped debug
environment, this could differ from system to system).

Here's a somewhat long description of the problem roots:

bread() in the first place calls getblk(), which first tries to find the
requested buffer in hash tables, and if it is not there, it calls
grow_buffers(), the later calls grow_dev_page() and finally that calls
create_buffers(). The later gets a set of free buffer_head's from the
pool, and puts them in a chain attached to a page. Many fields are left
in a indefinite state since they are initialized before usage. The
b_reqnext field is left in a indefinite state as well, and it happens
to be filled with garbage in my case (actually it's a leftover from
previous usage of the buffer_head).

Now when bread() gets this buffer, it is passed to ll_rw_block() which
is passing it to generic_make_request(), and, in turn (for many block
devices including IDE) to __make_request.

And finally, if elevator returns ELEVATOR_NO_MERGE, the b_reqnext field
of the buffer_head structure is left uninitialized! So when b_end_io
(and in turn end_that_request_first) is called, it looks at b_reqnext
and sees there's another bufhead waiting for processing. What happens
next is limited just by your imagination :-)

Also I observed the ELEVATOR_BACK_MERGE case also has the same problem
(bh->b_reqnext is left in a indefinite state). So maybe __make_request
always assumes that b_reqnext is initially NULL? In this case the bug is
in create_buffers which should NULLify this field. In any case, I'm
leaving the final solution up to kernel wizards.

--
Greetings,
Andrew

P.S. If you reply to this letter, please cc: it to my address since I'm
not subscribed to this list.


2003-09-20 11:37:49

by Jens Axboe

[permalink] [raw]
Subject: Re: __make_request() bug and a fix variant

On Fri, Sep 19 2003, Andrew Zabolotny wrote:
> Hello!
>
> While developing a driver I found something that I think is a bug in
> kernel (I'm using 2.4.20, I just hope it is already fixed in 2.6.x
> series). It manifests itself by bread() randomly crashing (in different
> places) if called for, say, reading 1024 bytes from block 0 of device
> 0300 from a driver's module_init() (at least in my very stripped debug
> environment, this could differ from system to system).
>
> Here's a somewhat long description of the problem roots:
>
> bread() in the first place calls getblk(), which first tries to find the
> requested buffer in hash tables, and if it is not there, it calls
> grow_buffers(), the later calls grow_dev_page() and finally that calls
> create_buffers(). The later gets a set of free buffer_head's from the
> pool, and puts them in a chain attached to a page. Many fields are left
> in a indefinite state since they are initialized before usage. The
> b_reqnext field is left in a indefinite state as well, and it happens
> to be filled with garbage in my case (actually it's a leftover from
> previous usage of the buffer_head).
>
> Now when bread() gets this buffer, it is passed to ll_rw_block() which
> is passing it to generic_make_request(), and, in turn (for many block
> devices including IDE) to __make_request.
>
> And finally, if elevator returns ELEVATOR_NO_MERGE, the b_reqnext field
> of the buffer_head structure is left uninitialized! So when b_end_io
> (and in turn end_that_request_first) is called, it looks at b_reqnext
> and sees there's another bufhead waiting for processing. What happens
> next is limited just by your imagination :-)
>
> Also I observed the ELEVATOR_BACK_MERGE case also has the same problem
> (bh->b_reqnext is left in a indefinite state). So maybe __make_request
> always assumes that b_reqnext is initially NULL? In this case the bug is
> in create_buffers which should NULLify this field. In any case, I'm
> leaving the final solution up to kernel wizards.

I dunno if you were the one posting this issue here some months ago?

Show me a regular kernel path that passes invalid b_reqnext to
__make_request? That would be a bug, indeed, but I've never heard of
such a bug. Most likely it's a bug in your driver, not initialising
b_reqnext. You can see the initialisor for buffer_heads is
init_buffer_head, which memsets the entire buffer_head. When a
buffer_head is detached from the request list, b_reqnext is cleared too.

--
Jens Axboe

2003-09-20 15:37:57

by Andrew Zabolotny

[permalink] [raw]
Subject: __make_request() bug and a fix variant

On Sat, 20 Sep 2003 13:37:37 +0200
Jens Axboe <[email protected]> wrote:

> I dunno if you were the one posting this issue here some months ago?
No, it wasn't me :-)

> Show me a regular kernel path that passes invalid b_reqnext to
> __make_request? That would be a bug, indeed, but I've never heard of
> such a bug. Most likely it's a bug in your driver, not initialising
> b_reqnext.
I have been calling bread() which was causing me troubles. bread does
not accept a buffer_head from outside, it gets a new one and returns it.
So I don't have any control over b_reqnext field - the bug happens
inside bread() between these lines:

bh = getblk(dev, block, size);
/* here bh_reqnext is already junk. In fact, I partially solved this
problem by making my own clone of bread() and setting b_reqnext
to NULL right here. But unfortunately, there is no guarantee we'll
fix all invalid buffer_heads - maybe some remain in the pool and
will be returned to other innocent drivers requesting them. */
if (buffer_uptodate(bh))
return bh;
/* and now ll_rw_block will try to merge the bh with those already in
the queue, and if it will take the ELEVATOR_NO_MERGE path, bh_reqnext
will still remain junk. */
ll_rw_block(READ, 1, &bh);

> You can see the initialisor for buffer_heads is
> init_buffer_head, which memsets the entire buffer_head. When a
> buffer_head is detached from the request list, b_reqnext is cleared
> too.
Ah, so I was correct that __make_request expects b_reqnext to be already
set to NULL. In this case the bug should be somewhere else - in some
code that returns buffer_head's back to the pool of buffers.

Interesting that right before the driver crashes in bread() I call
grok_partitions. I think the bug is somewhere there. I will do a new
debug session at Monday (the code that breaks is at work), so I will
post new details if I find any.

--
Greetings,
Andrew

2003-09-20 15:47:44

by Arjan van de Ven

[permalink] [raw]
Subject: Re: __make_request() bug and a fix variant

On Sat, 2003-09-20 at 17:36, Andrew Zabolotny wrote:
> On Sat, 20 Sep 2003 13:37:37 +0200
> Jens Axboe <[email protected]> wrote:
>
> > I dunno if you were the one posting this issue here some months ago?
> No, it wasn't me :-)
>
> > Show me a regular kernel path that passes invalid b_reqnext to
> > __make_request? That would be a bug, indeed, but I've never heard of
> > such a bug. Most likely it's a bug in your driver, not initialising
> > b_reqnext.
> I have been calling bread() which was causing me troubles. bread does
> not accept a buffer_head from outside, it gets a new one and returns it.
> So I don't have any control over b_reqnext field - the bug happens
> inside bread() between these lines:
>
> bh = getblk(dev, block, size);
> /* here bh_reqnext is already junk. In fact, I partially solved this
> problem by making my own clone of bread() and setting b_reqnext
> to NULL right here. But unfortunately, there is no guarantee we'll
> fix all invalid buffer_heads - maybe some remain in the pool and
> will be returned to other innocent drivers requesting them. */
> if (buffer_uptodate(bh))
> return bh;
> /* and now ll_rw_block will try to merge the bh with those already in
> the queue, and if it will take the ELEVATOR_NO_MERGE path, bh_reqnext
> will still remain junk. */
> ll_rw_block(READ, 1, &bh);
>
> > You can see the initialisor for buffer_heads is
> > init_buffer_head, which memsets the entire buffer_head. When a
> > buffer_head is detached from the request list, b_reqnext is cleared
> > too.
> Ah, so I was correct that __make_request expects b_reqnext to be already
> set to NULL. In this case the bug should be somewhere else - in some
> code that returns buffer_head's back to the pool of buffers.
>
> Interesting that right before the driver crashes in bread() I call
> grok_partitions. I think the bug is somewhere there. I will do a new
> debug session at Monday (the code that breaks is at work), so I will
> post new details if I find any.

it may help if you post the url to your full driver code here; others
may help you debugging....


Attachments:
signature.asc (189.00 B)
This is a digitally signed message part

2003-09-24 09:21:33

by Jens Axboe

[permalink] [raw]
Subject: Re: __make_request() bug and a fix variant

On Sat, Sep 20 2003, Andrew Zabolotny wrote:
> On Sat, 20 Sep 2003 13:37:37 +0200
> Jens Axboe <[email protected]> wrote:
>
> > I dunno if you were the one posting this issue here some months ago?
> No, it wasn't me :-)
>
> > Show me a regular kernel path that passes invalid b_reqnext to
> > __make_request? That would be a bug, indeed, but I've never heard of
> > such a bug. Most likely it's a bug in your driver, not initialising
> > b_reqnext.
> I have been calling bread() which was causing me troubles. bread does
> not accept a buffer_head from outside, it gets a new one and returns it.
> So I don't have any control over b_reqnext field - the bug happens
> inside bread() between these lines:
>
> bh = getblk(dev, block, size);
> /* here bh_reqnext is already junk. In fact, I partially solved this
> problem by making my own clone of bread() and setting b_reqnext
> to NULL right here. But unfortunately, there is no guarantee we'll
> fix all invalid buffer_heads - maybe some remain in the pool and
> will be returned to other innocent drivers requesting them. */
> if (buffer_uptodate(bh))
> return bh;
> /* and now ll_rw_block will try to merge the bh with those already in
> the queue, and if it will take the ELEVATOR_NO_MERGE path, bh_reqnext
> will still remain junk. */
> ll_rw_block(READ, 1, &bh);

Looks very odd, there must be a bug elsewhere. What else is junk in the
bh?

It follows that if you submit a buffer_head for io, it must be properly
initialized for io. Nobody complains that if b_blocknr is crap that data
ends up in the wrong location. Likewise, b_reqnext must be initialized
to NULL.

> > You can see the initialisor for buffer_heads is
> > init_buffer_head, which memsets the entire buffer_head. When a
> > buffer_head is detached from the request list, b_reqnext is cleared
> > too.
> Ah, so I was correct that __make_request expects b_reqnext to be already
> set to NULL. In this case the bug should be somewhere else - in some
> code that returns buffer_head's back to the pool of buffers.

Exactly!

> Interesting that right before the driver crashes in bread() I call
> grok_partitions. I think the bug is somewhere there. I will do a new
> debug session at Monday (the code that breaks is at work), so I will
> post new details if I find any.

Please do.

--
Jens Axboe

2003-09-24 19:38:18

by Andrew Zabolotny

[permalink] [raw]
Subject: Re: __make_request() bug and a fix variant

On Sat, 20 Sep 2003 13:37:37 +0200
Jens Axboe <[email protected]> wrote:

> You can see the initialisor for buffer_heads is
> init_buffer_head, which memsets the entire buffer_head. When a
> buffer_head is detached from the request list, b_reqnext is cleared
> too.
I did some more research on this subject. Indeed, I was wrong. The
problem is that when you acquire a buffer_head by calling
kmem_cache_alloc and you return it with kmem_cache_free the returned
buffer can be filled with any garbage *except* the b_reqnext field
which *should* be set to NULL, otherwise the next driver who'll get this
buffer_head will most probably crash.

Not that relying on this field being NULL is a bad practice (because
in kernel performance is top priority), but it's not clean (and
nowhere mentioned - even in the source code). So I think it would be a
Good Thing{tm} to mention somewhere(say at least in buffers.c before
put_unused_buffer_head) that before returning a buffer to the common
pool you should ensure that b_reqnext is NULL (although it is not
related only to put_unused_buffer_head but to kmem_cache_free as well -
but the later is not related to buffer_head's:).

Sorry for borrowed time and thank very much for assistance.

--
Greetings,
Andrew