2003-08-21 21:40:46

by Yoav Weiss

[permalink] [raw]
Subject: Bug in drivers/block/ll_rw_blk.c ?

A few days ago I posted the report attached below. After some more
research, I'm starting to think I've hit a bug in ll_rw_blk.c.

If the maintainer of the block dev subsystem happens to be reading
this, please contact me on the list or by mail.

Thanks,
Yoav Weiss

---------- Forwarded message ----------
Date: Tue, 19 Aug 2003 22:34:42 +0300 (IDT)
From: Yoav Weiss <[email protected]>
To: [email protected]
Subject: disk stalls - request disappears until kicked

While researching stalls of a cloop device under recent 2.4.x kernels,
I ran across what seems to be a bug in the request handling initiated by
do_generic_file_read().

The cloop (compressed loop) code I'm debugging is this one:

http://developer.linuxtag.net/knoppix/sources/cloop_1.0-2.tar.gz

I'm testing with kernel 2.4.22-rc2.

The code uses do_generic_file_read() in a similar manner to loop.o.
Under stress-testing, reading processes stall on TASK_UNINTERRUPTIBLE and
remain in that state until another process accesses some non-cached file
on the underlying filesystem. As soon as such access occurs, the stalled
processes resume.

The stalled process waits on a page in mm/filemap.c:1505:

/* Again, try some read-ahead while waiting for the page to finish.. */
generic_file_readahead(reada_ok, filp, inode, page);
------> wait_on_page(page);


I found who wakes it up in calls that don't stall:
unlock_page(), called from
drivers/block/ll_rw_blk.c:end_that_request_first().
bh->b_end_io(bh, uptodate) seems to do it.

Tracking end_that_request_first()'s callers leads all the way back to the
IDE code, and none of it seem to be called on the stalled request until
its kicked by having another process perform some access that causes a
wakeup of the stalled request.

Seems like some request queue doesn't get fully consumed under stress but
so far I've been unable to find what causes it. I'm not even sure if the
request hasn't been passed to the hardware or the hardware handled it and
the BH somehow mishandled it.

Having traced this to the IDE code, I tried the same with a USB disk
instead. It withstood the same stress-testing much longer than the IDE
did, although eventually it stalled in a similar manner. I'm not sure
whether the problem is in ll_rw_blk.c/filemap.c or happens to be shared by
ide and usb-storage/sd.

Curiously, the problem seems to happen when the underlying filesystem is
ext3, but doesn't happen when its vfat as far as I can tell. Could be
related to the fact that ext3 uses generic_file_read and vfat doesn't.

Anyone else experiencing similar stalls ? Suggestions ?

Yoav Weiss




2003-08-22 15:36:06

by Livio Baldini Soares

[permalink] [raw]
Subject: Re: Bug in drivers/block/ll_rw_blk.c ?

Hi Yoav!

Yoav Weiss writes:
> A few days ago I posted the report attached below. After some more
> research, I'm starting to think I've hit a bug in ll_rw_blk.c.
>
> If the maintainer of the block dev subsystem happens to be reading
> this, please contact me on the list or by mail.

I'm not the maintainer, but I'm pretty sure that there is no problem in
that specific code... but I think you're hitting another bug in the 2.4
tree (read below...)

[...snip...]

> The cloop (compressed loop) code I'm debugging is this one:
>
> http://developer.linuxtag.net/knoppix/sources/cloop_1.0-2.tar.gz
>
> I'm testing with kernel 2.4.22-rc2.

[...snip...]

> The stalled process waits on a page in mm/filemap.c:1505:
>
> /* Again, try some read-ahead while waiting for the page to finish.. */
> generic_file_readahead(reada_ok, filp, inode, page);
> ------> wait_on_page(page);
>
>
> I found who wakes it up in calls that don't stall:
> unlock_page(), called from
> drivers/block/ll_rw_blk.c:end_that_request_first().
> bh->b_end_io(bh, uptodate) seems to do it.


From this description it seems that you are hitting a bug which was
discussed to death here on the list. Here's a thread with 143 messages for
you:

http://marc.theaimsgroup.com/?t=105400721000001&r=5&w=2

And here are the threads in which a solution was dicussed:

http://marc.theaimsgroup.com/?t=105519528200001&r=1&w=2
http://marc.theaimsgroup.com/?t=105769525800005&r=3&w=2

Notice, however, that the patch Chris, Andrea, Jens and others made for
this problem is _already_ included in 2.4 (so, yes, 2.4.22-rc2 has the
fix).

So, you are probably hitting the same bug, which was not fixed 100%. If
you think that your test is very easily reproducible and can shed more
light on this problem, perhaps you should write to Chris, Andrea and Jens
(with Cc: to the list), and show them the test. I don't know if they would
be willing to spend more time on this issue, specially with 2.6 around the
corner...

best regards,

--
Livio B. Soares

2003-08-22 18:25:19

by Yoav Weiss

[permalink] [raw]
Subject: io-stalls again (was "Re: Bug in drivers/block/ll_rw_blk.c")

On Fri, 22 Aug 2003, Livio Baldini Soares wrote:

[...snip...]

[ for people who jump in, my original description of the problem can be
found here: http://lkml.org/lkml/2003/8/19/259 ]

> From this description it seems that you are hitting a bug which was
> discussed to death here on the list. Here's a thread with 143 messages for
> you:
>
> http://marc.theaimsgroup.com/?t=105400721000001&r=5&w=2
>
> And here are the threads in which a solution was dicussed:
>
> http://marc.theaimsgroup.com/?t=105519528200001&r=1&w=2
> http://marc.theaimsgroup.com/?t=105769525800005&r=3&w=2
>
> Notice, however, that the patch Chris, Andrea, Jens and others made for
> this problem is _already_ included in 2.4 (so, yes, 2.4.22-rc2 has the
> fix).
>

Yes, I guess its related to the same problem. I think the patch actually
broke something. I see that it was introduced in 2.4.22-pre3, and thats
exactly where the problem became much worse. I switched back to
2.4.22-pre2 and it mostly works. It still stalls on extreme conditions
but not as easily as with later kernels. With pre7 and rc2 which I tested
lately, it happens very quickly under heavy load.

> So, you are probably hitting the same bug, which was not fixed 100%. If
> you think that your test is very easily reproducible and can shed more
> light on this problem, perhaps you should write to Chris, Andrea and Jens
> (with Cc: to the list), and show them the test. I don't know if they would
> be willing to spend more time on this issue, specially with 2.6 around the
> corner...
>

Not only did the patch fail to fix it 100%, but it actually made it a lot
worse in my case.

Its easily reproducable once you have a big cloop image in place, but I
guess that doesn't qualify as easy reproduction for busy kernel
developers. I hope someone will still take the time to look into it.

If someone has some speculations/suggestions but no time to test it, send
it to me and I'll run the tests and post the results.

The easiest way to trigger it with recent kernels is to download a large
cloop image such as the large file called KNOPPIX in the Knoppix ISO
image, attach it, and create a lot or load on it.

If someone wishes to try this, here's how I reproduce it:
* Download the latest ISO from http://knoppix.net/get.php
* mount -o loop the image and extract the file KNOPPIX/KNOPPIX
* Download cloop from
http://developer.linuxtag.net/knoppix/sources/cloop_1.0-2.tar.gz
* extract cloop, make, insmod cloop.o, mknod /dev/cloop b 200 0
* losetup /dev/cloop /path/to/KNOPPIX && mount /dev/cloop /mnt
* tar cf - /mnt >/dev/null
* while tar is running, access some random files in /mnt.

With 2.4.22-rc2 the above will stall in less than a minute and will remain
stalled until another process accesses other files in the filesystem
storing KNOPPIX.

It may be possible to reproduce the same stall with loop.o but takes much
longer. Could be related to the fact that cloop.o is a single thread
while loop.o has a separate reader thread. Could this affect the problem ?

Anyway, if someone has a suggested test/patch, post it and I'll post the
results. Hopefully we can nail this next-to-last bug :)

> best regards,
>
> --
> Livio B. Soares
>

Thanks,
Yoav Weiss

2003-08-24 21:58:11

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: io-stalls again (was "Re: Bug in drivers/block/ll_rw_blk.c")

On Fri, Aug 22, 2003 at 09:25:01PM +0300, Yoav Weiss wrote:
> It may be possible to reproduce the same stall with loop.o but takes much
> longer. Could be related to the fact that cloop.o is a single thread
> while loop.o has a separate reader thread. Could this affect the problem ?

It may not be related to cloop of course, but it would reduce the number
of variables for us to have an how-to-reproduce that doesn't involve
running kernel code outside the mainline kernel. If that's the only way
to reproduce we simply have to look into the whole cloop code first,
before we can actually look again into the mainline kernel code for
this.

Andrea

2003-08-28 01:11:38

by Yoav Weiss

[permalink] [raw]
Subject: Re: io-stalls again (was "Re: Bug in drivers/block/ll_rw_blk.c")

Andrea,

The only way to consistently get these stalls is with the cloop code.

I don't want to waste your time on reading non-mainstream code if the bug
is there rather than in the kernel, so I'm trying to figure out what the
relevant differences between loop and cloop are, to see if the bug hides
there.

So far, the only difference I see is that do_generic_file_read is called
from a separate thread in loop.o, via loop_thread() which was added by
Jens Axboe. In cloop.o, everything is done from the context of the
reading thread.

Jens, maybe you can help me a bit here. Is it wrong to call
do_generic_file_read in a loop-like driver, in the caller's context rather
than using a helper thread ? Could it somehow cause a stall under heavy
load ? When the stall occurs, it seems the last request never got
handled, but once another thread accesses the underlying fs/device, the
old request is handled along with the new one.

btw, I verified that the stall still occurs in 2.4.22. The no-stall patch
only made it worse.

Yoav Weiss

On Sun, 24 Aug 2003, Andrea Arcangeli wrote:

> On Fri, Aug 22, 2003 at 09:25:01PM +0300, Yoav Weiss wrote:
> > It may be possible to reproduce the same stall with loop.o but takes much
> > longer. Could be related to the fact that cloop.o is a single thread
> > while loop.o has a separate reader thread. Could this affect the problem ?
>
> It may not be related to cloop of course, but it would reduce the number
> of variables for us to have an how-to-reproduce that doesn't involve
> running kernel code outside the mainline kernel. If that's the only way
> to reproduce we simply have to look into the whole cloop code first,
> before we can actually look again into the mainline kernel code for
> this.
>
> Andrea
>