2000-10-27 16:29:53

by Rui Sousa

[permalink] [raw]
Subject: Re: Blocked processes <=> Elevator starvation?

On Sun, 8 Oct 2000, Rik van Riel wrote:

> On Sun, 8 Oct 2000, Rui Sousa wrote:
>
> > After starting 2 processes that scan a lot of files (diff, find,
> > slocate, ...) it's impossible to run any other processes that
> > touch the disk, they will stall until one of the first two stop.
> > Could this be a sign of starvation in the elevator code?
>
> It could well be. I've seen this problem too and don't
> really have another explanation for this phenomenon.
>
> OTOH, maybe there is another reason for it that hasn't
> been found yet ;)
>

I finally had time to give this a better look. It now seems the problem
is in the VM system.

I patched a test10-pre4 kernel with kdb, then started two "diff -ur
linux-2.4.0testX linux-2.4.0testY > log1" and two "find / -true >
log". After this I tried cat"ing" a small file. The cat never
returned. At this point I entered kdb and did a stack trace on the "cat"
process:

schedule()
___wait_on_page()
do_generic_file_read()
generic_file_read()
sys_read()
system_call()

So it seems the process is either in a loop in ___wait_on_page()
racing for the PageLock or it never wakes-up... (I guess I could add a
printk to check which)
Unfortunately I didn't find anything obviously wrong with the code.
I hope you can do a better job tracking the problem down.

As a reminder:
i686, UP, 64Mb RAM, IDE disks, ext2.

Rui Sousa


2000-10-27 16:32:03

by Rik van Riel

[permalink] [raw]
Subject: Re: Blocked processes <=> Elevator starvation?

On Fri, 27 Oct 2000, Rui Sousa wrote:

> I finally had time to give this a better look. It now seems the
> problem is in the VM system.

*sigh*

> schedule()
> ___wait_on_page()
> do_generic_file_read()
> generic_file_read()
> sys_read()
> system_call()
>
> So it seems the process is either in a loop in ___wait_on_page()
> racing for the PageLock or it never wakes-up... (I guess I could
> add a printk to check which)

It is spinning in ___wait_on_page() because the page never
becomes available, because the IO doesn't get scheduled to
disk in time.

This appears to be an elevator problem, not a VM problem.

regards,

Rik
--
"What you're running that piece of shit Gnome?!?!"
-- Miguel de Icaza, UKUUG 2000

http://www.conectiva.com/ http://www.surriel.com/