LinuxLists.cc - Re: Blocked processes <=> Elevator starvation?

2000-10-27 20:44:29

Subject: Re: Blocked processes <=> Elevator starvation?

On Fri, Oct 27 2000, Rui Sousa wrote:
> I finally had time to give this a better look. It now seems the problem
> is in the VM system.
>
> I patched a test10-pre4 kernel with kdb, then started two "diff -ur
> linux-2.4.0testX linux-2.4.0testY > log1" and two "find / -true >
> log". After this I tried cat"ing" a small file. The cat never
> returned. At this point I entered kdb and did a stack trace on the "cat"
> process:
>
> schedule()
> ___wait_on_page()
> do_generic_file_read()
> generic_file_read()
> sys_read()
> system_call()
>
> So it seems the process is either in a loop in ___wait_on_page()
> racing for the PageLock or it never wakes-up... (I guess I could add a
> printk to check which)
> Unfortunately I didn't find anything obviously wrong with the code.
> I hope you can do a better job tracking the problem down.

Rik is right, just because you are seeing long waits on wait_on_page
doesn't make it a vm problem. When a I/O on a page completes, the
page will be unlocked and wait_on_page can grab it -- so I/O stalls
would results in this behaviour.

Could you try this patch:

*.kernel.org/pub/linux/kernel/people/axboe/patches/2.4.0-test10-pre6/blk-7.bz2

and see if it makes a difference?

--
* Jens Axboe <[email protected]>
* SuSE Labs

2000-10-28 03:14:53

by Rui Sousa

[permalink] [raw]

Subject: Re: Blocked processes <=> Elevator starvation?

On Fri, 27 Oct 2000, Jens Axboe wrote:

...
> > So it seems the process is either in a loop in ___wait_on_page()
> > racing for the PageLock or it never wakes-up... (I guess I could add a
> > printk to check which)
> > Unfortunately I didn't find anything obviously wrong with the code.
> > I hope you can do a better job tracking the problem down.
>
> Rik is right, just because you are seeing long waits on wait_on_page
> doesn't make it a vm problem. When a I/O on a page completes, the
> page will be unlocked and wait_on_page can grab it -- so I/O stalls
> would results in this behaviour.
>
> Could you try this patch:
>
> *.kernel.org/pub/linux/kernel/people/axboe/patches/2.4.0-test10-pre6/blk-7.bz2
>
> and see if it makes a difference?

After adding

#define ELEVATOR_HOLE_MERGE 3

to linux/include/linux/elevator.h it compiled ok.
There were still some stalls but they only lasted a couple of
seconds. The patch did make a difference and for the better.

Rui Sousa

2000-10-28 03:25:45

by Jens Axboe

[permalink] [raw]

Subject: Re: Blocked processes <=> Elevator starvation?

On Sat, Oct 28 2000, Rui Sousa wrote:
> After adding
>
> #define ELEVATOR_HOLE_MERGE 3
>
> to linux/include/linux/elevator.h it compiled ok.

Oops sorry, I'm on the road so the patch was extracted
from my packet writing tree (and not my regular tree).

> There were still some stalls but they only lasted a couple of
> seconds. The patch did make a difference and for the better.

Ok, still needs a bit of work. Thanks for the feedback.

--
* Jens Axboe <[email protected]>
* SuSE Labs

2000-10-29 18:28:01

by Miles Lane

[permalink] [raw]

Subject: Re: Blocked processes <=> Elevator starvation?

Jens Axboe wrote:

> On Sat, Oct 28 2000, Rui Sousa wrote:
>
>> After adding
>>
>> #define ELEVATOR_HOLE_MERGE 3
>>
>> to linux/include/linux/elevator.h it compiled ok.
>
>
> Oops sorry, I'm on the road so the patch was extracted
> from my packet writing tree (and not my regular tree).
>
>
>> There were still some stalls but they only lasted a couple of
>> seconds. The patch did make a difference and for the better.
>
>
> Ok, still needs a bit of work. Thanks for the feedback.

Have you resolved this problem completely, now?

I am testing the USB Storage support with my ORB backup
drive. When I run:

dd if=/dev/zero of=/dev/sda bs=1k count=2G

The drive gets data quickly for about thirty seconds.
Then the throughput drops off to about ten percent
of its previous transfer rate. This dropoff appears to
be due to conflict over accessing filesystems. Specifically,
I have USB_STORAGE_DEBUG enabled, which shoots a ton of
debugging output into my kernel log. When the throughput
to the ORB drive falls off, all writing to the syslog
ceases. At least, that's what "tail -f" shows.

I would be happy to test any patches you have for this
problem.

I hope this helps,

Miles

2000-10-29 21:55:24

by Jens Axboe

[permalink] [raw]

Subject: Re: Blocked processes <=> Elevator starvation?

On Sun, Oct 29 2000, Miles Lane wrote:
> >> There were still some stalls but they only lasted a couple of
> >> seconds. The patch did make a difference and for the better.
> >
> >
> > Ok, still needs a bit of work. Thanks for the feedback.
>
> Have you resolved this problem completely, now?
>
> I am testing the USB Storage support with my ORB backup
> drive. When I run:
>
> dd if=/dev/zero of=/dev/sda bs=1k count=2G
>
> The drive gets data quickly for about thirty seconds.
> Then the throughput drops off to about ten percent
> of its previous transfer rate. This dropoff appears to
> be due to conflict over accessing filesystems. Specifically,
> I have USB_STORAGE_DEBUG enabled, which shoots a ton of
> debugging output into my kernel log. When the throughput
> to the ORB drive falls off, all writing to the syslog
> ceases. At least, that's what "tail -f" shows.
>
> I would be happy to test any patches you have for this
> problem.

Could you send vmstat 1 info from the start of the copy
and until the i/o rate drops off?

--
* Jens Axboe <[email protected]>
* SuSE Labs

2000-10-29 22:25:02

by Jeff Merkey

[permalink] [raw]

Subject: Re: Blocked processes <=> Elevator starvation?

I am posting the completed NWFS 2.4.4 tommorrow, and it NEVER exhibits
this lockup problem on the console, no matter how busy the I/O subsystem
underneath becomes. I think this is probably because I use my own
elevator and LRU and don't use Linus's buffer cache. Whatever is
causing it I would guess is related to the way the buffer cache
interacts with ll_rw_block(). It's NOT in the VFS since if it were, I
would be seeing it as well, and I don't. I do see it when I use the
buffer cache with LINUX_BUFFER_CACHE = 1. I think it may be related to
the bdflush daemon and the way it interacts with ll_rw_block(). The
whole buffer cache needs some serious rework anyway, since it's physical
and not logical, and clustered file systems that use it will always be
buffering multiple data on the systems accross a cluster. We need
something that uses a logical partition semantic.

Jeff

Jens Axboe wrote:
>
> On Sun, Oct 29 2000, Miles Lane wrote:
> > >> There were still some stalls but they only lasted a couple of
> > >> seconds. The patch did make a difference and for the better.
> > >
> > >
> > > Ok, still needs a bit of work. Thanks for the feedback.
> >
> > Have you resolved this problem completely, now?
> >
> > I am testing the USB Storage support with my ORB backup
> > drive. When I run:
> >
> > dd if=/dev/zero of=/dev/sda bs=1k count=2G
> >
> > The drive gets data quickly for about thirty seconds.
> > Then the throughput drops off to about ten percent
> > of its previous transfer rate. This dropoff appears to
> > be due to conflict over accessing filesystems. Specifically,
> > I have USB_STORAGE_DEBUG enabled, which shoots a ton of
> > debugging output into my kernel log. When the throughput
> > to the ORB drive falls off, all writing to the syslog
> > ceases. At least, that's what "tail -f" shows.
> >
> > I would be happy to test any patches you have for this
> > problem.
>
> Could you send vmstat 1 info from the start of the copy
> and until the i/o rate drops off?
>
> --
> * Jens Axboe <[email protected]>
> * SuSE Labs
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> Please read the FAQ at http://www.tux.org/lkml/

2000-10-31 00:43:23

by Miles Lane

[permalink] [raw]

Subject: Re: Blocked processes <=> Elevator starvation?

Jens Axboe wrote:

> On Sun, Oct 29 2000, Miles Lane >
>>>> There were still some >>>> seconds. The patch >>>
>>>
>>> Ok, still needs a bit of >>
>> Have you resolved this problem >>
>> I am testing the USB Storage >> drive. When I run:
>>
>> dd if=/dev/zero of=/dev/sda >>
>> The drive gets data quickly >> Then the throughput drops off >> of its previous transfer rate. >> be due to conflict over accessing filesystems. >> I have USB_STORAGE_DEBUG enabled, >> debugging output into my kernel log. >> to the ORB drive falls off, >> ceases. At least, that's what >>
>> I would be happy to test any >> problem.
>
>
> Could you send vmstat 1 info from > and until the i/o rate drops off?

I just reproduced the problem in test10-pre7. output you requested:

vmstat 1
procs memory swap r b w swpd free buff cache si so 1 0 0 0 126308 5724 59480 0 0 23 0 0 0 0 126292 5724 59480 0 0 0 0 0 0 126292 5724 59480 0 0 1 2 2 0 59768 64308 59508 0 0 0 2 3 0 59460 64580 59508 0 0 0 2 2 0 58672 65272 59508 0 0 0 2 2 0 58272 65628 59508 0 0 0 2 2 0 57868 65984 59508 0 0 0 2 2 0 57464 66340 59508 0 0 1 2 2 0 56600 67100 59512 0 0 0 2 2 0 56192 67460 59512 0 0 0 2 2 0 55888 67728 59512 0 0 0 2 2 0 55484 68084 59512 0 0 0 2 2 0 54648 68820 59512 0 0 1 2 2 0 54240 69176 59512 0 0 1 2 2 0 53824 69532 59512 0 0 0 2 2 0 53424 69884 59516 0 0 2 2 2 0 52172 70380 59516 0 0 3 2 2 0 51728 70716 59516 0 0 1 2 2 0 51312 71064 59516 0 0 2 2 2 0 50972 71420 59516 0 0 0 2 2 0 50100 72176 59520 0 0 1 2 2 0 49692 72532 59520 0 0 1 2 2 0 49292 72888 59520 0 0 0 2 2 0 48580 73516 59520 0 0 1 2 2 0 48180 73868 59520 0 0 0 2 2 0 47776 74224 59520 0 0 0 2 2 0 47372 74580 59520 0 0 0 2 2 0 46572 75288 59520 0 0 0 2 2 0 46168 75644 59520 0 0 0 2 2 0 45764 76000 59520 0 0 0 2 2 0 45300 76324 59520 0 0 0 2 2 0 44896 76680 59520 0 0 0 2 2 0 44088 77392 59520 0 0 1 1 2 0 43736 77680 59548 0 0 0 2 3 0 43072 78040 59592 0 0
>>> /var/log/kernel output >>> CRUNCH! <<<

0 2 3 0 42656 78384 59592 0 0 0 2 3 0 42656 78384 59592 0 0 0 2 3 0 42656 78384 59592 0 0 0 2 3 0 42656 78384 59592 0 0 0 2 3 0 42656 78384 59592 0 0 0 2 3 0 42656 78384 59592 0 0 0 2 3 0 42656 78384 59592 0 0 procs memory swap r b w swpd free buff cache si so 1 2 3 0 42656 78384 59592 0 0 0 2 3 0 42652 78384 59592 0 0 0 2 3 0 42652 78384 59592 0 0 0 2 3 0 42652 78384 59592 0 0 0 2 3 0 42652 78384 59592 0 0 1 2 3 0 42652 78384 59592 0 0 0 2 3 0 42652 78384 59592 0 0 1 2 3 0 42652 78384 59592 0 0 1 2 3 0 42648 78384 59592 0 0 1 2 3 0 42648 78384 59592 0 0 0 2 3 0 42648 78384 59592 0 0 1 2 3 0 42648 78384 59592 0 0 0 2 3 0 42648 78384 59592 0 0 1 2 3 0 42648 78384 59592 0 0 0 2 3 0 42648 78384 59592 0 0 1 2 3 0 42648 78384 59592 0 0 0 2 3 0 42648 78384 59592 0 0 1 2 3 0 42648 78384 59592 0 0 0 2 3 0 42648 78384 59592 0 0 1 2 3 0 42640 78384 59592 0 0 0 2 3 0 42636 78384 59592 0 0
wrote:
stalls but they only lasted a couple of
did make a difference and for the better.
work. Thanks for the feedback.
completely, now?
support with my ORB backup
bs=1k count=2G
for about thirty seconds.
to about ten percent
This dropoff appears to
Specifically,
which shoots a ton of
When the throughput
all writing to the syslog
"tail -f" shows.
patches you have for this
the start of the copy
Here's the
io system cpu
bi bo in cs us sy id
9 162 306 6 2 92
0 0 297 786 4 4 92
0 0 210 593 2 1 97
0 17425 950 37095 11 66 23
0 645 317 1169 0 1 99
0 657 293 1928 0 2 98
0 647 283 1228 1 0 99
0 647 284 1230 0 4 96
0 648 284 1221 0 0 100
0 662 295 2062 0 0 100
0 522 285 1237 0 1 99
0 644 282 1040 0 1 99
0 646 285 1227 0 1 99
0 661 299 2019 0 2 98
0 648 406 1549 1 3 96
0 649 637 2073 6 1 93
0 649 466 1705 2 1 97
0 656 1011 3612 14 4 82
0 647 1994 2329 9 6 85
0 636 1883 1517 12 9 79
0 636 1092 1541 12 0 88
0 671 611 2266 19 6 75
0 638 305 682 1 1 98
0 650 368 1574 0 1 99
0 638 289 613 0 5 95
0 636 278 542 1 0 99
0 637 285 595 0 0 100
0 636 274 513 0 8 92
0 638 273 525 0 1 99
0 637 281 588 0 2 98
0 509 272 508 0 0 100
0 636 276 548 1 0 99
0 636 273 523 0 0 100
0 650 281 1307 0 7 93
0 1279 908 2637 1 8 91
0 1660 1281 4119 5 6 89
stopped being emitted here <<<
0 259 271 551 0 0 100
0 5 271 499 0 0 100
0 5 272 511 0 2 98
0 4 268 502 0 0 100
0 5 272 508 0 0 100
0 5 274 523 0 0 100
0 5 274 508 0 0 100
io system cpu
bi bo in cs us sy id
0 5 273 515 0 0 100
0 5 273 511 0 0 100
0 5 273 513 0 0 100
0 5 272 585 0 0 100
0 5 291 946 0 0 100
0 5 310 563 6 0 94
0 5 272 520 0 0 100
0 5 271 502 0 0 100
0 5 300 608 0 0 100
0 4 307 633 0 1 99
0 5 276 526 0 0 100
0 5 278 529 0 0 100
0 5 274 512 0 4 96
0 5 273 514 0 0 100
0 5 272 511 0 0 100
0 5 270 498 0 0 100
0 5 271 512 0 0 100
0 5 273 506 0 0 100
0 5 386 926 3 0 97
0 18 434 1072 2 2 96
0 30 292 759 2 0 98

2000-11-03 17:53:46

by Jens Axboe

[permalink] [raw]

Subject: Re: Blocked processes <=> Elevator starvation?

On Mon, Oct 30 2000, Miles Lane wrote:
> I just reproduced the problem in test10-pre7. Here's the
> output you requested:

Thanks

> vmstat 1
> procs memory swap io system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
> 1 1 2 0 43736 77680 59548 0 0 0 1279 908 2637 1 8 91
> 0 2 3 0 43072 78040 59592 0 0 0 1660 1281 4119 5 6 89
>
> >>> /var/log/kernel output stopped being emitted here <<<
> >>> CRUNCH! <<<
>
> 0 2 3 0 42656 78384 59592 0 0 0 259 271 551 0 0 100
> 0 2 3 0 42656 78384 59592 0 0 0 5 271 499 0 0 100

Yes, I would assume this would be the case. The current vm does not
behave very well once the cache has been filled up. I'm not knocking
the new vm, this has been a problem for some time. Basically I/O
performance goes badly down the drain at this point.

--
* Jens Axboe <[email protected]>
* SuSE Labs

2000-11-07 20:52:57

by Giuliano Pochini

[permalink] [raw]

Subject: Re: Blocked processes <=> Elevator starvation?

> I just reproduced the problem in test10-pre7. Here's the
> output you requested:
>
> vmstat 1
> procs memory swap io system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
> 0 2 2 0 45764 76000 59520 0 0 0 509 272 508 0 0 100
> 0 2 2 0 45300 76324 59520 0 0 0 636 276 548 1 0 99
> 0 2 2 0 44896 76680 59520 0 0 0 636 273 523 0 0 100
> 0 2 2 0 44088 77392 59520 0 0 0 650 281 1307 0 7 93
> 1 1 2 0 43736 77680 59548 0 0 0 1279 908 2637 1 8 91
> 0 2 3 0 43072 78040 59592 0 0 0 1660 1281 4119 5 6 89
>
> >>> /var/log/kernel output stopped being emitted here <<<
> >>> CRUNCH! <<<
>
> 0 2 3 0 42656 78384 59592 0 0 0 259 271 551 0 0 100
> 0 2 3 0 42656 78384 59592 0 0 0 5 271 499 0 0 100
> 0 2 3 0 42656 78384 59592 0 0 0 5 272 511 0 2 98
> 0 2 3 0 42656 78384 59592 0 0 0 4 268 502 0 0 100
> 0 2 3 0 42656 78384 59592 0 0 0 5 272 508 0 0 100
> 0 2 3 0 42656 78384 59592 0 0 0 5 274 523 0 0 100

test7 had the problem that when the rq queue if full new requests are no
longer
merged together (the disk is slow because it writes a lot of 512 bytes blocks
instead of 128K blocks). There is a patch for this. There still is the
problem that when the rq queue if full other processes can no more access to
*any*
device that uses ll_rw_block() and they remain blocked in D state.

Bye.