2002-01-07 18:46:28

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: [BUG] Error reading multiple large files

Hi all

I've sent this before, but as far as I can see, nothing's changed.

I'm having problems reading multiple large files at once. Reading 100 1GB
files at once.

What happens is, when the buffer cache gets filled up, it all stalls, and
transfer speed drops from 40-50 MB/s to a mere 2MB/s.

This has been tested on all versions from 2.4.16-2.4.18-pre1.

I've been testing Tux, Khttpd, apache 1.3.22, Apache 2, thttpd, cp and
dd to verify the bug.

Please help!

roy

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.


2002-01-08 00:14:59

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [BUG] Error reading multiple large files


Roy,

I suspect this is a use-once effect.

Could you please try http://surriel.com/patches/2.4/2.4.17-pre8-2ndchance
?

Thanks

On Mon, 7 Jan 2002, Roy Sigurd Karlsbakk wrote:

> Hi all
>
> I've sent this before, but as far as I can see, nothing's changed.
>
> I'm having problems reading multiple large files at once. Reading 100 1GB
> files at once.
>
> What happens is, when the buffer cache gets filled up, it all stalls, and
> transfer speed drops from 40-50 MB/s to a mere 2MB/s.


2002-01-09 10:45:10

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: Re: [BUG] Error reading multiple large files

> Roy,
>
> I suspect this is a use-once effect.
>
> Could you please try http://surriel.com/patches/2.4/2.4.17-pre8-2ndchance
> ?

I tried it, but without any success. The vmstat output below indicate what
happens. As it shows, the 'bi' is quite stable (but slightly falling)
around 30-35 megs per sec. After the buffer memory is filled up/used, it
tries to swap out a little, works itself downwards before stabalizing at
~900kB/sec - from two 120G IDE drives in RAID-0 !!!

thanks for any help.

roy

procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 99 0 0 583376 7456 191076 0 0 34822 0 669 391 0 6 94
0 99 0 0 513392 7540 258720 0 0 33864 0 657 458 0 9 91
0 99 0 0 447432 7608 322484 0 0 31916 0 639 417 0 6 93
0 99 0 0 379816 7692 387836 0 0 32718 0 666 354 1 7 92
0 99 0 0 308704 7784 456560 0 0 34408 0 673 410 0 7 93
0 99 0 0 237428 7876 525444 0 0 34486 0 670 407 1 7 92
0 99 0 0 171460 7988 589172 0 0 31920 0 630 387 0 6 93
0 99 0 0 110328 8080 648240 0 0 29582 0 634 295 1 4 95
0 99 0 0 41876 8172 714392 0 0 33118 0 1076 552 0 13 87
0 99 1 376 2340 1612 760852 0 0 33836 6 661 313 0 14 86
0 99 1 376 3312 1288 760240 0 0 29870 0 1018 595 1 11 88
0 99 2 376 3252 1260 760428 0 0 30616 22 667 447 1 11 88
0 99 0 376 3268 1296 760344 0 0 25324 0 647 414 0 8 91
0 99 1 376 3316 1320 760140 0 0 28272 0 700 500 0 6 93
0 99 0 376 2564 1352 760920 0 0 25960 0 758 455 1 7 91
0 99 0 376 3276 1372 760236 0 0 19910 0 746 415 0 6 93
0 99 0 376 3264 1404 760108 0 0 18396 0 824 454 1 7 92
0 99 0 376 3272 1444 760056 0 0 16324 0 846 480 0 6 94
0 99 0 376 3268 1472 760024 0 0 14414 0 886 493 0 6 94
0 99 0 376 3304 1500 759964 0 0 10630 0 891 463 0 6 94
0 99 0 376 3244 1532 759984 0 0 7198 0 913 449 0 6 94
0 99 0 376 3276 1564 759924 0 0 4730 0 920 488 0 4 96
0 99 0 376 3264 1568 759924 0 0 4102 0 966 479 0 6 94
0 99 0 376 3332 1584 759872 0 0 942 0 936 486 0 4 96
0 99 0 376 3288 1584 759884 0 0 902 0 948 464 1 4 94
0 99 0 376 3268 1584 759904 0 0 970 0 956 499 0 3 97


--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.

2002-01-09 13:56:58

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: Re: [BUG] Error reading multiple large files

> you really should try akpm's "[patch, CFT] improved disk read latency"
> patch. it sounds almost perfect for your application.

hi

It seemed like it helped first, but after a while, some 99 processes went
Defunct, and locked. After this, the total 'bi' as reported from vmstat
went down to ~ 900kB per sec

What should I do? Run Windoze?

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.

2002-01-09 14:00:28

by Jens Axboe

[permalink] [raw]
Subject: Re: [BUG] Error reading multiple large files

On Wed, Jan 09 2002, Roy Sigurd Karlsbakk wrote:
> > you really should try akpm's "[patch, CFT] improved disk read latency"
> > patch. it sounds almost perfect for your application.
>
> hi
>
> It seemed like it helped first, but after a while, some 99 processes went
> Defunct, and locked. After this, the total 'bi' as reported from vmstat
> went down to ~ 900kB per sec

Bad news for Andrew's patch, however I really don't think it would have
helped you much in the first place. The problem seems to be down to
loosing read-ahead when cache ends up eating all of available memory,
I've seen this effect myself too. Maybe the vm needs to be more
aggressive about tossing out pages when this happens, I'm quite sure
that would help tremendously for this workload.

--
Jens Axboe

2002-01-09 14:03:48

by Rik van Riel

[permalink] [raw]
Subject: Re: [BUG] Error reading multiple large files

On Wed, 9 Jan 2002, Roy Sigurd Karlsbakk wrote:

> > you really should try akpm's "[patch, CFT] improved disk read latency"
> > patch. it sounds almost perfect for your application.

> It seemed like it helped first, but after a while, some 99 processes
> went Defunct, and locked. After this, the total 'bi' as reported from
> vmstat went down to ~ 900kB per sec
>
> What should I do?

I've done a little bit of low memory testing with my -rmap
VM patch, the system seems to be working just fine with 8MB
of RAM ...

If you have the time, could you try the following patch ?

http://surriel.com/patches/2.4/2.4.17-rmap-11a


regards,

Rik
--
"Linux holds advantages over the single-vendor commercial OS"
-- Microsoft's "Competing with Linux" document

http://www.surriel.com/ http://distro.conectiva.com/

2002-01-09 14:04:38

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: Re: [BUG] Error reading multiple large files

> > It seemed like it helped first, but after a while, some 99 processes went
> > Defunct, and locked. After this, the total 'bi' as reported from vmstat
> > went down to ~ 900kB per sec
>
> Bad news for Andrew's patch, however I really don't think it would have
> helped you much in the first place. The problem seems to be down to
> loosing read-ahead when cache ends up eating all of available memory,
> I've seen this effect myself too. Maybe the vm needs to be more
> aggressive about tossing out pages when this happens, I'm quite sure
> that would help tremendously for this workload.

Thanks for answering. I'm really close to giving up and have already
started testing on *BSD unices.

It seems reasonable if that (tossing old pages) could be the problem.

Thanks, guys

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.

2002-01-09 14:07:00

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: Re: [BUG] Error reading multiple large files

> I've done a little bit of low memory testing with my -rmap
> VM patch, the system seems to be working just fine with 8MB
> of RAM ...
>
> If you have the time, could you try the following patch ?
>
> http://surriel.com/patches/2.4/2.4.17-rmap-11a

Do you think this is the case? I've 1GB memory in the box (no highmem)

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.

2002-01-09 14:19:08

by MrChuoi

[permalink] [raw]
Subject: Re: [BUG] Error reading multiple large files

On Wednesday 09 January 2002 08:56 pm, Roy Sigurd Karlsbakk wrote:
> > you really should try akpm's "[patch, CFT] improved disk read latency"
> > patch. it sounds almost perfect for your application.
>
> hi
>
> It seemed like it helped first, but after a while, some 99 processes went
> Defunct, and locked. After this, the total 'bi' as reported from vmstat
> went down to ~ 900kB per sec
>
> What should I do? Run Windoze?
Windoze can do it better ? :-\

2002-01-09 15:45:26

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: Re: [BUG] Error reading multiple large files (analyzing... ?)

> Bad news for Andrew's patch, however I really don't think it would have
> helped you much in the first place. The problem seems to be down to
> loosing read-ahead when cache ends up eating all of available memory,
> I've seen this effect myself too. Maybe the vm needs to be more
> aggressive about tossing out pages when this happens, I'm quite sure
> that would help tremendously for this workload.

I just wanted to tell I've tried this on 2.4.9 (as a beleive is before the new
vm came in)with the same result. What's interesting, is that the error shows up,
not at the time the buffer memory is used, but seemingly after it's been used
_twice_ (or three times?).

Can this help anyone getting a clue what the h... happens here?

The computer's got 1GB memory. Highmem is disabled.

roy

vmstat output:
---
procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
0 100 0 0 511448 2492 244828 0 0 29938 0 1970 882 1 15 84
0 100 0 0 440652 2516 313320 0 0 34258 0 407 338 1 6 93
0 100 0 0 367956 2608 383576 0 0 35164 0 433 355 0 6 93
0 100 0 0 298868 2696 450348 0 0 34382 0 414 363 1 6 93
0 100 0 0 229452 2876 517344 0 0 32620 0 1037 560 1 12 87
0 100 0 0 158716 3040 585636 0 0 34238 0 429 353 0 7 93
0 100 0 0 88372 3156 653596 0 0 34034 0 436 278 0 7 92
0 100 0 0 19068 3204 720616 0 0 33550 0 444 219 0 9 91
0 100 0 0 2560 3284 737580 0 0 38622 0 468 1282 1 24 75
0 100 0 0 2768 3360 737340 0 0 36626 0 664 2763 1 23 76
0 100 0 0 2816 3440 737120 0 0 35466 0 559 2262 1 18 81
0 100 0 0 3056 3500 736716 0 0 36492 0 486 1963 1 19 80
0 100 0 0 3056 3548 736708 0 0 39018 0 543 2101 1 14 86
0 100 0 0 3056 3628 736508 0 0 36868 0 875 1707 1 17 83
0 100 0 0 2856 3688 736728 0 0 29708 0 493 1793 0 14 86
0 100 0 0 3056 3716 736448 0 0 32000 34 526 1888 1 16 83
0 100 0 0 3056 3740 736448 0 0 29102 0 563 1400 1 14 85
0 100 0 0 3056 3764 736396 0 0 24992 0 618 1266 1 9 90
0 100 0 0 3056 3780 736384 0 0 24678 0 671 1184 1 13 86
2 98 0 0 3004 3792 736464 0 0 20288 0 1071 1537 1 16 82
0 100 0 0 3008 3804 736376 0 0 14620 0 932 812 0 10 90
0 100 0 0 3008 3804 736372 0 0 6776 0 1122 726 1 8 90
0 100 0 0 3056 3812 736328 0 0 4326 0 1195 714 0 6 93
0 100 0 0 3056 3812 736352 0 0 1826 0 1142 623 0 4 96
0 100 0 0 3056 3812 736364 0 0 1026 0 1093 563 0 5 95
0 100 0 0 3056 3812 736364 0 0 1058 0 1114 567 1 3 96
0 100 0 0 3056 3812 736360 0 0 1066 0 1124 586 0 3 96
0 100 0 0 3056 3816 736356 0 0 2474 0 1112 574 0 4 96


--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.

2002-01-09 18:03:36

by Roy Sigurd Karlsbakk

[permalink] [raw]
Subject: Re: [BUG] Error reading multiple large files :-)

Thanks, guys!

This actually solved the problem, and even gave me a little increase in
read speed as a bonus.

Is this somehow planned for a 2.4 merge?

roy

On Wed, 9 Jan 2002, Rik van Riel wrote:

> On Wed, 9 Jan 2002, Roy Sigurd Karlsbakk wrote:
>
> > > you really should try akpm's "[patch, CFT] improved disk read latency"
> > > patch. it sounds almost perfect for your application.
>
> > It seemed like it helped first, but after a while, some 99 processes
> > went Defunct, and locked. After this, the total 'bi' as reported from
> > vmstat went down to ~ 900kB per sec
> >
> > What should I do?
>
> I've done a little bit of low memory testing with my -rmap
> VM patch, the system seems to be working just fine with 8MB
> of RAM ...
>
> If you have the time, could you try the following patch ?
>
> http://surriel.com/patches/2.4/2.4.17-rmap-11a
>
>
> regards,
>
> Rik
> --
> "Linux holds advantages over the single-vendor commercial OS"
> -- Microsoft's "Competing with Linux" document
>
> http://www.surriel.com/ http://distro.conectiva.com/
>

--
Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA

Computers are like air conditioners.
They stop working when you open Windows.

2002-01-10 15:38:58

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: [BUG] Error reading multiple large files

On Mon, Jan 07, 2002 at 07:45:57PM +0100, Roy Sigurd Karlsbakk wrote:
> Hi all
>
> I've sent this before, but as far as I can see, nothing's changed.
>
> I'm having problems reading multiple large files at once. Reading 100 1GB
> files at once.
>
> What happens is, when the buffer cache gets filled up, it all stalls, and
> transfer speed drops from 40-50 MB/s to a mere 2MB/s.
>
> This has been tested on all versions from 2.4.16-2.4.18-pre1.

please try to reproduce on 2.4.18pre2aa2:

ftp://ftp.us.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.18pre2aa2.bz2

>
> I've been testing Tux, Khttpd, apache 1.3.22, Apache 2, thttpd, cp and
> dd to verify the bug.

tux latest version is just included in -aa. please don't apply any
incremental patch before testing 18pre2aa2 to be sure the problem is not
introduced by some other patch.

>
> Please help!
>
> roy
>
> --
> Roy Sigurd Karlsbakk, MCSE, MCNE, CLS, LCA
>
> Computers are like air conditioners.
> They stop working when you open Windows.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/


Andrea