2002-10-03 20:38:12

by Andrew Morton

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice)

Dave Jones wrote:
>
> On Thu, Oct 03, 2002 at 08:57:13AM -0700, Linus Torvalds wrote:
>
> > The memory management issues would qualify for 3.0, but my argument there
> > is really that I doubt everybody really is happy yet. Which was why I
> > asked for people to test it and complain about VM behaviour - and we've
> > had some ccomplaints ("too swap-happy") although they haven't sounded like
> > really horrible problems.
>
> We still need some work for low memory boxes (where low isn't
> necessarily all that low). On my 128MB laptop I can lock up the box
> for a minute or two at a time by doing two things at the same time,
> like a bk pull, and switching desktops.

Specific version info and all the usual how-to-reproduce info
would help here. Things have changed a _lot_ in the past
week or two.

Comparisons with 2.4 are useful. Simple "here's how to
reproduce" instructions are 100% golden ;)

> I dread to think how a 16 or 32MB box performs these days..

Well last I looked, a 2.5 kernel with NR_CPUS=8 had 22MB
of unreclaimable memory by the time it reached the console
login prompt.

Yet John Bradford says that in swapless 8MB, 2.5.40 is "springier"
than 2.4.x, so weird.

Jens did some aggressive scaling work against the BIO pools
recently which saved a ton of memory, and 2.5.40 now consumes
slightly less than 2.4.x to get started.

But the major thing we can do for the tiny boxes is to scale back
much harder on the big caches, the mempools, etc. I hope to
be able to remove the radix-tree and pte_chain mempools altogether,
which will free up a quarter meg or so.


Apart from that, I'm reasonably happy with where the VM stands at
present. It's very simple, very fast to identify which pages to
replace, and pretty accurate and efficient at doing that.

It should be immune to our traditional catastrophic failure
scenarios, and that's something which we want to keep. There are
some ten- or twenty-percent regressions in some areas, but at this
time that's a reasonable price to pay for not locking up, not having
five-minute comas, not exhibiting massive stalls when there's a
lot of disk writeout, etc. I think history teaches us to value
simplicity, predictability and robustness over performance-in-corner-cases.

There are some OOM problems on really big highmem machines which
still need investigation. I expect they can be largely cleaned
up by making the throttling be per-zone rather than global. Which
would complete the migration of the VM to being a per-zone thing.
Zone fallbacks then become known only to the page allocator and
the VM proper only cares for individual zones.

The reverse map was a huge conceptual cleanup. It trumped a
whole class of nasty, fallible when-to-unmap decision making
logic.

Yeah, it swaps a lot. It's the use-it-or-lose-it VM, and it's
mean. People (damn them) don't like that.

Right now, I am rather disinclined to fix this via algorithmic changes,
by twiddling with aging-of-mapped-memory versus aging-of-pagecache,
or anything like that. Because any such algorithmic change tends to
unbalance things, and to cause incorrect latency under sudden load
changes which could cause false OOM failures, or excessive CPU burn.

What I'm more inclined to do is to leave things conceptually unchanged,
and to bolt a really obvious, bloody great ugly knob on the side;
maybe something as simple as:

if (mapped_memory / total_memory < sysctl_the_user_is_a_wimp)
only_reclaim_pagecache()

We shall see...


2002-10-03 21:57:56

by Dave Jones

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice)

On Thu, Oct 03, 2002 at 01:43:33PM -0700, Andrew Morton wrote:

> > > The memory management issues would qualify for 3.0, but my argument there
> > > is really that I doubt everybody really is happy yet. Which was why I
> > > asked for people to test it and complain about VM behaviour - and we've
> > > had some ccomplaints ("too swap-happy") although they haven't sounded like
> > > really horrible problems.
> >
> > We still need some work for low memory boxes (where low isn't
> > necessarily all that low). On my 128MB laptop I can lock up the box
> > for a minute or two at a time by doing two things at the same time,
> > like a bk pull, and switching desktops.
>
> Specific version info and all the usual how-to-reproduce info
> would help here. Things have changed a _lot_ in the past
> week or two.

That was 2.5.39 + bk from just before .40 hit the streets.
I'll pull something current in a tick, and give that a shot.

> Comparisons with 2.4 are useful. Simple "here's how to
> reproduce" instructions are 100% golden ;)

theres usually not too much going on on the laptop.
It runs enlightenment + gnome 1.4. A few gnome-terminals,
and thats about it. After bitkeeper had sucked down a few
changesets and started its "lets grind the disk for a while"
consistency thing, interactive feel is approaching nil.
Trying to focus a different window takes about 5 seconds minimum.
Switching desktops takes 30 seconds minimum.

My completely unscientific guess here is that bitkeeper is
whoring all the I/O bandwidth, and we're trying to swap at
the same time, which is getting starved.
I'll try and reproduce after some sleep with vmstat running
if this will be of use.

> > I dread to think how a 16 or 32MB box performs these days..
> Well last I looked, a 2.5 kernel with NR_CPUS=8 had 22MB
> of unreclaimable memory by the time it reached the console
> login prompt.

Ouch.

> Yet John Bradford says that in swapless 8MB, 2.5.40 is "springier"
> than 2.4.x, so weird.

Depends on what tests are I suppose. "springier" doesn't
really say too much. We do minimise memory usage in a few
places if mem<16M though iirc which could be helping this case.

> It should be immune to our traditional catastrophic failure
> scenarios, and that's something which we want to keep. There are
> some ten- or twenty-percent regressions in some areas, but at this
> time that's a reasonable price to pay for not locking up, not having
> five-minute comas, not exhibiting massive stalls when there's a
> lot of disk writeout, etc. I think history teaches us to value
> simplicity, predictability and robustness over performance-in-corner-cases.

Hmm, my case seems to be everything you say should not be happening
any more. Sorry 8-)
I *can't* be the only one seeing this though.
The laptop disk is no speed demon, but its quite nippy at ~12MB/s
For obvious reasons, having swap and / on the same disk is making
a considerable impact here.

Dave

--
| Dave Jones. http://www.codemonkey.org.uk

2002-10-04 02:28:17

by Andreas Boman

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice)

On Thu, 2002-10-03 at 17:05, Dave Jones wrote:
<SNIP>
> > > We still need some work for low memory boxes (where low isn't
> > > necessarily all that low). On my 128MB laptop I can lock up the box
> > > for a minute or two at a time by doing two things at the same time,
> > > like a bk pull, and switching desktops.
> >
> > Specific version info and all the usual how-to-reproduce info
> > would help here. Things have changed a _lot_ in the past
> > week or two.
> That was 2.5.39 + bk from just before .40 hit the streets.
> I'll pull something current in a tick, and give that a shot.
>
> > Comparisons with 2.4 are useful. Simple "here's how to
> > reproduce" instructions are 100% golden ;)
Usually its difficult with theese 'feeling' issues though...

> theres usually not too much going on on the laptop.
> It runs enlightenment + gnome 1.4. A few gnome-terminals,
> and thats about it. After bitkeeper had sucked down a few
> changesets and started its "lets grind the disk for a while"
> consistency thing, interactive feel is approaching nil.
> Trying to focus a different window takes about 5 seconds minimum.
> Switching desktops takes 30 seconds minimum.
>
> My completely unscientific guess here is that bitkeeper is
> whoring all the I/O bandwidth, and we're trying to swap at
> the same time, which is getting starved.
> I'll try and reproduce after some sleep with vmstat running
> if this will be of use.
>
<SNIP>
>
> > It should be immune to our traditional catastrophic failure
> > scenarios, and that's something which we want to keep. There are
> > some ten- or twenty-percent regressions in some areas, but at this
> > time that's a reasonable price to pay for not locking up, not having
> > five-minute comas, not exhibiting massive stalls when there's a
> > lot of disk writeout, etc. I think history teaches us to value
> > simplicity, predictability and robustness over performance-in-corner-cases.
>
> Hmm, my case seems to be everything you say should not be happening
> any more. Sorry 8-)
> I *can't* be the only one seeing this though.
You arent ;)

> The laptop disk is no speed demon, but its quite nippy at ~12MB/s
> For obvious reasons, having swap and / on the same disk is making
> a considerable impact here.
> Dave

I'm seeing similar behavior, though not to the extent you describe. 512M
ram, ~600 or so swap on a U160 scsi disk (only one disk in the box
-definitely need one more).

rpm -ba mozilla.spec and while its untar/gunziping i keep switching
desktops ("edge flip" in E) between one with a few Eterms and misc
stuff, and one with mozilla. At first it behaves fine, but eventually
the mouse pointer will start jerking around and itll be slightly slower
to switch, a little later the swapping starts and xmms will skip
(sometimes just once, othertimes repetedly). Once the untaring is done
and the build starts the box becomes responsive again.

Doing the same thing on 2.4.20-pre5aa2 xmms never skipped, starting a
build of mozilla and evolution at the same time, still no skipping. Drop
xmms and play a music video in MPLayer -still no skipping. I could even
move the MPlayer output window back and forth between the desktops
repetedly although i didnt move it around as fast as when i was just
switching desktops, without sound skips video playback did freeze up a
bit and left funky trails across the mozilla page at times), but sound
didnt skip. Sound started skipping when i had mozilla and evolution both
untar/ungzipping and I moved the window around madly between heads and
desktops.

the attached vmstat 1 from 2.5.40 is taken from when the build has just
started until a little after I killed it (when it had untared and
started ./configure). A little time goes by after i kill it I see a
little more IO and then the box just idles again.

--
Andreas Boman <[email protected]>


Attachments:
vmstat-2.5 (13.27 kB)
signature.asc (232.00 B)
This is a digitally signed message part
Download all attachments

2002-10-04 07:31:53

by jbradford

[permalink] [raw]
Subject: Re: [OT] 2.6 not 3.0 - (WAS Re: [PATCH-RFC] 4 of 4 - New problem logging macros, SCSI RAIDdevice)

> > Yet John Bradford says that in swapless 8MB, 2.5.40 is "springier"
> > than 2.4.x, so weird.
>
> Depends on what tests are I suppose. "springier" doesn't
> really say too much. We do minimise memory usage in a few
> places if mem<16M though iirc which could be helping this case.

Well, I've got the following:

486, SX-25 laptop, with 8 MB Ram, no swap, running 2.5.40 and also 2.4.19.
486, SX-20 laptop, with 4 MB Ram, 20 MB swap, running 2.2.21, and 2.2.13.

Both are capable of running the lastest Apache, with PHP support, and Lynx at a usable speed, (I use the 8 MB Ram machine for debugging small bits of PHP while I'm on the tube going up to London :-) ).

I know "feels springier" isn't very helpful, but what benchmarks do you expect me to run on machines with 120 Meg HDs? :-) Suggest something, and I'll give it a go. It's not really faster, just more responsive, (E.G. doing a updatedb, and using jed at the same time is better in 2.5.x).

By the way, I've got X11 running on the 4 meg one, and it's quite usable. I have even demoed a graphical browser accessing the local Apache, serving PHP content.

If anybody doesn't believe me, come along to Linux Expo UK next week, and see for yourselves :-).

John.