LinuxLists.cc - Re: 2.6.0-test9 - poor swap performance on low end machines

2003-11-02 23:07:19

Subject: Re: 2.6.0-test9 - poor swap performance on low end machines

On Friday 31 October 2003 3:57 am, Rik van Riel wrote:
> On Wed, 29 Oct 2003, Chris Vine wrote:
> > However, on a low end machine (200MHz Pentium MMX uniprocessor with only
> > 32MB of RAM and 70MB of swap) I get poor performance once extensive use
> > is made of the swap space.
>
> Could you try the patch Con Kolivas posted on the 25th ?
>
> Subject: [PATCH] Autoregulate vm swappiness cleanup

OK. I have now done some testing.

The default swappiness in the kernel (without Con's patch) is 60. This gives
hopeless swapping results on a 200MHz Pentium with 32MB of RAM once the
amount of memory swapped out exceeds about 15 to 20MB. A static swappiness
of 10 gives results which work under load, with up to 40MB swapped out (I
haven't tested beyond that). Compile times with a test file requiring about
35MB of swap and with everything else the same are:

2.4.22 - average of 1 minute 35 seconds
2.6.0-test9 (swappiness 10) - average of 5 minutes 56 seconds

A swappiness of 5 on the test compile causes the machine to hang in some kind
of "won't swap/can't continue without more memory" stand-off, and a
swappiness of 20 starts the machine thrashing to the point where I stopped
the compile. A swappiness of 10 would complete anything I threw at it and
without excessive thrashing, but more slowly (and using a little more swap
space) than 2.4.22.

With Con's dynamic swappiness patch things were worse, rather than better.
With no load, the swappiness (now read only) was around 37. Under load with
the test compile, swappiness went up to around 62, thrashing began, and after
30 minutes the compile still had not completed, swappiness had reached 70,
and I abandoned it.

Chris.

2003-11-03 00:49:13

by Con Kolivas

[permalink] [raw]

Subject: Re: 2.6.0-test9 - poor swap performance on low end machines

On Mon, 3 Nov 2003 10:06, Chris Vine wrote:
> On Friday 31 October 2003 3:57 am, Rik van Riel wrote:
> > On Wed, 29 Oct 2003, Chris Vine wrote:
> > > However, on a low end machine (200MHz Pentium MMX uniprocessor with
> > > only 32MB of RAM and 70MB of swap) I get poor performance once
> > > extensive use is made of the swap space.
> >
> > Could you try the patch Con Kolivas posted on the 25th ?
> >
> > Subject: [PATCH] Autoregulate vm swappiness cleanup
>
> OK. I have now done some testing.
>
> The default swappiness in the kernel (without Con's patch) is 60. This
> gives hopeless swapping results on a 200MHz Pentium with 32MB of RAM once
> the amount of memory swapped out exceeds about 15 to 20MB. A static
> swappiness of 10 gives results which work under load, with up to 40MB
> swapped out (I haven't tested beyond that). Compile times with a test file
> requiring about 35MB of swap and with everything else the same are:
>
> 2.4.22 - average of 1 minute 35 seconds
> 2.6.0-test9 (swappiness 10) - average of 5 minutes 56 seconds
>
> A swappiness of 5 on the test compile causes the machine to hang in some
> kind of "won't swap/can't continue without more memory" stand-off, and a
> swappiness of 20 starts the machine thrashing to the point where I stopped
> the compile. A swappiness of 10 would complete anything I threw at it and
> without excessive thrashing, but more slowly (and using a little more swap
> space) than 2.4.22.
>
> With Con's dynamic swappiness patch things were worse, rather than better.
> With no load, the swappiness (now read only) was around 37. Under load
> with the test compile, swappiness went up to around 62, thrashing began,
> and after 30 minutes the compile still had not completed, swappiness had
> reached 70, and I abandoned it.

Well I was considering adding the swap pressure to this algorithm but I had
hoped 2.6 behaved better than this under swap overload which is what appears
to happen to yours. Can you try this patch? It takes into account swap
pressure as well. It wont be as aggressive as setting the swappiness manually
to 10, but unlike a swappiness of 10 it will be more useful over a wide range
of hardware and circumstances.

Con

P.S. patches available here: http://ck.kolivas.org/patches

Attachments:

(No filename) (2.22 kB)
patch-test9-am-5 (2.27 kB)
Download all attachments

2003-11-03 21:13:58

by Chris Vine

[permalink] [raw]

Subject: Re: 2.6.0-test9 - poor swap performance on low end machines

On Monday 03 November 2003 12:48 am, Con Kolivas wrote:

> Well I was considering adding the swap pressure to this algorithm but I had
> hoped 2.6 behaved better than this under swap overload which is what
> appears to happen to yours. Can you try this patch? It takes into account
> swap pressure as well. It wont be as aggressive as setting the swappiness
> manually to 10, but unlike a swappiness of 10 it will be more useful over a
> wide range of hardware and circumstances.

Hi,

I applied the patch.

The test compile started in a similar way to the compile when using your first
patch. swappiness under no load was 37. At the beginning of the compile it
went up to 67, but when thrashing was well established it started to come
down slowly. After 40 minutes of thrashing it came down to 53. At that
point I stopped the compile attempt (which did not complete).

So, there is a slight move in the right direction, but given that a swappiness
of 20 generates thrashing with 32 MB of RAM when more than about 20MB of
memory is swapped out, it is a drop in the ocean.

The conclusion appears to be that for low end systems, once memory swapped out
reaches about 60% of installed RAM the swap ceases to work effectively unless
swappiness is much more aggressively low than your patch achieves. The
ability manually to tune it therefore seems to be required (and even then,
2.4.22 is considerably better, compiling the test file in about 1 minute 35
seconds).

I suppose one question is whether I would get the same thrashiness with my
other machine (which has 512MB of RAM) once more than about 300MB is swapped
out. However, I cannot answer that question as I do not have anything here
which makes memory demands of that kind.

Chris.

2003-11-04 02:56:25

by Con Kolivas

[permalink] [raw]

Subject: Re: 2.6.0-test9 - poor swap performance on low end machines

On Tue, 4 Nov 2003 08:13, Chris Vine wrote:
> On Monday 03 November 2003 12:48 am, Con Kolivas wrote:
> > Well I was considering adding the swap pressure to this algorithm but I
> > had hoped 2.6 behaved better than this under swap overload which is what
> > appears to happen to yours. Can you try this patch? It takes into account
> > swap pressure as well. It wont be as aggressive as setting the swappiness
> > manually to 10, but unlike a swappiness of 10 it will be more useful over
> > a wide range of hardware and circumstances.

>
> The test compile started in a similar way to the compile when using your
> first patch. swappiness under no load was 37. At the beginning of the
> compile it went up to 67, but when thrashing was well established it
> started to come down slowly. After 40 minutes of thrashing it came down to
> 53. At that point I stopped the compile attempt (which did not complete).
>
> So, there is a slight move in the right direction, but given that a
> swappiness of 20 generates thrashing with 32 MB of RAM when more than about
> 20MB of memory is swapped out, it is a drop in the ocean.
>
> The conclusion appears to be that for low end systems, once memory swapped
> out reaches about 60% of installed RAM the swap ceases to work effectively
> unless swappiness is much more aggressively low than your patch achieves.
> The ability manually to tune it therefore seems to be required (and even
> then, 2.4.22 is considerably better, compiling the test file in about 1
> minute 35 seconds).
>
> I suppose one question is whether I would get the same thrashiness with my
> other machine (which has 512MB of RAM) once more than about 300MB is
> swapped out. However, I cannot answer that question as I do not have
> anything here which makes memory demands of that kind.

That's pretty much what I expected. Overall I'm happier with this later
version as it doesn't impact on the noticable improvement on systems that are
not overloaded, yet keeps performance at least that of the untuned version. I
can tune it to be better for this work load but it would be to the detriment
of the rest.

Ultimately this is the problem I see with 2.6 ; there is no way for the vm to
know that "all the pages belonging to the currently running tasks should try
their best to fit into the available space by getting an equal share". It
seems the 2.6 vm gives nice emphasis to the most current task, but at the
detriment of other tasks that are on the runqueue and still need ram. The
original design of the 2.6 vm didn't even include this last ditch effort at
taming swappiness with the "knob", and behaved as though the swapppiness was
always set at 100. Trying to tune this further with just the swappiness value
will prove futile as can be seen by the "best" setting of 20 in your test
case still taking 4 times longer to compile the kernel.

This is now a balance tradeoff of trying to set a value that works for your
combination of the required ram of the applications you run concurrently, the
physical ram and the swap ram. As you can see from your example, in your
workload it seems there would be no point having more swap than your physical
ram since even if it tries to use say 40Mb it just drowns in a swapstorm.
Clearly this is not the case in a machine with more ram in different
circumstances, as swapping out say openoffice and mozilla while it's not
being used will not cause any harm to a kernel compile that takes up all the
available physical ram (it would actually be beneficial). Fortunately most
modern machines' ram vs application sizes are of the latter balance.

There's always so much more you can do...

wli, riel care to comment?

Cheers,
Con

2003-11-04 22:08:24

by Chris Vine

[permalink] [raw]

Subject: Re: 2.6.0-test9 - poor swap performance on low end machines

On Tuesday 04 November 2003 2:55 am, Con Kolivas wrote:
> That's pretty much what I expected. Overall I'm happier with this later
> version as it doesn't impact on the noticable improvement on systems that
> are not overloaded, yet keeps performance at least that of the untuned
> version. I can tune it to be better for this work load but it would be to
> the detriment of the rest.
>
> Ultimately this is the problem I see with 2.6 ; there is no way for the vm
> to know that "all the pages belonging to the currently running tasks should
> try their best to fit into the available space by getting an equal share".
> It seems the 2.6 vm gives nice emphasis to the most current task, but at
> the detriment of other tasks that are on the runqueue and still need ram.
> The original design of the 2.6 vm didn't even include this last ditch
> effort at taming swappiness with the "knob", and behaved as though the
> swapppiness was always set at 100. Trying to tune this further with just
> the swappiness value will prove futile as can be seen by the "best" setting
> of 20 in your test case still taking 4 times longer to compile the kernel.
>
> This is now a balance tradeoff of trying to set a value that works for your
> combination of the required ram of the applications you run concurrently,
> the physical ram and the swap ram. As you can see from your example, in
> your workload it seems there would be no point having more swap than your
> physical ram since even if it tries to use say 40Mb it just drowns in a
> swapstorm. Clearly this is not the case in a machine with more ram in
> different circumstances, as swapping out say openoffice and mozilla while
> it's not being used will not cause any harm to a kernel compile that takes
> up all the available physical ram (it would actually be beneficial).
> Fortunately most modern machines' ram vs application sizes are of the
> latter balance.

Your diagnosis looks right, but two points -

1. The test compile was not of the kernel but of a file in a C++ program
using quite a lot of templates and therefore which is quite memory intensive
(for the sake of choosing something, it was a compile of src/main.o in
http://www.cvine.freeserve.co.uk/efax-gtk/efax-gtk-2.2.2.src.tgz). It would
be a sad day if the kernel could not be compiled under 2.6 in 32MB of memory,
and I am glad to say that it does compile - my 2.6.0-test9 kernel compiles on
the 32MB machine in on average 45 minutes 13 seconds under kernel 2.4.22, and
in 54 minutes 11 seconds under 2.6.0-test9 with your latest patch, which is
not an enormous difference. (As a digression, in the 2.0 days the kernel
would compile in 6 minutes on the machine in question, and at the time I was
very impressed.)

2. Being able to choose a manual setting for swappiness is not "futile". As
I mentioned in an earlier post, a swappiness of 10 will enable 2.6.0-test9 to
compile the things I threw at it on a low end machine, albeit slowly, whereas
with dynamic swappiness it would not compile at all. So the difference is
between being able to do something and not being able to do it.

Chris.

2003-11-04 22:31:17

by Con Kolivas

[permalink] [raw]

Subject: Re: 2.6.0-test9 - poor swap performance on low end machines

On Wed, 5 Nov 2003 09:08, Chris Vine wrote:
> Your diagnosis looks right, but two points -
>
> 1. The test compile was not of the kernel but of a file in a C++ program
> using quite a lot of templates and therefore which is quite memory
> intensive (for the sake of choosing something, it was a compile of
> src/main.o in
> http://www.cvine.freeserve.co.uk/efax-gtk/efax-gtk-2.2.2.src.tgz). It
> would be a sad day if the kernel could not be compiled under 2.6 in 32MB of
> memory, and I am glad to say that it does compile - my 2.6.0-test9 kernel
> compiles on the 32MB machine in on average 45 minutes 13 seconds under
> kernel 2.4.22, and in 54 minutes 11 seconds under 2.6.0-test9 with your
> latest patch, which is not an enormous difference. (As a digression, in
> the 2.0 days the kernel would compile in 6 minutes on the machine in
> question, and at the time I was very impressed.)

Phew. It would be sad if it couldn't compile a kernel indeed.
>
> 2. Being able to choose a manual setting for swappiness is not "futile".
> As I mentioned in an earlier post, a swappiness of 10 will enable
> 2.6.0-test9 to compile the things I threw at it on a low end machine,
> albeit slowly, whereas with dynamic swappiness it would not compile at all.
> So the difference is between being able to do something and not being able
> to do it.

I agree with you on that; I meant it would be futile trying to get the compile
times back to 2.4 levels with just this tunable modified alone (statically or
dynamically)... which means we should look elsewhere for ways to tackle this.

Con