2002-10-25 03:20:36

by James Cleverdon

[permalink] [raw]
Subject: Kswapd madness in 2.4 kernels

Folks,

We have some customers with some fairly beefy servers. They can get the
system into an unusable state that has been reported on lkml before. Namely,
kswapd starts taking 100% of a CPU, any other process that attempts to
allocate memory starts to spin on memory locks, etc. The box slows way down,
and is pretty much dead when this happens. Kswapd never drops below 50%.
Slabinfo shows that the inode and buffer caches have grown enormously, and
low memory is nearly gone. (But several Gb of high memory is available.)

This pathalogical behavior can be triggered by something as simple as:
"cd / ; cp -r . /raidfs"
Where /raidfs and root are HW RAID arrays.

The two attached patches applied to 2.4.19 fix the problem on our test boxes.

Are these patches still considered a good idea for 2.4? Is there something
better I should be using?

TIA,

--
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com


Attachments:
Andrea_Archangeli-inode_highmem_imbalance.patch (5.48 kB)
Andrew_Morton-2.4_VM_sucks._Again.patch (10.67 kB)
Download all attachments

2002-10-25 04:25:55

by Andrew Morton

[permalink] [raw]
Subject: Re: Kswapd madness in 2.4 kernels

James Cleverdon wrote:
>
> Andrea_Archangeli-inode_highmem_imbalance.patch Type: text/x-diff

That's in -aa kernels, is correct and is needed.

> Andrew_Morton-2.4_VM_sucks._Again.patch Type: text/x-diff

hmm. Someone seems to have renamed my nuke-buffers patch ;)

My main concern is that this was a real quickie; it does a very
aggressive takedown of buffer_heads. Andrea's kernels contain a
patch which takes a very different approach. See
http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20pre8aa2/05_vm_16_active_free_zone_bhs-1

I don't think anyone has tried that patch in isolation though...

If nuke-buffers passes testing and doesn't impact performance then
fine. A more cautious approach would be to use the active_free_zone_bhs
patch. If that proves inadequate then add in the "read" part of nuke-buffers.
That means dropping the fs/buffer.c part.

2002-10-25 16:51:13

by Rik van Riel

[permalink] [raw]
Subject: Re: Kswapd madness in 2.4 kernels

On Thu, 24 Oct 2002, James Cleverdon wrote:

> We have some customers with some fairly beefy servers. They can get the
> system into an unusable state that has been reported on lkml before.

> The two attached patches applied to 2.4.19 fix the problem on our test boxes.
>
> Are these patches still considered a good idea for 2.4? Is there something
> better I should be using?

Yes, these patches are a good idea. I'm curious why they
haven't been submitted to Marcelo yet ;)

Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
Current spamtrap: <a href=mailto:"[email protected]">[email protected]</a>

2002-11-05 22:07:06

by James Cleverdon

[permalink] [raw]
Subject: Re: Kswapd madness in 2.4 kernels

Status report:

Due to dependencies, I didn't try the two recommended patches alone. I ran
Andrea's 2.4.20-pre10aa1 kernel on the test load for one week. Low memory
was conserved and kswapd never went out of control. Presumably,
05_vm_16_active_free_zone_bhs-1 did the job for buffers, and the inode patch
continued to work.

Are there any plans on getting these into 2.4.21?


On Thursday 24 October 2002 09:32 pm, Andrew Morton wrote:
> James Cleverdon wrote:
> > Andrea_Archangeli-inode_highmem_imbalance.patch Type: text/x-diff
>
> That's in -aa kernels, is correct and is needed.
>
> > Andrew_Morton-2.4_VM_sucks._Again.patch Type: text/x-diff
>
> hmm. Someone seems to have renamed my nuke-buffers patch ;)
>
> My main concern is that this was a real quickie; it does a very
> aggressive takedown of buffer_heads. Andrea's kernels contain a
> patch which takes a very different approach. See
> http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.20pre
>8aa2/05_vm_16_active_free_zone_bhs-1
>
> I don't think anyone has tried that patch in isolation though...
>
> If nuke-buffers passes testing and doesn't impact performance then
> fine. A more cautious approach would be to use the active_free_zone_bhs
> patch. If that proves inadequate then add in the "read" part of
> nuke-buffers. That means dropping the fs/buffer.c part.
> -


On Friday 25 October 2002 09:57 am, Rik van Riel wrote:
> On Thu, 24 Oct 2002, James Cleverdon wrote:
> > We have some customers with some fairly beefy servers. They can get the
> > system into an unusable state that has been reported on lkml before.
> >
> > The two attached patches applied to 2.4.19 fix the problem on our test
> > boxes.
> >
> > Are these patches still considered a good idea for 2.4? Is there
> > something better I should be using?
>
> Yes, these patches are a good idea. I'm curious why they
> haven't been submitted to Marcelo yet ;)
>
> Rik


--
James Cleverdon
IBM xSeries Linux Solutions
{jamesclv(Unix, preferred), cleverdj(Notes)} at us dot ibm dot com

2002-11-06 11:06:25

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Kswapd madness in 2.4 kernels

On Tue, Nov 05, 2002 at 02:13:00PM -0800, James Cleverdon wrote:
> Status report:
>
> Due to dependencies, I didn't try the two recommended patches alone. I ran
> Andrea's 2.4.20-pre10aa1 kernel on the test load for one week. Low memory
> was conserved and kswapd never went out of control. Presumably,
> 05_vm_16_active_free_zone_bhs-1 did the job for buffers, and the inode patch
> continued to work.

yes, for stability the related-bh patch is known to be more than enough
and this is a nice confirmation. I would also like to integrated some
bit of andrew's nuke-buffer patch for performance reasons (to maximize
the free memory utilization), not for stability. For stability teaching
the VM about the problem is the right fix IMHO, good to have regardless
in case for some reason the bh cannot be nucked if we can't take a lock
or similar. But the bit that drops the bhs after reads may improve
memory utilization when there is no memory pressure at all. The part I
wouldn't merge in 2.4 from the Andrew's patch is the drop after writes,
that has the potential of slowing down rewrite. I'm not saying it will
slow down the rewrite performance, but there is definitely the
potential. My fix instead has no way to affect read/writes w/o memory
pressure compared to mainline (i.e. in a <1G machine).

> Are there any plans on getting these into 2.4.21?

the related-bhs fix should be definitely integrated. Then Andrew's patch
that drops bh after reads may be an obvious further optimization but not
really related to this bug anymore. I didn't experimented with it yet
because it was low prio and it can only improve performance by saving
some ram. And even only merging the drop bhs after read from the
nuke-buffers, still might decrease performance in a read+write case in
the same pagecache block, but that case probably isn't very common,
rewrite instead is more likely to happen.

> > Yes, these patches are a good idea. I'm curious why they
> > haven't been submitted to Marcelo yet ;)
> >

they have been sumitted, I think I covered this bit with Marcelo during
the kernel summit, but you know there are many other important patches
to merge, the google fix etc.. but we must not forget that lots of them
are been just integrated in 2.4.20pre, it is normal that we discuss
only the pending fix that aren't been integrated yet and we forget about
the ones that are just included ;).

Andrea

2002-11-06 16:11:28

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: Kswapd madness in 2.4 kernels



On Wed, 6 Nov 2002, Andrea Arcangeli wrote:

> On Tue, Nov 05, 2002 at 02:13:00PM -0800, James Cleverdon wrote:
> > Status report:
> >
> > Due to dependencies, I didn't try the two recommended patches alone. I ran
> > Andrea's 2.4.20-pre10aa1 kernel on the test load for one week. Low memory
> > was conserved and kswapd never went out of control. Presumably,
> > 05_vm_16_active_free_zone_bhs-1 did the job for buffers, and the inode patch
> > continued to work.
>
> yes, for stability the related-bh patch is known to be more than enough
> and this is a nice confirmation. I would also like to integrated some
> bit of andrew's nuke-buffer patch for performance reasons (to maximize
> the free memory utilization), not for stability. For stability teaching
> the VM about the problem is the right fix IMHO, good to have regardless
> in case for some reason the bh cannot be nucked if we can't take a lock
> or similar. But the bit that drops the bhs after reads may improve
> memory utilization when there is no memory pressure at all. The part I
> wouldn't merge in 2.4 from the Andrew's patch is the drop after writes,
> that has the potential of slowing down rewrite. I'm not saying it will
> slow down the rewrite performance, but there is definitely the
> potential. My fix instead has no way to affect read/writes w/o memory
> pressure compared to mainline (i.e. in a <1G machine).
>
> > Are there any plans on getting these into 2.4.21?

I will look closely at -aa during 2.4.21-pre stage, yes.

Andrea, please bug me on that.

2002-11-06 16:34:27

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Kswapd madness in 2.4 kernels

On Wed, Nov 06, 2002 at 11:18:42AM -0200, Marcelo Tosatti wrote:
> I will look closely at -aa during 2.4.21-pre stage, yes.
>
> Andrea, please bug me on that.

sure ;)

Andrea