I would appreciate any expert insight into a problem that is causing me
considerable grief. Please cc to me directly. Thank you very much.
Problem: kswapd, kreclaimd, kupdated push load average high during simple
tar. Response of system drops such that even keystroke echos are
noticeably delayed.
Specifics:
Machine- 4 processor 700Mhz Dell with 4G Ram and 6G swap space running
stock Redhat 7.2 distribution. All disks are SCSI using ext2.
Command- from a remote machine this command is executed to the Linux
machine "tar cBf - . | rsh linux "(tar xBpf -)".
Manifestion of problem- As this command continues on a freshly booted
Linux machine the free memory reported by 'top' slowly goes to a low
number. When it bottoms out, the processes 'kswapd', 'kreclaimd', and
'kupdated' begin to run pushing the load average above 4 at times.
Responsiveness of machine drops dramatically with even keystroke echos
delayed for seconds.
I apologize if this is well known. If there is a simple solution I would
appreciate even a terse pointer to it. Thanks.
--
Art Hays [email protected] or [email protected]
Bldg 49 Rm 2A50 (301) 496-7143 (voice)
Nat. Institutes of Health (301) 402-0511 (fax)
Bethesda, MD 20892
Well known, yes -- simple solution, not yet :)). This problem has been
kicking around in one form or another for *months*, and although partial
solutions have made their way into more recent kernels, someone reports
issues of this nature on a more or less daily basis. What is happening is
that Linux sees nothing better to do with free memory, so it fills it up
with data from I/O into the page cache. Then when something comes along that
wants memory, the system goes into conniption fits trying to reclaim the
memory from the page cache and give it to the process that wants it.
There were a whole bunch of tuning parameters in the VM in 2.2 that got
dropped in 2.4; maybe re-instating some of them and returning them to their
rightful owner, the system administrator, would solve this problem once and
for all. But for some reason, those who control Linux have decided that this
is "a bug in the VM" and pursued fixes in code and the associated logic
rather than give us sysadmins what I believe is rightfully ours. I request
such tuning parameters at least once a week here, and get ignored. I'll keep
asking until I know enough about the code to put them in myself, assuming no
one has broken down and admitted that someone who's been performance tuning
operating systems since 1974 just might know what he's talking about :)).
Anyone else want to share my soapbox??? :))
--
M. Edward Borasky
[email protected]
http://www.borasky-research.net
On Wed, 2 Jan 2002, M. Edward Borasky wrote:
> There were a whole bunch of tuning parameters in the VM in 2.2 that got
> dropped in 2.4; maybe re-instating some of them and returning them to their
> rightful owner, the system administrator, would solve this problem once and
> for all. But for some reason, those who control Linux have decided that this
> is "a bug in the VM" and pursued fixes in code and the associated logic
> rather than give us sysadmins what I believe is rightfully ours.
Extra magic number twiddling is available in Andrea's -aa tree.
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
"M. Edward Borasky" wrote:
>
> ...
> There were a whole bunch of tuning parameters in the VM in 2.2 that got
> dropped in 2.4; maybe re-instating some of them and returning them to their
> rightful owner, the system administrator, would solve this problem once and
> for all.
> ...
>
> Anyone else want to share my soapbox??? :))
Nope. Not yet.
The VM system in Art's machine is not working correctly. It is swapping
and evicting useful data when it should be dropping written-back write()
pages. That's a bug, and there's no point in adding knobs to twiddle
the behaviour when the system clearly isn't working *as designed* yet.
If we reach the stage where everything is exactly operating as we designed
it to, and it _still_ fails under some usage patterns then yes, that's the
time to throw up our hands and add knobs.
But Art's kernel (what kernel is in RH7.2 anyway? 2.4.9 with vendor
hacks^Wfixes, I think) is nowhere near that stage.
And we, the kernel developers, should hang our heads over this. A
vendor-released, stable kernel is performing terribly with such a
simple workload. One year after the release of 2.4.0!
The good news is that 2.4.17 has pretty much slain this dragon. The
-aa patches are better still, and 2.4.18 will be even better than
that.
So where does this leave Art Hays? Yup, he's going to have to apply
the latest Service Pack. The rawhide kernel appears to be at 2.4.16,
which isn't recent enough. He'll need to build his own. I'd recommend
2.4.17-rc2 with http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.17rc2aa2.bz2
applied on top.
-
On January 3, 2002 06:15 am, Andrew Morton wrote:
> And we, the kernel developers, should hang our heads over this. A
> vendor-released, stable kernel is performing terribly with such a
> simple workload. One year after the release of 2.4.0!
To be fair, in the year leading up to 2.4.0 much energy was expended on
getting the bugs out of the unified and heaviliy threaded page+buffer
cache[1], at the expense of work on the memory manager, so 2001 ended up
being like a whole new kernel cycle. Anyway, the saving grace is that 2.2
managed to metamorphose from ugly duckling to... quite a nice duck, with
almost all the features of 2.4 from the user's point of view. So everybody
has something to run.
With 20 20 hindsight, the VM work could have been managed better but I don't
see why anybody's head needs to be hung. It was a bumpy road, we had to
change a few tires, but we got to the other side of the mountain.
--
Daniel
[1] According to Matt Dillon's interview today, FreeBSD went through the same
pain unifying their caches, and they have yet to seriously tackle the SMP
issues.
On Thu, 3 Jan 2002, Daniel Phillips wrote:
> On January 3, 2002 06:15 am, Andrew Morton wrote:
> > And we, the kernel developers, should hang our heads over this. A
> > vendor-released, stable kernel is performing terribly with such a
> > simple workload. One year after the release of 2.4.0!
>
> To be fair, in the year leading up to 2.4.0 much energy was expended on
> getting the bugs out of the unified and heaviliy threaded page+buffer
> cache[1], at the expense of work on the memory manager, so 2001 ended up
> being like a whole new kernel cycle. Anyway, the saving grace is that 2.2
> managed to metamorphose from ugly duckling to... quite a nice duck, with
> almost all the features of 2.4 from the user's point of view. So everybody
> has something to run.
>
> With 20 20 hindsight, the VM work could have been managed better but I don't
> see why anybody's head needs to be hung. It was a bumpy road, we had to
> change a few tires, but we got to the other side of the mountain.
We did? I'm running the last released kernel and today I got an OOM event
when 1.4 GB main memory was used for buffer cache. I have to babysit any
Linux 2.4 machines that have interesting workloads. 2.4 may have reached
a local maximum, but the ascent to the peak is still in front of us.
-jwb
> But Art's kernel (what kernel is in RH7.2 anyway? 2.4.9 with vendor
> hacks^Wfixes, I think) is nowhere near that stage.
7.2 is 2.4.7-ac ish, 7.2 + errata is 2.4.9-ac ish
> The good news is that 2.4.17 has pretty much slain this dragon. The
> -aa patches are better still, and 2.4.18 will be even better than
> that.
Bollocks. I get regular mails from large numbers of people who are stuck
at 2.4.12/13-ac and are hoping I'll do an update because their machines
die in hours or run 25-50% slower with 2.4.1x.
2.4.1x VM code is performing better under light loads but its absolutely
and completely hopeless under a real paging load. 2.4.17-aa is somewhat
better interestingly.
Alan
> Machine- 4 processor 700Mhz Dell with 4G Ram and 6G swap space running
> stock Redhat 7.2 distribution. All disks are SCSI using ext2.
What I/O - megaraid ?
> I apologize if this is well known. If there is a simple solution I would
> appreciate even a terse pointer to it. Thanks.
If you are running the 2.4.7 kernel from the distro make sure you get the
errata one
On Thu, 3 Jan 2002, Alan Cox wrote:
> 2.4.1x VM code is performing better under light loads but its
> absolutely and completely hopeless under a real paging load. 2.4.17-aa
> is somewhat better interestingly.
A quick 'make -j bzImage' test I did yesterday got the system
to use near 70% of its CPU time in user mode and 30% in system
mode. This was with 2.4.17-rmap-10b, btw.
Though I have to admit the rmap patches should be considered
experimental, the last one does seem to survive some loads
where the standard kernel falls over ;)
regards,
Rik
--
Shortwave goes a long way: irc.starchat.net #swl
http://www.surriel.com/ http://distro.conectiva.com/
On Thu, 3 Jan 2002 14:51:01 -0200 (BRST)
Rik van Riel <[email protected]> wrote:
> On Thu, 3 Jan 2002, Alan Cox wrote:
>
> > 2.4.1x VM code is performing better under light loads but its
> > absolutely and completely hopeless under a real paging load. 2.4.17-aa
> > is somewhat better interestingly.
>
> A quick 'make -j bzImage' test I did yesterday got the system
> to use near 70% of its CPU time in user mode and 30% in system
> mode. This was with 2.4.17-rmap-10b, btw.
And what kind of an argument is this? This is an honest question, really. If I
do this make I end up around 80-90% in user mode and the rest in system on a
standard 2.4.17 SMP box (configured with too less swap btw).
???
Regards,
Stephan
On Thu, 3 Jan 2002, Stephan von Krawczynski wrote:
> On Thu, 3 Jan 2002 14:51:01 -0200 (BRST)
> Rik van Riel <[email protected]> wrote:
> > On Thu, 3 Jan 2002, Alan Cox wrote:
> >
> > > 2.4.1x VM code is performing better under light loads but its
> > > absolutely and completely hopeless under a real paging load. 2.4.17-aa
> > > is somewhat better interestingly.
> >
> > A quick 'make -j bzImage' test I did yesterday got the system
> > to use near 70% of its CPU time in user mode and 30% in system
> > mode. This was with 2.4.17-rmap-10b, btw.
>
> And what kind of an argument is this? This is an honest question,
> really. If I do this make I end up around 80-90% in user mode and the
> rest in system on a standard 2.4.17 SMP box (configured with too less
> swap btw).
How much memory does that box have ?
In my case it was with 512 MB of RAM, the system went almost
900 MB into swap.
If your machine has one GB of RAM (or more), I expect the gccs
to fit mostly into RAM, which would give much better behaviour.
regards,
Rik
--
Shortwave goes a long way: irc.starchat.net #swl
http://www.surriel.com/ http://distro.conectiva.com/
Alan Cox wrote:
>
> > But Art's kernel (what kernel is in RH7.2 anyway? 2.4.9 with vendor
> > hacks^Wfixes, I think) is nowhere near that stage.
>
> 7.2 is 2.4.7-ac ish, 7.2 + errata is 2.4.9-ac ish
OK, thanks.
> > The good news is that 2.4.17 has pretty much slain this dragon. The
> > -aa patches are better still, and 2.4.18 will be even better than
> > that.
>
> Bollocks. I get regular mails from large numbers of people who are stuck
> at 2.4.12/13-ac and are hoping I'll do an update because their machines
> die in hours or run 25-50% slower with 2.4.1x.
I was referring to the swap and evict in the presence of heavy write
traffic.
> 2.4.1x VM code is performing better under light loads but its absolutely
> and completely hopeless under a real paging load. 2.4.17-aa is somewhat
> better interestingly.
>
s/interestingly/frustratingly/. -aa has some interesting changes to
the write scheduling as well. I just wish I knew what problem they're
solving.
-
> On Thu, 3 Jan 2002, Stephan von Krawczynski wrote:
> > On Thu, 3 Jan 2002 14:51:01 -0200 (BRST)
> > Rik van Riel <[email protected]> wrote:
> > > On Thu, 3 Jan 2002, Alan Cox wrote:
> > >
> > > > 2.4.1x VM code is performing better under light loads but its
> > > > absolutely and completely hopeless under a real paging load.
2.4.17-aa
> > > > is somewhat better interestingly.
> > >
> > > A quick 'make -j bzImage' test I did yesterday got the system
> > > to use near 70% of its CPU time in user mode and 30% in system
> > > mode. This was with 2.4.17-rmap-10b, btw.
> >
> > And what kind of an argument is this? This is an honest question,
> > really. If I do this make I end up around 80-90% in user mode and
the
> > rest in system on a standard 2.4.17 SMP box (configured with too
less
> > swap btw).
>
> How much memory does that box have ?
>
> In my case it was with 512 MB of RAM, the system went almost
> 900 MB into swap.
I cannot back this statement. Though my machine has 1 GB RAM I did the
make in a situation where countless processes were running amongst
them mail-client, Mozilla with several windows, seti, numerous xterms,
all in all 16 desktops full with this and that. I'd say it was a bit
more than 600 MB free (meaning cached of course). But I have only
256MB of swap. During the make a damn lot of paging was going on and
swap went from 70 MB up to about 210 MB, but that was it. Load was
around 154 at top. As the thing was over all came back to normal and
bzImage was working.
I did not see any problem.
I will drive mem down to around 400 MB tomorrow for another test.
Regards,
Stephan
An update: This particular behavior I observed no longer occurs with
2.4.9-13smp, the latest update supplied by Redhat. Thanks for all the
very helpful info I received.
> Problem: kswapd, kreclaimd, kupdated push load average high during simple
> tar. Response of system drops such that even keystroke echos are
> noticeably delayed.
>
> Specifics:
>
> Machine- 4 processor 700Mhz Dell with 4G Ram and 6G swap space running
> stock Redhat 7.2 distribution. All disks are SCSI using ext2.
>
> Command- from a remote machine this command is executed to the Linux
> machine "tar cBf - . | rsh linux "(tar xBpf -)".
>
> Manifestion of problem- As this command continues on a freshly booted
> Linux machine the free memory reported by 'top' slowly goes to a low
> number. When it bottoms out, the processes 'kswapd', 'kreclaimd', and
> 'kupdated' begin to run pushing the load average above 4 at times.
> Responsiveness of machine drops dramatically with even keystroke echos
> delayed for seconds.
--
Art Hays [email protected] or [email protected]
Bldg 49 Rm 2A50 (301) 496-7143 (voice)
Nat. Institutes of Health (301) 402-0511 (fax)
Bethesda, MD 20892