2003-09-19 17:16:20

by Roland Bless

[permalink] [raw]
Subject: Fix for wrong OOM killer trigger?

Hi Miquel,

I read your e-mail http://www.cs.helsinki.fi/linux/linux-kernel/2003-27/1274.html
in the archive, but was not able to find a solution.
We have a similar problem:
HW: 4x 2,4GHz Xeon, 4GB Ram, 3ware 7000-series ATA-RAID
SW: Kernel 2.4.22 (also seen on 2.4.21, 2.4.22-ac3), lvm, software raid,
reiserfs, SuSE 8.1. Swap turned off (see later).

**Symptom: programs that heavily search the whole filesystem
(e.g., rsync, ssync, TSM backup client dsmc) cause to trigger the
OOM killer procedure (not very funny if NIS or NFS gets killed).

Sep 17 21:49:07 fs1 kernel: Out of Memory: Killed process 1384 (lmgrd).
Sep 17 21:49:12 fs1 kernel: Out of Memory: Killed process 1617 (exim).
Sep 17 21:49:18 fs1 kernel: Out of Memory: Killed process 1402 (ntpd).
Sep 17 21:49:23 fs1 kernel: Out of Memory: Killed process 1278 (portmap).
Sep 17 21:49:29 fs1 kernel: Out of Memory: Killed process 2715 (dsmc).
Sep 17 21:49:29 fs1 kernel: Out of Memory: Killed process 2716 (dsmc).
Sep 17 21:49:29 fs1 kernel: Out of Memory: Killed process 2717 (dsmc).
Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1600 (nscd).
Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1601 (nscd).
Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1602 (nscd).
Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1603 (nscd).
Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1604 (nscd).
Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1605 (nscd).
Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1606 (nscd).
Sep 17 21:49:40 fs1 kernel: Out of Memory: Killed process 1602 (nscd).
Sep 17 21:49:46 fs1 kernel: Out of Memory: Killed process 1421 (ypbind).
Sep 17 21:49:46 fs1 kernel: Out of Memory: Killed process 1422 (ypbind).
Sep 17 21:49:46 fs1 kernel: Out of Memory: Killed process 1423 (ypbind).
Sep 17 21:49:46 fs1 kernel: Out of Memory: Killed process 1424 (ypbind).
Sep 17 21:49:51 fs1 kernel: Out of Memory: Killed process 1584 (atd).
Sep 17 21:49:57 fs1 kernel: Out of Memory: Killed process 1329 (ypserv).

The OOM kill occured also when the cache memory didn't exhausted the
available memory (total mem usage was around 1.8GB).
echo 2>/proc/sys/vm/overcommit_memory did not solve the problem either.
In my understanding it has to do something with the fs cache/vm. We have some
files that are larger than 2GB, but usually the killing process starts at
different points in time.

However, I also saw that kswapd used a lot CPU though swap was not active.
With swap space activated, the load on the cpu increases dramatically
so that the system becomes unusable, too. This is our file server and
I'm currently not able to make a backup to other systems. That's really
frustrating.

Anyone any ideas? Please Cc: to me in your replies since I'm not on the lkml.
Cheers,
Roland

--
Roland Bless -- e-Mail: [email protected] WWW: http://www.tm.uka.de/~bless
Institute of Telematics, University of Karlsruhe, Germany


2003-09-19 17:34:06

by Marc-Christian Petersen

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

On Friday 19 September 2003 19:16, Roland Bless wrote:

Hi Roland,

> SW: Kernel 2.4.22 (also seen on 2.4.21, 2.4.22-ac3), lvm, software raid,
> reiserfs, SuSE 8.1. Swap turned off (see later).
> .... <snip> ....
> Anyone any ideas? Please Cc: to me in your replies since I'm not on the
> lkml. Cheers,

Please try v2.4.23-pre5 or rmap 15k for 2.4.22 vanilla.

ciao, Marc


2003-09-19 19:25:55

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

Hi Roland,

On Fri, Sep 19, 2003 at 07:16:13PM +0200, Roland Bless wrote:
> Hi Miquel,
>
> I read your e-mail http://www.cs.helsinki.fi/linux/linux-kernel/2003-27/1274.html
> in the archive, but was not able to find a solution.
> We have a similar problem:
> HW: 4x 2,4GHz Xeon, 4GB Ram, 3ware 7000-series ATA-RAID
> SW: Kernel 2.4.22 (also seen on 2.4.21, 2.4.22-ac3), lvm, software raid,
> reiserfs, SuSE 8.1. Swap turned off (see later).
>
> **Symptom: programs that heavily search the whole filesystem
> (e.g., rsync, ssync, TSM backup client dsmc) cause to trigger the
> OOM killer procedure (not very funny if NIS or NFS gets killed).
>
> Sep 17 21:49:07 fs1 kernel: Out of Memory: Killed process 1384 (lmgrd).
> Sep 17 21:49:12 fs1 kernel: Out of Memory: Killed process 1617 (exim).
> Sep 17 21:49:18 fs1 kernel: Out of Memory: Killed process 1402 (ntpd).
> Sep 17 21:49:23 fs1 kernel: Out of Memory: Killed process 1278 (portmap).
> Sep 17 21:49:29 fs1 kernel: Out of Memory: Killed process 2715 (dsmc).
> Sep 17 21:49:29 fs1 kernel: Out of Memory: Killed process 2716 (dsmc).
> Sep 17 21:49:29 fs1 kernel: Out of Memory: Killed process 2717 (dsmc).
> Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1600 (nscd).
> Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1601 (nscd).
> Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1602 (nscd).
> Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1603 (nscd).
> Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1604 (nscd).
> Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1605 (nscd).
> Sep 17 21:49:35 fs1 kernel: Out of Memory: Killed process 1606 (nscd).
> Sep 17 21:49:40 fs1 kernel: Out of Memory: Killed process 1602 (nscd).
> Sep 17 21:49:46 fs1 kernel: Out of Memory: Killed process 1421 (ypbind).
> Sep 17 21:49:46 fs1 kernel: Out of Memory: Killed process 1422 (ypbind).
> Sep 17 21:49:46 fs1 kernel: Out of Memory: Killed process 1423 (ypbind).
> Sep 17 21:49:46 fs1 kernel: Out of Memory: Killed process 1424 (ypbind).
> Sep 17 21:49:51 fs1 kernel: Out of Memory: Killed process 1584 (atd).
> Sep 17 21:49:57 fs1 kernel: Out of Memory: Killed process 1329 (ypserv).
>
> The OOM kill occured also when the cache memory didn't exhausted the
> available memory (total mem usage was around 1.8GB).
> echo 2>/proc/sys/vm/overcommit_memory did not solve the problem either.
> In my understanding it has to do something with the fs cache/vm. We have some
> files that are larger than 2GB, but usually the killing process starts at
> different points in time.
>
> However, I also saw that kswapd used a lot CPU though swap was not active.
> With swap space activated, the load on the cpu increases dramatically
> so that the system becomes unusable, too. This is our file server and
> I'm currently not able to make a backup to other systems. That's really
> frustrating.
>
> Anyone any ideas? Please Cc: to me in your replies since I'm not on the lkml.

can you try with 2.4.22aa1? the oom killer there will only work on tasks
that are allocating memory, not on idle daemons, so the probability of
killing rsync first should be higher. stock SuSE 8.1 kernel should do
the same too.

Andrea

/*
* If you refuse to depend on closed software for a critical
* part of your business, these links may be useful:
*
* rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.5/
* rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.4/
* http://www.cobite.com/cvsps/
*
* svn://svn.kernel.org/linux-2.6/trunk
* svn://svn.kernel.org/linux-2.4/trunk
*/

2003-09-19 19:35:50

by Russell King

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

On Fri, Sep 19, 2003 at 09:25:44PM +0200, Andrea Arcangeli wrote:
>...
> the same too.
>
> Andrea
>
> /*
> * If you refuse to depend on closed software for a critical
> * part of your business, these links may be useful:
> *
> * rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.5/
> * rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.4/
> * http://www.cobite.com/cvsps/
> *
> * svn://svn.kernel.org/linux-2.6/trunk
> * svn://svn.kernel.org/linux-2.4/trunk
> */

Would you mind following nettiquette guidelines for your signature.
It appears to be overly large and contain inflamitory material, both
of which are equally good reasons on their _own_ not to use it.

--
Russell King ([email protected]) http://www.arm.linux.org.uk/personal/
Linux kernel maintainer of:
2.6 ARM Linux - http://www.arm.linux.org.uk/
2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
2.6 Serial core

2003-09-19 20:01:53

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

On Fri, Sep 19, 2003 at 08:35:38PM +0100, Russell King wrote:
> On Fri, Sep 19, 2003 at 09:25:44PM +0200, Andrea Arcangeli wrote:
> >...
> > the same too.
> >
> > Andrea
> >
> > /*
> > * If you refuse to depend on closed software for a critical
> > * part of your business, these links may be useful:
> > *
> > * rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.5/
> > * rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.4/
> > * http://www.cobite.com/cvsps/
> > *
> > * svn://svn.kernel.org/linux-2.6/trunk
> > * svn://svn.kernel.org/linux-2.4/trunk
> > */
>
> Would you mind following nettiquette guidelines for your signature.
> It appears to be overly large and contain inflamitory material, both
> of which are equally good reasons on their _own_ not to use it.
>
> --
> Russell King ([email protected]) http://www.arm.linux.org.uk/personal/
> Linux kernel maintainer of:
> 2.6 ARM Linux - http://www.arm.linux.org.uk/
> 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/
> 2.6 Serial core

does it look better if I change it like this:

-------------------------
Andrea - If you refuse to depend on closed software for a critical
part of your business, these links may be useful:
rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
http://www.cobite.com/cvsps/
svn://svn.kernel.org/linux-2.[46]/trunk
-------------------------

Hope it's ok in terms of bandwidth now.

And if you don't like the text I write in my signature, I'm sorry and I
would simply suggest you to not read it. If you need a marker to cleanup
it reliably just ask me and I'll be glad to add it. IMHO the value those
services provides is too high not to advertize them as much as I
possibly can.

Andrea

2003-09-19 20:52:32

by Larry McVoy

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

On Fri, Sep 19, 2003 at 10:01:17PM +0200, Andrea Arcangeli wrote:
> Andrea - If you refuse to depend on closed software for a critical
> part of your business, these links may be useful:
> rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
> http://www.cobite.com/cvsps/
> svn://svn.kernel.org/linux-2.[46]/trunk
> -------------------------
>
> Hope it's ok in terms of bandwidth now.
>
> And if you don't like the text I write in my signature, I'm sorry and I
> would simply suggest you to not read it. If you need a marker to cleanup
> it reliably just ask me and I'll be glad to add it. IMHO the value those
> services provides is too high not to advertize them as much as I
> possibly can.

Then put the URL's in the FAQ and be done with it.

I can assure you that the first time the CVS gateway has a problem it
won't come back until you have stopped being rude. You do understand
that the SVN and RSYNC data come from the CVS gateway and that the
CVS gateway comes from BitMover and that all of this crap is hosted by
BitMover, right? {cvs,svn}.kernel.org are cnames for kernel.bkbits.net.

Here's a suggested replacement signature, it accomplishes your goal of
advertising the services. Why this needs to be here and not in the FAQ
is beyond me.

/*
* SCM access to the kernel
*
* BitKeeper: bk://linux.bkbits.net/linux-2.[45]
* CVS: :pserver:[email protected]:/home/cvs/linux-2.[45]
* RSYNC: rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
* Subversion: svn://svn.kernel.org/linux-2.[46]/trunk
*/
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2003-09-19 21:06:10

by Mr. James W. Laferriere

[permalink] [raw]
Subject: Formal complaint , Re: Fix for wrong OOM killer trigger?

Hello Dave , I am well aware that you are not the keeper of the
responsible individuals of the ilk that keeps this thread cropping
up every so often . Please have a few words with them .
Tia , JimL

On Fri, 19 Sep 2003, Larry McVoy wrote:
> On Fri, Sep 19, 2003 at 10:01:17PM +0200, Andrea Arcangeli wrote:
> > Andrea - If you refuse to depend on closed software for a critical
> > part of your business, these links may be useful:
> > rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
> > http://www.cobite.com/cvsps/
> > svn://svn.kernel.org/linux-2.[46]/trunk
> > -------------------------
> >
> > Hope it's ok in terms of bandwidth now.
> >
> > And if you don't like the text I write in my signature, I'm sorry and I
> > would simply suggest you to not read it. If you need a marker to cleanup
> > it reliably just ask me and I'll be glad to add it. IMHO the value those
> > services provides is too high not to advertize them as much as I
> > possibly can.
>
> Then put the URL's in the FAQ and be done with it.
>
> I can assure you that the first time the CVS gateway has a problem it
> won't come back until you have stopped being rude. You do understand
> that the SVN and RSYNC data come from the CVS gateway and that the
> CVS gateway comes from BitMover and that all of this crap is hosted by
> BitMover, right? {cvs,svn}.kernel.org are cnames for kernel.bkbits.net.
>
> Here's a suggested replacement signature, it accomplishes your goal of
> advertising the services. Why this needs to be here and not in the FAQ
> is beyond me.
>
> /*
> * SCM access to the kernel
> *
> * BitKeeper: bk://linux.bkbits.net/linux-2.[45]
> * CVS: :pserver:[email protected]:/home/cvs/linux-2.[45]
> * RSYNC: rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
> * Subversion: svn://svn.kernel.org/linux-2.[46]/trunk
> */
>

--
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | P.O. Box 854 | Give me Linux |
| [email protected] | Coudersport PA 16915 | only on AXP |
+------------------------------------------------------------------+

2003-09-20 03:31:52

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

Hi Larry,

On Fri, Sep 19, 2003 at 01:52:20PM -0700, Larry McVoy wrote:
> won't come back until you have stopped being rude. You do understand

If I would remotely think my signature is rude with you, or anybody
else, I wouldn't post it anymore, especially after you point me to it.

Some people in the past and probably even today thought they would
never depend on open source for critical things, a number of people
like me thinks just the opposite.

I don't see why you find this fact as rude. Do you think it's rude that
some people refuses to depend on closed software for critical tasks?
So then why do you think the source code of some closed software is
being offered to governaments for the first time after 20 years? Is it
rude that some governament prefers to have the source too and they as
well apparently see a value in not depending on closed software? I mean
you really can't just complain at me saying I'm rude, like if I was the
only one on earth sharing this view.

And clearly if somebody is interested in my links is because he's
sharing my view, otherwise he could just use bitkeeper that despite
being born after cvs, is much more feature rich (that's the reason of
the comment!).

I will never say that you're rude because your claims against open
source you posted several times in linux-kernel (you know the parasite that
eat the host, and lots and lots of stuff like that, all things that I
absolutely and totally disagree with), I will never say the bitkeeper
"free" licence is rude or whatever like that despite I find it much less
acceptable than all other proprietary licence I dealt with in my limited
experience with proprietary software, but people is different, it's not
about being rude, it's about thinking differently, and I will never buy
from you that thinking different is the same as being rude.

Andrea - If you refuse to depend on closed software for a critical
part of your business, these links may be useful:
rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
http://www.cobite.com/cvsps/
svn://svn.kernel.org/linux-2.[46]/trunk

2003-09-20 04:30:35

by Larry McVoy

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

On Sat, Sep 20, 2003 at 05:31:53AM +0200, Andrea Arcangeli wrote:
> If I would remotely think my signature is rude with you, or anybody
> else, I wouldn't post it anymore, especially after you point me to it.
> [etc]

The problem is that you are saying that closed source is bad, in
particular, that BitKeeper is bad. That's not the problem, lots of
people think that closed source is bad, but in the same breath you
promote some free gateways PAID FOR BY BITKEEPER and requested by you.
That's hypocritical in the extreme.

Let me clue you in. The economy sucks. Nobody except Microsoft is
getting rich in this economy. Everyone is looking to cut costs and we
are too. It costs us money to provide those gateways. I write checks
every month to keep them going. As a corporation we derive zero benefit
from providing those gateways. They are there because you asked for them
and I thought the deal was that you would stop whining once you got them.
If that's the deal, then stop whining. If that's not the deal, ok,
I guess I misunderstood, but I can save some money and shut them down.

Then your signature can read:

/*
* We used to be able to depend on the following license free gateways
* but I was deliberately rude to the people providing them so they went
* away:
*
* rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
* :pserver:[email protected]:/home/cvs/linux-2.[45]/
* svn://svn.kernel.org/linux-2.[46]/trunk
*/

Sooner or later, I expect the more reasonable people out there to explain
to you that your actions are hurting them and maybe they'll help you
decide which is more important, getting at the data you want, in a timely
manner, without a license, or doing negative advertising against us.
If the other folks don't care enough to do that then that's fine, the
gateways are not important and you can whine all you want but you'll be
back to waiting for tarball releases and we can save some money.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2003-09-20 06:48:48

by David Miller

[permalink] [raw]
Subject: Re: Formal complaint , Re: Fix for wrong OOM killer trigger?

On Fri, 19 Sep 2003 17:05:41 -0400 (EDT)
"Mr. James W. Laferriere" <[email protected]> wrote:

> Hello Dave , I am well aware that you are not the keeper of the
> responsible individuals of the ilk that keeps this thread cropping
> up every so often . Please have a few words with them .

I see nothing wrong with Larry's posting that you're showing
me here.

2003-09-20 11:09:36

by Roland Bless

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

On Fri, Sep 19, 2003 at 09:25:44PM +0200, Andrea Arcangeli wrote:
>
> can you try with 2.4.22aa1? the oom killer there will only work on tasks
> that are allocating memory, not on idle daemons, so the probability of
> killing rsync first should be higher. stock SuSE 8.1 kernel should do
> the same too.

This will only help to avoid not shooting important daemons.
The real cause, however, seems to be that the filesystem cache
memory is not properly re-used when it should, or, that it tries to
allocate a huge amount memory. The programs themselves do not
allocate much memory! It must be the system, because I also
ran programs with memory restrictions by ulimit. The programs
are definitely not allocating the memory, and, 4GB RAM are really
enough for a simple file server like ours.

Regards,
Roland

2003-09-20 12:22:45

by Bernd Schmidt

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

On Fri, 19 Sep 2003, Larry McVoy wrote:
>
> The problem is that you are saying that closed source is bad,

[snip]

> Sooner or later, I expect the more reasonable people out there to explain
> to you that your actions are hurting them and maybe they'll help you
> decide which is more important, getting at the data you want, in a timely
> manner, without a license, or doing negative advertising against us.
> If the other folks don't care enough to do that then that's fine, the
> gateways are not important and you can whine all you want but you'll be
> back to waiting for tarball releases and we can save some money.

Thank you for demonstrating exactly _why_ closed source software is bad.
Your posts clearly show that with a closed-source solution, you put
yourself at the mercy of a single vendor, and have to put up with whatever
demands and threats he feels like making.


Bernd

2003-09-20 13:54:50

by Larry McVoy

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

On Sat, Sep 20, 2003 at 01:22:42PM +0100, Bernd Schmidt wrote:
> On Fri, 19 Sep 2003, Larry McVoy wrote:
> > Sooner or later, I expect the more reasonable people out there to explain
> > to you that your actions are hurting them and maybe they'll help you
> > decide which is more important, getting at the data you want, in a timely
> > manner, without a license, or doing negative advertising against us.
> > If the other folks don't care enough to do that then that's fine, the
> > gateways are not important and you can whine all you want but you'll be
> > back to waiting for tarball releases and we can save some money.
>
> Thank you for demonstrating exactly _why_ closed source software is bad.
> Your posts clearly show that with a closed-source solution, you put
> yourself at the mercy of a single vendor, and have to put up with whatever
> demands and threats he feels like making.

Nonsense. This isn't closed source issue at all because the issue is the
CVS gateway. You don't need source to write that gateway and you could
have (and recall that Linus said you should have) written the gateway
yourself, hosted it yourself, and maintained it yourself.

There are no closed source issues here, you're just trying to redirect
attention there because it meets your agenda. Nice try but no dice.

Since we are providing something that you want, you asked for, and you
could have built yourself, you get it under our terms. Which are "shut
the heck up already, you got what you wanted."
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2003-09-20 13:52:31

by Willy Tarreau

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

On Sat, Sep 20, 2003 at 01:22:42PM +0100, Bernd Schmidt wrote:
> On Fri, 19 Sep 2003, Larry McVoy wrote:
> > Sooner or later, I expect the more reasonable people out there to explain
> > to you that your actions are hurting them and maybe they'll help you
> > decide which is more important, getting at the data you want, in a timely
> > manner, without a license, or doing negative advertising against us.
> > If the other folks don't care enough to do that then that's fine, the
> > gateways are not important and you can whine all you want but you'll be
> > back to waiting for tarball releases and we can save some money.
>
> Thank you for demonstrating exactly _why_ closed source software is bad.
> Your posts clearly show that with a closed-source solution, you put
> yourself at the mercy of a single vendor, and have to put up with whatever
> demands and threats he feels like making.

Larry didn't show anything related to whether closed source is bad or not,
he simply tried to explain that someone has to PAY to host all those gateways
and that since HE actually pays for them, he at least expects not to be the
subject of permanent complaints, or he will finally stop paying for those. If
you have the lines, the machine and the time to do this yourself, perhaps you
could propose Larry to do it yourself then complain about his closed source
software which you're not forced to use.

Now please, please... before RMS jumps into the wagon again, stop using every
mail which contains the two letters B and K as the pretext to start a new war.

Thanks,
Willy

PS: replies to this mail will go to /dev/null, so please don't pollute the list.

2003-09-20 14:23:28

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

On Fri, Sep 19, 2003 at 09:30:26PM -0700, Larry McVoy wrote:
> The problem is that you are saying that closed source is bad, in
> particular, that BitKeeper is bad. That's not the problem, lots of

this is not true. I never said that and I will never say that.

Find a quote where I said closed source is bad, I think I never said
that in my whole life, or if I said that it had to be a joke or
something like that. I may have said sometime that binary only drivers
are bad but that's just because of the pain they give after you
recompile a kernel, and I change the kernel very often ;). I refuse to
use closed software myself for my critical tasks true, but I've never
said closed software is bad.

Closed software it's just not acceptable for my needs, but it can be
perfect for others. It's not about good or bad right/wrong here. It's
about different people having different needs. And please avoid
imagination and stick to facts and to what I write, not to what you
think I want to say, because you're wrong about that.

As for the economy comments, I would suggest to have a look here:

http://insider.thomsonfn.com/tfn/stocks.asp?imodule=coTearsheet&ticker=msft&ttype=A

As for the replacement of my signature you should stop insulting me
with "I was deliberately rude to the people providing them" or about
speculation that I'm saying that closed software is bad or whatever,
this is the last email I answer you if you keep making deliberate
wrong assumptions about something I never said and I will never say
because I simply wouldn't agree with that claims myself.

I'm very satisfied with the service you're providing, I thank you a lot
of that and for giving us the data in the open, we could lose nearly all
the 2.5 development logs if it wasn't for your effort. So I very much
hope that you will be able to provide it in the long run and that you
will be very successful, but the day you won't be able to provide it
anymore for whatever reason, I'm optimistic the open community will be
able to find a (possibly inferior) substitute.

Andrea - If you refuse to depend on closed software for a critical
part of your business, these links may be useful:
rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
http://www.cobite.com/cvsps/
svn://svn.kernel.org/linux-2.[46]/trunk

2003-09-20 14:34:22

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

On Sat, Sep 20, 2003 at 01:09:28PM +0200, Roland Bless wrote:
> On Fri, Sep 19, 2003 at 09:25:44PM +0200, Andrea Arcangeli wrote:
> >
> > can you try with 2.4.22aa1? the oom killer there will only work on tasks
> > that are allocating memory, not on idle daemons, so the probability of
> > killing rsync first should be higher. stock SuSE 8.1 kernel should do
> > the same too.
>
> This will only help to avoid not shooting important daemons.
> The real cause, however, seems to be that the filesystem cache
> memory is not properly re-used when it should, or, that it tries to
> allocate a huge amount memory. The programs themselves do not
> allocate much memory! It must be the system, because I also
> ran programs with memory restrictions by ulimit. The programs
> are definitely not allocating the memory, and, 4GB RAM are really
> enough for a simple file server like ours.

that might be an accounting error in the oom killing then (even that
should be corrected in my tree or in the stock 8.1 SuSE kernel).

the reason normally oom accounting errors never showup, is that when the
amount of free-swap is >0, the oom-killer is never invoked (that's a
magic that probably avoids those situations to normally arise in the
stock kernel).

so maybe you had no swap, if you had no swap that would explain it.

and of course if you have 4G of ram and you know you've more than enough
ram then you'd be right using 0 swap (just the stock kernel oom killer
may malfunction, but that's not going to happen with the kernels I
suggested you to try, they'll be fine with 0 swap)

hope this helps ;)

Andrea - If you refuse to depend on closed software for a critical
part of your business, these links may be useful:
rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
http://www.cobite.com/cvsps/
svn://svn.kernel.org/linux-2.[46]/trunk

2003-09-20 15:13:36

by Larry McVoy

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

On Sat, Sep 20, 2003 at 04:23:14PM +0200, Andrea Arcangeli wrote:
> I refuse to use closed software myself for my critical tasks true,
> but I've never said closed software is bad.

Really? So where's the source to the BIOS of your machine? Your drive
firmware? Do you drive a car? Turn on a microwave? Use a cell phone?

And didn't you say:

> I may have said sometime that binary only drivers are bad but that's
> just because of the pain they give after you recompile a kernel, and I
> change the kernel very often ;).

So if you don't use closed source then why is it that those binaries are
giving you a problem? Could it be that you don't practice what you
preach? Oh, I see, it's OK to use closed source if you need to play
quake but not if you want to check in some code. Sure, I can see how
that makes sense. NOT.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2003-09-20 16:07:37

by Valdis Klētnieks

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

On Sat, 20 Sep 2003 13:22:42 BST, Bernd Schmidt said:

> Thank you for demonstrating exactly _why_ closed source software is bad.
> Your posts clearly show that with a closed-source solution, you put
> yourself at the mercy of a single vendor, and have to put up with whatever
> demands and threats he feels like making.

Larry has on several occasions posted explaining why at the current time, he
has to make the difficult choice between keeping it closed source and actually
paying his programmers.

The open-source zealots are not going to make any friends by beating up on
people who aren't in a position to do anything. You write Larry a big enough
check, it'll be open-source by Wednesday. The only problem here is that nobody
is writing Larry a check...


Attachments:
(No filename) (226.00 B)

2003-09-20 17:14:24

by Stephen Satchell

[permalink] [raw]
Subject: Flames (was: Fix for wrong OOM killer trigger?)

At 08:13 AM 9/20/2003 -0700, Larry McVoy wrote:
>Really? So where's the source to the BIOS of your machine? Your drive
>firmware? Do you drive a car? Turn on a microwave? Use a cell phone?

And when you start flaming, PLEASE CHANGE THE SUBJECT LINE!

I am EXCEEDINGLY INTERESTED in finding a solution to the
killing-the-wrong-task problem, because I have 50 Linux boxes that do it
all the time. ANY discussion to that topic deserves my time. Discussions
of signature blocks does not -- nothing bores me more, in fact.

So either post on topic, or change the subject line.

I'm saying "please."


--
Human beings, who are almost unique in having the ability to learn from the
experience of others, are also remarkable for their apparent disinclination
to do so. -- Douglas Adams

2003-09-20 17:48:22

by Alan

[permalink] [raw]
Subject: Re: Flames (was: Fix for wrong OOM killer trigger?)

> I am EXCEEDINGLY INTERESTED in finding a solution to the
> killing-the-wrong-task problem, because I have 50 Linux boxes that do it
> all the time. ANY discussion to that topic deserves my time. Discussions
> of signature blocks does not -- nothing bores me more, in fact.

Does the -ac overcommit disabling option meet your needs ?

2003-09-20 19:56:32

by Jamie Lokier

[permalink] [raw]
Subject: Gateways (was Re: Fix for wrong OOM killer trigger?)

Larry McVoy wrote:
> Nonsense. This isn't closed source issue at all because the issue is the
> CVS gateway. You don't need source to write that gateway and you could
> have (and recall that Linus said you should have) written the gateway
> yourself, hosted it yourself, and maintained it yourself.

I was prepared to write such a gatway.

We discussed it, and found that the combination of BitKeeper license
and BitMover's control over the kernel repository prevents it. This
was the subject of a heated debate.

I believe that debate was the reason BitMover wrote and now host the
BK->CVS gateway, which other gateways are built upon.

It's a brilliant solution, and thank you, I am glad of your work,
but let's not pretend that a 3rd party is in a position to offer such
a gateway.

(You need either the BK protocol, the right to run BK, or a copy of
the BK repository files to extract data from, and none of these are
available to a 3rd party who wants to write and support a BK->whatever
gateway for the kernel tree. I asked; all 3 were refused).

-- Jamie

2003-09-20 20:15:16

by Larry McVoy

[permalink] [raw]
Subject: Re: Gateways (was Re: Fix for wrong OOM killer trigger?)

On Sat, Sep 20, 2003 at 08:56:10PM +0100, Jamie Lokier wrote:
> Larry McVoy wrote:
> > Nonsense. This isn't closed source issue at all because the issue is the
> > CVS gateway. You don't need source to write that gateway and you could
> > have (and recall that Linus said you should have) written the gateway
> > yourself, hosted it yourself, and maintained it yourself.
>
> I was prepared to write such a gatway.
>
> We discussed it, and found that the combination of BitKeeper license
> and BitMover's control over the kernel repository prevents it. This
> was the subject of a heated debate.
>
> I believe that debate was the reason BitMover wrote and now host the
> BK->CVS gateway, which other gateways are built upon.
>
> It's a brilliant solution, and thank you, I am glad of your work,
> but let's not pretend that a 3rd party is in a position to offer such
> a gateway.

Anyone who obeys the license is welcome to write a gateway. Lots of
people have done lots of interesting things around BK without violating
the license. You explicitly stated an intent that violated the license,
so no, _you_ can't write one but plenty of other people can.

Let's also not pretend that it is an easy task or that keeping it
working is easy. We're going from a system that works to a system that
is extremely fragile. When it breaks it takes about 4-5 hours of a
2.1Ghz Athlon with a GB of ram to rebuild the gateway.

Let's also not pretend that it is cheap to host this.

It's all well and good to complain that you weren't allowed or whatever,
but unless you are going to build the gateway, make it work, make it
keep working, host it, and maintain that host, then you need to stop
pretending that you were going to solve the problem. You were prepared
to _attempt_ to write such a gateway. Pavel was going to write one,
Daniel was just dieing to write a BK replacement, etc. Lots of people
would love to have a BK replacement but they all go look at the problem
space and find out it's a lot harder than they thought and they go work
on something more fun.

I'd really like to know what all you guys would do if you were in
my shoes. Over and over you are willing to throw stones but not one
of you has done 1/100th of the SCM work required to replace BK or even
build a gateway. Complaining is fun and all, but I'd sure like to see
you come up with and actually execute on a plan that provides everything
we provide and has a GPLed result. That would be an amazing feat and
I'd come work for you. Until you do, however, how about backing down
a bit? In spite of all the flames, we have an excellent track record
of providing you free service, providing you free tools, and providing
you free support. In the face of your non-stop complaining that's pretty
amazing and is it really so much to ask you to leave off the flaming?
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2003-09-21 10:40:45

by Eric W. Biederman

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

Larry McVoy <[email protected]> writes:

> On Sat, Sep 20, 2003 at 04:23:14PM +0200, Andrea Arcangeli wrote:
> > I refuse to use closed software myself for my critical tasks true,
> > but I've never said closed software is bad.
>
> Really? So where's the source to the BIOS of your machine? Your drive
> firmware? Do you drive a car? Turn on a microwave? Use a cell phone?

Careful with your accusations Larry, some of us can answer those questions,
in ways that won't support your argument.

Eric

2003-09-21 11:52:24

by David Miller

[permalink] [raw]
Subject: Re: Gateways (was Re: Fix for wrong OOM killer trigger?)

On Sat, 20 Sep 2003 13:14:56 -0700
Larry McVoy <[email protected]> wrote:

> is it really so much to ask you to leave off the flaming?

Please, let this be the end of this thread.

Thanks everyone.

2003-09-21 14:22:26

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

On Sun, Sep 21, 2003 at 04:40:29AM -0600, Eric W. Biederman wrote:
> Careful with your accusations Larry, some of us can answer those questions,
> in ways that won't support your argument.

It didn't worth an answer IMHO, he's ignoring lots of efforts going on,
AFIK you're in the bios area like many others, especially for x86-64 it
sounds very promising. notably these days my PDA strictly runs open
source since I strictly need it for security reasons, for istance I
nuked Opera immediatly and replaced it with konqueror and the whole
openzaurus suite, I will do the same soon with the cellphone, and
everything he listed is all but critical, and we pay that as well to
have some sort of warranty most of the time, at least for the first few
years, nothing like the bkbits.net that can be shutdown anyday, Larry
made sure he can turn everything "free" of anytime AFIK. And we have
many providers for cellphones microwaves cars etc.. not just one. If
something breaks and can't be repaired I throw it away and buy another
one.

But it would be unacceptable to throw away the whole 2.5 changesets
instead. And without this bkcvs export in the open, they could be lost
anyday of the week. And I can't even try to extract those with b*tkeeper,
since it's illegal to do so from my part. yeah, if there wasn't bkcvs,
somebody had to sacrifice his freedom for us to extract this closed info
encoded in proprietary form (like a .doc). since many already sacrificed
their freedom of development in this area, maybe it wouldn't be too bad,
they're already screwed so it can't go worse for them, but bkcvs to me
sounds much safer than an hope that somebody oneday will do the
conversion after sacrificing its freedom and after sorting out the
linearization of the tree.

Andrea - If you prefer relying on open source software, check these links:
rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
http://www.cobite.com/cvsps/
svn://svn.kernel.org/linux-2.[46]/trunk

2003-09-21 14:52:47

by Larry McVoy

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

On Sun, Sep 21, 2003 at 04:22:52PM +0200, Andrea Arcangeli wrote:
> On Sun, Sep 21, 2003 at 04:40:29AM -0600, Eric W. Biederman wrote:
> > Careful with your accusations Larry, some of us can answer those questions,
> > in ways that won't support your argument.
>
> It didn't worth an answer IMHO, he's ignoring lots of efforts going on,

First of all, I didn't accuse anyone of anything, I asked if you were
using open source for in everything that you use each day. And you
are ignoring the question. You stated

>> I refuse to use closed software myself for my critical tasks true,

and I asked

> So where's the source to the BIOS of your machine? Your drive
> firmware? Do you drive a car? Turn on a microwave? Use a cell phone?

And you tell me you "will" be running free software on all that soon.
Until you are, how about you go attack the drive people, the bios people,
the car people, the cellphone people, etc? Why constantly harp on the one
thing that has done an enormous amount of good for the kernel?

I also asked about those binaries that give you such problems when you
recompile kernels, you seem perfectly OK using closed source to play
quake. Oh, that's not "critical" because it is for your fun, I see.

How convenient for you that some closed source is OK for you to use but
other closed source is not OK. I see you are a man of principle, of
strong ethics and principles. What a great role model.

If you feel so strongly about closed source then stop using EVERYTHING
that doesn't have open source in it. When you have done that, and only
then, you have earned the right to whine about BK or whatever. Until
then, it's pathetic. You are complaining about the stuff that it is
easy for you not to use, but you are silent about the stuff that you
want to use and there is no open source alternative.

Don't you find it a bit pathetic that you are whining at the very people
who did the work so you wouldn't have to use anything but open source?
Do you have no sense of shame at all?
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm

2003-09-21 15:52:37

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

On Sun, Sep 21, 2003 at 07:52:35AM -0700, Larry McVoy wrote:
> On Sun, Sep 21, 2003 at 04:22:52PM +0200, Andrea Arcangeli wrote:
> > On Sun, Sep 21, 2003 at 04:40:29AM -0600, Eric W. Biederman wrote:
> > > Careful with your accusations Larry, some of us can answer those questions,
> > > in ways that won't support your argument.
> >
> > It didn't worth an answer IMHO, he's ignoring lots of efforts going on,
>
> First of all, I didn't accuse anyone of anything, I asked if you were
> using open source for in everything that you use each day. And you
> are ignoring the question. You stated
>
> >> I refuse to use closed software myself for my critical tasks true,
>
> and I asked
>
> > So where's the source to the BIOS of your machine? Your drive
> > firmware? Do you drive a car? Turn on a microwave? Use a cell phone?

I told you none of this is critical to me. they can all break, and I
will throw them away and replace, and most of them cames with a
reasonable warranty anyways.

this is not the case for creative unique data encoded in .doc without a
loss-less converter freely available.

Sure, if I would be maintaining the kernel with a software and a
processor located in the electric injection of fuel board in my car,
then I would pretend that data to be stored in a standard documented
format and the program to be open source (so I can migrate to other
platform instead of the cpu embedded in the car) in the future.

> And you tell me you "will" be running free software on all that soon.
> Until you are, how about you go attack the drive people, the bios people,

I am already using open sources for everything that runs with my
critical data. I turn off acpi as well to be sure the bios don't run (I
only use acpi to do the discovery pci of the devices at boot, I never
allow my box to call into the bios and my data is encrypted on disk so
when the bios runs at boot it has no way to look into it).

>From my point of view, a bug in the bios that destroys the data, is the
same as an hardware bug that corrupts the fs or whatever.

Comparing the worth of a piece of hardware, with ~2 years of development
of hundred of people sounds sounds very stupid.

> recompile kernels, you seem perfectly OK using closed source to play
> quake. Oh, that's not "critical" because it is for your fun, I see.

Please, I don't play quake, and quake is all but critical. And I think
you couldn't find a worse example anyways since AFIK quake is GPL too
(I'm sure doom was at least open source since I compiled it myself some
year ago).

> If you feel so strongly about closed source then stop using EVERYTHING
> that doesn't have open source in it. When you have done that, and only

I already did years ago, my data won't risk to be touched by anything
closed software (when it's in decrypted form).

This is my last email on this topic, I feel these emails don't worth an
answer sorry. It's stunning that you seems really convinced that I'm
making a special case for b*tkeeper or that I'm not coherent with my
view, I've nothing against b*tkeeper, like I've nothing against closed
software at all, infact I always did my best and I will definitely still
do my very best I can, to support all the proprietary software available
for Linux, I do my best to support proprietary binary only extension to
the kernel too infact! But you have no way at all to pretend that I will
be an user of whatever proprietary closed software for anything very
important to me.

And since you didn't mention that I also fly in airplanes that have lots
more software than whatever car. FWIW I definitely think lots of
software critical for lives like the ones that is meant to avoids
collisions between airplanes should be open source. If there's a bug I
definitely prefer that people is able to find it and fix it. We know
from practice that security through obscurity can lead to disasters in
the long run.

Andrea - If you prefer relying on open source software, check these links:
rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
http://www.cobite.com/cvsps/
svn://svn.kernel.org/linux-2.[46]/trunk

2003-09-22 10:11:56

by Roland Bless

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

Hi Andrea,

On Sat, 20 Sep 2003 16:34:10 +0200 Andrea Arcangeli <[email protected]> wrote:

> > The real cause, however, seems to be that the filesystem cache
> > memory is not properly re-used when it should, or, that it tries to
> > allocate a huge amount memory. The programs themselves do not
> > allocate much memory! It must be the system, because I also
> > ran programs with memory restrictions by ulimit. The programs
> > are definitely not allocating the memory, and, 4GB RAM are really
> > enough for a simple file server like ours.
>
> that might be an accounting error in the oom killing then (even that
> should be corrected in my tree or in the stock 8.1 SuSE kernel).
>
> the reason normally oom accounting errors never showup, is that when the
> amount of free-swap is >0, the oom-killer is never invoked (that's a
> magic that probably avoids those situations to normally arise in the
> stock kernel).
>
> so maybe you had no swap, if you had no swap that would explain it.

That's clear then, however, some kernel process/procedure must have tried
to allocate a huge block of memory.

> and of course if you have 4G of ram and you know you've more than enough
> ram then you'd be right using 0 swap (just the stock kernel oom killer
> may malfunction, but that's not going to happen with the kernels I
> suggested you to try, they'll be fine with 0 swap)

> hope this helps ;)

The suggestion from Marc-Christian Petersen <[email protected]>,
namely using v2.4.23-pre5, worked for me. I was not sure before,
because I was not able to guess from the Changelog whether there
was a fix for the particular bug. My suggestion is that the log entry
below describes the bug fix for it:

Summary of changes from v2.4.22 to v2.4.23-pre1
============================================
...
Marc-Christian Petersen:
o Cleanup kmem_cache_reap()
or was it related to this one:
o Avoid potentially leaking pagetables into the per-cpu queues

I hope that it was also fixed in 2.6, or is there a different mechanism
used?

Best regards,
Roland

2003-09-22 13:01:55

by Andrea Arcangeli

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

On Mon, Sep 22, 2003 at 12:11:40PM +0200, Roland Bless wrote:
> was a fix for the particular bug. My suggestion is that the log entry
> below describes the bug fix for it:

the kmem cleanup wasn't a bug. So in theory it could be even the leaking
of pagetables that went from -aa to mainline in 23pre1, but I think it
really was the removal of the oom killer with the -aa VM merges that
went into 2.4.23pre[2-5] that really fixed your problem (if it's true
that you had no swap, which I understood it's the case, and no swap puts
at the light the brokeness of the oom killer), that leak is a minor one,
many other places shrinks the per-cpu queues, so it's unlikely to be
able to leak lots of ram in a misc workload.

It's good to hear that pre5 is fixed. thanks.


>
> Summary of changes from v2.4.22 to v2.4.23-pre1
> ============================================
> ...
> Marc-Christian Petersen:
> o Cleanup kmem_cache_reap()
> or was it related to this one:
> o Avoid potentially leaking pagetables into the per-cpu queues
>
> I hope that it was also fixed in 2.6, or is there a different mechanism
> used?

dunno, but the oom killer certainly has not enough information in 2.6
either to be able to do a reliable decision.

Andrea - If you prefer relying on open source software, check these links:
rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
http://www.cobite.com/cvsps/
svn://svn.kernel.org/linux-2.[46]/trunk

2003-09-23 12:45:30

by Bas Mevissen

[permalink] [raw]
Subject: Re: Fix for wrong OOM killer trigger?

Andrea Arcangeli wrote:

> And we have
> many providers for cellphones microwaves cars etc.. not just one. If
> something breaks and can't be repaired I throw it away and buy another
> one.
>
> But it would be unacceptable to throw away the whole 2.5 changesets
> instead.

Maybe it is for other people unacceptable to throw away their cellphone
with the addressbook in it, and they would care about the whole 2.5
development changeset as much as you care about your cellphone...

Bas.