2004-10-07 09:18:41

by Michael Buesch

[permalink] [raw]
Subject: [2.4] 0-order allocation failed

Hi all,

I'm running 2.4.28 bk snapshot of 2004.09.03
The machine has an uptime of 7 days, 23:46 now.

I was running several bittorrent clients inside of
a screen session. Suddenly they all died (including the
screen session).
dmesg sayed this:

__alloc_pages: 0-order allocation failed (gfp=0x1f0/0)
__alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
VM: killing process python
__alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
__alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
VM: killing process screen

I already got this with kernel 2.4.27 vanilla after a
higher amount of uptime (I think it was over 10 days).
This was exactly the reason I updated to bk snapshot.

What can be the reason for this? Is it OOM? (I can't
really believe it is).
Is it a kernel memory leak?

With 2.4.26 I never got these errors. And I ran uptimes
up to 50 days.

--
Regards Michael Buesch [ http://www.tuxsoft.de.vu ]


Attachments:
(No filename) (920.00 B)
(No filename) (189.00 B)
Download all attachments

2004-10-07 17:22:51

by Michael Buesch

[permalink] [raw]
Subject: Re: [2.4] 0-order allocation failed

Quoting Marcelo Tosatti <[email protected]>:
> On Thu, Oct 07, 2004 at 01:18:13PM +0200, Michael Buesch wrote:
> > Hi all,
> >
> > I'm running 2.4.28 bk snapshot of 2004.09.03
> > The machine has an uptime of 7 days, 23:46 now.
> >
> > I was running several bittorrent clients inside of
> > a screen session. Suddenly they all died (including the
> > screen session).
> > dmesg sayed this:
> >
> > __alloc_pages: 0-order allocation failed (gfp=0x1f0/0)
> > __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> > VM: killing process python
> > __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> > __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> > VM: killing process screen
> >
> > I already got this with kernel 2.4.27 vanilla after a
> > higher amount of uptime (I think it was over 10 days).
> > This was exactly the reason I updated to bk snapshot.
> >
> > What can be the reason for this? Is it OOM? (I can't
> > really believe it is).
>
> Can you check how much swap space is there available when
> the OOM killer trigger? I bet this is the case.

The machine doesn't have swap.

> If its not, we have a problem.
>
> > Is it a kernel memory leak?
> >
> > With 2.4.26 I never got these errors. And I ran uptimes
> > up to 50 days.
>
>

--
Regards Michael Buesch [ http://www.tuxsoft.de.vu ]


Attachments:
(No filename) (1.31 kB)
(No filename) (189.00 B)
Download all attachments

2004-10-07 17:36:30

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [2.4] 0-order allocation failed

On Thu, Oct 07, 2004 at 01:18:13PM +0200, Michael Buesch wrote:
> Hi all,
>
> I'm running 2.4.28 bk snapshot of 2004.09.03
> The machine has an uptime of 7 days, 23:46 now.
>
> I was running several bittorrent clients inside of
> a screen session. Suddenly they all died (including the
> screen session).
> dmesg sayed this:
>
> __alloc_pages: 0-order allocation failed (gfp=0x1f0/0)
> __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> VM: killing process python
> __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> VM: killing process screen
>
> I already got this with kernel 2.4.27 vanilla after a
> higher amount of uptime (I think it was over 10 days).
> This was exactly the reason I updated to bk snapshot.
>
> What can be the reason for this? Is it OOM? (I can't
> really believe it is).

Can you check how much swap space is there available when
the OOM killer trigger? I bet this is the case.

If its not, we have a problem.

> Is it a kernel memory leak?
>
> With 2.4.26 I never got these errors. And I ran uptimes
> up to 50 days.

2004-10-07 17:46:13

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [2.4] 0-order allocation failed

On Thu, Oct 07, 2004 at 07:17:30PM +0200, Michael Buesch wrote:
> Quoting Marcelo Tosatti <[email protected]>:
> > On Thu, Oct 07, 2004 at 01:18:13PM +0200, Michael Buesch wrote:
> > > Hi all,
> > >
> > > I'm running 2.4.28 bk snapshot of 2004.09.03
> > > The machine has an uptime of 7 days, 23:46 now.
> > >
> > > I was running several bittorrent clients inside of
> > > a screen session. Suddenly they all died (including the
> > > screen session).
> > > dmesg sayed this:
> > >
> > > __alloc_pages: 0-order allocation failed (gfp=0x1f0/0)
> > > __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> > > VM: killing process python
> > > __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> > > __alloc_pages: 0-order allocation failed (gfp=0x1d2/0)
> > > VM: killing process screen
> > >
> > > I already got this with kernel 2.4.27 vanilla after a
> > > higher amount of uptime (I think it was over 10 days).
> > > This was exactly the reason I updated to bk snapshot.
> > >
> > > What can be the reason for this? Is it OOM? (I can't
> > > really believe it is).
> >
> > Can you check how much swap space is there available when
> > the OOM killer trigger? I bet this is the case.
>
> The machine doesn't have swap.

Well then you're probably facing true OOM.

Add some swap.

> > If its not, we have a problem.
> >
> > > Is it a kernel memory leak?
> > >
> > > With 2.4.26 I never got these errors. And I ran uptimes
> > > up to 50 days.

2004-10-07 18:30:37

by Gabor Z. Papp

[permalink] [raw]
Subject: Re: [2.4] 0-order allocation failed

* Marcelo Tosatti <[email protected]>:

| > > Can you check how much swap space is there available when
| > > the OOM killer trigger? I bet this is the case.
| >
| > The machine doesn't have swap.
|
| Well then you're probably facing true OOM.
|
| Add some swap.

There is really no way to run 2.4 without swap?

I have the same problem with nfsroot and ramdisk based setups after
1-2 weeks uptime.

2004-10-07 18:46:32

by Gabor Z. Papp

[permalink] [raw]
Subject: Re: [2.4] 0-order allocation failed

* Marcelo Tosatti <[email protected]>:

| > There is really no way to run 2.4 without swap?
|
| Nope. Any kernel can't. The thing is the system overcommits
| memory (it allows applications to allocate more memory than the system
| is able to handle).

Okay, then whats the required minimum swap size that needed to avoid
such crashes?

In the case when the system is in the ram, quite funny to allocate a
swap file on the ramdisk anyway...

2004-10-07 18:46:33

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [2.4] 0-order allocation failed

On Thu, Oct 07, 2004 at 08:28:04PM +0200, Gabor Z. Papp wrote:
> * Marcelo Tosatti <[email protected]>:
>
> | > > Can you check how much swap space is there available when
> | > > the OOM killer trigger? I bet this is the case.
> | >
> | > The machine doesn't have swap.
> |
> | Well then you're probably facing true OOM.
> |
> | Add some swap.
>
> There is really no way to run 2.4 without swap?

Nope. Any kernel can't. The thing is the system overcommits
memory (it allows applications to allocate more memory than the system
is able to handle).

If there is no place to "save" that memory once you run out of it,
you're dead. Its a physical problem. :)

> I have the same problem with nfsroot and ramdisk based setups after
> 1-2 weeks uptime.

You can try to remove the following lines from mm/vmscan.c::try_to_free_pages_zone()

if (likely(current->pid != 1))
break;
if (!check_classzone_need_balance(classzone))
break;

And disable CONFIG_OOM_KILLER. See if that makes a difference.

What will happen is that the kernel will try to free memory and
never go into the OOM killer. If it can't free memory, the
system will hang forever at certain point.

There's not much to do about it really.

In 2.6 you can decrease swappiness so for it to free pagecache harder,
but its the same game.

2004-10-07 18:55:58

by Neil Horman

[permalink] [raw]
Subject: Re: [2.4] 0-order allocation failed

Gabor Z. Papp wrote:
> * Marcelo Tosatti <[email protected]>:
>
> | > > Can you check how much swap space is there available when
> | > > the OOM killer trigger? I bet this is the case.
> | >
> | > The machine doesn't have swap.
> |
> | Well then you're probably facing true OOM.
> |
> | Add some swap.
>
> There is really no way to run 2.4 without swap?
>
> I have the same problem with nfsroot and ramdisk based setups after
> 1-2 weeks uptime.
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to [email protected]
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

sure, you can run a system without swap, you just run the risk of not
being able to start new processes, or OOM kills if your running
processees try to allocate too much memory. You can turn the OOM killer
off if you like, but then you run the risk of wedging the machine.

Neil

--
/***************************************************
*Neil Horman
*Software Engineer
*Red Hat, Inc.
*[email protected]
*gpg keyid: 1024D / 0x92A74FA1
*http://pgp.mit.edu
***************************************************/

2004-10-07 18:55:58

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [2.4] 0-order allocation failed

On Thu, Oct 07, 2004 at 08:43:39PM +0200, Gabor Z. Papp wrote:
> * Marcelo Tosatti <[email protected]>:
>
> | > There is really no way to run 2.4 without swap?
> |
> | Nope. Any kernel can't. The thing is the system overcommits
> | memory (it allows applications to allocate more memory than the system
> | is able to handle).
>
> Okay, then whats the required minimum swap size that needed to avoid
> such crashes?
>
> In the case when the system is in the ram, quite funny to allocate a
> swap file on the ramdisk anyway...

It depends on how much memory you have an idea of the workload
you have is going to use.

How much memory do you have? Try 128MB for a test.

Also investigate swap-over-nfs/swap-over-nbd.


2004-10-07 19:00:37

by Marc-Christian Petersen

[permalink] [raw]
Subject: Re: [2.4] 0-order allocation failed

On Thursday 07 October 2004 20:28, Gabor Z. Papp wrote:

Hi all,

> | > > Can you check how much swap space is there available when
> | > > the OOM killer trigger? I bet this is the case.
> | > The machine doesn't have swap.
> | Well then you're probably facing true OOM.
> | Add some swap.

> There is really no way to run 2.4 without swap?
> I have the same problem with nfsroot and ramdisk based setups after
> 1-2 weeks uptime.

stop whining about braindead 2.4 mainline vm. Apply the attached patch and be
happy :p

Marcelo: Is there something wrong with my VM documentation update patches for
2.4? Or do you not care and think: "Hello my friend, let's stick with 2.2 VM
documentation even if almost all of the documentation is not longer valid"

;-)

ciao, Marc


Attachments:
(No filename) (771.00 B)
vm-anon-lru-3.patch (4.50 kB)
Download all attachments

2004-10-07 19:00:36

by Gabor Z. Papp

[permalink] [raw]
Subject: Re: [2.4] 0-order allocation failed

* Marcelo Tosatti <[email protected]>:

| Also investigate swap-over-nfs/swap-over-nbd.

I would like something *stable* solution, but I can't find
swap-over-nfs/swap-over-nbd in kernel. Thanks anyway.

2004-10-07 19:25:46

by Aleksandar Milivojevic

[permalink] [raw]
Subject: Re: [2.4] 0-order allocation failed

Marcelo Tosatti wrote:
> On Thu, Oct 07, 2004 at 08:28:04PM +0200, Gabor Z. Papp wrote:
>>
>>There is really no way to run 2.4 without swap?
>
> Nope. Any kernel can't. The thing is the system overcommits
> memory (it allows applications to allocate more memory than the system
> is able to handle).
>
> If there is no place to "save" that memory once you run out of it,
> you're dead. Its a physical problem. :)

Hm, shouldn't there be command line option to instruct kernel not to
overcommit memory? For servers that don't really need disk, and
wouldn't have much use for swap anyhow (like firewalls for example). I
do vaugly remember of such option in OSF/1 (Compaq Unix or whatever it
is called nowdays) or Solaris. Not sure which one. Or maybe both. If
I remember correctly, there the default was not to overcommit, and (not
recommended) option would allow kernel to overcommit. Or I might be
wrong about the default setting, but the option was there. I don't
remember if application that tried to allocate more memory that there
was free on the system would be killed or if system call would simply fail.

--
Aleksandar Milivojevic <[email protected]> Pollard Banknote Limited
Systems Administrator 1499 Buffalo Place
Tel: (204) 474-2323 ext 276 Winnipeg, MB R3T 1L7

2004-10-07 19:40:49

by Adam Heath

[permalink] [raw]
Subject: Re: [2.4] 0-order allocation failed

On Thu, 7 Oct 2004, Gabor Z. Papp wrote:

> * Marcelo Tosatti <[email protected]>:
>
> | Also investigate swap-over-nfs/swap-over-nbd.
>
> I would like something *stable* solution, but I can't find
> swap-over-nfs/swap-over-nbd in kernel. Thanks anyway.

google.

2004-10-08 03:16:24

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [2.4] 0-order allocation failed

On Thu, Oct 07, 2004 at 08:54:16PM +0200, Marc-Christian Petersen wrote:
> On Thursday 07 October 2004 20:28, Gabor Z. Papp wrote:
>
> Hi all,
>
> > | > > Can you check how much swap space is there available when
> > | > > the OOM killer trigger? I bet this is the case.
> > | > The machine doesn't have swap.
> > | Well then you're probably facing true OOM.
> > | Add some swap.
>
> > There is really no way to run 2.4 without swap?
> > I have the same problem with nfsroot and ramdisk based setups after
> > 1-2 weeks uptime.
>
> stop whining about braindead 2.4 mainline vm. Apply the attached patch and be
> happy :p

As I told you in private, I can't see how badly this patch could affect performance.
But then, as you answered, with all anonymous pages added to LRU you see much better
behavior (tons less swapping) on several workloads. That must be due to
refill_inactive()/shrink_cache() balancing.

The same patch also fixes kswapd excessive CPU consumption on huge
memory box.

Its easy enough to be applied because behaviour is unchanged by default
(you need to change a sysctl value for that).

I would like to understand why does it cause so much improved behaviour
though.

> Marcelo: Is there something wrong with my VM documentation update patches for
> 2.4? Or do you not care and think: "Hello my friend, let's stick with 2.2 VM
> documentation even if almost all of the documentation is not longer valid"

As I said to you in private, please resend.

Thanks!

2004-10-08 03:16:24

by Marcelo Tosatti

[permalink] [raw]
Subject: Re: [2.4] 0-order allocation failed

On Thu, Oct 07, 2004 at 10:05:39PM -0300, Marcelo Tosatti wrote:
> On Thu, Oct 07, 2004 at 08:54:16PM +0200, Marc-Christian Petersen wrote:
> > On Thursday 07 October 2004 20:28, Gabor Z. Papp wrote:
> >
> > Hi all,
> >
> > > | > > Can you check how much swap space is there available when
> > > | > > the OOM killer trigger? I bet this is the case.
> > > | > The machine doesn't have swap.
> > > | Well then you're probably facing true OOM.
> > > | Add some swap.
> >
> > > There is really no way to run 2.4 without swap?
> > > I have the same problem with nfsroot and ramdisk based setups after
> > > 1-2 weeks uptime.
> >
> > stop whining about braindead 2.4 mainline vm. Apply the attached patch and be
> > happy :p
>
> As I told you in private, I can't see how badly this patch could affect performance.
> But then, as you answered, with all anonymous pages added to LRU you see much better
> behavior (tons less swapping) on several workloads. That must be due to
> refill_inactive()/shrink_cache() balancing.

Ah, I dont think this will fix the OOM killer cases with no swap. They look
like plain OOM condition to me.

Wish I'm wrong.

> The same patch also fixes kswapd excessive CPU consumption on huge
> memory box.
>
> Its easy enough to be applied because behaviour is unchanged by default
> (you need to change a sysctl value for that).
>
> I would like to understand why does it cause so much improved behaviour
> though.
>
> > Marcelo: Is there something wrong with my VM documentation update patches for
> > 2.4? Or do you not care and think: "Hello my friend, let's stick with 2.2 VM
> > documentation even if almost all of the documentation is not longer valid"
>
> As I said to you in private, please resend.
>

> Thanks!