2005-03-27 06:52:55

by Paul Jackson

[permalink] [raw]
Subject: [PATCH 2.6.12-rc1] cpusets special case GFP_ATOMIC allocs

Stringent enforcement of cpuset memory placement could cause
the kernel to panic on a GFP_ATOMIC (!wait) memory allocation,
even though memory was available elsewhere in the system.

Relax the cpuset constraint, on the last zone loop in
mm/page_alloc.c:__alloc_pages(), for ATOMIC requests.

Signed-off-by: Paul Jackson <[email protected]>

Index: 2.6.12-pj/Documentation/cpusets.txt
===================================================================
--- 2.6.12-pj.orig/Documentation/cpusets.txt 2005-03-26 22:34:46.000000000 -0800
+++ 2.6.12-pj/Documentation/cpusets.txt 2005-03-26 22:34:47.000000000 -0800
@@ -262,6 +262,14 @@ that has had all its allowed CPUs or Mem
code should reconfigure cpusets to only refer to online CPUs and Memory
Nodes when using hotplug to add or remove such resources.

+There is a second exception to the above. GFP_ATOMIC requests are
+kernel internal allocations that must be satisfied, immediately.
+The kernel may panic if such a requested page is not allocated.
+If such a request cannot be satisfied within the cpusets allowed
+memory, then we relax the cpuset boundaries and allow any page in
+the system to satisfy a GFP_ATOMIC request. It is better to violate
+the cpuset constraints than it is to panic the kernel.
+
To start a new job that is to be contained within a cpuset, the steps are:

1) mkdir /dev/cpuset
Index: 2.6.12-pj/mm/page_alloc.c
===================================================================
--- 2.6.12-pj.orig/mm/page_alloc.c 2005-03-26 22:34:38.000000000 -0800
+++ 2.6.12-pj/mm/page_alloc.c 2005-03-26 22:47:49.000000000 -0800
@@ -780,6 +780,9 @@ __alloc_pages(unsigned int gfp_mask, uns
/*
* Go through the zonelist again. Let __GFP_HIGH and allocations
* coming from realtime tasks to go deeper into reserves
+ *
+ * This is the last chance, in general, before the goto nopage.
+ * Ignore cpuset if GFP_ATOMIC (!wait) - better that than panic.
*/
for (i = 0; (z = zones[i]) != NULL; i++) {
if (!zone_watermark_ok(z, order, z->pages_min,
@@ -787,7 +790,7 @@ __alloc_pages(unsigned int gfp_mask, uns
gfp_mask & __GFP_HIGH))
continue;

- if (!cpuset_zone_allowed(z))
+ if (wait && !cpuset_zone_allowed(z))
continue;

page = buffered_rmqueue(z, order, gfp_mask);

--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373, 1.925.600.0401


2005-03-27 06:59:46

by Nick Piggin

[permalink] [raw]
Subject: Re: [PATCH 2.6.12-rc1] cpusets special case GFP_ATOMIC allocs

Paul Jackson wrote:
> Stringent enforcement of cpuset memory placement could cause
> the kernel to panic on a GFP_ATOMIC (!wait) memory allocation,
> even though memory was available elsewhere in the system.
>
> Relax the cpuset constraint, on the last zone loop in
> mm/page_alloc.c:__alloc_pages(), for ATOMIC requests.
>

Kernel should not panic if a GFP_ATOMIC allocation fails.
Where is this happening?

2005-03-27 09:16:59

by Paul Jackson

[permalink] [raw]
Subject: Re: [PATCH 2.6.12-rc1] cpusets special case GFP_ATOMIC allocs

Nick wrote:
> Kernel should not panic if a GFP_ATOMIC allocation fails.
> Where is this happening?

I didn't actually see any such panics occur. This patch came from
reading code, not any actual crash seen so far. The closest thing to a
real world event that I saw was a network connection that got dropped
once, on a system we had under severe memory distress, and _if_ the
allocating task had been in a cpuset (which it wasn't, in this instance)
it would likely have dropped the connection even sooner, on a failed
GFP_ATOMIC.

But in any case, since there were other special cases in __alloc_pages()
for ATOMIC (!wait or can_try_harder) requests, it seemed like an
unsurprising tradeoff to special case this one too.

Even if the following apparent panics for a failed ATOMIC allocation are
not relevant (perhaps they are all init routines, and so don't apply to
a normally running system), it seemed to me that the kernel would be
under increased stress and start dropping things, such as the network
connection I saw dropped, if ATOMIC's failed. I did not want some
ordinary user task, in their own small cpuset, to be able to damage the
kernel behaviour for the rest of the system, at least not that easily.

At least for our (SGI) users, cpusets are of greatest interest for page
allocations backing user address space, and they expect that the kernel
will do whatever it needs to do, in order to stay healthy, including
taking some (so long as its not too much) memory off each node in the
system. My understanding was that GFP_ATOMIC allocations were more or
less always done for pages backing kernel address space, so letting them
steal from outside the cpuset under memory stress seems fine by me.

====

The following places were found, simply grep'ing for failed GFP_ATOMIC
allocation requests followed by a panic:

==

arch/ppc64/kernel/eeh.c: event = kmalloc(sizeof(*event), GFP_ATOMIC);
arch/ppc64/kernel/eeh.c- if (event == NULL) {
arch/ppc64/kernel/eeh.c- eeh_panic(dev, reset_state);

==

arch/ppc64/kernel/iommu.c: tbl->it_map = (unsigned long *)__get_free_pages(GFP_ATOMIC, get_order(sz));
arch/ppc64/kernel/iommu.c- if (!tbl->it_map)
arch/ppc64/kernel/iommu.c- panic("iommu_init_table: Can't allocate %ld bytes\n",sz);

==

arch/ppc64/mm/init.c: kcore_mem = kmalloc(sizeof(struct kcore_list), GFP_ATOMIC);
arch/ppc64/mm/init.c- if (!kcore_mem)
arch/ppc64/mm/init.c- panic("mem_init: kmalloc failed\n");

==

arch/sparc64/kernel/ebus.c: mem = kmalloc(size, GFP_ATOMIC);
arch/sparc64/kernel/ebus.c- if (!mem)
arch/sparc64/kernel/ebus.c- panic("ebus_alloc: out of memory");

==

arch/v850/kernel/rte_mb_a_pci.c: block = kmalloc (block_size, GFP_ATOMIC);
arch/v850/kernel/rte_mb_a_pci.c- if (! block)
arch/v850/kernel/rte_mb_a_pci.c- panic ("free_mb_sram: can't allocate free-list entry");

==

arch/x86_64/kernel/setup64.c- pda->irqstackptr = (char *)
arch/x86_64/kernel/setup64.c: __get_free_pages(GFP_ATOMIC, IRQSTACK_ORDER);
arch/x86_64/kernel/setup64.c- if (!pda->irqstackptr)
arch/x86_64/kernel/setup64.c- panic("cannot allocate irqstack for cpu %d", cpu);

==

drivers/nubus/nubus.c- /* Actually we should probably panic if this fails */
drivers/nubus/nubus.c: if ((dev = kmalloc(sizeof(*dev), GFP_ATOMIC)) == NULL)
drivers/nubus/nubus.c- return NULL;

==

fs/dquot.c: dquot_hash = (struct hlist_head *)__get_free_pages(GFP_ATOMIC, order);
fs/dquot.c- if (!dquot_hash)
fs/dquot.c- panic("Cannot create dquot hash table");

==

sound/core/init.c: s_f_ops = kmalloc(sizeof(struct snd_shutdown_f_ops), GFP_ATOMIC);
sound/core/init.c- if (s_f_ops == NULL)
sound/core/init.c- panic("Atomic allocation failed for snd_shutdown_f_ops!");


--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373, 1.925.600.0401

2005-03-28 07:03:36

by Paul Jackson

[permalink] [raw]
Subject: [PATCH 2.6.12-rc1] cpusets GFP_ATOMIC fix: tonedown panic comment

This patch applies on top of my patch of March 26, entitled "cpusets
special case GFP_ATOMIC allocs". It tones down my panic'y commentary.

My commentary shouldn't imply that failed GFP_ATOMICs should lead to,
or normally lead to, panics. Even though there are a few panic()
calls following failed GFP_ATOMIC allocs, this is not the usual or
desired result of a failed GFP_ATOMIC. The kernel will probably
drop some detail on the floor and keep on working.

Thanks to Nick Piggin for noticing (I hope this answers his point.)

Signed-off-by: Paul Jackson <[email protected]>

Index: 2.6.12-pj/Documentation/cpusets.txt
===================================================================
--- 2.6.12-pj.orig/Documentation/cpusets.txt 2005-03-27 22:48:14.000000000 -0800
+++ 2.6.12-pj/Documentation/cpusets.txt 2005-03-27 22:48:22.000000000 -0800
@@ -264,11 +264,11 @@ Nodes when using hotplug to add or remov

There is a second exception to the above. GFP_ATOMIC requests are
kernel internal allocations that must be satisfied, immediately.
-The kernel may panic if such a requested page is not allocated.
-If such a request cannot be satisfied within the cpusets allowed
-memory, then we relax the cpuset boundaries and allow any page in
-the system to satisfy a GFP_ATOMIC request. It is better to violate
-the cpuset constraints than it is to panic the kernel.
+The kernel may drop some request, in rare cases even panic, if a
+GFP_ATOMIC alloc fails. If the request cannot be satisfied within
+the current tasks cpuset, then we relax the cpuset, and look for
+memory anywhere we can find it. It's better to violate the cpuset
+than stress the kernel.

To start a new job that is to be contained within a cpuset, the steps are:

Index: 2.6.12-pj/mm/page_alloc.c
===================================================================
--- 2.6.12-pj.orig/mm/page_alloc.c 2005-03-27 22:48:14.000000000 -0800
+++ 2.6.12-pj/mm/page_alloc.c 2005-03-27 22:48:42.000000000 -0800
@@ -782,7 +782,7 @@ __alloc_pages(unsigned int gfp_mask, uns
* coming from realtime tasks to go deeper into reserves
*
* This is the last chance, in general, before the goto nopage.
- * Ignore cpuset if GFP_ATOMIC (!wait) - better that than panic.
+ * Ignore cpuset if GFP_ATOMIC (!wait) rather than fail alloc.
*/
for (i = 0; (z = zones[i]) != NULL; i++) {
if (!zone_watermark_ok(z, order, z->pages_min,


--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[email protected]> 1.650.933.1373, 1.925.600.0401