2007-10-04 04:03:23

by Christoph Lameter

[permalink] [raw]
Subject: [13/18] x86_64: Allow fallback for the stack

Peter Zijlstra has recently demonstrated that we can have order 1 allocation
failures under memory pressure with small memory configurations. The
x86_64 stack has a size of 8k and thus requires a order 1 allocation.

This patch adds a virtual fallback capability for the stack. The system may
continue even in extreme situations and we may be able to increase the stack
size if necessary (see next patch).

Cc: [email protected]
Cc: [email protected]
Signed-off-by: Christoph Lameter <[email protected]>

---
include/asm-x86_64/thread_info.h | 16 +++++-----------
1 file changed, 5 insertions(+), 11 deletions(-)

Index: linux-2.6/include/asm-x86_64/thread_info.h
===================================================================
--- linux-2.6.orig/include/asm-x86_64/thread_info.h 2007-10-03 14:49:48.000000000 -0700
+++ linux-2.6/include/asm-x86_64/thread_info.h 2007-10-03 14:51:00.000000000 -0700
@@ -74,20 +74,14 @@ static inline struct thread_info *stack_

/* thread information allocation */
#ifdef CONFIG_DEBUG_STACK_USAGE
-#define alloc_thread_info(tsk) \
- ({ \
- struct thread_info *ret; \
- \
- ret = ((struct thread_info *) __get_free_pages(GFP_KERNEL,THREAD_ORDER)); \
- if (ret) \
- memset(ret, 0, THREAD_SIZE); \
- ret; \
- })
+#define THREAD_FLAGS (GFP_VFALLBACK | __GFP_ZERO)
#else
-#define alloc_thread_info(tsk) \
- ((struct thread_info *) __get_free_pages(GFP_KERNEL,THREAD_ORDER))
+#define THREAD_FLAGS GFP_VFALLBACK
#endif

+#define alloc_thread_info(tsk) \
+ ((struct thread_info *) __get_free_pages(THREAD_FLAGS, THREAD_ORDER))
+
#define free_thread_info(ti) free_pages((unsigned long) (ti), THREAD_ORDER)

#else /* !__ASSEMBLY__ */

--


2007-10-04 11:59:31

by Andi Kleen

[permalink] [raw]
Subject: Re: [13/18] x86_64: Allow fallback for the stack

On Thursday 04 October 2007 05:59:48 Christoph Lameter wrote:
> Peter Zijlstra has recently demonstrated that we can have order 1 allocation
> failures under memory pressure with small memory configurations. The
> x86_64 stack has a size of 8k and thus requires a order 1 allocation.

We've known for ages that it is possible. But it has been always so rare
that it was ignored.

Is there any evidence this is more common now than it used to be?

-Andi

2007-10-04 12:08:28

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [13/18] x86_64: Allow fallback for the stack

On Thu, 2007-10-04 at 13:56 +0200, Andi Kleen wrote:
> On Thursday 04 October 2007 05:59:48 Christoph Lameter wrote:
> > Peter Zijlstra has recently demonstrated that we can have order 1 allocation
> > failures under memory pressure with small memory configurations. The
> > x86_64 stack has a size of 8k and thus requires a order 1 allocation.
>
> We've known for ages that it is possible. But it has been always so rare
> that it was ignored.
>
> Is there any evidence this is more common now than it used to be?

The order-1 allocation failures where GFP_ATOMIC, because SLUB uses !0
order for everything. Kernel stack allocation is GFP_KERNEL I presume.
Also, I use 4k stacks on all my machines.

Maybe the cpumask thing needs an extended api, one that falls back to
kmalloc if NR_CPUS >> sane.

That way that cannot be an argument to inflate stacks.

2007-10-04 12:25:56

by Andi Kleen

[permalink] [raw]
Subject: Re: [13/18] x86_64: Allow fallback for the stack


> The order-1 allocation failures where GFP_ATOMIC, because SLUB uses !0
> order for everything.

slub is wrong then. Can it be fixed?

> Kernel stack allocation is GFP_KERNEL I presume.

Of course.

> Also, I use 4k stacks on all my machines.

You don't have any x86-64 machines?

-Andi

2007-10-04 12:30:25

by Peter Zijlstra

[permalink] [raw]
Subject: Re: [13/18] x86_64: Allow fallback for the stack

On Thu, 2007-10-04 at 14:25 +0200, Andi Kleen wrote:
> > The order-1 allocation failures where GFP_ATOMIC, because SLUB uses !0
> > order for everything.
>
> slub is wrong then. Can it be fixed?

I think mainline slub doesn't do this, just -mm.

See DEFAULT_MAX_ORDER in mm/slub.c

> > Kernel stack allocation is GFP_KERNEL I presume.
>
> Of course.
>
> > Also, I use 4k stacks on all my machines.
>
> You don't have any x86-64 machines?

Ah, my bad, yes I do, but I (wrongly) thought they had that option too.

2007-10-04 17:40:49

by Christoph Lameter

[permalink] [raw]
Subject: Re: [13/18] x86_64: Allow fallback for the stack

On Thu, 4 Oct 2007, Andi Kleen wrote:

> > The order-1 allocation failures where GFP_ATOMIC, because SLUB uses !0
> > order for everything.
>
> slub is wrong then. Can it be fixed?

SLUB in mm kernels was using higher order allocations for some slabs
for the last 6 months or so. Not true for upstream.

2007-10-04 19:21:01

by Christoph Lameter

[permalink] [raw]
Subject: Re: [13/18] x86_64: Allow fallback for the stack

On Thu, 4 Oct 2007, Andi Kleen wrote:

> We've known for ages that it is possible. But it has been always so rare
> that it was ignored.

Well we can now address the rarity. That is the whole point of the
patchset.

> Is there any evidence this is more common now than it used to be?

It will be more common if the stack size is increased beyond 8k.


2007-10-04 19:40:11

by Rik van Riel

[permalink] [raw]
Subject: Re: [13/18] x86_64: Allow fallback for the stack

On Thu, 4 Oct 2007 12:20:50 -0700 (PDT)
Christoph Lameter <[email protected]> wrote:

> On Thu, 4 Oct 2007, Andi Kleen wrote:
>
> > We've known for ages that it is possible. But it has been always so
> > rare that it was ignored.
>
> Well we can now address the rarity. That is the whole point of the
> patchset.

Introducing complexity to fight a very rare problem with a good
fallback (refusing to fork more tasks, as well as lumpy reclaim)
somehow does not seem like a good tradeoff.

> > Is there any evidence this is more common now than it used to be?
>
> It will be more common if the stack size is increased beyond 8k.

Why would we want to do such a thing?

8kB stacks are large enough...

--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

2007-10-04 21:20:30

by Christoph Lameter

[permalink] [raw]
Subject: Re: [13/18] x86_64: Allow fallback for the stack

On Thu, 4 Oct 2007, Rik van Riel wrote:

> > Well we can now address the rarity. That is the whole point of the
> > patchset.
>
> Introducing complexity to fight a very rare problem with a good
> fallback (refusing to fork more tasks, as well as lumpy reclaim)
> somehow does not seem like a good tradeoff.

The problem can become non-rare on special low memory machines doing wild
swapping things though.

> > It will be more common if the stack size is increased beyond 8k.
>
> Why would we want to do such a thing?

Because NUMA requires more stack space. In particular support for very
large cpu configurations of 16k may require 2k cpumasks on the stack.

> 8kB stacks are large enough...

For many things yes. I just want to have the compile time option to
increase it.

2007-10-06 18:47:44

by Bill Davidsen

[permalink] [raw]
Subject: Re: [13/18] x86_64: Allow fallback for the stack

Rik van Riel wrote:
> On Thu, 4 Oct 2007 12:20:50 -0700 (PDT)
> Christoph Lameter <[email protected]> wrote:
>
>> On Thu, 4 Oct 2007, Andi Kleen wrote:
>>
>>> We've known for ages that it is possible. But it has been always so
>>> rare that it was ignored.
>> Well we can now address the rarity. That is the whole point of the
>> patchset.
>
> Introducing complexity to fight a very rare problem with a good
> fallback (refusing to fork more tasks, as well as lumpy reclaim)
> somehow does not seem like a good tradeoff.
>
>>> Is there any evidence this is more common now than it used to be?
>> It will be more common if the stack size is increased beyond 8k.
>
> Why would we want to do such a thing?
>
> 8kB stacks are large enough...
>
Why would anyone need more than 640k... In addition to NUMA, who can
tell what some future hardware might do, given that the size of memory
is expanding as if it were covered in Moore's Law. As memory sizes
increase someone will bump the page size again. Better to Let people
make it as large as they feel they need and warn at build time
performance may suck.

--
Bill Davidsen <[email protected]>
"We have more to fear from the bungling of the incompetent than from
the machinations of the wicked." - from Slashdot

2007-10-08 00:07:36

by Nick Piggin

[permalink] [raw]
Subject: Re: [13/18] x86_64: Allow fallback for the stack

On Friday 05 October 2007 07:20, Christoph Lameter wrote:
> On Thu, 4 Oct 2007, Rik van Riel wrote:
> > > Well we can now address the rarity. That is the whole point of the
> > > patchset.
> >
> > Introducing complexity to fight a very rare problem with a good
> > fallback (refusing to fork more tasks, as well as lumpy reclaim)
> > somehow does not seem like a good tradeoff.
>
> The problem can become non-rare on special low memory machines doing wild
> swapping things though.

But only your huge systems will be using huge stacks?

2007-10-08 17:36:54

by Christoph Lameter

[permalink] [raw]
Subject: Re: [13/18] x86_64: Allow fallback for the stack

On Sun, 7 Oct 2007, Nick Piggin wrote:

> > The problem can become non-rare on special low memory machines doing wild
> > swapping things though.
>
> But only your huge systems will be using huge stacks?

I have no idea who else would be using such a feature. Relaxing the tight
memory restrictions on stack use may allow placing larger structures on
the stack in general.

I have some concerns about the medium NUMA systems (a few dozen of nodes)
also running out of stack since more data is placed on the stack through
the policy layer and since we may end up with a couple of stacked
filesystems. Most of the current NUMA systems on x86_64 are basically
two nodes on one motherboard. The use of NUMA controls is likely
limited there and the complexity of the filesystems is also not high.


2007-10-09 05:26:55

by Nick Piggin

[permalink] [raw]
Subject: Re: [13/18] x86_64: Allow fallback for the stack

On Tuesday 09 October 2007 03:36, Christoph Lameter wrote:
> On Sun, 7 Oct 2007, Nick Piggin wrote:
> > > The problem can become non-rare on special low memory machines doing
> > > wild swapping things though.
> >
> > But only your huge systems will be using huge stacks?
>
> I have no idea who else would be using such a feature. Relaxing the tight
> memory restrictions on stack use may allow placing larger structures on
> the stack in general.

The tight memory restrictions on stack usage do not come about because
of the difficulty in increasing the stack size :) It is because we want to
keep stack sizes small!

Increasing the stack size 4K uses another 4MB of memory for every 1000
threads you have, right?

It would take a lot of good reason to move away from the general direction
we've been taking over the past years that 4/8K stacks are a good idea for
regular 32 and 64 bit builds in general.


> I have some concerns about the medium NUMA systems (a few dozen of nodes)
> also running out of stack since more data is placed on the stack through
> the policy layer and since we may end up with a couple of stacked
> filesystems. Most of the current NUMA systems on x86_64 are basically
> two nodes on one motherboard. The use of NUMA controls is likely
> limited there and the complexity of the filesystems is also not high.

The solution has until now always been to fix the problems so they don't
use so much stack. Maybe a bigger stack is OK for you for 1024+ CPU
systems, but I don't think you'd be able to make that assumption for most
normal systems.

2007-10-09 18:39:31

by Christoph Lameter

[permalink] [raw]
Subject: Re: [13/18] x86_64: Allow fallback for the stack

On Mon, 8 Oct 2007, Nick Piggin wrote:

> The tight memory restrictions on stack usage do not come about because
> of the difficulty in increasing the stack size :) It is because we want to
> keep stack sizes small!
>
> Increasing the stack size 4K uses another 4MB of memory for every 1000
> threads you have, right?
>
> It would take a lot of good reason to move away from the general direction
> we've been taking over the past years that 4/8K stacks are a good idea for
> regular 32 and 64 bit builds in general.

We already use 32k stacks on IA64. So the memory argument fail there.

> > I have some concerns about the medium NUMA systems (a few dozen of nodes)
> > also running out of stack since more data is placed on the stack through
> > the policy layer and since we may end up with a couple of stacked
> > filesystems. Most of the current NUMA systems on x86_64 are basically
> > two nodes on one motherboard. The use of NUMA controls is likely
> > limited there and the complexity of the filesystems is also not high.
>
> The solution has until now always been to fix the problems so they don't
> use so much stack. Maybe a bigger stack is OK for you for 1024+ CPU
> systems, but I don't think you'd be able to make that assumption for most
> normal systems.

Yes that is why I made the stack size configurable.

2007-10-10 01:18:19

by Nick Piggin

[permalink] [raw]
Subject: Re: [13/18] x86_64: Allow fallback for the stack

On Wednesday 10 October 2007 04:39, Christoph Lameter wrote:
> On Mon, 8 Oct 2007, Nick Piggin wrote:
> > The tight memory restrictions on stack usage do not come about because
> > of the difficulty in increasing the stack size :) It is because we want
> > to keep stack sizes small!
> >
> > Increasing the stack size 4K uses another 4MB of memory for every 1000
> > threads you have, right?
> >
> > It would take a lot of good reason to move away from the general
> > direction we've been taking over the past years that 4/8K stacks are a
> > good idea for regular 32 and 64 bit builds in general.
>
> We already use 32k stacks on IA64. So the memory argument fail there.

I'm talking about generic code.


> > > I have some concerns about the medium NUMA systems (a few dozen of
> > > nodes) also running out of stack since more data is placed on the stack
> > > through the policy layer and since we may end up with a couple of
> > > stacked filesystems. Most of the current NUMA systems on x86_64 are
> > > basically two nodes on one motherboard. The use of NUMA controls is
> > > likely limited there and the complexity of the filesystems is also not
> > > high.
> >
> > The solution has until now always been to fix the problems so they don't
> > use so much stack. Maybe a bigger stack is OK for you for 1024+ CPU
> > systems, but I don't think you'd be able to make that assumption for most
> > normal systems.
>
> Yes that is why I made the stack size configurable.

Fine. I just don't see why you need this fallback.

2007-10-10 01:27:08

by Christoph Lameter

[permalink] [raw]
Subject: Re: [13/18] x86_64: Allow fallback for the stack

On Tue, 9 Oct 2007, Nick Piggin wrote:

> > We already use 32k stacks on IA64. So the memory argument fail there.
>
> I'm talking about generic code.

The stack size is set in arch code not in generic code.

> > > The solution has until now always been to fix the problems so they don't
> > > use so much stack. Maybe a bigger stack is OK for you for 1024+ CPU
> > > systems, but I don't think you'd be able to make that assumption for most
> > > normal systems.
> >
> > Yes that is why I made the stack size configurable.
>
> Fine. I just don't see why you need this fallback.

So you would be ok with submitting the configurable stacksize patches
separately without the fallback?

2007-10-10 02:28:19

by Nick Piggin

[permalink] [raw]
Subject: Re: [13/18] x86_64: Allow fallback for the stack

On Wednesday 10 October 2007 11:26, Christoph Lameter wrote:
> On Tue, 9 Oct 2007, Nick Piggin wrote:
> > > We already use 32k stacks on IA64. So the memory argument fail there.
> >
> > I'm talking about generic code.
>
> The stack size is set in arch code not in generic code.

Generic code must assume a 4K stack on 32-bit, in general (modulo
huge cpumasks and such, I guess).


> > > > The solution has until now always been to fix the problems so they
> > > > don't use so much stack. Maybe a bigger stack is OK for you for 1024+
> > > > CPU systems, but I don't think you'd be able to make that assumption
> > > > for most normal systems.
> > >
> > > Yes that is why I made the stack size configurable.
> >
> > Fine. I just don't see why you need this fallback.
>
> So you would be ok with submitting the configurable stacksize patches
> separately without the fallback?

Sure. It's already configurable on other architectures.