Perhaps the time for this has come and gone, but it occurred to me
that it should be relatively straightforward to make a form of
dynamically extended stacks that are appropriate to the kernel.
While we have a good handle on most of the worst stack offenders, we
can still run into trouble with pathological cases (say, symlink
recursion for XFS on a RAID built from loopback mounts over NFS
tunneled over IPSEC through GRE). So there's probably no
one-size-fits-all when it comes to stack size.
Rather than relying on guard pages and VM faults like userspace, we
can use a cooperative scheme where we "label" call paths that might be
extra deep (recursion through the block layer, network tunnels,
symlinks, etc.) with something like the following:
ret = grow_stack(function, arg, GFP_ATOMIC);
This is much like cond_resched() except for stack usage rather than
CPU usage. grow_stack() checks if we're in the danger zone for stack
usage (say 1k remaining), and if so, allocates a new stack and
swizzles the stack pointer over to it.
Then, whether we allocated a new stack page or not, we call
function(arg) to continue with our operation. When function() returns,
we deallocate the new stack (if we built one), switch back to the old
one, and propagate function's return value.
We only get into trouble with this scheme when we can't allocate a new
stack, which will only happen when we're completely out of memory[1]
and we can't sleep waiting for more. In which case, we print a warning
of impending doom and proceed to run with our current stack. This is
the same as our current behavior but with a warning message. For
safety, we can keep a small mempool of extra stacks on hand to avoid
hitting this wall when dealing with OOM in an atomic context.
We can also easily instrument the scheme to print warnings when a
process has allocated more than a couple stacks, with a hard limit to
catch any unbounded recursion.
[1] Assuming we're using 4k stacks, where fragmentation is not an
issue. But there's no reason not to use single-page stacks with this
scheme.
--
Mathematics is the supreme nostalgia of our time.
> While we have a good handle on most of the worst stack offenders, we
> can still run into trouble with pathological cases (say, symlink
> recursion for XFS on a RAID built from loopback mounts over NFS
> tunneled over IPSEC through GRE). So there's probably no
> one-size-fits-all when it comes to stack size.
this is pure conjecture at this time, not a fact. You post it as a fact
here... Have you tried something like this to see the layers you add
actually add stack??
On Sun, Dec 18, 2005 at 04:12:49PM -0800, Matt Mackall wrote:
> Perhaps the time for this has come and gone, but it occurred to me
> that it should be relatively straightforward to make a form of
> dynamically extended stacks that are appropriate to the kernel.
>
> While we have a good handle on most of the worst stack offenders, we
> can still run into trouble with pathological cases (say, symlink
> recursion for XFS on a RAID built from loopback mounts over NFS
> tunneled over IPSEC through GRE). So there's probably no
> one-size-fits-all when it comes to stack size.
My count of bug reports for problems with in-kernel code with 4k stacks
after Neil's patch went into -mm is still at 0. That's amazing
considering how many people have claimed in this thread how unstable
4k stacks were...
> Rather than relying on guard pages and VM faults like userspace, we
> can use a cooperative scheme where we "label" call paths that might be
> extra deep (recursion through the block layer, network tunnels,
> symlinks, etc.) with something like the following:
>
> ret = grow_stack(function, arg, GFP_ATOMIC);
>
> This is much like cond_resched() except for stack usage rather than
> CPU usage. grow_stack() checks if we're in the danger zone for stack
> usage (say 1k remaining), and if so, allocates a new stack and
> swizzles the stack pointer over to it.
>...
If more than 3 kB of stack is used on i386 that's a bug.
And we should fix bugs, not work around them.
CONFIG_DEBUG_STACKOVERFLOW in arch/i386/kernel/irq.c shows the correct
approach:
- a debug option for not harming performance
- dump_stack() when too much stack is used
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
On Mon, Dec 19, 2005 at 07:36:04PM +0100, Adrian Bunk wrote:
> On Sun, Dec 18, 2005 at 04:12:49PM -0800, Matt Mackall wrote:
>
> > Perhaps the time for this has come and gone, but it occurred to me
> > that it should be relatively straightforward to make a form of
> > dynamically extended stacks that are appropriate to the kernel.
> >
> > While we have a good handle on most of the worst stack offenders, we
> > can still run into trouble with pathological cases (say, symlink
> > recursion for XFS on a RAID built from loopback mounts over NFS
> > tunneled over IPSEC through GRE). So there's probably no
> > one-size-fits-all when it comes to stack size.
>
> My count of bug reports for problems with in-kernel code with 4k stacks
> after Neil's patch went into -mm is still at 0. That's amazing
> considering how many people have claimed in this thread how unstable
> 4k stacks were...
I should have said up front that I don't know of any remaining
problems with 4k stacks and support switching to them. Remember, I
dusted off the 4k stack code, cleaned it up, and fixed up some of the
worst offenders in my -tiny tree well before Arjan pushed it to
mainline.
So why am I raising this idea now at all? Because I think Neil's patch
is too clever and too specific to block layer stacking and I'd rather
have a more general solution. Block is by no means the only part of
the system that allows nesting and pathological combinations surely
still exist. And will be introduced in the future.
Also note that my approach might make it reasonable to use one-page
stacks everywhere, not just on x86.
> If more than 3 kB of stack is used on i386 that's a bug.
> And we should fix bugs, not work around them.
One man's fix is another man's work-around.
--
Mathematics is the supreme nostalgia of our time.
On Mon, Dec 19, 2005 at 06:27:59PM -0600, Matt Mackall wrote:
>...
> So why am I raising this idea now at all? Because I think Neil's patch
> is too clever and too specific to block layer stacking and I'd rather
> have a more general solution. Block is by no means the only part of
> the system that allows nesting and pathological combinations surely
> still exist. And will be introduced in the future.
>
> Also note that my approach might make it reasonable to use one-page
> stacks everywhere, not just on x86.
>...
I'm really looking forward to seeing your patch.
It will e.g. be interesting to measure whether there'll be any
performance impact.
And since after this patch driver authors might become more sloppy with
stack usage since there's no longer a hard limit, it will be especially
interesting to see how you'll implement ensuring that there are no
additional stack usages > 1 kB between two invocations of you check
function, because otherwise your patch won't work reliable.
cu
Adrian
--
"Is there not promise of rain?" Ling Tan asked suddenly out
of the darkness. There had been need of rain for many days.
"Only a promise," Lao Er said.
Pearl S. Buck - Dragon Seed
On Tue, Dec 20, 2005 at 05:43:17PM +0100, Adrian Bunk wrote:
> On Mon, Dec 19, 2005 at 06:27:59PM -0600, Matt Mackall wrote:
> >...
> > So why am I raising this idea now at all? Because I think Neil's patch
> > is too clever and too specific to block layer stacking and I'd rather
> > have a more general solution. Block is by no means the only part of
> > the system that allows nesting and pathological combinations surely
> > still exist. And will be introduced in the future.
> >
> > Also note that my approach might make it reasonable to use one-page
> > stacks everywhere, not just on x86.
> >...
>
> I'm really looking forward to seeing your patch.
I might get to it after the New Year.
> It will e.g. be interesting to measure whether there'll be any
> performance impact.
>
> And since after this patch driver authors might become more sloppy with
> stack usage since there's no longer a hard limit, it will be especially
> interesting to see how you'll implement ensuring that there are no
> additional stack usages > 1 kB between two invocations of you check
> function, because otherwise your patch won't work reliable.
I can give you the answer to that now: use the existing stack overflow
detection.
This is not intended to be an automatic scheme. To use it, you must
actually insert code into the troublesome codepaths, which will of
course serve as a red flag for code review.
It's entirely possible that this idea won't be needed on x86 because
we can manage to squeeze everything into 4k. But 32-bit x86's days as
the majority platform are numbered.
--
Mathematics is the supreme nostalgia of our time.
Matt Mackall wrote:
>
> This is not intended to be an automatic scheme. To use it, you must
> actually insert code into the troublesome codepaths, which will of
> course serve as a red flag for code review.
>
It might be an idea to put in a #warn to make it an even bigger red
flag, and to really make people fix this rather than just ignore it
since it works with the dynamic stacks.
On Tue, 20 Dec 2005 14:40:06 EST, Patrick McLean said:
> Matt Mackall wrote:
> >
> > This is not intended to be an automatic scheme. To use it, you must
> > actually insert code into the troublesome codepaths, which will of
> > course serve as a red flag for code review.
> >
>
> It might be an idea to put in a #warn to make it an even bigger red
> flag, and to really make people fix this rather than just ignore it
> since it works with the dynamic stacks.
Stick a 'depends on CONFIG_BROKEN' on it? ;)