2008-06-18 00:48:21

by Mikulas Patocka

[permalink] [raw]
Subject: stack overflow on Sparc64

Hi

I am getting stack overflows on my Sparc64 station. They happen when I
copy to device-mapper snapshot origin device using small IO size (512
bytes) and simultaneously execute "lvs" command. The kernel is compiled
with most debugging functions enabled. The stack trace is this:

__ide_end_request
__blk_end_request
__end_that_request_first
req_bio_endio
bio_endio
clone_endio
dec_pending
bio_endio
clone_endio
dec_pending
bio_endio
clone_endio
dec_pending
bio_endio
end_bio_bh_io_sync
end_buffer_read_sync
__end_buffer_read_notouch
unlock_buffer
wake_up_bit
__wake_up_bit
__wake_up
__wake_up_common
wake_bit_function
autoremove_wake_function
default_wake_function
try_to_wake_up
task_rq_lock
__spin_lock
lock_acquire
__lock_acquire
*** crash, stack overflow

--- observations:

That loop bio_endio->clone_endio->dec_pending is repeating for each level
of nested devices --- so for any architecture there exists a level at
which it causes trouble. We need something to prevent recursion, maybe the
similar trick that was done with avoing bio request function recursion
(i.e. if bio_endio is called recursively, it just adds the bio to queue
and lets the top level to call endio method).

Wait queue waking looks like being written by a high-level maniac --- it
contains 8 levels of calls (none of them inlined). 7 of these calls (until
try_to_wake_up) do nothing but pass arguments to lower level call. And
each of these calls allocate at least 192 bytes of stack space. All these
7 useless calls consume 1360 bytes of stack (and cause windows traps that
needlessly damage performance). Would you agree to inline most of the
calls to save stack? Or do you see another solution?

Long-term consideration: Is it possible to implement interrupt stacks on
sparc64? Functions on sparc eat stack much more aggressively than on other
architectures (minimum stack size for a function is 192 bytes).

Mikulas


2008-06-18 04:02:12

by David Miller

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

From: Mikulas Patocka <[email protected]>
Date: Tue, 17 Jun 2008 20:47:57 -0400 (EDT)

> Wait queue waking looks like being written by a high-level maniac --- it
> contains 8 levels of calls (none of them inlined). 7 of these calls (until
> try_to_wake_up) do nothing but pass arguments to lower level call. And
> each of these calls allocate at least 192 bytes of stack space. All these
> 7 useless calls consume 1360 bytes of stack (and cause windows traps that
> needlessly damage performance). Would you agree to inline most of the
> calls to save stack? Or do you see another solution?

Some of them could be inlined but there are a few limiting
factors here.

Even spin lock acquisitions are function calls, limiting how
much leaf function and tail call optimizations can be done.

Also, wake_up_bit has this aggregate local variable "key" whose
address is passed down to subsequent functions, which limits
optimizations even further.

It could still be improved a lot, however.

> Long-term consideration: Is it possible to implement interrupt stacks on
> sparc64? Functions on sparc eat stack much more aggressively than on other
> architectures (minimum stack size for a function is 192 bytes).

I had a patch but at the time I wrote it (several years ago) I
couldn't make it stable enough to put mainline, I may resurrect it.

I just did a quick scan and I can't find the last copy I had, and
things have changed enough that I'd probably work from scratch
anyways.

But the level of recursion possible by the current device layer is
excessive and needs to be curtained irrespective of these generic
wakeup and sparc64 interrupt stack issues.

2008-06-19 03:24:36

by Mikulas Patocka

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

On Tue, 17 Jun 2008, David Miller wrote:

> From: Mikulas Patocka <[email protected]>
> Date: Tue, 17 Jun 2008 20:47:57 -0400 (EDT)
>
>> Wait queue waking looks like being written by a high-level maniac --- it
>> contains 8 levels of calls (none of them inlined). 7 of these calls (until
>> try_to_wake_up) do nothing but pass arguments to lower level call. And
>> each of these calls allocate at least 192 bytes of stack space. All these
>> 7 useless calls consume 1360 bytes of stack (and cause windows traps that
>> needlessly damage performance). Would you agree to inline most of the
>> calls to save stack? Or do you see another solution?
>
> Some of them could be inlined but there are a few limiting
> factors here.

I inlined three of them, I think I can inline another two. So hopefully,
I'll be able to shring 8-call depth to 3-call depth.

> Even spin lock acquisitions are function calls, limiting how
> much leaf function and tail call optimizations can be done.

Tail call optimization is not done at all if you compile kernel with stack
checking. This contributes to the stack overflow too.

> Also, wake_up_bit has this aggregate local variable "key" whose
> address is passed down to subsequent functions, which limits
> optimizations even further.
>
> It could still be improved a lot, however.
>
> But the level of recursion possible by the current device layer is
> excessive and needs to be curtained irrespective of these generic
> wakeup and sparc64 interrupt stack issues.

I fixed that too.

BTW. what's the purpose of having 192-byte stack frame? There are 16
8-byte registers being saved per function call, so 128-byte frame should
be sufficient, shoudn't? The ABI specifies that some additional entries
must be present even if unused, but I don't see reason for them. Would
something bad happen if GCC started to generate 128-byte stacks?

Mikulas

2008-06-19 04:00:00

by David Miller

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

From: Mikulas Patocka <[email protected]>
Date: Wed, 18 Jun 2008 23:24:20 -0400 (EDT)

> BTW. what's the purpose of having 192-byte stack frame? There are 16
> 8-byte registers being saved per function call, so 128-byte frame should
> be sufficient, shoudn't? The ABI specifies that some additional entries
> must be present even if unused, but I don't see reason for them. Would
> something bad happen if GCC started to generate 128-byte stacks?

The callee can pop the arguments into the area past the
register window.

So you have the 128 byte register window save area, 6
slots for incoming arguments, which gives us 176 bytes.
The rest is for some miscellaneous stack frame state,
which I don't remember the details of at the moment.
I'd have to read the sparc backend of gcc to remember.

2008-06-19 05:17:53

by Mikulas Patocka

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

On Wed, 18 Jun 2008, David Miller wrote:

> From: Mikulas Patocka <[email protected]>
> Date: Wed, 18 Jun 2008 23:24:20 -0400 (EDT)
>
>> BTW. what's the purpose of having 192-byte stack frame? There are 16
>> 8-byte registers being saved per function call, so 128-byte frame should
>> be sufficient, shoudn't? The ABI specifies that some additional entries
>> must be present even if unused, but I don't see reason for them. Would
>> something bad happen if GCC started to generate 128-byte stacks?
>
> The callee can pop the arguments into the area past the
> register window.

I see ... the callee writes arguments into caller's stack frame, if it has
variable number of arguments. That it misdesign, the callee should write
registers arguments into it's own frame like on AMD64 (then this space
would be allocated only if needed).
But nothing can be done with it since ABI was specified :-(

Mikulas

> So you have the 128 byte register window save area, 6
> slots for incoming arguments, which gives us 176 bytes.
> The rest is for some miscellaneous stack frame state,
> which I don't remember the details of at the moment.
> I'd have to read the sparc backend of gcc to remember.
>

2008-06-19 06:37:26

by David Miller

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

From: Mikulas Patocka <[email protected]>
Date: Thu, 19 Jun 2008 01:17:39 -0400 (EDT)

> I see ... the callee writes arguments into caller's stack frame, if it has
> variable number of arguments. That it misdesign, the callee should write
> registers arguments into it's own frame like on AMD64 (then this space
> would be allocated only if needed).

The callee can do this even for non-variable argument lists.

It's like a set of pre-allocated stack slots for those incoming
argument registers when reloading under register pressure.

In my opinion it is better to put this onus on the callee because only
the callee knows if it needs to pop these values onto the stack to
alleviate register pressure.

I think it might be possible for the compiler to only use 176 bytes.
I'll take a look at the gcc sparc backend and the ABI specification
to see if this is the case.

2008-06-19 13:02:28

by Mikulas Patocka

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

On Wed, 18 Jun 2008, David Miller wrote:

> From: Mikulas Patocka <[email protected]>
> Date: Thu, 19 Jun 2008 01:17:39 -0400 (EDT)
>
>> I see ... the callee writes arguments into caller's stack frame, if it has
>> variable number of arguments. That it misdesign, the callee should write
>> registers arguments into it's own frame like on AMD64 (then this space
>> would be allocated only if needed).
>
> The callee can do this even for non-variable argument lists.
>
> It's like a set of pre-allocated stack slots for those incoming
> argument registers when reloading under register pressure.
>
> In my opinion it is better to put this onus on the callee because only
> the callee knows if it needs to pop these values onto the stack to
> alleviate register pressure.
>
> I think it might be possible for the compiler to only use 176 bytes.
> I'll take a look at the gcc sparc backend and the ABI specification
> to see if this is the case.

Yes, it could be shrunk to 176 bytes. Maybe there could be some
performance problems if the spills are cacheline-unaligned. Or better ---
make special -mkernel-abi function to gcc that will drop this area at all
and make 128-byte frames. In kernel it wouldn't matter that ABI is
incompatible.

Mikulas

2008-06-20 15:47:33

by Mikulas Patocka

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

> It could still be improved a lot, however.
>
>> Long-term consideration: Is it possible to implement interrupt stacks on
>> sparc64? Functions on sparc eat stack much more aggressively than on other
>> architectures (minimum stack size for a function is 192 bytes).
>
> I had a patch but at the time I wrote it (several years ago) I
> couldn't make it stable enough to put mainline, I may resurrect it.
>
> I just did a quick scan and I can't find the last copy I had, and
> things have changed enough that I'd probably work from scratch
> anyways.
>
> But the level of recursion possible by the current device layer is
> excessive and needs to be curtained irrespective of these generic
> wakeup and sparc64 interrupt stack issues.

I took another few traces (to track the whole stack content) and there is
another problem: nested interrupts. Does Sparc64 limit them somehow?

sys_call_table
timer_interrupt
irq_exit
do_softirq
__do_softirq
run_timer_softirq
_spin_unlock
sys_call_table
handler_irq
handler_fasteoi_irq
handle_irq_event
ide_intr
ide_dma_intr
task_end_request
ide_end_request
__ide_end_request
...

Mikulas

2008-06-20 17:26:33

by David Miller

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

From: Mikulas Patocka <[email protected]>
Date: Fri, 20 Jun 2008 11:47:12 -0400 (EDT)

> I took another few traces (to track the whole stack content) and there is
> another problem: nested interrupts. Does Sparc64 limit them somehow?

Two levels should be the deepest you will ever see, and this is
equivalent to what you get on other platforms.

That path occurs when softirq processing re-enabled HW interrupts when
returning from the top-level interrupt.

2008-06-20 20:47:22

by Mikulas Patocka

[permalink] [raw]
Subject: Re: stack overflow on Sparc64



On Fri, 20 Jun 2008, David Miller wrote:

> From: Mikulas Patocka <[email protected]>
> Date: Fri, 20 Jun 2008 11:47:12 -0400 (EDT)
>
>> I took another few traces (to track the whole stack content) and there is
>> another problem: nested interrupts. Does Sparc64 limit them somehow?
>
> Two levels should be the deepest you will ever see, and this is
> equivalent to what you get on other platforms.
>
> That path occurs when softirq processing re-enabled HW interrupts when
> returning from the top-level interrupt.

And what if network softirq happened here? How much stack does it consume?

The whole overflowed stack trace has 75 functions, I was able to get rid
of 9 by avoiding bio_endio recursion and 10 by turning simple functions
into inlines. --- so is it enough or not enough for possible networking
calls?

Maybe a good thing would be to add a check for stack size to __do_softirq
and handing the softirq to ksoftirqd if there's not enough space.

Mikulas

2008-06-20 20:48:26

by David Miller

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

From: Mikulas Patocka <[email protected]>
Date: Fri, 20 Jun 2008 16:34:23 -0400 (EDT)

> And what if network softirq happened here? How much stack does it consume?
>
> The whole overflowed stack trace has 75 functions, I was able to get rid
> of 9 by avoiding bio_endio recursion and 10 by turning simple functions
> into inlines. --- so is it enough or not enough for possible networking
> calls?

It should be OK, because the minimum stack of a (75 - 19) depth call
chain is under 11K and within safe limits I believe.

> Maybe a good thing would be to add a check for stack size to __do_softirq
> and handing the softirq to ksoftirqd if there's not enough space.

I'd rather it spit out a WARN_ON() message and a backtrace.

Otherwise it will be considered a feature and people won't fix
these deep call chains.

2008-06-20 21:14:56

by Mikulas Patocka

[permalink] [raw]
Subject: Re: stack overflow on Sparc64



On Fri, 20 Jun 2008, David Miller wrote:

> From: Mikulas Patocka <[email protected]>
> Date: Fri, 20 Jun 2008 11:47:12 -0400 (EDT)
>
>> I took another few traces (to track the whole stack content) and there is
>> another problem: nested interrupts. Does Sparc64 limit them somehow?
>
> Two levels should be the deepest you will ever see, and this is
> equivalent to what you get on other platforms.

Are you sure? What about this:
ide-io.c:ide_intr
if (drive->unmask)
local_irq_enable_in_hardirq();

or this:
kernel/irq/handle.c:handle_IRQ_event
if (!(action->flags & IRQF_DISABLED))
local_irq_enable_in_hardirq();


--- how is number of nested interrupts here supposed to be limited?

If these things are not limited, you get at most as many nested handlers
as there are hardware interrupts, which means crash.

Mikulas

> That path occurs when softirq processing re-enabled HW interrupts when
> returning from the top-level interrupt.

2008-06-20 21:20:45

by David Miller

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

From: Mikulas Patocka <[email protected]>
Date: Fri, 20 Jun 2008 17:14:41 -0400 (EDT)

> Are you sure? What about this:
> ide-io.c:ide_intr
> if (drive->unmask)
> local_irq_enable_in_hardirq();
>
> or this:
> kernel/irq/handle.c:handle_IRQ_event
> if (!(action->flags & IRQF_DISABLED))
> local_irq_enable_in_hardirq();
>
>
> --- how is number of nested interrupts here supposed to be limited?
>
> If these things are not limited, you get at most as many nested handlers
> as there are hardware interrupts, which means crash.

It means i386 and every other platform potentially has the same exact
problem.

What point wrt. sparc64 are you trying to make here? :-)

2008-06-20 21:25:40

by Mikulas Patocka

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

On Fri, 20 Jun 2008, David Miller wrote:

> From: Mikulas Patocka <[email protected]>
> Date: Fri, 20 Jun 2008 17:14:41 -0400 (EDT)
>
>> Are you sure? What about this:
>> ide-io.c:ide_intr
>> if (drive->unmask)
>> local_irq_enable_in_hardirq();
>>
>> or this:
>> kernel/irq/handle.c:handle_IRQ_event
>> if (!(action->flags & IRQF_DISABLED))
>> local_irq_enable_in_hardirq();
>>
>>
>> --- how is number of nested interrupts here supposed to be limited?
>>
>> If these things are not limited, you get at most as many nested handlers
>> as there are hardware interrupts, which means crash.
>
> It means i386 and every other platform potentially has the same exact
> problem.
>
> What point wrt. sparc64 are you trying to make here? :-)

The difference is that i386 takes minimum 4 bytes per stack frame and
sparc64 192 bytes per stack frame. So this problem will kill sparc64
sooner.

But yes, it is general problem and should be solved in arch-independent
code.

Mikulas

2008-06-20 21:26:49

by Mikulas Patocka

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

On Fri, 20 Jun 2008, David Miller wrote:

> From: Mikulas Patocka <[email protected]>
> Date: Fri, 20 Jun 2008 16:34:23 -0400 (EDT)
>
>> And what if network softirq happened here? How much stack does it consume?
>>
>> The whole overflowed stack trace has 75 functions, I was able to get rid
>> of 9 by avoiding bio_endio recursion and 10 by turning simple functions
>> into inlines. --- so is it enough or not enough for possible networking
>> calls?
>
> It should be OK, because the minimum stack of a (75 - 19) depth call
> chain is under 11K and within safe limits I believe.

I meant if some fancy networking options can eat those 19 frames that I
saved and crash again? I use the computer as a workstation, it doesn't
have high network load and it doesn't use any features except basic
TCP/IP.

>> Maybe a good thing would be to add a check for stack size to __do_softirq
>> and handing the softirq to ksoftirqd if there's not enough space.
>
> I'd rather it spit out a WARN_ON() message and a backtrace.
>
> Otherwise it will be considered a feature and people won't fix
> these deep call chains.

If you think that process context+network processing+hardirqs can fit into
75 nested functions... I really have no idea how much the networking
takes, given the amount of protocols and features and inability to test
them all in one lab, it looks very scary.

Mikulas

2008-06-20 21:41:47

by David Miller

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

From: Mikulas Patocka <[email protected]>
Date: Fri, 20 Jun 2008 17:26:35 -0400 (EDT)

> it looks very scary.

I agree, it looks scary to me too.

Next week I'll make one of my primary projects the implementation of
IRQ stacks for sparc64.

2008-06-20 21:44:36

by David Miller

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

From: Mikulas Patocka <[email protected]>
Date: Fri, 20 Jun 2008 17:25:26 -0400 (EDT)

> On Fri, 20 Jun 2008, David Miller wrote:
>
> > From: Mikulas Patocka <[email protected]>
> > Date: Fri, 20 Jun 2008 17:14:41 -0400 (EDT)
> >
> > It means i386 and every other platform potentially has the same exact
> > problem.
> >
> > What point wrt. sparc64 are you trying to make here? :-)
>
> The difference is that i386 takes minimum 4 bytes per stack frame and
> sparc64 192 bytes per stack frame. So this problem will kill sparc64
> sooner.
>
> But yes, it is general problem and should be solved in arch-independent
> code.

I agree on both counts. Although I'm curious what the average stack
frame sizes look like on x86_64 and i386, and also how this area
appears on powerpc.

One mitigating factor on sparc64 is that typically when there are lots
of devices with interrupts there are also lots of cpus, and we evenly
distribute the IRQ targetting amongst the available cpus on sparc64.

This is probably why, in practice, these problems tend to not surface
often.

In any event, with the work you've accomplished and my implementation
of IRQ stacks for sparc64 we should be able to get things in much
better shape.

2008-06-20 21:47:49

by David Miller

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

From: David Miller <[email protected]>
Date: Fri, 20 Jun 2008 14:44:24 -0700 (PDT)

> I agree on both counts. Although I'm curious what the average stack
> frame sizes look like on x86_64 and i386, and also how this area
> appears on powerpc.

I also one to mention in passing that another thing we can do to
help deep call stack sizes is to make call chains more tail-call
friendly when possible.

2008-06-20 22:22:49

by Mikulas Patocka

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

On Fri, 20 Jun 2008, David Miller wrote:

> From: David Miller <[email protected]>
> Date: Fri, 20 Jun 2008 14:44:24 -0700 (PDT)
>
>> I agree on both counts. Although I'm curious what the average stack
>> frame sizes look like on x86_64 and i386, and also how this area
>> appears on powerpc.
>
> I also one to mention in passing that another thing we can do to
> help deep call stack sizes is to make call chains more tail-call
> friendly when possible.

... and remove -fno-optimize-sibling-calls?:

Makefile:
ifdef CONFIG_FRAME_POINTER
KBUILD_CFLAGS += -fno-omit-frame-pointer -fno-optimize-sibling-calls
else
KBUILD_CFLAGS += -fomit-frame-pointer
endif

--- maybe it could be better to remove it, instead of some inlining that I
made. Or do you see a situation when for debugging purpose, user would
want -fno-optimize-sibling-calls?

Mikulas

2008-06-20 22:28:59

by David Miller

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

From: Mikulas Patocka <[email protected]>
Date: Fri, 20 Jun 2008 18:22:33 -0400 (EDT)

> On Fri, 20 Jun 2008, David Miller wrote:
>
> > From: David Miller <[email protected]>
> > Date: Fri, 20 Jun 2008 14:44:24 -0700 (PDT)
> >
> >> I agree on both counts. Although I'm curious what the average stack
> >> frame sizes look like on x86_64 and i386, and also how this area
> >> appears on powerpc.
> >
> > I also one to mention in passing that another thing we can do to
> > help deep call stack sizes is to make call chains more tail-call
> > friendly when possible.
>
> ... and remove -fno-optimize-sibling-calls?:
>
> Makefile:
> ifdef CONFIG_FRAME_POINTER
> KBUILD_CFLAGS += -fno-omit-frame-pointer -fno-optimize-sibling-calls
> else
> KBUILD_CFLAGS += -fomit-frame-pointer
> endif
>
> --- maybe it could be better to remove it, instead of some inlining that I
> made. Or do you see a situation when for debugging purpose, user would
> want -fno-optimize-sibling-calls?

Yes for debugging and other things it has to stay.

2008-06-20 22:33:20

by Mikulas Patocka

[permalink] [raw]
Subject: Re: stack overflow on Sparc64



On Fri, 20 Jun 2008, David Miller wrote:

> From: Mikulas Patocka <[email protected]>
> Date: Fri, 20 Jun 2008 17:25:26 -0400 (EDT)
>
>> On Fri, 20 Jun 2008, David Miller wrote:
>>
>>> From: Mikulas Patocka <[email protected]>
>>> Date: Fri, 20 Jun 2008 17:14:41 -0400 (EDT)
>>>
>>> It means i386 and every other platform potentially has the same exact
>>> problem.
>>>
>>> What point wrt. sparc64 are you trying to make here? :-)
>>
>> The difference is that i386 takes minimum 4 bytes per stack frame and
>> sparc64 192 bytes per stack frame. So this problem will kill sparc64
>> sooner.
>>
>> But yes, it is general problem and should be solved in arch-independent
>> code.
>
> I agree on both counts. Although I'm curious what the average stack
> frame sizes look like on x86_64 and i386, and also how this area
> appears on powerpc.

If I look at an old oops that I have in my log on i386: it's 1104 stack
bytes ~ 38 functions.

> One mitigating factor on sparc64 is that typically when there are lots
> of devices with interrupts there are also lots of cpus, and we evenly
> distribute the IRQ targetting amongst the available cpus on sparc64.
>
> This is probably why, in practice, these problems tend to not surface
> often.
>
> In any event, with the work you've accomplished and my implementation
> of IRQ stacks for sparc64 we should be able to get things in much
> better shape.

I created this to help with nested irqs:
--- linux-2.6.26-rc5-devel.orig/include/linux/interrupt.h 2008-06-20
23:34:04.000000000 +0200
+++ linux-2.6.26-rc5-devel/include/linux/interrupt.h 2008-06-20
23:36:03.000000000 +0200
@@ -95,7 +95,7 @@
#ifdef CONFIG_LOCKDEP
# define local_irq_enable_in_hardirq() do { } while (0)
#else
-# define local_irq_enable_in_hardirq() local_irq_enable()
+# define local_irq_enable_in_hardirq() do { if (hardirq_count() <= (1 <<
HARDIRQ_SHIFT)) local_irq_enable(); } while (0)
#endif

extern void disable_irq_nosync(unsigned int irq);

Mikulas

2008-06-20 22:36:28

by Mikulas Patocka

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

On Fri, 20 Jun 2008, David Miller wrote:

> From: Mikulas Patocka <[email protected]>
> Date: Fri, 20 Jun 2008 18:22:33 -0400 (EDT)
>
>> On Fri, 20 Jun 2008, David Miller wrote:
>>
>>> From: David Miller <[email protected]>
>>> Date: Fri, 20 Jun 2008 14:44:24 -0700 (PDT)
>>>
>>>> I agree on both counts. Although I'm curious what the average stack
>>>> frame sizes look like on x86_64 and i386, and also how this area
>>>> appears on powerpc.
>>>
>>> I also one to mention in passing that another thing we can do to
>>> help deep call stack sizes is to make call chains more tail-call
>>> friendly when possible.
>>
>> ... and remove -fno-optimize-sibling-calls?:
>>
>> Makefile:
>> ifdef CONFIG_FRAME_POINTER
>> KBUILD_CFLAGS += -fno-omit-frame-pointer -fno-optimize-sibling-calls
>> else
>> KBUILD_CFLAGS += -fomit-frame-pointer
>> endif
>>
>> --- maybe it could be better to remove it, instead of some inlining that I
>> made. Or do you see a situation when for debugging purpose, user would
>> want -fno-optimize-sibling-calls?
>
> Yes for debugging and other things it has to stay.

If you want it to stay, then it doesn't make sense to make functions
tail-call-friendly --- because it should not crash with or without
debugging.

Mikulas

2008-06-20 22:47:26

by David Miller

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

From: Mikulas Patocka <[email protected]>
Date: Fri, 20 Jun 2008 18:36:09 -0400 (EDT)

> On Fri, 20 Jun 2008, David Miller wrote:
>
> > Yes for debugging and other things it has to stay.
>
> If you want it to stay, then it doesn't make sense to make functions
> tail-call-friendly --- because it should not crash with or without
> debugging.

On the contrary, of course it makes sense to do so.

When debugging is disabled, the kernel will run faster.

We have to fix the stack usage in either case, but from a
performance standpoint when debugging is disabled the
tail-call friendly layout is still highly desirable.

2008-06-21 00:37:30

by Mikulas Patocka

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

On Fri, 20 Jun 2008, David Miller wrote:

> From: Mikulas Patocka <[email protected]>
> Date: Fri, 20 Jun 2008 18:36:09 -0400 (EDT)
>
>> On Fri, 20 Jun 2008, David Miller wrote:
>>
>>> Yes for debugging and other things it has to stay.
>>
>> If you want it to stay, then it doesn't make sense to make functions
>> tail-call-friendly --- because it should not crash with or without
>> debugging.
>
> On the contrary, of course it makes sense to do so.
>
> When debugging is disabled, the kernel will run faster.
>
> We have to fix the stack usage in either case, but from a
> performance standpoint when debugging is disabled the
> tail-call friendly layout is still highly desirable.

I agree, but performance is different problem than stack overflows.

I put all the patches for this overflow problem here:
http://people.redhat.com/mpatocka/patches/kernel-stack-overflow

Mikulas

2008-06-21 04:51:53

by David Miller

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

From: David Miller <[email protected]>
Date: Fri, 20 Jun 2008 14:41:28 -0700 (PDT)

> Next week I'll make one of my primary projects the implementation of
> IRQ stacks for sparc64.

Next week came real fast :-)

Amazingly this patch worked the first time I booted it up on my
dual-cpu workstation (using SMP + PREEMPT). I'm worried that maybe
gcc can do some clever things and not allow my extremely simple
approach to work, but anyways it might work for you and it's worth
giving a try.

sparc64: Implement support for IRQ stacks.

Signed-off-by: David S. Miller <[email protected]>

diff --git a/arch/sparc64/Kconfig.debug b/arch/sparc64/Kconfig.debug
index 6a4d28a..df80962 100644
--- a/arch/sparc64/Kconfig.debug
+++ b/arch/sparc64/Kconfig.debug
@@ -41,4 +41,11 @@ config FRAME_POINTER
depends on MCOUNT
default y

+config IRQSTACKS
+ bool "Use separate kernel stacks when processing interrupts"
+ help
+ If you say Y here the kernel will use separate kernel stacks
+ for handling hard and soft interrupts. This can help avoid
+ overflowing the process kernel stacks.
+
endmenu
diff --git a/arch/sparc64/kernel/irq.c b/arch/sparc64/kernel/irq.c
index b441a26..24820fa 100644
--- a/arch/sparc64/kernel/irq.c
+++ b/arch/sparc64/kernel/irq.c
@@ -674,10 +674,42 @@ void ack_bad_irq(unsigned int virt_irq)
ino, virt_irq);
}

+#ifdef CONFIG_IRQSTACKS
+void *hardirq_stack[NR_CPUS];
+void *softirq_stack[NR_CPUS];
+
+static void *set_hardirq_stack(void)
+{
+ void *orig_sp, *sp = hardirq_stack[smp_processor_id()];
+
+ __asm__ __volatile__("mov %%sp, %0" : "=r" (orig_sp));
+ if (orig_sp < sp ||
+ orig_sp > (sp + THREAD_SIZE)) {
+ sp += THREAD_SIZE - 192 - STACK_BIAS;
+ __asm__ __volatile__("mov %0, %%sp" : : "r" (sp));
+ }
+
+ return orig_sp;
+}
+static void restore_hardirq_stack(void *orig_sp)
+{
+ __asm__ __volatile__("mov %0, %%sp" : : "r" (orig_sp));
+}
+#else
+static void *set_hardirq_stack(void)
+{
+ return NULL;
+}
+static void restore_hardirq_stack(void *orig_sp)
+{
+}
+#endif
+
void handler_irq(int irq, struct pt_regs *regs)
{
unsigned long pstate, bucket_pa;
struct pt_regs *old_regs;
+ void *orig_sp;

clear_softint(1 << irq);

@@ -695,6 +727,8 @@ void handler_irq(int irq, struct pt_regs *regs)
"i" (PSTATE_IE)
: "memory");

+ orig_sp = set_hardirq_stack();
+
while (bucket_pa) {
struct irq_desc *desc;
unsigned long next_pa;
@@ -711,10 +745,40 @@ void handler_irq(int irq, struct pt_regs *regs)
bucket_pa = next_pa;
}

+ restore_hardirq_stack(orig_sp);
+
irq_exit();
set_irq_regs(old_regs);
}

+#ifdef CONFIG_IRQSTACKS
+void do_softirq(void)
+{
+ unsigned long flags;
+
+ if (in_interrupt())
+ return;
+
+ local_irq_save(flags);
+
+ if (local_softirq_pending()) {
+ void *orig_sp, *sp = softirq_stack[smp_processor_id()];
+
+ sp += THREAD_SIZE - 192 - STACK_BIAS;
+
+ __asm__ __volatile__("mov %%sp, %0\n\t"
+ "mov %1, %%sp"
+ : "=&r" (orig_sp)
+ : "r" (sp));
+ __do_softirq();
+ __asm__ __volatile__("mov %0, %%sp"
+ : : "r" (orig_sp));
+ }
+
+ local_irq_restore(flags);
+}
+#endif
+
#ifdef CONFIG_HOTPLUG_CPU
void fixup_irqs(void)
{
diff --git a/arch/sparc64/mm/init.c b/arch/sparc64/mm/init.c
index 84898c4..56e22f4 100644
--- a/arch/sparc64/mm/init.c
+++ b/arch/sparc64/mm/init.c
@@ -49,6 +49,7 @@
#include <asm/sstate.h>
#include <asm/mdesc.h>
#include <asm/cpudata.h>
+#include <asm/irq.h>

#define MAX_PHYS_ADDRESS (1UL << 42UL)
#define KPTE_BITMAP_CHUNK_SZ (256UL * 1024UL * 1024UL)
@@ -1817,6 +1818,18 @@ void __init paging_init(void)
if (tlb_type == hypervisor)
sun4v_mdesc_init();

+#ifdef CONFIG_IRQSTACKS
+ /* Once the OF device tree and MDESC have been setup, we know
+ * the list of possible cpus. Therefore we can allocate the
+ * IRQ stacks.
+ */
+ for_each_possible_cpu(i) {
+ /* XXX Use node local allocations... XXX */
+ softirq_stack[i] = __va(lmb_alloc(THREAD_SIZE, THREAD_SIZE));
+ hardirq_stack[i] = __va(lmb_alloc(THREAD_SIZE, THREAD_SIZE));
+ }
+#endif
+
/* Setup bootmem... */
last_valid_pfn = end_pfn = bootmem_init(phys_base);

diff --git a/include/asm-sparc64/irq.h b/include/asm-sparc64/irq.h
index 0bb9bf5..d71b5ff 100644
--- a/include/asm-sparc64/irq.h
+++ b/include/asm-sparc64/irq.h
@@ -90,4 +90,10 @@ static inline unsigned long get_softint(void)
return retval;
}

+#ifdef CONFIG_IRQSTACKS
+extern void *hardirq_stack[NR_CPUS];
+extern void *softirq_stack[NR_CPUS];
+#define __ARCH_HAS_DO_SOFTIRQ
+#endif
+
#endif

2008-06-21 19:43:18

by Mikulas Patocka

[permalink] [raw]
Subject: Re: stack overflow on Sparc64



On Fri, 20 Jun 2008, David Miller wrote:

> From: David Miller <[email protected]>
> Date: Fri, 20 Jun 2008 14:41:28 -0700 (PDT)
>
>> Next week I'll make one of my primary projects the implementation of
>> IRQ stacks for sparc64.
>
> Next week came real fast :-)
>
> Amazingly this patch worked the first time I booted it up on my
> dual-cpu workstation (using SMP + PREEMPT). I'm worried that maybe
> gcc can do some clever things and not allow my extremely simple
> approach to work, but anyways it might work for you and it's worth
> giving a try.
>
> sparc64: Implement support for IRQ stacks.

For me it doesn't work. Locked up after "console: colour dummy device
80x25".

Mikulas

> Signed-off-by: David S. Miller <[email protected]>
>
> diff --git a/arch/sparc64/Kconfig.debug b/arch/sparc64/Kconfig.debug
> index 6a4d28a..df80962 100644
> --- a/arch/sparc64/Kconfig.debug
> +++ b/arch/sparc64/Kconfig.debug
> @@ -41,4 +41,11 @@ config FRAME_POINTER
> depends on MCOUNT
> default y
>
> +config IRQSTACKS
> + bool "Use separate kernel stacks when processing interrupts"
> + help
> + If you say Y here the kernel will use separate kernel stacks
> + for handling hard and soft interrupts. This can help avoid
> + overflowing the process kernel stacks.
> +
> endmenu
> diff --git a/arch/sparc64/kernel/irq.c b/arch/sparc64/kernel/irq.c
> index b441a26..24820fa 100644
> --- a/arch/sparc64/kernel/irq.c
> +++ b/arch/sparc64/kernel/irq.c
> @@ -674,10 +674,42 @@ void ack_bad_irq(unsigned int virt_irq)
> ino, virt_irq);
> }
>
> +#ifdef CONFIG_IRQSTACKS
> +void *hardirq_stack[NR_CPUS];
> +void *softirq_stack[NR_CPUS];
> +
> +static void *set_hardirq_stack(void)
> +{
> + void *orig_sp, *sp = hardirq_stack[smp_processor_id()];
> +
> + __asm__ __volatile__("mov %%sp, %0" : "=r" (orig_sp));
> + if (orig_sp < sp ||
> + orig_sp > (sp + THREAD_SIZE)) {
> + sp += THREAD_SIZE - 192 - STACK_BIAS;
> + __asm__ __volatile__("mov %0, %%sp" : : "r" (sp));
> + }
> +
> + return orig_sp;
> +}
> +static void restore_hardirq_stack(void *orig_sp)
> +{
> + __asm__ __volatile__("mov %0, %%sp" : : "r" (orig_sp));
> +}
> +#else
> +static void *set_hardirq_stack(void)
> +{
> + return NULL;
> +}
> +static void restore_hardirq_stack(void *orig_sp)
> +{
> +}
> +#endif
> +
> void handler_irq(int irq, struct pt_regs *regs)
> {
> unsigned long pstate, bucket_pa;
> struct pt_regs *old_regs;
> + void *orig_sp;
>
> clear_softint(1 << irq);
>
> @@ -695,6 +727,8 @@ void handler_irq(int irq, struct pt_regs *regs)
> "i" (PSTATE_IE)
> : "memory");
>
> + orig_sp = set_hardirq_stack();
> +
> while (bucket_pa) {
> struct irq_desc *desc;
> unsigned long next_pa;
> @@ -711,10 +745,40 @@ void handler_irq(int irq, struct pt_regs *regs)
> bucket_pa = next_pa;
> }
>
> + restore_hardirq_stack(orig_sp);
> +
> irq_exit();
> set_irq_regs(old_regs);
> }
>
> +#ifdef CONFIG_IRQSTACKS
> +void do_softirq(void)
> +{
> + unsigned long flags;
> +
> + if (in_interrupt())
> + return;
> +
> + local_irq_save(flags);
> +
> + if (local_softirq_pending()) {
> + void *orig_sp, *sp = softirq_stack[smp_processor_id()];
> +
> + sp += THREAD_SIZE - 192 - STACK_BIAS;
> +
> + __asm__ __volatile__("mov %%sp, %0\n\t"
> + "mov %1, %%sp"
> + : "=&r" (orig_sp)
> + : "r" (sp));
> + __do_softirq();
> + __asm__ __volatile__("mov %0, %%sp"
> + : : "r" (orig_sp));
> + }
> +
> + local_irq_restore(flags);
> +}
> +#endif
> +
> #ifdef CONFIG_HOTPLUG_CPU
> void fixup_irqs(void)
> {
> diff --git a/arch/sparc64/mm/init.c b/arch/sparc64/mm/init.c
> index 84898c4..56e22f4 100644
> --- a/arch/sparc64/mm/init.c
> +++ b/arch/sparc64/mm/init.c
> @@ -49,6 +49,7 @@
> #include <asm/sstate.h>
> #include <asm/mdesc.h>
> #include <asm/cpudata.h>
> +#include <asm/irq.h>
>
> #define MAX_PHYS_ADDRESS (1UL << 42UL)
> #define KPTE_BITMAP_CHUNK_SZ (256UL * 1024UL * 1024UL)
> @@ -1817,6 +1818,18 @@ void __init paging_init(void)
> if (tlb_type == hypervisor)
> sun4v_mdesc_init();
>
> +#ifdef CONFIG_IRQSTACKS
> + /* Once the OF device tree and MDESC have been setup, we know
> + * the list of possible cpus. Therefore we can allocate the
> + * IRQ stacks.
> + */
> + for_each_possible_cpu(i) {
> + /* XXX Use node local allocations... XXX */
> + softirq_stack[i] = __va(lmb_alloc(THREAD_SIZE, THREAD_SIZE));
> + hardirq_stack[i] = __va(lmb_alloc(THREAD_SIZE, THREAD_SIZE));
> + }
> +#endif
> +
> /* Setup bootmem... */
> last_valid_pfn = end_pfn = bootmem_init(phys_base);
>
> diff --git a/include/asm-sparc64/irq.h b/include/asm-sparc64/irq.h
> index 0bb9bf5..d71b5ff 100644
> --- a/include/asm-sparc64/irq.h
> +++ b/include/asm-sparc64/irq.h
> @@ -90,4 +90,10 @@ static inline unsigned long get_softint(void)
> return retval;
> }
>
> +#ifdef CONFIG_IRQSTACKS
> +extern void *hardirq_stack[NR_CPUS];
> +extern void *softirq_stack[NR_CPUS];
> +#define __ARCH_HAS_DO_SOFTIRQ
> +#endif
> +
> #endif
>

2008-06-22 07:03:22

by David Miller

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

From: Mikulas Patocka <[email protected]>
Date: Sat, 21 Jun 2008 15:42:56 -0400 (EDT)

> For me it doesn't work. Locked up after "console: colour dummy device
> 80x25".

Machine type, compiler, and config please so I can debug this :-)

2008-06-22 13:49:16

by Mikulas Patocka

[permalink] [raw]
Subject: Re: stack overflow on Sparc64



On Sun, 22 Jun 2008, David Miller wrote:

> From: Mikulas Patocka <[email protected]>
> Date: Sat, 21 Jun 2008 15:42:56 -0400 (EDT)
>
>> For me it doesn't work. Locked up after "console: colour dummy device
>> 80x25".
>
> Machine type, compiler, and config please so I can debug this :-)

Ultra 5, 360MHz, 256MB, compiler gcc version 4.1.2 20061115 (prerelease)
(Debian 4.1.1-21). .config is attached.

Mikulas


Attachments:
.config (27.40 kB)

2008-08-12 06:30:26

by David Miller

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

From: Mikulas Patocka <[email protected]>
Date: Sat, 21 Jun 2008 15:42:56 -0400 (EDT)

> On Fri, 20 Jun 2008, David Miller wrote:
>
> > giving a try.
> >
> > sparc64: Implement support for IRQ stacks.
>
> For me it doesn't work. Locked up after "console: colour dummy device
> 80x25".

Are you sure you didn't see a "Stack overflow" message on the
screen? :-)

That's what I get when I try to boot with your provided
kernel config.

The problem is that CONFIG_STACK_DEBUG doesn't understand
the IRQ stacks at all.

I'll see if I can tweak it to handle this.

2008-08-12 08:22:14

by David Miller

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

From: David Miller <[email protected]>
Date: Mon, 11 Aug 2008 23:30:13 -0700 (PDT)

> The problem is that CONFIG_STACK_DEBUG doesn't understand
> the IRQ stacks at all.
>
> I'll see if I can tweak it to handle this.

This patch, on top of my original IRQSTACKS patch for
sparc64, seems to get things working for me.

diff --git a/arch/sparc64/lib/mcount.S b/arch/sparc64/lib/mcount.S
index 9e4534b..24e6a3c 100644
--- a/arch/sparc64/lib/mcount.S
+++ b/arch/sparc64/lib/mcount.S
@@ -45,11 +45,47 @@ _mcount:
sub %g3, STACK_BIAS, %g3
cmp %sp, %g3
bg,pt %xcc, 1f
- sethi %hi(panicstring), %g3
+ nop
+#ifdef CONFIG_IRQSTACKS
+ lduh [%g6 + TI_CPU], %g1
+ sethi %hi(hardirq_stack), %g3
+ or %g3, %lo(hardirq_stack), %g3
+ sllx %g1, 3, %g1
+ ldx [%g3 + %g1], %g7
+ sub %g7, STACK_BIAS, %g7
+ cmp %sp, %g7
+ bleu,pt %xcc, 2f
+ sethi %hi(THREAD_SIZE), %g3
+ add %g7, %g3, %g7
+ cmp %sp, %g7
+ blu,pn %xcc, 1f
+2: sethi %hi(softirq_stack), %g3
+ or %g3, %lo(softirq_stack), %g3
+ ldx [%g3 + %g1], %g7
+ cmp %sp, %g7
+ bleu,pt %xcc, 2f
+ sethi %hi(THREAD_SIZE), %g3
+ add %g7, %g3, %g7
+ cmp %sp, %g7
+ blu,pn %xcc, 1f
+ nop
+#endif
+ /* If we are already on panic stack, don't hop onto it
+ * again, we are already trying to output the stack overflow
+ * message.
+ */
sethi %hi(ovstack), %g7 ! cant move to panic stack fast enough
or %g7, %lo(ovstack), %g7
- add %g7, OVSTACKSIZE, %g7
+ add %g7, OVSTACKSIZE, %g3
+ sub %g3, STACK_BIAS - 192, %g3
sub %g7, STACK_BIAS, %g7
+ cmp %sp, %g7
+ blu,pn %xcc, 2f
+ cmp %sp, %g3
+ bleu,pn %xcc, 1f
+ nop
+2: mov %g3, %sp
+ sethi %hi(panicstring), %g3
mov %g7, %sp
call prom_printf
or %g3, %lo(panicstring), %o0

2008-08-13 00:53:19

by Mikulas Patocka

[permalink] [raw]
Subject: Re: stack overflow on Sparc64



On Tue, 12 Aug 2008, David Miller wrote:

> From: David Miller <[email protected]>
> Date: Mon, 11 Aug 2008 23:30:13 -0700 (PDT)
>
> > The problem is that CONFIG_STACK_DEBUG doesn't understand
> > the IRQ stacks at all.
> >
> > I'll see if I can tweak it to handle this.
>
> This patch, on top of my original IRQSTACKS patch for
> sparc64, seems to get things working for me.

Thanks, the patch works.

Mikulas

> diff --git a/arch/sparc64/lib/mcount.S b/arch/sparc64/lib/mcount.S
> index 9e4534b..24e6a3c 100644
> --- a/arch/sparc64/lib/mcount.S
> +++ b/arch/sparc64/lib/mcount.S
> @@ -45,11 +45,47 @@ _mcount:
> sub %g3, STACK_BIAS, %g3
> cmp %sp, %g3
> bg,pt %xcc, 1f
> - sethi %hi(panicstring), %g3
> + nop
> +#ifdef CONFIG_IRQSTACKS
> + lduh [%g6 + TI_CPU], %g1
> + sethi %hi(hardirq_stack), %g3
> + or %g3, %lo(hardirq_stack), %g3
> + sllx %g1, 3, %g1
> + ldx [%g3 + %g1], %g7
> + sub %g7, STACK_BIAS, %g7
> + cmp %sp, %g7
> + bleu,pt %xcc, 2f
> + sethi %hi(THREAD_SIZE), %g3
> + add %g7, %g3, %g7
> + cmp %sp, %g7
> + blu,pn %xcc, 1f
> +2: sethi %hi(softirq_stack), %g3
> + or %g3, %lo(softirq_stack), %g3
> + ldx [%g3 + %g1], %g7
> + cmp %sp, %g7
> + bleu,pt %xcc, 2f
> + sethi %hi(THREAD_SIZE), %g3
> + add %g7, %g3, %g7
> + cmp %sp, %g7
> + blu,pn %xcc, 1f
> + nop
> +#endif
> + /* If we are already on panic stack, don't hop onto it
> + * again, we are already trying to output the stack overflow
> + * message.
> + */
> sethi %hi(ovstack), %g7 ! cant move to panic stack fast enough
> or %g7, %lo(ovstack), %g7
> - add %g7, OVSTACKSIZE, %g7
> + add %g7, OVSTACKSIZE, %g3
> + sub %g3, STACK_BIAS - 192, %g3
> sub %g7, STACK_BIAS, %g7
> + cmp %sp, %g7
> + blu,pn %xcc, 2f
> + cmp %sp, %g3
> + bleu,pn %xcc, 1f
> + nop
> +2: mov %g3, %sp
> + sethi %hi(panicstring), %g3
> mov %g7, %sp
> call prom_printf
> or %g3, %lo(panicstring), %o0
>

2008-08-13 00:59:57

by David Miller

[permalink] [raw]
Subject: Re: stack overflow on Sparc64

From: Mikulas Patocka <[email protected]>
Date: Tue, 12 Aug 2008 20:53:04 -0400 (EDT)

> On Tue, 12 Aug 2008, David Miller wrote:
>
> > From: David Miller <[email protected]>
> > Date: Mon, 11 Aug 2008 23:30:13 -0700 (PDT)
> >
> > > The problem is that CONFIG_STACK_DEBUG doesn't understand
> > > the IRQ stacks at all.
> > >
> > > I'll see if I can tweak it to handle this.
> >
> > This patch, on top of my original IRQSTACKS patch for
> > sparc64, seems to get things working for me.
>
> Thanks, the patch works.

Thanks for testing. I'm engaged in a few activities right
now related to this:

1) I'm making a final version of this irqstacks patch which
I'll get into Linus's tree and then submit to -stable.

2) I have a patch which I'm regression testing for gcc which
gets rid of the 16-byte secondary-reload stack slot. This
gcc is emitting 176 byte default stack frames on sparc64.

3) I'm talking with some folks familiar with the V9 ABI about
the mandatory incoming argument stack slots. I think they
can be eliminated in all cases except functions taking
varargs. If it cannot be done universally for some
reason, I'll add a -mkernel option that will eliminate
them. This will emit 128 byte default stack frames when
possible.

So in the end we should hopefully have 128 byte stack frames
coming out of gcc and IRQ stacks in the kernel.

2008-08-13 01:12:20

by Mikulas Patocka

[permalink] [raw]
Subject: console handover badness [was: stack overflow on Sparc64]



On Mon, 11 Aug 2008, David Miller wrote:

> From: Mikulas Patocka <[email protected]>
> Date: Sat, 21 Jun 2008 15:42:56 -0400 (EDT)
>
> > On Fri, 20 Jun 2008, David Miller wrote:
> >
> > > giving a try.
> > >
> > > sparc64: Implement support for IRQ stacks.
> >
> > For me it doesn't work. Locked up after "console: colour dummy device
> > 80x25".
>
> Are you sure you didn't see a "Stack overflow" message on the
> screen? :-)
>
> That's what I get when I try to boot with your provided
> kernel config.

I think no, it just locked-up solid. There is a problem with console
handover. See this dmesg that I get on boot.

Notice the lines:
(1) console handover: boot [earlyprom0] -> real [tty0]
and
(2) Console: switching to colour frame buffer device 128x48

At line (1), the kernel disables the PROM console. At line (2) it enables
framebuffer. Between these lines, the kernel runs with no console at all.
Everything that is printk'ed between these lines doesn't go to the screen.

If the kernel hits oops at some point between (1) and (2), you don't see
anything, it just appears as a lockup.

I hit already three crashes that happened between these lines and didn't
generate any output: this one with interrupt stacks that you have just
fixed, CONFIG_LOCKDEP+CONFIG_DEBUG_PAGEALLOC crash that I will send you
patch for, and then boot failure of 2.6.27-rc[12] because of bad memory
migratetype. Is this migratetype crash a known problem? --- the problem is
that starting with 2.6.27rc1, I'm getting crash with this backtrace:
__list_add
__free_pages_ok
__free_pages
__free_pages_bootmem
__free_all_bootmem
mem_init
start_kernel_tlb_fixup_code
--- the crash is due to migratetype == 5 in __free_one_page (inlined into
__free_pages_ok) and because there are only 5 migratettypes, it attempts
to add to a non-existent list.

The trace can be obtained if I disable console handover in kernel/printk.
But it should really be somehow rewritten so that the kernel can write
crashes during boot on console without extra patching --- the PROM console
is disabled just before the framebuffer is registered and not too early.

Mikulas


PROMLIB: Sun IEEE Boot Prom 'OBP 3.31.0 2001/07/25 20:36'
PROMLIB: Root node compatible:
Linux version 2.6.26-devel (root@slunicko) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #24 Wed Aug 13 01:25:13 CEST 2008
console [earlyprom0] enabled
ARCH: SUN4U
Ethernet address: 08:00:20:f5:03:81
Kernel: Using 3 locked TLB entries for main kernel image.
Remapping the kernel... done.
OF stdout device is: /pci@1f,0/pci@1,1/SUNW,m64B@2:r1024x768x75
PROM: Built device tree with 44212 bytes of memory.
Top of RAM: 0x27f42000, Total RAM: 0x1ff40000
Memory hole size: 128MB
Entering add_active_range(0, 0, 16384) 0 entries of 256 used
Entering add_active_range(0, 32768, 81791) 1 entries of 256 used
Entering add_active_range(0, 81792, 81825) 2 entries of 256 used
[0000000200000000-fffff80000400000] page_structs=131072 node=0 entry=0/0
[0000000200000000-fffff80000800000] page_structs=131072 node=0 entry=1/0
Allocated 532480 bytes for kernel page tables.
Zone PFN ranges:
Normal 0 -> 81825
Movable zone start PFN for each node
early_node_map[3] active PFN ranges
0: 0 -> 16384
0: 32768 -> 81791
0: 81792 -> 81825
On node 0 totalpages: 65440
Normal zone: 560 pages used for memmap
Normal zone: 0 pages reserved
Normal zone: 64880 pages, LIFO batch:15
Movable zone: 0 pages used for memmap
Booting Linux...
Built 1 zonelists in Zone order, mobility grouping on. Total pages: 64880
Kernel command line: root=/dev/hda1 ro
PID hash table entries: 2048 (order: 11, 16384 bytes)
clocksource: mult[2c71c] shift[16]
clockevent: mult[5c28f5c2] shift[32]
Console: colour dummy device 80x25
console handover: boot [earlyprom0] -> real [tty0]
Lock dependency validator: Copyright (c) 2006 Red Hat, Inc., Ingo Molnar
... MAX_LOCKDEP_SUBCLASSES: 8
... MAX_LOCK_DEPTH: 48
... MAX_LOCKDEP_KEYS: 2048
... CLASSHASH_SIZE: 1024
... MAX_LOCKDEP_ENTRIES: 8192
... MAX_LOCKDEP_CHAINS: 16384
... CHAINHASH_SIZE: 8192
memory used by lock dependency info: 1648 kB
per task-struct memory footprint: 2688 bytes
Dentry cache hash table entries: 65536 (order: 6, 524288 bytes)
Inode-cache hash table entries: 32768 (order: 5, 262144 bytes)
Memory: 504888k available (1960k kernel code, 1032k data, 120k init) [fffff80000000000,0000000027f42000]
SLUB: Genslabs=13, HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
Calibrating delay using timer specific routine.. 728.70 BogoMIPS (lpj=1213930)
Mount-cache hash table entries: 512
khelper used greatest stack depth: 11152 bytes left
net_namespace: 456 bytes
NET: Registered protocol family 16
khelper used greatest stack depth: 10544 bytes left
PCI: Probing for controllers.
/pci@1f,0: SABRE PCI Bus Module
/pci@1f,0: PCI IO[1fe02000000] MEM[1ff00000000]
PCI: Scanning PBM /pci@1f,0
khelper used greatest stack depth: 9712 bytes left
ebus0: [auxio] [power] [SUNW,pll] [se] [su] [su] [ecpp] [fdthree] [eeprom] [flashprom] [SUNW,CS4231]
power: Control reg at 1fff1724000
AUXIO: Found device at /pci@1f,0/pci@1,1/ebus@1/auxio@14,726000
/pci@1f,0/pci@1,1/ebus@1/eeprom@14,0: Clock regs at 000001fff1000000
Switched to NOHz mode on CPU #0
NET: Registered protocol family 2
IP route cache hash table entries: 4096 (order: 2, 32768 bytes)
TCP established hash table entries: 16384 (order: 5, 262144 bytes)
TCP bind hash table entries: 16384 (order: 6, 917504 bytes)
TCP: Hash tables configured (established 16384 bind 16384)
TCP reno registered
Mini RTC Driver
khelper used greatest stack depth: 9696 bytes left
msgmni has been set to 987
io scheduler noop registered
io scheduler cfq registered (default)
atyfb: 3D RAGE PRO (Mach64 GP, PQFP, PCI) [0x4750 rev 0x7c]
atyfb: 4M SGRAM (1:1), 14.31818 MHz XTAL, 230 MHz PLL, 100 Mhz MCLK, 100 MHz XCLK
Console: switching to colour frame buffer device 128x48
atyfb: fb0: ATY Mach64 frame buffer device on PCI
khelper used greatest stack depth: 9520 bytes left
khelper used greatest stack depth: 9504 bytes left
/pci@1f,0/pci@1,1/ebus@1/su@14,3083f8: Keyboard port at 1fff13083f8, irq 6
/pci@1f,0/pci@1,1/ebus@1/su@14,3062f8: Mouse port at 1fff13062f8, irq 7
Uniform Multi-Platform E-IDE driver
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
CMD646: IDE controller (0x1095:0x0646 rev 0x03) at PCI slot 0000:01:03.0
CMD646: MultiWord DMA force limited
CMD646: 100% native mode on irq 14
ide0: BM-DMA at 0x1fe02c00020-0x1fe02c00027
ide1: BM-DMA at 0x1fe02c00028-0x1fe02c0002f
Probing IDE interface ide0...
hda: ST38410A, ATA DISK drive
hda: host max PIO5 wanted PIO255(auto-tune) selected PIO4
hda: MWDMA2 mode selected
Probing IDE interface ide1...
ide0 at 0x1fe02c00000-0x1fe02c00007,0x1fe02c0000a on irq 14
ide1 at 0x1fe02c00010-0x1fe02c00017,0x1fe02c0001a on irq 14 (shared with ide0)
hda: max request size: 128KiB
hda: 16841664 sectors (8622 MB) w/512KiB Cache, CHS=16708/16/63
hda: cache flushes not supported
hda: hda1 hda3
mice: PS/2 mouse device common for all mice
TCP cubic registered
input: Sun Type 5 keyboard as /devices/root/f005f9c0/f00601b4/f0061504/f0063594/serio0/input/input0
VFS: Mounted root (ext2 filesystem) readonly.
khelper used greatest stack depth: 6592 bytes left
NET: Registered protocol family 1
modprobe used greatest stack depth: 256 bytes left
tail used greatest stack depth: 32 bytes left
gunzip used greatest stack depth: 0 bytes left
input: Sun Mouse as /devices/root/f005f9c0/f00601b4/f0061504/f0064df4/serio1/input/input1
Adding 524272k swap on /swap. Priority:-1 extents:36 across:529920k
PCI: Enabling device: (0000:01:01.1), cmd 2
sunhme.c:v3.00 June 23, 2006 David S. Miller ([email protected])
eth0: HAPPY MEAL (PCI/CheerIO) 10/100BaseT Ethernet 08:00:20:f5:03:81
eth0: Link is up using internal transceiver at 100Mb/s, Full Duplex.

2008-08-13 01:22:43

by David Miller

[permalink] [raw]
Subject: Re: console handover badness

From: Mikulas Patocka <[email protected]>
Date: Tue, 12 Aug 2008 21:11:53 -0400 (EDT)

>
>
> On Mon, 11 Aug 2008, David Miller wrote:
>
> > From: Mikulas Patocka <[email protected]>
> > Date: Sat, 21 Jun 2008 15:42:56 -0400 (EDT)
> >
> > > On Fri, 20 Jun 2008, David Miller wrote:
> > >
> > > > giving a try.
> > > >
> > > > sparc64: Implement support for IRQ stacks.
> > >
> > > For me it doesn't work. Locked up after "console: colour dummy device
> > > 80x25".
> >
> > Are you sure you didn't see a "Stack overflow" message on the
> > screen? :-)
> >
> > That's what I get when I try to boot with your provided
> > kernel config.
>
> I think no, it just locked-up solid. There is a problem with console
> handover. See this dmesg that I get on boot.
>
> Notice the lines:
> (1) console handover: boot [earlyprom0] -> real [tty0]
> and
> (2) Console: switching to colour frame buffer device 128x48
>
> At line (1), the kernel disables the PROM console. At line (2) it enables
> framebuffer. Between these lines, the kernel runs with no console at all.
> Everything that is printk'ed between these lines doesn't go to the screen.

Yes, I know, this is such an incredible pain and it bothers
me a lot as it makes diagnosing bugs that trigger in between
these two points very difficult to diagnose.

The VT layer should not register it's console until an upper level
provider (such as an fbdev driver or the plain VGA console) really has
their driver attached.

> I hit already three crashes that happened between these lines and didn't
> generate any output: this one with interrupt stacks that you have just
> fixed,

The interrupt stacks one would show up on the console, because it
uses prom_printf() to use the firmware console directly.

Actually, I bet it got printed, but you didn't see it, because
the framebuffer driver changed the console palette, resulting in
the pixels the PROM console writes being black on the black
background :-/

> CONFIG_LOCKDEP+CONFIG_DEBUG_PAGEALLOC crash that I will send you
> patch for, and then boot failure of 2.6.27-rc[12] because of bad memory
> migratetype. Is this migratetype crash a known problem? --- the problem is
> that starting with 2.6.27rc1, I'm getting crash with this backtrace:
> __list_add
> __free_pages_ok
> __free_pages
> __free_pages_bootmem
> __free_all_bootmem
> mem_init
> start_kernel_tlb_fixup_code
> --- the crash is due to migratetype == 5 in __free_one_page (inlined into
> __free_pages_ok) and because there are only 5 migratettypes, it attempts
> to add to a non-existent list.

We have another report of this, thanks for grabbing the extra
information.

> The trace can be obtained if I disable console handover in kernel/printk.
> But it should really be somehow rewritten so that the kernel can write
> crashes during boot on console without extra patching --- the PROM console
> is disabled just before the framebuffer is registered and not too early.

Another way to capture this is to remove the CON_BOOT thing from
the prom console struct in arch/sparc64/kernel/setup.c

I am probably going to make the old "-p" boot command line option do
this dynamically.

2008-08-13 01:41:00

by David Miller

[permalink] [raw]
Subject: Re: console handover badness

From: Mikulas Patocka <[email protected]>
Date: Tue, 12 Aug 2008 21:11:53 -0400 (EDT)

> and then boot failure of 2.6.27-rc[12] because of bad memory
> migratetype. Is this migratetype crash a known problem? --- the problem is
> that starting with 2.6.27rc1, I'm getting crash with this backtrace:
> __list_add
> __free_pages_ok
> __free_pages
> __free_pages_bootmem
> __free_all_bootmem
> mem_init
> start_kernel_tlb_fixup_code
> --- the crash is due to migratetype == 5 in __free_one_page (inlined into
> __free_pages_ok) and because there are only 5 migratettypes, it attempts
> to add to a non-existent list.

Mikulas can you send me the .config you're using in 2.6.27 to trigger
this?

Thanks.

2008-08-13 08:51:20

by David Miller

[permalink] [raw]
Subject: Re: console handover badness

From: David Miller <[email protected]>
Date: Tue, 12 Aug 2008 18:40:52 -0700 (PDT)

> From: Mikulas Patocka <[email protected]>
> Date: Tue, 12 Aug 2008 21:11:53 -0400 (EDT)
>
> > and then boot failure of 2.6.27-rc[12] because of bad memory
> > migratetype. Is this migratetype crash a known problem? --- the problem is
> > that starting with 2.6.27rc1, I'm getting crash with this backtrace:
> > __list_add
> > __free_pages_ok
> > __free_pages
> > __free_pages_bootmem
> > __free_all_bootmem
> > mem_init
> > start_kernel_tlb_fixup_code
> > --- the crash is due to migratetype == 5 in __free_one_page (inlined into
> > __free_pages_ok) and because there are only 5 migratettypes, it attempts
> > to add to a non-existent list.
>
> Mikulas can you send me the .config you're using in 2.6.27 to trigger
> this?

Meanwhile I tried to figure out how this can go wrong like this.

The way this stuff works this early is very simple.

The pageblock bitmaps get allocated by sparse_init() as it iterates over
each mem section, via sparse_early_usemap_alloc(). These use the
various bootmem allocators, which will zero initialize the bitmap.

I added some debugging to sparse_early_usemap_alloc() to make sure
the size was correct and that the pointer looked sane.

What happens next is that memmap_init_zone() walks over each zone's
page and initializes their pageblock migrate type to MIGRATE_MOVABLE
which is "2".

So given the simplicity of that stuff, I can only imagine that something
is writing all over the bitmaps, clobbering them somehow.

I'll try to reproduce this here so I can try to narrow down the cause
a bit more, but so far my attempts have not been successful.

2008-08-13 12:46:51

by Mikulas Patocka

[permalink] [raw]
Subject: Re: console handover badness



On Tue, 12 Aug 2008, David Miller wrote:

> From: Mikulas Patocka <[email protected]>
> Date: Tue, 12 Aug 2008 21:11:53 -0400 (EDT)
>
> > and then boot failure of 2.6.27-rc[12] because of bad memory
> > migratetype. Is this migratetype crash a known problem? --- the problem is
> > that starting with 2.6.27rc1, I'm getting crash with this backtrace:
> > __list_add
> > __free_pages_ok
> > __free_pages
> > __free_pages_bootmem
> > __free_all_bootmem
> > mem_init
> > start_kernel_tlb_fixup_code
> > --- the crash is due to migratetype == 5 in __free_one_page (inlined into
> > __free_pages_ok) and because there are only 5 migratettypes, it attempts
> > to add to a non-existent list.
>
> Mikulas can you send me the .config you're using in 2.6.27 to trigger
> this?
>
> Thanks.

Here it is. The computer is Ultra 5 with 512MB RAM.

Mikulas

#
# Automatically generated make config: don't edit
# Linux kernel version: 2.6.27-rc2
# Thu Aug 7 14:30:44 2008
#
CONFIG_SPARC=y
CONFIG_SPARC64=y
CONFIG_GENERIC_TIME=y
CONFIG_GENERIC_CMOS_UPDATE=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_64BIT=y
CONFIG_MMU=y
CONFIG_IOMMU_HELPER=y
CONFIG_QUICKLIST=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
# CONFIG_ARCH_HAS_ILOG2_U32 is not set
# CONFIG_ARCH_HAS_ILOG2_U64 is not set
CONFIG_AUDIT_ARCH=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_ARCH_NO_VIRT_TO_BUS=y
CONFIG_OF=y
CONFIG_GENERIC_HARDIRQS_NO__DO_IRQ=y
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"

#
# General setup
#
CONFIG_EXPERIMENTAL=y
CONFIG_BROKEN_ON_SMP=y
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_LOCALVERSION="-devel"
# CONFIG_LOCALVERSION_AUTO is not set
CONFIG_SWAP=y
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
# CONFIG_BSD_PROCESS_ACCT is not set
# CONFIG_TASKSTATS is not set
# CONFIG_AUDIT is not set
CONFIG_IKCONFIG=y
CONFIG_IKCONFIG_PROC=y
CONFIG_LOG_BUF_SHIFT=18
# CONFIG_CGROUPS is not set
# CONFIG_GROUP_SCHED is not set
# CONFIG_SYSFS_DEPRECATED_V2 is not set
# CONFIG_RELAY is not set
CONFIG_NAMESPACES=y
# CONFIG_UTS_NS is not set
# CONFIG_IPC_NS is not set
# CONFIG_USER_NS is not set
# CONFIG_PID_NS is not set
# CONFIG_BLK_DEV_INITRD is not set
CONFIG_CC_OPTIMIZE_FOR_SIZE=y
CONFIG_SYSCTL=y
# CONFIG_EMBEDDED is not set
CONFIG_UID16=y
CONFIG_SYSCTL_SYSCALL=y
CONFIG_SYSCTL_SYSCALL_CHECK=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
# CONFIG_KALLSYMS_EXTRA_PASS is not set
CONFIG_HOTPLUG=y
CONFIG_PRINTK=y
CONFIG_BUG=y
CONFIG_ELF_CORE=y
CONFIG_COMPAT_BRK=y
CONFIG_BASE_FULL=y
CONFIG_FUTEX=y
CONFIG_ANON_INODES=y
CONFIG_EPOLL=y
CONFIG_SIGNALFD=y
CONFIG_TIMERFD=y
CONFIG_EVENTFD=y
CONFIG_SHMEM=y
CONFIG_VM_EVENT_COUNTERS=y
CONFIG_SLUB_DEBUG=y
# CONFIG_SLAB is not set
CONFIG_SLUB=y
# CONFIG_SLOB is not set
# CONFIG_PROFILING is not set
# CONFIG_MARKERS is not set
CONFIG_HAVE_OPROFILE=y
# CONFIG_KPROBES is not set
# CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS is not set
# CONFIG_HAVE_IOREMAP_PROT is not set
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KRETPROBES=y
CONFIG_HAVE_ARCH_TRACEHOOK=y
# CONFIG_HAVE_DMA_ATTRS is not set
# CONFIG_USE_GENERIC_SMP_HELPERS is not set
# CONFIG_HAVE_CLK is not set
CONFIG_PROC_PAGE_MONITOR=y
# CONFIG_HAVE_GENERIC_DMA_COHERENT is not set
CONFIG_SLABINFO=y
CONFIG_RT_MUTEXES=y
# CONFIG_TINY_SHMEM is not set
CONFIG_BASE_SMALL=0
CONFIG_MODULES=y
# CONFIG_MODULE_FORCE_LOAD is not set
CONFIG_MODULE_UNLOAD=y
# CONFIG_MODULE_FORCE_UNLOAD is not set
CONFIG_MODVERSIONS=y
CONFIG_MODULE_SRCVERSION_ALL=y
CONFIG_KMOD=y
CONFIG_BLOCK=y
# CONFIG_BLK_DEV_IO_TRACE is not set
# CONFIG_BLK_DEV_BSG is not set
# CONFIG_BLK_DEV_INTEGRITY is not set
CONFIG_BLOCK_COMPAT=y

#
# IO Schedulers
#
CONFIG_IOSCHED_NOOP=y
CONFIG_IOSCHED_AS=m
CONFIG_IOSCHED_DEADLINE=m
CONFIG_IOSCHED_CFQ=y
# CONFIG_DEFAULT_AS is not set
# CONFIG_DEFAULT_DEADLINE is not set
CONFIG_DEFAULT_CFQ=y
# CONFIG_DEFAULT_NOOP is not set
CONFIG_DEFAULT_IOSCHED="cfq"
CONFIG_CLASSIC_RCU=y

#
# Processor type and features
#
CONFIG_SPARC64_PAGE_SIZE_8KB=y
# CONFIG_SPARC64_PAGE_SIZE_64KB is not set
# CONFIG_SECCOMP is not set
# CONFIG_HZ_100 is not set
# CONFIG_HZ_250 is not set
CONFIG_HZ_300=y
# CONFIG_HZ_1000 is not set
CONFIG_HZ=300
# CONFIG_SCHED_HRTICK is not set
CONFIG_GENERIC_HARDIRQS=y
CONFIG_TICK_ONESHOT=y
CONFIG_NO_HZ=y
# CONFIG_HIGH_RES_TIMERS is not set
CONFIG_GENERIC_CLOCKEVENTS_BUILD=y
# CONFIG_SMP is not set
# CONFIG_CPU_FREQ is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_FIND_NEXT_BIT=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
# CONFIG_NUMA is not set
CONFIG_ARCH_POPULATES_NODE_MAP=y
CONFIG_ARCH_SELECT_MEMORY_MODEL=y
CONFIG_ARCH_SPARSEMEM_ENABLE=y
CONFIG_ARCH_SPARSEMEM_DEFAULT=y
CONFIG_SELECT_MEMORY_MODEL=y
# CONFIG_FLATMEM_MANUAL is not set
# CONFIG_DISCONTIGMEM_MANUAL is not set
CONFIG_SPARSEMEM_MANUAL=y
CONFIG_SPARSEMEM=y
CONFIG_HAVE_MEMORY_PRESENT=y
# CONFIG_SPARSEMEM_STATIC is not set
CONFIG_SPARSEMEM_EXTREME=y
CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y
CONFIG_SPARSEMEM_VMEMMAP=y
CONFIG_PAGEFLAGS_EXTENDED=y
CONFIG_SPLIT_PTLOCK_CPUS=4
CONFIG_RESOURCES_64BIT=y
CONFIG_ZONE_DMA_FLAG=0
CONFIG_NR_QUICK=1
CONFIG_SBUS=y
CONFIG_SBUSCHAR=y
CONFIG_SUN_AUXIO=y
CONFIG_SUN_IO=y
# CONFIG_SUN_LDOMS is not set
CONFIG_PCI=y
CONFIG_PCI_DOMAINS=y
CONFIG_PCI_SYSCALL=y
CONFIG_ARCH_SUPPORTS_MSI=y
# CONFIG_PCI_MSI is not set
# CONFIG_PCI_LEGACY is not set
# CONFIG_PCI_DEBUG is not set
CONFIG_SUN_OPENPROMFS=m

#
# Executable file formats
#
CONFIG_BINFMT_ELF=y
CONFIG_COMPAT_BINFMT_ELF=y
# CONFIG_BINFMT_MISC is not set
CONFIG_COMPAT=y
CONFIG_SYSVIPC_COMPAT=y
# CONFIG_PREEMPT_NONE is not set
CONFIG_PREEMPT_VOLUNTARY=y
# CONFIG_PREEMPT is not set
# CONFIG_CMDLINE_BOOL is not set
CONFIG_NET=y

#
# Networking options
#
CONFIG_PACKET=m
CONFIG_PACKET_MMAP=y
CONFIG_UNIX=m
# CONFIG_NET_KEY is not set
CONFIG_INET=y
# CONFIG_IP_MULTICAST is not set
# CONFIG_IP_ADVANCED_ROUTER is not set
CONFIG_IP_FIB_HASH=y
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_ARPD is not set
CONFIG_SYN_COOKIES=y
# CONFIG_INET_AH is not set
# CONFIG_INET_ESP is not set
# CONFIG_INET_IPCOMP is not set
# CONFIG_INET_XFRM_TUNNEL is not set
# CONFIG_INET_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_TRANSPORT is not set
# CONFIG_INET_XFRM_MODE_TUNNEL is not set
# CONFIG_INET_XFRM_MODE_BEET is not set
# CONFIG_INET_LRO is not set
# CONFIG_INET_DIAG is not set
# CONFIG_TCP_CONG_ADVANCED is not set
CONFIG_TCP_CONG_CUBIC=y
CONFIG_DEFAULT_TCP_CONG="cubic"
# CONFIG_TCP_MD5SIG is not set
# CONFIG_IPV6 is not set
# CONFIG_NETWORK_SECMARK is not set
# CONFIG_NETFILTER is not set
# CONFIG_IP_DCCP is not set
# CONFIG_IP_SCTP is not set
# CONFIG_TIPC is not set
# CONFIG_ATM is not set
# CONFIG_BRIDGE is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_DECNET is not set
# CONFIG_LLC2 is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_NET_SCHED is not set

#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
# CONFIG_HAMRADIO is not set
# CONFIG_CAN is not set
# CONFIG_IRDA is not set
# CONFIG_BT is not set
# CONFIG_AF_RXRPC is not set

#
# Wireless
#
# CONFIG_CFG80211 is not set
# CONFIG_WIRELESS_EXT is not set
# CONFIG_MAC80211 is not set
# CONFIG_IEEE80211 is not set
# CONFIG_RFKILL is not set
# CONFIG_NET_9P is not set

#
# Device Drivers
#

#
# Generic Driver Options
#
CONFIG_UEVENT_HELPER_PATH="/sbin/hotplug"
CONFIG_STANDALONE=y
CONFIG_PREVENT_FIRMWARE_BUILD=y
CONFIG_FW_LOADER=y
# CONFIG_FIRMWARE_IN_KERNEL is not set
CONFIG_EXTRA_FIRMWARE=""
# CONFIG_DEBUG_DRIVER is not set
# CONFIG_DEBUG_DEVRES is not set
# CONFIG_SYS_HYPERVISOR is not set
CONFIG_CONNECTOR=m
# CONFIG_MTD is not set
CONFIG_OF_DEVICE=y
CONFIG_PARPORT=m
CONFIG_PARPORT_PC=m
CONFIG_PARPORT_PC_FIFO=y
# CONFIG_PARPORT_PC_SUPERIO is not set
# CONFIG_PARPORT_GSC is not set
# CONFIG_PARPORT_SUNBPP is not set
# CONFIG_PARPORT_AX88796 is not set
CONFIG_PARPORT_1284=y
CONFIG_BLK_DEV=y
# CONFIG_BLK_DEV_FD is not set
# CONFIG_PARIDE is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
# CONFIG_BLK_DEV_COW_COMMON is not set
CONFIG_BLK_DEV_LOOP=m
# CONFIG_BLK_DEV_CRYPTOLOOP is not set
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_SX8 is not set
# CONFIG_BLK_DEV_RAM is not set
# CONFIG_CDROM_PKTCDVD is not set
# CONFIG_ATA_OVER_ETH is not set
# CONFIG_BLK_DEV_HD is not set
# CONFIG_MISC_DEVICES is not set
CONFIG_HAVE_IDE=y
CONFIG_IDE=y
CONFIG_BLK_DEV_IDE=y

#
# Please see Documentation/ide/ide.txt for help/info on IDE drives
#
CONFIG_IDE_TIMINGS=y
# CONFIG_BLK_DEV_IDE_SATA is not set
CONFIG_BLK_DEV_IDEDISK=y
# CONFIG_IDEDISK_MULTI_MODE is not set
# CONFIG_BLK_DEV_IDECD is not set
# CONFIG_BLK_DEV_IDETAPE is not set
# CONFIG_BLK_DEV_IDEFLOPPY is not set
# CONFIG_BLK_DEV_IDESCSI is not set
# CONFIG_IDE_TASK_IOCTL is not set
CONFIG_IDE_PROC_FS=y

#
# IDE chipset support/bugfixes
#
# CONFIG_BLK_DEV_PLATFORM is not set
CONFIG_BLK_DEV_IDEDMA_SFF=y

#
# PCI IDE chipsets support
#
CONFIG_BLK_DEV_IDEPCI=y
CONFIG_IDEPCI_PCIBUS_ORDER=y
# CONFIG_BLK_DEV_OFFBOARD is not set
CONFIG_BLK_DEV_GENERIC=y
# CONFIG_BLK_DEV_OPTI621 is not set
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
# CONFIG_BLK_DEV_AMD74XX is not set
CONFIG_BLK_DEV_CMD64X=y
# CONFIG_BLK_DEV_TRIFLEX is not set
# CONFIG_BLK_DEV_CS5520 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_HPT366 is not set
# CONFIG_BLK_DEV_JMICRON is not set
# CONFIG_BLK_DEV_SC1200 is not set
# CONFIG_BLK_DEV_PIIX is not set
# CONFIG_BLK_DEV_IT8213 is not set
# CONFIG_BLK_DEV_IT821X is not set
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_PDC202XX_OLD is not set
# CONFIG_BLK_DEV_PDC202XX_NEW is not set
# CONFIG_BLK_DEV_SVWKS is not set
# CONFIG_BLK_DEV_SIIMAGE is not set
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
# CONFIG_BLK_DEV_VIA82CXXX is not set
# CONFIG_BLK_DEV_TC86C001 is not set
CONFIG_BLK_DEV_IDEDMA=y

#
# SCSI device support
#
# CONFIG_RAID_ATTRS is not set
CONFIG_SCSI=m
CONFIG_SCSI_DMA=y
# CONFIG_SCSI_TGT is not set
# CONFIG_SCSI_NETLINK is not set
CONFIG_SCSI_PROC_FS=y

#
# SCSI support type (disk, tape, CD-ROM)
#
CONFIG_BLK_DEV_SD=m
# CONFIG_CHR_DEV_ST is not set
# CONFIG_CHR_DEV_OSST is not set
# CONFIG_BLK_DEV_SR is not set
# CONFIG_CHR_DEV_SG is not set
# CONFIG_CHR_DEV_SCH is not set

#
# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
#
# CONFIG_SCSI_MULTI_LUN is not set
CONFIG_SCSI_CONSTANTS=y
# CONFIG_SCSI_LOGGING is not set
# CONFIG_SCSI_SCAN_ASYNC is not set
CONFIG_SCSI_WAIT_SCAN=m

#
# SCSI Transports
#
# CONFIG_SCSI_SPI_ATTRS is not set
# CONFIG_SCSI_FC_ATTRS is not set
# CONFIG_SCSI_ISCSI_ATTRS is not set
# CONFIG_SCSI_SAS_LIBSAS is not set
# CONFIG_SCSI_SRP_ATTRS is not set
CONFIG_SCSI_LOWLEVEL=y
# CONFIG_ISCSI_TCP is not set
# CONFIG_BLK_DEV_3W_XXXX_RAID is not set
# CONFIG_SCSI_3W_9XXX is not set
# CONFIG_SCSI_ACARD is not set
# CONFIG_SCSI_AACRAID is not set
# CONFIG_SCSI_AIC7XXX is not set
# CONFIG_SCSI_AIC7XXX_OLD is not set
# CONFIG_SCSI_AIC79XX is not set
# CONFIG_SCSI_AIC94XX is not set
# CONFIG_SCSI_ARCMSR is not set
# CONFIG_MEGARAID_NEWGEN is not set
# CONFIG_MEGARAID_LEGACY is not set
# CONFIG_MEGARAID_SAS is not set
# CONFIG_SCSI_HPTIOP is not set
# CONFIG_SCSI_DMX3191D is not set
# CONFIG_SCSI_FUTURE_DOMAIN is not set
# CONFIG_SCSI_IPS is not set
# CONFIG_SCSI_INITIO is not set
CONFIG_SCSI_INIA100=m
# CONFIG_SCSI_PPA is not set
# CONFIG_SCSI_IMM is not set
# CONFIG_SCSI_MVSAS is not set
# CONFIG_SCSI_STEX is not set
# CONFIG_SCSI_SYM53C8XX_2 is not set
# CONFIG_SCSI_QLOGIC_1280 is not set
# CONFIG_SCSI_QLOGICPTI is not set
# CONFIG_SCSI_QLA_FC is not set
# CONFIG_SCSI_QLA_ISCSI is not set
# CONFIG_SCSI_LPFC is not set
# CONFIG_SCSI_DC395x is not set
# CONFIG_SCSI_DC390T is not set
# CONFIG_SCSI_DEBUG is not set
# CONFIG_SCSI_SUNESP is not set
# CONFIG_SCSI_SRP is not set
# CONFIG_SCSI_DH is not set
# CONFIG_ATA is not set
CONFIG_MD=y
# CONFIG_BLK_DEV_MD is not set
CONFIG_BLK_DEV_DM=m
# CONFIG_DM_DEBUG is not set
CONFIG_DM_CRYPT=m
CONFIG_DM_LOOP=m
CONFIG_DM_SNAPSHOT=m
CONFIG_DM_MIRROR=m
CONFIG_DM_LOG_CLUSTERED=m
CONFIG_DM_ZERO=m
CONFIG_DM_MULTIPATH=m
CONFIG_DM_DELAY=m
# CONFIG_DM_UEVENT is not set
# CONFIG_FUSION is not set

#
# IEEE 1394 (FireWire) support
#

#
# Enable only one of the two stacks, unless you know what you are doing
#
# CONFIG_FIREWIRE is not set
# CONFIG_IEEE1394 is not set
# CONFIG_I2O is not set
CONFIG_NETDEVICES=y
# CONFIG_DUMMY is not set
# CONFIG_BONDING is not set
# CONFIG_MACVLAN is not set
# CONFIG_EQUALIZER is not set
CONFIG_TUN=m
# CONFIG_VETH is not set
# CONFIG_ARCNET is not set
# CONFIG_PHYLIB is not set
CONFIG_NET_ETHERNET=y
CONFIG_MII=m
# CONFIG_SUNLANCE is not set
CONFIG_HAPPYMEAL=m
# CONFIG_SUNBMAC is not set
# CONFIG_SUNQE is not set
# CONFIG_SUNGEM is not set
# CONFIG_CASSINI is not set
# CONFIG_NET_VENDOR_3COM is not set
# CONFIG_NET_TULIP is not set
# CONFIG_HP100 is not set
# CONFIG_IBM_NEW_EMAC_ZMII is not set
# CONFIG_IBM_NEW_EMAC_RGMII is not set
# CONFIG_IBM_NEW_EMAC_TAH is not set
# CONFIG_IBM_NEW_EMAC_EMAC4 is not set
# CONFIG_NET_PCI is not set
# CONFIG_B44 is not set
# CONFIG_NET_POCKET is not set
# CONFIG_NETDEV_1000 is not set
# CONFIG_NETDEV_10000 is not set
# CONFIG_TR is not set

#
# Wireless LAN
#
# CONFIG_WLAN_PRE80211 is not set
# CONFIG_WLAN_80211 is not set
# CONFIG_IWLWIFI_LEDS is not set
# CONFIG_WAN is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PLIP is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
# CONFIG_NET_FC is not set
# CONFIG_NETCONSOLE is not set
# CONFIG_NETPOLL is not set
# CONFIG_NET_POLL_CONTROLLER is not set
# CONFIG_ISDN is not set
# CONFIG_PHONE is not set

#
# Input device support
#
CONFIG_INPUT=y
# CONFIG_INPUT_FF_MEMLESS is not set
# CONFIG_INPUT_POLLDEV is not set

#
# Userland interfaces
#
CONFIG_INPUT_MOUSEDEV=y
CONFIG_INPUT_MOUSEDEV_PSAUX=y
CONFIG_INPUT_MOUSEDEV_SCREEN_X=1024
CONFIG_INPUT_MOUSEDEV_SCREEN_Y=768
# CONFIG_INPUT_JOYDEV is not set
CONFIG_INPUT_EVDEV=m
# CONFIG_INPUT_EVBUG is not set

#
# Input Device Drivers
#
CONFIG_INPUT_KEYBOARD=y
# CONFIG_KEYBOARD_ATKBD is not set
CONFIG_KEYBOARD_SUNKBD=y
# CONFIG_KEYBOARD_LKKBD is not set
# CONFIG_KEYBOARD_XTKBD is not set
# CONFIG_KEYBOARD_NEWTON is not set
# CONFIG_KEYBOARD_STOWAWAY is not set
CONFIG_INPUT_MOUSE=y
# CONFIG_MOUSE_PS2 is not set
CONFIG_MOUSE_SERIAL=m
# CONFIG_MOUSE_VSXXXAA is not set
# CONFIG_INPUT_JOYSTICK is not set
# CONFIG_INPUT_TABLET is not set
# CONFIG_INPUT_TOUCHSCREEN is not set
CONFIG_INPUT_MISC=y
CONFIG_INPUT_SPARCSPKR=m
# CONFIG_INPUT_UINPUT is not set

#
# Hardware I/O ports
#
CONFIG_SERIO=y
# CONFIG_SERIO_I8042 is not set
CONFIG_SERIO_SERPORT=m
# CONFIG_SERIO_PARKBD is not set
# CONFIG_SERIO_PCIPS2 is not set
# CONFIG_SERIO_RAW is not set
# CONFIG_GAMEPORT is not set

#
# Character devices
#
CONFIG_VT=y
CONFIG_CONSOLE_TRANSLATIONS=y
CONFIG_VT_CONSOLE=y
CONFIG_HW_CONSOLE=y
# CONFIG_VT_HW_CONSOLE_BINDING is not set
CONFIG_DEVKMEM=y
# CONFIG_SERIAL_NONSTANDARD is not set
# CONFIG_NOZOMI is not set

#
# Serial drivers
#

#
# Non-8250 serial port support
#
CONFIG_SERIAL_SUNCORE=y
# CONFIG_SERIAL_SUNZILOG is not set
CONFIG_SERIAL_SUNSU=y
CONFIG_SERIAL_SUNSU_CONSOLE=y
CONFIG_SERIAL_SUNSAB=m
# CONFIG_SERIAL_SUNHV is not set
CONFIG_SERIAL_CORE=y
CONFIG_SERIAL_CORE_CONSOLE=y
# CONFIG_SERIAL_JSM is not set
CONFIG_UNIX98_PTYS=y
CONFIG_LEGACY_PTYS=y
CONFIG_LEGACY_PTY_COUNT=256
# CONFIG_PRINTER is not set
# CONFIG_PPDEV is not set
# CONFIG_IPMI_HANDLER is not set
# CONFIG_HW_RANDOM is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_RAW_DRIVER is not set
# CONFIG_TCG_TPM is not set
CONFIG_DEVPORT=y
# CONFIG_I2C is not set
# CONFIG_SPI is not set
# CONFIG_W1 is not set
# CONFIG_POWER_SUPPLY is not set
# CONFIG_HWMON is not set
# CONFIG_THERMAL is not set
# CONFIG_THERMAL_HWMON is not set
# CONFIG_WATCHDOG is not set

#
# Sonics Silicon Backplane
#
CONFIG_SSB_POSSIBLE=y
# CONFIG_SSB is not set

#
# Multifunction device drivers
#
# CONFIG_MFD_CORE is not set
# CONFIG_MFD_SM501 is not set
# CONFIG_HTC_PASIC3 is not set

#
# Multimedia devices
#

#
# Multimedia core support
#
# CONFIG_VIDEO_DEV is not set
# CONFIG_DVB_CORE is not set
# CONFIG_VIDEO_MEDIA is not set

#
# Multimedia drivers
#
# CONFIG_DAB is not set

#
# Graphics support
#
# CONFIG_DRM is not set
# CONFIG_VGASTATE is not set
# CONFIG_VIDEO_OUTPUT_CONTROL is not set
CONFIG_FB=y
# CONFIG_FIRMWARE_EDID is not set
# CONFIG_FB_DDC is not set
CONFIG_FB_CFB_FILLRECT=y
CONFIG_FB_CFB_COPYAREA=y
CONFIG_FB_CFB_IMAGEBLIT=y
# CONFIG_FB_CFB_REV_PIXELS_IN_BYTE is not set
# CONFIG_FB_SYS_FILLRECT is not set
# CONFIG_FB_SYS_COPYAREA is not set
# CONFIG_FB_SYS_IMAGEBLIT is not set
# CONFIG_FB_FOREIGN_ENDIAN is not set
# CONFIG_FB_SYS_FOPS is not set
# CONFIG_FB_SVGALIB is not set
# CONFIG_FB_MACMODES is not set
# CONFIG_FB_BACKLIGHT is not set
# CONFIG_FB_MODE_HELPERS is not set
# CONFIG_FB_TILEBLITTING is not set

#
# Frame buffer hardware drivers
#
# CONFIG_FB_CIRRUS is not set
# CONFIG_FB_PM2 is not set
# CONFIG_FB_ASILIANT is not set
# CONFIG_FB_IMSTT is not set
# CONFIG_FB_UVESA is not set
# CONFIG_FB_SBUS is not set
# CONFIG_FB_XVR500 is not set
# CONFIG_FB_XVR2500 is not set
# CONFIG_FB_S1D13XXX is not set
# CONFIG_FB_NVIDIA is not set
# CONFIG_FB_RIVA is not set
# CONFIG_FB_MATROX is not set
# CONFIG_FB_RADEON is not set
# CONFIG_FB_ATY128 is not set
CONFIG_FB_ATY=y
CONFIG_FB_ATY_CT=y
# CONFIG_FB_ATY_GENERIC_LCD is not set
# CONFIG_FB_ATY_GX is not set
# CONFIG_FB_ATY_BACKLIGHT is not set
# CONFIG_FB_S3 is not set
# CONFIG_FB_SAVAGE is not set
# CONFIG_FB_SIS is not set
# CONFIG_FB_NEOMAGIC is not set
# CONFIG_FB_KYRO is not set
# CONFIG_FB_3DFX is not set
# CONFIG_FB_VOODOO1 is not set
# CONFIG_FB_VT8623 is not set
# CONFIG_FB_TRIDENT is not set
# CONFIG_FB_ARK is not set
# CONFIG_FB_PM3 is not set
# CONFIG_FB_CARMINE is not set
# CONFIG_FB_VIRTUAL is not set
# CONFIG_BACKLIGHT_LCD_SUPPORT is not set

#
# Display device support
#
# CONFIG_DISPLAY_SUPPORT is not set

#
# Console display driver support
#
# CONFIG_PROM_CONSOLE is not set
CONFIG_DUMMY_CONSOLE=y
CONFIG_FRAMEBUFFER_CONSOLE=y
# CONFIG_FRAMEBUFFER_CONSOLE_DETECT_PRIMARY is not set
# CONFIG_FRAMEBUFFER_CONSOLE_ROTATION is not set
CONFIG_FONTS=y
# CONFIG_FONT_8x8 is not set
# CONFIG_FONT_8x16 is not set
# CONFIG_FONT_6x11 is not set
# CONFIG_FONT_7x14 is not set
# CONFIG_FONT_PEARL_8x8 is not set
# CONFIG_FONT_ACORN_8x8 is not set
CONFIG_FONT_SUN8x16=y
# CONFIG_FONT_SUN12x22 is not set
# CONFIG_FONT_10x18 is not set
CONFIG_LOGO=y
# CONFIG_LOGO_LINUX_MONO is not set
# CONFIG_LOGO_LINUX_VGA16 is not set
# CONFIG_LOGO_LINUX_CLUT224 is not set
CONFIG_LOGO_SUN_CLUT224=y
CONFIG_SOUND=m
CONFIG_SND=m
CONFIG_SND_TIMER=m
CONFIG_SND_PCM=m
CONFIG_SND_RAWMIDI=m
CONFIG_SND_SEQUENCER=m
CONFIG_SND_SEQ_DUMMY=m
CONFIG_SND_OSSEMUL=y
CONFIG_SND_MIXER_OSS=m
CONFIG_SND_PCM_OSS=m
CONFIG_SND_PCM_OSS_PLUGINS=y
CONFIG_SND_SEQUENCER_OSS=y
# CONFIG_SND_DYNAMIC_MINORS is not set
CONFIG_SND_SUPPORT_OLD_API=y
CONFIG_SND_VERBOSE_PROCFS=y
# CONFIG_SND_VERBOSE_PRINTK is not set
# CONFIG_SND_DEBUG is not set
CONFIG_SND_DRIVERS=y
# CONFIG_SND_DUMMY is not set
CONFIG_SND_VIRMIDI=m
# CONFIG_SND_MTPAV is not set
# CONFIG_SND_MTS64 is not set
# CONFIG_SND_SERIAL_U16550 is not set
# CONFIG_SND_MPU401 is not set
# CONFIG_SND_PORTMAN2X4 is not set
# CONFIG_SND_PCI is not set
CONFIG_SND_SPARC=y
# CONFIG_SND_SUN_AMD7930 is not set
CONFIG_SND_SUN_CS4231=m
# CONFIG_SND_SUN_DBRI is not set
# CONFIG_SND_SOC is not set
# CONFIG_SOUND_PRIME is not set
# CONFIG_HID_SUPPORT is not set
# CONFIG_USB_SUPPORT is not set
# CONFIG_MMC is not set
# CONFIG_MEMSTICK is not set
# CONFIG_NEW_LEDS is not set
# CONFIG_ACCESSIBILITY is not set
# CONFIG_INFINIBAND is not set
# CONFIG_RTC_CLASS is not set
# CONFIG_DMADEVICES is not set
# CONFIG_AUXDISPLAY is not set
# CONFIG_UIO is not set

#
# Misc Linux/SPARC drivers
#
CONFIG_SUN_OPENPROMIO=m
# CONFIG_OBP_FLASH is not set
# CONFIG_SUN_BPP is not set
# CONFIG_BBC_I2C is not set
# CONFIG_ENVCTRL is not set
# CONFIG_DISPLAY7SEG is not set

#
# File systems
#
CONFIG_EXT2_FS=y
CONFIG_EXT2_FS_XATTR=y
CONFIG_EXT2_FS_POSIX_ACL=y
CONFIG_EXT2_FS_SECURITY=y
# CONFIG_EXT2_FS_XIP is not set
# CONFIG_EXT3_FS is not set
# CONFIG_EXT4DEV_FS is not set
CONFIG_FS_MBCACHE=y
# CONFIG_REISERFS_FS is not set
# CONFIG_JFS_FS is not set
CONFIG_FS_POSIX_ACL=y
# CONFIG_XFS_FS is not set
# CONFIG_GFS2_FS is not set
# CONFIG_OCFS2_FS is not set
CONFIG_DNOTIFY=y
CONFIG_INOTIFY=y
CONFIG_INOTIFY_USER=y
# CONFIG_QUOTA is not set
# CONFIG_AUTOFS_FS is not set
# CONFIG_AUTOFS4_FS is not set
# CONFIG_FUSE_FS is not set
CONFIG_GENERIC_ACL=y

#
# CD-ROM/DVD Filesystems
#
# CONFIG_ISO9660_FS is not set
# CONFIG_UDF_FS is not set

#
# DOS/FAT/NT Filesystems
#
# CONFIG_MSDOS_FS is not set
# CONFIG_VFAT_FS is not set
# CONFIG_NTFS_FS is not set

#
# Pseudo filesystems
#
CONFIG_PROC_FS=y
CONFIG_PROC_KCORE=y
CONFIG_PROC_SYSCTL=y
CONFIG_SYSFS=y
CONFIG_TMPFS=y
CONFIG_TMPFS_POSIX_ACL=y
# CONFIG_HUGETLBFS is not set
# CONFIG_HUGETLB_PAGE is not set
CONFIG_CONFIGFS_FS=m

#
# Miscellaneous filesystems
#
# CONFIG_ADFS_FS is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EFS_FS is not set
# CONFIG_CRAMFS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_OMFS_FS is not set
# CONFIG_HPFS_FS is not set
# CONFIG_QNX4FS_FS is not set
# CONFIG_ROMFS_FS is not set
# CONFIG_SYSV_FS is not set
# CONFIG_UFS_FS is not set
# CONFIG_NETWORK_FILESYSTEMS is not set

#
# Partition Types
#
CONFIG_PARTITION_ADVANCED=y
# CONFIG_ACORN_PARTITION is not set
# CONFIG_OSF_PARTITION is not set
# CONFIG_AMIGA_PARTITION is not set
# CONFIG_ATARI_PARTITION is not set
# CONFIG_MAC_PARTITION is not set
CONFIG_MSDOS_PARTITION=y
# CONFIG_BSD_DISKLABEL is not set
# CONFIG_MINIX_SUBPARTITION is not set
# CONFIG_SOLARIS_X86_PARTITION is not set
# CONFIG_UNIXWARE_DISKLABEL is not set
# CONFIG_LDM_PARTITION is not set
# CONFIG_SGI_PARTITION is not set
# CONFIG_ULTRIX_PARTITION is not set
CONFIG_SUN_PARTITION=y
# CONFIG_KARMA_PARTITION is not set
# CONFIG_EFI_PARTITION is not set
# CONFIG_SYSV68_PARTITION is not set
# CONFIG_NLS is not set
# CONFIG_DLM is not set

#
# Kernel hacking
#
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_PRINTK_TIME is not set
CONFIG_ENABLE_WARN_DEPRECATED=y
CONFIG_ENABLE_MUST_CHECK=y
CONFIG_FRAME_WARN=2048
CONFIG_MAGIC_SYSRQ=y
# CONFIG_UNUSED_SYMBOLS is not set
# CONFIG_DEBUG_FS is not set
# CONFIG_HEADERS_CHECK is not set
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_SHIRQ=y
CONFIG_DETECT_SOFTLOCKUP=y
# CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC is not set
CONFIG_BOOTPARAM_SOFTLOCKUP_PANIC_VALUE=0
CONFIG_SCHED_DEBUG=y
# CONFIG_SCHEDSTATS is not set
# CONFIG_TIMER_STATS is not set
CONFIG_DEBUG_OBJECTS=y
# CONFIG_DEBUG_OBJECTS_SELFTEST is not set
CONFIG_DEBUG_OBJECTS_FREE=y
CONFIG_DEBUG_OBJECTS_TIMERS=y
CONFIG_SLUB_DEBUG_ON=y
# CONFIG_SLUB_STATS is not set
CONFIG_DEBUG_RT_MUTEXES=y
CONFIG_DEBUG_PI_LIST=y
# CONFIG_RT_MUTEX_TESTER is not set
CONFIG_DEBUG_SPINLOCK=y
CONFIG_DEBUG_MUTEXES=y
CONFIG_DEBUG_LOCK_ALLOC=y
CONFIG_PROVE_LOCKING=y
CONFIG_LOCKDEP=y
# CONFIG_LOCK_STAT is not set
# CONFIG_DEBUG_LOCKDEP is not set
CONFIG_TRACE_IRQFLAGS=y
CONFIG_DEBUG_SPINLOCK_SLEEP=y
# CONFIG_DEBUG_LOCKING_API_SELFTESTS is not set
CONFIG_STACKTRACE=y
# CONFIG_DEBUG_KOBJECT is not set
CONFIG_DEBUG_BUGVERBOSE=y
# CONFIG_DEBUG_INFO is not set
CONFIG_DEBUG_VM=y
CONFIG_DEBUG_WRITECOUNT=y
CONFIG_DEBUG_MEMORY_INIT=y
CONFIG_DEBUG_LIST=y
CONFIG_DEBUG_SG=y
CONFIG_FRAME_POINTER=y
# CONFIG_BOOT_PRINTK_DELAY is not set
# CONFIG_RCU_TORTURE_TEST is not set
# CONFIG_BACKTRACE_SELF_TEST is not set
# CONFIG_FAULT_INJECTION is not set
CONFIG_HAVE_FTRACE=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
# CONFIG_FTRACE is not set
# CONFIG_IRQSOFF_TRACER is not set
# CONFIG_SCHED_TRACER is not set
# CONFIG_CONTEXT_SWITCH_TRACER is not set
# CONFIG_SAMPLES is not set
CONFIG_HAVE_ARCH_KGDB=y
# CONFIG_KGDB is not set
CONFIG_DEBUG_STACK_USAGE=y
CONFIG_DEBUG_DCFLUSH=y
CONFIG_STACK_DEBUG=y
# CONFIG_DEBUG_PAGEALLOC is not set
CONFIG_MCOUNT=y

#
# Security options
#
# CONFIG_KEYS is not set
# CONFIG_SECURITY is not set
# CONFIG_SECURITY_FILE_CAPABILITIES is not set
CONFIG_CRYPTO=m

#
# Crypto core or helper
#
CONFIG_CRYPTO_ALGAPI=m
CONFIG_CRYPTO_BLKCIPHER=m
CONFIG_CRYPTO_MANAGER=m
# CONFIG_CRYPTO_GF128MUL is not set
# CONFIG_CRYPTO_NULL is not set
# CONFIG_CRYPTO_CRYPTD is not set
# CONFIG_CRYPTO_AUTHENC is not set
# CONFIG_CRYPTO_TEST is not set

#
# Authenticated Encryption with Associated Data
#
# CONFIG_CRYPTO_CCM is not set
# CONFIG_CRYPTO_GCM is not set
# CONFIG_CRYPTO_SEQIV is not set

#
# Block modes
#
CONFIG_CRYPTO_CBC=m
# CONFIG_CRYPTO_CTR is not set
# CONFIG_CRYPTO_CTS is not set
# CONFIG_CRYPTO_ECB is not set
# CONFIG_CRYPTO_LRW is not set
# CONFIG_CRYPTO_PCBC is not set
# CONFIG_CRYPTO_XTS is not set

#
# Hash modes
#
# CONFIG_CRYPTO_HMAC is not set
# CONFIG_CRYPTO_XCBC is not set

#
# Digest
#
# CONFIG_CRYPTO_CRC32C is not set
# CONFIG_CRYPTO_MD4 is not set
# CONFIG_CRYPTO_MD5 is not set
# CONFIG_CRYPTO_MICHAEL_MIC is not set
# CONFIG_CRYPTO_RMD128 is not set
# CONFIG_CRYPTO_RMD160 is not set
# CONFIG_CRYPTO_RMD256 is not set
# CONFIG_CRYPTO_RMD320 is not set
# CONFIG_CRYPTO_SHA1 is not set
# CONFIG_CRYPTO_SHA256 is not set
# CONFIG_CRYPTO_SHA512 is not set
# CONFIG_CRYPTO_TGR192 is not set
# CONFIG_CRYPTO_WP512 is not set

#
# Ciphers
#
CONFIG_CRYPTO_AES=m
# CONFIG_CRYPTO_ANUBIS is not set
# CONFIG_CRYPTO_ARC4 is not set
# CONFIG_CRYPTO_BLOWFISH is not set
# CONFIG_CRYPTO_CAMELLIA is not set
# CONFIG_CRYPTO_CAST5 is not set
# CONFIG_CRYPTO_CAST6 is not set
# CONFIG_CRYPTO_DES is not set
# CONFIG_CRYPTO_FCRYPT is not set
# CONFIG_CRYPTO_KHAZAD is not set
# CONFIG_CRYPTO_SALSA20 is not set
# CONFIG_CRYPTO_SEED is not set
CONFIG_CRYPTO_SERPENT=m
# CONFIG_CRYPTO_TEA is not set
# CONFIG_CRYPTO_TWOFISH is not set

#
# Compression
#
# CONFIG_CRYPTO_DEFLATE is not set
# CONFIG_CRYPTO_LZO is not set
# CONFIG_CRYPTO_HW is not set

#
# Library routines
#
CONFIG_BITREVERSE=y
# CONFIG_GENERIC_FIND_FIRST_BIT is not set
# CONFIG_CRC_CCITT is not set
# CONFIG_CRC16 is not set
CONFIG_CRC_T10DIF=m
# CONFIG_CRC_ITU_T is not set
CONFIG_CRC32=y
# CONFIG_CRC7 is not set
# CONFIG_LIBCRC32C is not set
CONFIG_PLIST=y
CONFIG_HAS_IOMEM=y
CONFIG_HAS_IOPORT=y
CONFIG_HAS_DMA=y
CONFIG_HAVE_LMB=y

2008-08-14 03:26:16

by David Miller

[permalink] [raw]
Subject: Re: console handover badness

From: Mikulas Patocka <[email protected]>
Date: Wed, 13 Aug 2008 08:46:37 -0400 (EDT)

> On Tue, 12 Aug 2008, David Miller wrote:
>
> > Mikulas can you send me the .config you're using in 2.6.27 to trigger
> > this?
> >
> > Thanks.
>
> Here it is. The computer is Ultra 5 with 512MB RAM.

Thanks, I've tried a lot of things to try and reproduce this myself but to
no avail.

Let's try to track down exactly where the corruption happens. Please try
the debug patch below on your box. It will spit out a message something
like:

[ 0.000000] BUG: Bogus migrate type 5
[ 0.000000] BUG: Usemap for section 0 corrupted at sparse_init+0x17c/0x218[mm/sparse.c:498]

The theory is that some other piece of code is either not allocating a large
enough buffer or overwriting past the end of a validly sized other buffer
for some reason, and thus clobbering this pageblock flags bitmap.

Wherever this debugging check first triggers should give us some idea
of who the culprit might be.

diff --git a/arch/sparc64/mm/init.c b/arch/sparc64/mm/init.c
index 217de3e..26b018f 100644
--- a/arch/sparc64/mm/init.c
+++ b/arch/sparc64/mm/init.c
@@ -1643,6 +1643,8 @@ void __init setup_per_cpu_areas(void)
{
}

+extern void sparse_validate_usemap(const char *file, int line);
+
void __init paging_init(void)
{
unsigned long end_pfn, shift, phys_base;
@@ -1788,7 +1790,9 @@ void __init paging_init(void)
#ifndef CONFIG_NEED_MULTIPLE_NODES
max_mapnr = last_valid_pfn;
#endif
+ sparse_validate_usemap(__FILE__, __LINE__);
kernel_physical_mapping_init();
+ sparse_validate_usemap(__FILE__, __LINE__);

{
unsigned long max_zone_pfns[MAX_NR_ZONES];
@@ -1798,12 +1802,15 @@ void __init paging_init(void)
max_zone_pfns[ZONE_NORMAL] = end_pfn;

free_area_init_nodes(max_zone_pfns);
+ sparse_validate_usemap(__FILE__, __LINE__);
}

printk("Booting Linux...\n");

central_probe();
+ sparse_validate_usemap(__FILE__, __LINE__);
cpu_probe();
+ sparse_validate_usemap(__FILE__, __LINE__);
}

int __init page_in_phys_avail(unsigned long paddr)
diff --git a/init/main.c b/init/main.c
index 0bc7e16..80771f5 100644
--- a/init/main.c
+++ b/init/main.c
@@ -536,6 +536,8 @@ void __init __weak thread_info_cache_init(void)
{
}

+extern void sparse_validate_usemap(const char *file, int line);
+
asmlinkage void __init start_kernel(void)
{
char * command_line;
@@ -567,12 +569,19 @@ asmlinkage void __init start_kernel(void)
printk(KERN_NOTICE);
printk(linux_banner);
setup_arch(&command_line);
+ sparse_validate_usemap(__FILE__, __LINE__);
mm_init_owner(&init_mm, &init_task);
+ sparse_validate_usemap(__FILE__, __LINE__);
setup_command_line(command_line);
+ sparse_validate_usemap(__FILE__, __LINE__);
unwind_setup();
+ sparse_validate_usemap(__FILE__, __LINE__);
setup_per_cpu_areas();
+ sparse_validate_usemap(__FILE__, __LINE__);
setup_nr_cpu_ids();
+ sparse_validate_usemap(__FILE__, __LINE__);
smp_prepare_boot_cpu(); /* arch-specific boot-cpu hooks */
+ sparse_validate_usemap(__FILE__, __LINE__);

/*
* Set up the scheduler prior starting any interrupts (such as the
@@ -580,35 +589,52 @@ asmlinkage void __init start_kernel(void)
* time - but meanwhile we still have a functioning scheduler.
*/
sched_init();
+ sparse_validate_usemap(__FILE__, __LINE__);
/*
* Disable preemption - early bootup scheduling is extremely
* fragile until we cpu_idle() for the first time.
*/
preempt_disable();
build_all_zonelists();
+ sparse_validate_usemap(__FILE__, __LINE__);
page_alloc_init();
+ sparse_validate_usemap(__FILE__, __LINE__);
printk(KERN_NOTICE "Kernel command line: %s\n", boot_command_line);
parse_early_param();
+ sparse_validate_usemap(__FILE__, __LINE__);
parse_args("Booting kernel", static_command_line, __start___param,
__stop___param - __start___param,
&unknown_bootoption);
+ sparse_validate_usemap(__FILE__, __LINE__);
if (!irqs_disabled()) {
printk(KERN_WARNING "start_kernel(): bug: interrupts were "
"enabled *very* early, fixing it\n");
local_irq_disable();
}
sort_main_extable();
+ sparse_validate_usemap(__FILE__, __LINE__);
trap_init();
+ sparse_validate_usemap(__FILE__, __LINE__);
rcu_init();
+ sparse_validate_usemap(__FILE__, __LINE__);
init_IRQ();
+ sparse_validate_usemap(__FILE__, __LINE__);
pidhash_init();
+ sparse_validate_usemap(__FILE__, __LINE__);
init_timers();
+ sparse_validate_usemap(__FILE__, __LINE__);
hrtimers_init();
+ sparse_validate_usemap(__FILE__, __LINE__);
softirq_init();
+ sparse_validate_usemap(__FILE__, __LINE__);
timekeeping_init();
+ sparse_validate_usemap(__FILE__, __LINE__);
time_init();
+ sparse_validate_usemap(__FILE__, __LINE__);
sched_clock_init();
+ sparse_validate_usemap(__FILE__, __LINE__);
profile_init();
+ sparse_validate_usemap(__FILE__, __LINE__);
if (!irqs_disabled())
printk("start_kernel(): bug: interrupts were enabled early\n");
early_boot_irqs_on();
@@ -620,10 +646,12 @@ asmlinkage void __init start_kernel(void)
* this. But we do want output early, in case something goes wrong.
*/
console_init();
+ sparse_validate_usemap(__FILE__, __LINE__);
if (panic_later)
panic(panic_later, panic_param);

lockdep_info();
+ sparse_validate_usemap(__FILE__, __LINE__);

/*
* Need to run this when irqs are enabled, because it wants
@@ -631,6 +659,7 @@ asmlinkage void __init start_kernel(void)
* too:
*/
locking_selftest();
+ sparse_validate_usemap(__FILE__, __LINE__);

#ifdef CONFIG_BLK_DEV_INITRD
if (initrd_start && !initrd_below_start_ok &&
@@ -643,7 +672,9 @@ asmlinkage void __init start_kernel(void)
}
#endif
vfs_caches_init_early();
+ sparse_validate_usemap(__FILE__, __LINE__);
cpuset_init_early();
+ sparse_validate_usemap(__FILE__, __LINE__);
mem_init();
enable_debug_pagealloc();
cpu_hotplug_init();
diff --git a/mm/sparse.c b/mm/sparse.c
index 5d9dbbb..116559c 100644
--- a/mm/sparse.c
+++ b/mm/sparse.c
@@ -262,6 +262,52 @@ unsigned long usemap_size(void)
return size_bytes;
}

+#if 1
+static int check_one_blockval(unsigned long *bitmap, unsigned long off, unsigned long nbits)
+{
+ unsigned long i, value = 1, flags = 0;
+
+ for (i = 0; i < nbits; i++, value <<= 1)
+ if (test_bit(off + i, bitmap))
+ flags |= value;
+
+ if (flags >= MIGRATE_TYPES) {
+ printk(KERN_ERR "BUG: Bogus migrate type %lu\n", flags);
+ return 1;
+ }
+ return 0;
+}
+
+void sparse_validate_usemap(const char *file, int line)
+{
+ void *caller = __builtin_return_address(0);
+ unsigned long size = usemap_size();
+ unsigned long pnum;
+ static int reported = 0;
+
+ if (reported)
+ return;
+
+ for (pnum = 0; pnum < NR_MEM_SECTIONS; pnum++) {
+ struct mem_section *ms;
+ unsigned long *bitmap;
+ unsigned long off;
+
+ if (!present_section_nr(pnum))
+ continue;
+ ms = __nr_to_section(pnum);
+ bitmap = ms->pageblock_flags;
+ for (off = 0; off < size; off += 3) {
+ if (check_one_blockval(bitmap, off, 3)) {
+ printk(KERN_ERR "BUG: Usemap for section %lu corrupted at %pS[%s:%d]\n",
+ pnum, caller, file, line);
+ reported = 1;
+ break;
+ }
+ }
+ }
+}
+#endif
#ifdef CONFIG_MEMORY_HOTPLUG
static unsigned long *__kmalloc_section_usemap(void)
{
@@ -445,10 +491,16 @@ void __init sparse_init(void)
sparse_init_one_section(__nr_to_section(pnum), pnum, map,
usemap);
}
+#if 1
+ sparse_validate_usemap(__FILE__, __LINE__);
+#endif

vmemmap_populate_print_last();

free_bootmem(__pa(usemap_map), size);
+#if 1
+ sparse_validate_usemap(__FILE__, __LINE__);
+#endif
}

#ifdef CONFIG_MEMORY_HOTPLUG

2008-08-14 23:11:35

by Mikulas Patocka

[permalink] [raw]
Subject: Bootmem allocator broken [was: console handover badness]



On Wed, 13 Aug 2008, David Miller wrote:

> From: Mikulas Patocka <[email protected]>
> Date: Wed, 13 Aug 2008 08:46:37 -0400 (EDT)
>
> > On Tue, 12 Aug 2008, David Miller wrote:
> >
> > > Mikulas can you send me the .config you're using in 2.6.27 to trigger
> > > this?
> > >
> > > Thanks.
> >
> > Here it is. The computer is Ultra 5 with 512MB RAM.
>
> Thanks, I've tried a lot of things to try and reproduce this myself but to
> no avail.
>
> Let's try to track down exactly where the corruption happens. Please try
> the debug patch below on your box. It will spit out a message something
> like:
>
> [ 0.000000] BUG: Bogus migrate type 5
> [ 0.000000] BUG: Usemap for section 0 corrupted at sparse_init+0x17c/0x218[mm/sparse.c:498]
>
> The theory is that some other piece of code is either not allocating a large
> enough buffer or overwriting past the end of a validly sized other buffer
> for some reason, and thus clobbering this pageblock flags bitmap.
>
> Wherever this debugging check first triggers should give us some idea
> of who the culprit might be.

Hi

So I tried the patch and found out that the corruption happens in
setup_command_line --- the first strcpy call corrupted the migratetype
map.

Examining the problem further, it turned out that Johannes Weiner
committed new bootmem allocator to 2.6.27-rc1 and the allocator is broken.

This is the minimal sequence that jams the allocator:

void *p, *q, *r;
p = alloc_bootmem(PAGE_SIZE);
q = alloc_bootmem(64);
free_bootmem(p, PAGE_SIZE);
p = alloc_bootmem(PAGE_SIZE);
r = alloc_bootmem(64);

--- after this sequence (assuming that the allocator was empty or
page-aligned before), pointer "q" will be equal to pointer "r".

What's hapenning inside the allocator:
p = alloc_bootmem(PAGE_SIZE);
in allocator: last_end_off == PAGE_SIZE, bitmap contains bits 10000...
q = alloc_bootmem(64);
in allocator: last_end_off == PAGE_SIZE + 64, bitmap contains 11000...
free_bootmem(p, PAGE_SIZE);
in allocator: last_end_off == PAGE_SIZE + 64, bitmap contains 01000...
p = alloc_bootmem(PAGE_SIZE);
in allocator: last_end_off == PAGE_SIZE, bitmap contains 11000...
r = alloc_bootmem(64);
and now:
it finds bit "2", as a place where to allocate (sidx)
it hits the condition
if (bdata->last_end_off && PFN_DOWN(bdata->last_end_off) + 1 == sidx))
start_off = ALIGN(bdata->last_end_off, align);
--- you can see that the condition is true, so it assigns start_off =
ALIGN(bdata->last_end_off, align); --- that is PAGE_SIZE --- and allocates
over already allocated block.

This patch fixes it (kernels 2.6.27-rc2 and 2.6.27-rc3 boot ok after the
patch). Johannes, please review the patch and submit it to Linus.

With the patch it tries to continue at the end of previous allocation only
if the previous allocation ended in the middle of the page.

Signed-off-by: Mikulas Patocka <[email protected]>

---
mm/bootmem.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.27-rc2-orig/mm/bootmem.c
===================================================================
--- linux-2.6.27-rc2-orig.orig/mm/bootmem.c 2008-08-15 00:10:38.000000000 +0200
+++ linux-2.6.27-rc2-orig/mm/bootmem.c 2008-08-15 00:10:53.000000000 +0200
@@ -473,7 +473,7 @@ find_block:
goto find_block;
}

- if (bdata->last_end_off &&
+ if (bdata->last_end_off & (PAGE_SIZE - 1) &&
PFN_DOWN(bdata->last_end_off) + 1 == sidx)
start_off = ALIGN(bdata->last_end_off, align);
else

2008-08-14 23:25:31

by David Miller

[permalink] [raw]
Subject: Re: Bootmem allocator broken

From: Mikulas Patocka <[email protected]>
Date: Thu, 14 Aug 2008 19:11:19 -0400 (EDT)

[ Adding Alexander back to the CC: as he is seeing this same
exact bug too, please keep him in the loop for testing. ]

> So I tried the patch and found out that the corruption happens in
> setup_command_line --- the first strcpy call corrupted the migratetype
> map.
>
> Examining the problem further, it turned out that Johannes Weiner
> committed new bootmem allocator to 2.6.27-rc1 and the allocator is broken.

Ok, I was just looking at Alexander's debugging dump from my patch
and in his case it pointed to kernel_physical_mapping_init() and
I couldn't find any obvious problems there.

But with the bug you found, it makes perfect sense, nice work!

> This is the minimal sequence that jams the allocator:
>
> void *p, *q, *r;
> p = alloc_bootmem(PAGE_SIZE);
> q = alloc_bootmem(64);
> free_bootmem(p, PAGE_SIZE);
> p = alloc_bootmem(PAGE_SIZE);
> r = alloc_bootmem(64);
>
> --- after this sequence (assuming that the allocator was empty or
> page-aligned before), pointer "q" will be equal to pointer "r".

Excellent detective work!

> What's hapenning inside the allocator:
> p = alloc_bootmem(PAGE_SIZE);
> in allocator: last_end_off == PAGE_SIZE, bitmap contains bits 10000...
> q = alloc_bootmem(64);
> in allocator: last_end_off == PAGE_SIZE + 64, bitmap contains 11000...
> free_bootmem(p, PAGE_SIZE);
> in allocator: last_end_off == PAGE_SIZE + 64, bitmap contains 01000...
> p = alloc_bootmem(PAGE_SIZE);
> in allocator: last_end_off == PAGE_SIZE, bitmap contains 11000...
> r = alloc_bootmem(64);
> and now:
> it finds bit "2", as a place where to allocate (sidx)
> it hits the condition
> if (bdata->last_end_off && PFN_DOWN(bdata->last_end_off) + 1 == sidx))
> start_off = ALIGN(bdata->last_end_off, align);
> --- you can see that the condition is true, so it assigns start_off =
> ALIGN(bdata->last_end_off, align); --- that is PAGE_SIZE --- and allocates
> over already allocated block.
>
> This patch fixes it (kernels 2.6.27-rc2 and 2.6.27-rc3 boot ok after the
> patch). Johannes, please review the patch and submit it to Linus.
>
> With the patch it tries to continue at the end of previous allocation only
> if the previous allocation ended in the middle of the page.
>
> Signed-off-by: Mikulas Patocka <[email protected]>
>
> ---
> mm/bootmem.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: linux-2.6.27-rc2-orig/mm/bootmem.c
> ===================================================================
> --- linux-2.6.27-rc2-orig.orig/mm/bootmem.c 2008-08-15 00:10:38.000000000 +0200
> +++ linux-2.6.27-rc2-orig/mm/bootmem.c 2008-08-15 00:10:53.000000000 +0200
> @@ -473,7 +473,7 @@ find_block:
> goto find_block;
> }
>
> - if (bdata->last_end_off &&
> + if (bdata->last_end_off & (PAGE_SIZE - 1) &&
> PFN_DOWN(bdata->last_end_off) + 1 == sidx)
> start_off = ALIGN(bdata->last_end_off, align);
> else
>
>

2008-08-14 23:41:16

by Johannes Weiner

[permalink] [raw]
Subject: Re: Bootmem allocator broken

Hi Mikulas,

Mikulas Patocka <[email protected]> writes:

> Examining the problem further, it turned out that Johannes Weiner
> committed new bootmem allocator to 2.6.27-rc1 and the allocator is broken.
>
> This is the minimal sequence that jams the allocator:
>
> void *p, *q, *r;
> p = alloc_bootmem(PAGE_SIZE);
> q = alloc_bootmem(64);
> free_bootmem(p, PAGE_SIZE);
> p = alloc_bootmem(PAGE_SIZE);
> r = alloc_bootmem(64);
>
> --- after this sequence (assuming that the allocator was empty or
> page-aligned before), pointer "q" will be equal to pointer "r".
>
> What's hapenning inside the allocator:
> p = alloc_bootmem(PAGE_SIZE);
> in allocator: last_end_off == PAGE_SIZE, bitmap contains bits 10000...
> q = alloc_bootmem(64);
> in allocator: last_end_off == PAGE_SIZE + 64, bitmap contains 11000...
> free_bootmem(p, PAGE_SIZE);
> in allocator: last_end_off == PAGE_SIZE + 64, bitmap contains 01000...
> p = alloc_bootmem(PAGE_SIZE);
> in allocator: last_end_off == PAGE_SIZE, bitmap contains 11000...
> r = alloc_bootmem(64);
> and now:
> it finds bit "2", as a place where to allocate (sidx)
> it hits the condition
> if (bdata->last_end_off && PFN_DOWN(bdata->last_end_off) + 1 == sidx))
> start_off = ALIGN(bdata->last_end_off, align);
> --- you can see that the condition is true, so it assigns start_off =
> ALIGN(bdata->last_end_off, align); --- that is PAGE_SIZE --- and allocates
> over already allocated block.
>
> This patch fixes it (kernels 2.6.27-rc2 and 2.6.27-rc3 boot ok after the
> patch). Johannes, please review the patch and submit it to Linus.
>
> With the patch it tries to continue at the end of previous allocation only
> if the previous allocation ended in the middle of the page.

Yes, taking last_end_off into account when it's page-aligned is bogus as
the whole merging thing is about partial pages.

Cool spot and nice fix!

> Signed-off-by: Mikulas Patocka <[email protected]>

Acked-by: Johannes Weiner <[email protected]>

Hannes

> ---
> mm/bootmem.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: linux-2.6.27-rc2-orig/mm/bootmem.c
> ===================================================================
> --- linux-2.6.27-rc2-orig.orig/mm/bootmem.c 2008-08-15 00:10:38.000000000 +0200
> +++ linux-2.6.27-rc2-orig/mm/bootmem.c 2008-08-15 00:10:53.000000000 +0200
> @@ -473,7 +473,7 @@ find_block:
> goto find_block;
> }
>
> - if (bdata->last_end_off &&
> + if (bdata->last_end_off & (PAGE_SIZE - 1) &&
> PFN_DOWN(bdata->last_end_off) + 1 == sidx)
> start_off = ALIGN(bdata->last_end_off, align);
> else

2008-08-15 11:09:32

by Alexander Beregalov

[permalink] [raw]
Subject: Re: Bootmem allocator broken

2008/8/15 David Miller <[email protected]>:
> From: Mikulas Patocka <[email protected]>
> Date: Thu, 14 Aug 2008 19:11:19 -0400 (EDT)
>
> [ Adding Alexander back to the CC: as he is seeing this same
> exact bug too, please keep him in the loop for testing. ]

>> This patch fixes it (kernels 2.6.27-rc2 and 2.6.27-rc3 boot ok after the
>> patch). Johannes, please review the patch and submit it to Linus.
>>
>> With the patch it tries to continue at the end of previous allocation only
>> if the previous allocation ended in the middle of the page.
>>
>> Signed-off-by: Mikulas Patocka <[email protected]>


It is working! Thanks a lot David and Mikulas!

alexb@sparky ~ $ zgrep LOCKDEP /proc/config.gz
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_LOCKDEP=y
CONFIG_DEBUG_LOCKDEP=y
alexb@sparky ~ $ uname -a
Linux sparky 2.6.27-rc3-00171-gb635ace-dirty #3 PREEMPT Fri Aug 15
14:52:40 MSD 2008 sparc64 sun4u TI UltraSparc IIi (Sabre) GNU/Linux

2008-08-15 21:13:36

by David Miller

[permalink] [raw]
Subject: Re: Bootmem allocator broken

From: "Alexander Beregalov" <[email protected]>
Date: Fri, 15 Aug 2008 15:09:20 +0400

> It is working! Thanks a lot David and Mikulas!

Thanks for your testing and patience :)