2005-10-07 15:05:35

by John Rigg

[permalink] [raw]
Subject: 2.6.14-rc3-rt10 crashes on boot

On Friday 7 October 2005 Ingo Molnar wrote:
>i got overflows in initramfs's gunzip with certain debug options. I have
>improved the stack footprint of the worst offenders in -rt11 (see the
>standalone patch below) - John, does it boot any better?

Ah. I'm using initrd. With CONFIG_LATENCY_TRACE=y my initrd.img is
large, > 3.6MB. Maybe it's time to try initramfs.

BTW I'm having trouble enabling DEBUG_STACKOVERFLOW. I can see
it in arch/i386/Kconfig.debug (and not in arch/x86_64/Kconfig.debug),
but it doesn't appear in menuconfig no matter what other kernel hacking
options I enable. If I add it manually to .config it just gets removed
by `make oldconfig'. Is this an x86_64 issue?

For now I'll assume that there is a stack overflow and try initramfs.

John


2005-10-07 15:48:55

by Steven Rostedt

[permalink] [raw]
Subject: Re: 2.6.14-rc3-rt10 crashes on boot


On Fri, 7 Oct 2005, John Rigg wrote:

> On Friday 7 October 2005 Ingo Molnar wrote:
> >i got overflows in initramfs's gunzip with certain debug options. I have
> >improved the stack footprint of the worst offenders in -rt11 (see the
> >standalone patch below) - John, does it boot any better?
>
> Ah. I'm using initrd. With CONFIG_LATENCY_TRACE=y my initrd.img is
> large, > 3.6MB. Maybe it's time to try initramfs.
>
> BTW I'm having trouble enabling DEBUG_STACKOVERFLOW. I can see
> it in arch/i386/Kconfig.debug (and not in arch/x86_64/Kconfig.debug),
> but it doesn't appear in menuconfig no matter what other kernel hacking
> options I enable. If I add it manually to .config it just gets removed
> by `make oldconfig'. Is this an x86_64 issue?
>
> For now I'll assume that there is a stack overflow and try initramfs.
>

Here John,

Add this patch and it will add the option for you in x86_64 (I forgot that
you were using that). I even set it to be default on. I didn't add a test
in do_IRQ, but I believe that the tests in latency.c should be good
enough.

-- Steve

Index: linux-rt-quilt/arch/x86_64/Kconfig.debug
===================================================================
--- linux-rt-quilt.orig/arch/x86_64/Kconfig.debug 2005-08-28 19:41:01.000000000 -0400
+++ linux-rt-quilt/arch/x86_64/Kconfig.debug 2005-10-07 11:43:45.000000000 -0400
@@ -33,6 +33,14 @@
options. See Documentation/x86_64/boot-options.txt for more
details.

+config DEBUG_STACKOVERFLOW
+ bool "Check for stack overflows"
+ depends on DEBUG_KERNEL
+ default y
+ help
+ This option will cause messages to be printed if free stack space
+ drops below a certain limit.
+
config KPROBES
bool "Kprobes"
depends on DEBUG_KERNEL

2005-10-07 19:10:49

by John Rigg

[permalink] [raw]
Subject: Re: 2.6.14-rc3-rt10 crashes on boot

On Friday, October 7 Steve Rostedt wrote:

>Add this patch and it will add the option for you in x86_64 (I forgot that
>you were using that). I even set it to be default on. I didn't add a test
>in do_IRQ, but I believe that the tests in latency.c should be good
>enough.

Hi Steve,

Thanks for the patch. I applied it to 2.6.14-rc3-rt12, looked in
arch/x86_64/Kconfig.debug just to be sure it applied OK to -rt12,
then ran make. It failed to compile, with the following message:

CC kernel/rt.o
CC kernel/latency.o
kernel/latency.c: In function '__print_worst_stack':
kernel/latency.c:336: warning: format '%d' expects type 'int', but argument 5 has type 'long unsigned int'
kernel/latency.c:384:3: error: #error Poke the author of above asm code line !
kernel/latency.c: In function 'debug_stackoverflow':
kernel/latency.c:386: error: 'STACK_WARN' undeclared (first use in this function)
kernel/latency.c:386: error: (Each undeclared identifier is reported only once
kernel/latency.c:386: error: for each function it appears in.)
make[1]: *** [kernel/latency.o] Error 1
make: *** [kernel] Error 2

I wonder if DEBUG_STACKOVERFLOW was left out of x86_64 for this reason.

John

2005-10-07 19:43:06

by Steven Rostedt

[permalink] [raw]
Subject: Re: 2.6.14-rc3-rt10 crashes on boot


John, Please don't strip the CC list. Ingo may want to see what's
happening, and besides, it's not proper netiquette for LKML.

On Fri, 7 Oct 2005, John Rigg wrote:

> On Friday, October 7 Steve Rostedt wrote:
>
> >Add this patch and it will add the option for you in x86_64 (I forgot that
> >you were using that). I even set it to be default on. I didn't add a test
> >in do_IRQ, but I believe that the tests in latency.c should be good
> >enough.
>
> Hi Steve,
>
> Thanks for the patch. I applied it to 2.6.14-rc3-rt12, looked in
> arch/x86_64/Kconfig.debug just to be sure it applied OK to -rt12,
> then ran make. It failed to compile, with the following message:
>
> CC kernel/rt.o
> CC kernel/latency.o
> kernel/latency.c: In function '__print_worst_stack':
> kernel/latency.c:336: warning: format '%d' expects type 'int', but argument 5 has type 'long unsigned int'
> kernel/latency.c:384:3: error: #error Poke the author of above asm code line !
> kernel/latency.c: In function 'debug_stackoverflow':
> kernel/latency.c:386: error: 'STACK_WARN' undeclared (first use in this function)
> kernel/latency.c:386: error: (Each undeclared identifier is reported only once
> kernel/latency.c:386: error: for each function it appears in.)
> make[1]: *** [kernel/latency.o] Error 1
> make: *** [kernel] Error 2
>
> I wonder if DEBUG_STACKOVERFLOW was left out of x86_64 for this reason.
>

Here's an addon patch to my last one. I don't know x86_64 very well, but
I believe the the asm is pretty much the same, so this patch removes the
check for __i386__ and also defines STACK_WARN.

I'm leaving for the weekend, so you are now on your own. Unless you get
help from others. ;-)

-- Steve

Index: linux-rt-quilt/include/asm-x86_64/page.h
===================================================================
--- linux-rt-quilt.orig/include/asm-x86_64/page.h 2005-10-06 08:04:00.000000000 -0400
+++ linux-rt-quilt/include/asm-x86_64/page.h 2005-10-07 15:34:20.000000000 -0400
@@ -21,6 +21,8 @@
#endif
#define CURRENT_MASK (~(THREAD_SIZE-1))

+#define STACK_WARN (THREAD_SIZE/8)
+
#define LARGE_PAGE_MASK (~(LARGE_PAGE_SIZE-1))
#define LARGE_PAGE_SIZE (1UL << PMD_SHIFT)

Index: linux-rt-quilt/kernel/latency.c
===================================================================
--- linux-rt-quilt.orig/kernel/latency.c 2005-10-06 08:04:56.000000000 -0400
+++ linux-rt-quilt/kernel/latency.c 2005-10-07 15:31:20.000000000 -0400
@@ -377,7 +377,8 @@
atomic_inc(&tr->disabled);

/* Debugging check for stack overflow: is there less than 1KB free? */
-#ifdef __i386__
+#if 1 // def __i386__
+ /* Hopefully this works on x86_64! */
__asm__ __volatile__("andl %%esp,%0" :
"=r" (stack_left) : "0" (THREAD_SIZE - 1));
#else

2005-10-09 16:22:57

by John Rigg

[permalink] [raw]
Subject: Re: 2.6.14-rc3-rt10 crashes on boot

On Friday, October 7 Steven Rostedt wrote:

>Here's an addon patch to my last one. I don't know x86_64 very well, but
>I believe the the asm is pretty much the same, so this patch removes the
>check for __i386__ and also defines STACK_WARN.

>Index: linux-rt-quilt/include/asm-x86_64/page.h
>===================================================================
>--- linux-rt-quilt.orig/include/asm-x86_64/page.h 2005-10-06 08:04:00.000000000 -0400
>+++ linux-rt-quilt/include/asm-x86_64/page.h 2005-10-07 15:34:20.000000000 -0400
>@@ -21,6 +21,8 @@
> #endif
> #define CURRENT_MASK (~(THREAD_SIZE-1))
>
>+#define STACK_WARN (THREAD_SIZE/8)
>+
> #define LARGE_PAGE_MASK (~(LARGE_PAGE_SIZE-1))
> #define LARGE_PAGE_SIZE (1UL << PMD_SHIFT)
>
>Index: linux-rt-quilt/kernel/latency.c
>===================================================================
>--- linux-rt-quilt.orig/kernel/latency.c 2005-10-06 08:04:56.000000000 -0400
>+++ linux-rt-quilt/kernel/latency.c 2005-10-07 15:31:20.000000000 -0400
>@@ -377,7 +377,8 @@
> atomic_inc(&tr->disabled);
>
> /* Debugging check for stack overflow: is there less than 1KB free? */
>-#ifdef __i386__
>+#if 1 // def __i386__
>+ /* Hopefully this works on x86_64! */
> __asm__ __volatile__("andl %%esp,%0" :
> "=r" (stack_left) : "0" (THREAD_SIZE - 1));
> #else

Steve, thanks for these patches. I got it to compile with 2.6.14-rc3-rt12
but had to change the assembly lines in (patched) latency.c to

__asm__ __volatile__("and %%rsp,%0" :
"=r" (stack_left) : "0" (THREAD_SIZE - 1));

ie. `and' instead of `andl' and `%%rsp' instead of `%%esp'.
Somebody who understands x86_64 assembly better than I do should probably check
this before anyone tries using it.
While I was at it I changed a printk arg in line 335 of (patched) latency.c -
I think the last %d should be %ld, ie.

printk("| new stack-footprint maximum: %s/%d, %ld bytes (out of %ld bytes).\n",
worst_stack_comm, worst_stack_pid, MAX_STACK-worst_stack_left, MAX_STACK);

John

2005-10-09 16:24:30

by John Rigg

[permalink] [raw]
Subject: Re: 2.6.14-rc3-rt10 crashes on boot

On Sunday October 9 2005 John Rigg wrote:
>Steve, thanks for these patches. I got it to compile with 2.6.14-rc3-rt12
>but had to change the assembly lines in (patched) latency.c to
>
> __asm__ __volatile__("and %%rsp,%0" :
> "=r" (stack_left) : "0" (THREAD_SIZE - 1));
>
>ie. `and' instead of `andl' and `%%rsp' instead of `%%esp'.
>Somebody who understands x86_64 assembly better than I do should probably check
>this before anyone tries using it.
>While I was at it I changed a printk arg in line 335 of (patched)
>latency.c - I think the last %d should be %ld, ie.
>
>printk("| new stack-footprint maximum: %s/%d, %ld bytes (out of %ld bytes).\n",
> worst_stack_comm, worst_stack_pid, MAX_STACK-worst_stack_left, MAX_STACK);

Ingo, thanks to help from Steve Rostedt I got 2.6.14-rc3-rt12 to compile
with CONFIG_DEBUG_STACKOVERFLOW=y on x86_64 smp. Unfortunately if I enable
it along with latency tracing (which is causing the crash during boot)
it crashes so early that I can't get anything from the serial console,
even using earlyprintk. All I get is a blank screen for a few seconds
then the machine reboots.
I have all other debugging options disabled (apart from necessary dependencies
for these two). This of course means that I can't confirm whether the crash
is caused by stack overflow.
With latency tracing disabled but CONFIG_DEBUG_STACKOVERFLOW=y the kernel
boots and runs fine.
BTW this is still with initrd. If the stack footprint is likely to
be smaller with initramfs I'll give it a try, but it'll be a few days
before I can set this up (I still have to work out how to use initramfs).

John

2005-10-11 10:54:49

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.14-rc3-rt10 crashes on boot


* Steven Rostedt <[email protected]> wrote:

> > I wonder if DEBUG_STACKOVERFLOW was left out of x86_64 for this reason.
>
> Here's an addon patch to my last one. I don't know x86_64 very well,
> but I believe the the asm is pretty much the same, so this patch
> removes the check for __i386__ and also defines STACK_WARN.

this wont work on x64 - but i've now implemented this in my tree and it
should work fine in -rc4-rt1.

Ingo

2005-10-11 11:17:20

by Ingo Molnar

[permalink] [raw]
Subject: Re: 2.6.14-rc3-rt10 crashes on boot


* John Rigg <[email protected]> wrote:

> Ingo, thanks to help from Steve Rostedt I got 2.6.14-rc3-rt12 to
> compile with CONFIG_DEBUG_STACKOVERFLOW=y on x86_64 smp. Unfortunately
> if I enable it along with latency tracing (which is causing the crash
> during boot) it crashes so early that I can't get anything from the
> serial console, even using earlyprintk.

i fixed this in -rc4-rt1, could you give it a try?

Ingo