2006-05-31 04:45:33

by Piet Delaney

[permalink] [raw]
Subject: Re: linux-2.6 x86_64 kgdb issue

On Thu, 2006-05-25 at 12:07 +0530, Amit S. Kale wrote:
> On Wednesday 24 May 2006 23:41, Vladimir A. Barinov wrote:
> > Amit S. Kale wrote:
> > >Looking at this again:
> > >Call Trace: {kgdb_notify}
> > > {notifier_call_chain}
> > > {do_stack_segment}
> > > {stack_segment}
> > > {io_outb}
> > > {kgdb_mem2hex}
> > >
> > >Why is io_outb being called from kgdb_mem2hex. kgdb_mem2hex refers to data
> > >directly and not through io_outb.
> > >
> > >Perhaps it's got something to do with iommu feature. Have you used "iommu"
> > > on kernel command line?
> >
> > I have iommu switched on, but disabling this lead to the same dump_stack
> > result.
>
> This is confusing. Could you shed some light on this.
>
> > But I've used earlier version of kgdb_8250.c and io_outb() was a
> > callback similar to kgdb_oiwrite() in kgdb-2.6.16.tar.bz2.
> > I merged the kgdb_8250.c from the kgdb-2.6.16.tar.bz2 and got the
> > dump_stack:
> > {notifier_call_chain}
> > {do_stack_segment}
> > {stack_segment}
> > {kgdb_mem2hex}
> > {kgdb_mem2hex}
> >
> >
> > Also I've used 2.6.10 with this stack exception behavior. In 2.6.16
> > patched with kgdb-2.6.16.tar.bz2 the behavior is so that the target
> > reboots after multiple steps and a "continue" command in the end.
>
> Now it looks like we have a stack overflow. It would result in a stack
> exception. Stack overflow usually results in a complete breakdown of a kernel
> since there is no stack to handle the stack exception itself. Processors upto
> pentium used float their buses, which would be detected by the surrounding
> hardware and cause a reset. I am not sure whether modern processors and/or
> hardware does that.
>
> >
> > Just want to note that in 2.6.10 kernel the stack exception doesn't
> > occur if CONFIG_64BIT in linux/kernel/kgdb.c is not defined.
>
> CONFIG_64BIT probably requires more stack. That's why you see a stack
> exception.

I added some debug info to the thread and got stack overflows; it was
trivial to double the size of the stack. I was saving a back trace of
the stack during at each preemption point (Ex: spinlock) to allow
me to see the context of the active holders of spinlocks. I configured
it with CONFIG_DEBUG_PREEMPT_AUDIT and enabled large stacks in:
------------------------------------------------------------------------
include/asm-i386/thread_info.h:
------------------------------------------------------------------------
#ifdef CONFIG_DEBUG_PREEMPT_AUDIT
#define THREAD_SIZE (8192 * 2)
#else
#ifdef CONFIG_4KSTACKS
#define THREAD_SIZE (4096)
#else
#define THREAD_SIZE (8192)
#endif
#endif
-------------------------------------------------------------------------
All you really need to do is change THREAD_SIZE from
(8192) to (8192 * 2). I didn't have any problems in i386.



>
> Unfortunately x86_64 architecture doesn't provide any stack overflow debugging
> mechanism. Perhaps you can implement a little code in kgdb_handle_exception,
> which checks whether we are beyond 7168 bytes of stack usage on entry. If we
> are, declare a panic indicating a possible stack overflow later.

I was getting stack overflows on the SPARC architecture when compiling
the kernel -O1 for kgdb/kdbx debugging. I allocated a hot physical page
for each CPU as it was brought on line and then mapped it on the fly
when we got a stack overflow. I then pushed out the register window that
caused the trap, and then continued with the normal panic path.


Perhaps we should add a kgdb config menu option and #define
CONFIG_16KSTACKS to double the stack size so the kernel can be
debugged with more context available. I'm currently using -O0 for
the networking stack and -O1 for the rest of the kernel. Sounds like
it would be helpful now for AMD64 targets.

-piet

>
> -Amit
>
> >
> > Vladimir
> >
> > >-Amit
> > >
> > >On Friday 19 May 2006 23:45, Vladimir A. Barinov wrote:
> > >>Hi All,
> > >>
> > >>I'm working with em64t dual xeon board and have a problems with kgdb
> > >>when SMP is on.
> > >>During step by step debugging I've got the error message and gdb server
> > >>lost connection
> > >>to the target (gdb log is attached)
> > >>
> > >>Putting simple printk() and dump_stack() into the kgdb_notify():
> > >> .....
> > >> if (cmd == DIE_TRAP) {
> > >> printk("DIE_TRAP, args->str=%s,
> > >>kgdb_may_fault=%d\n",args->str,kgdb_may_fault);
> > >> dump_stack();
> > >> }
> > >> ......
> > >>
> > >>I've got trace:
> > >>DIE_TRAP, args->str=stack segment, kgdb_may_fault=1
> > >>Call Trace: {kgdb_notify}
> > >> {notifier_call_chain}
> > >> {do_stack_segment}
> > >> {stack_segment}
> > >> {io_outb}
> > >> {kgdb_mem2hex}
> > >>
> > >>The stack exception occurs always at the same step during debugging in
> > >>kgdb_mem2hex().
> > >>I've attached patch that fixes this issue. Could you please review, is
> > >>this patch appropriate
> > >>to the problem?
> > >>
> > >>Vladimir
>
>
> -------------------------------------------------------
> All the advantages of Linux Managed Hosting--Without the Cost and Risk!
> Fully trained technicians. The highest number of Red Hat certifications in
> the hosting industry. Fanatical Support. Click to learn more
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
> _______________________________________________
> Kgdb-bugreport mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport
--
---
[email protected]


2006-05-31 05:50:51

by Andi Kleen

[permalink] [raw]
Subject: Re: linux-2.6 x86_64 kgdb issue


> Perhaps we should add a kgdb config menu option and #define
> CONFIG_16KSTACKS to double the stack size so the kernel can be
> debugged with more context available. I'm currently using -O0 for
> the networking stack and -O1 for the rest of the kernel. Sounds like
> it would be helpful now for AMD64 targets.

You only got stack overflows when working with kgdb right?
Sounds like a kgdb problem to me then that can and should be probably fixed. e.g.
afaik kgdb isn't reentrant anyways so it could just use static buffers.

I would suggest against adding any new config options for this - it would
conflict with the great goal of having loadable debuggers that work
on any kernels.

-Andi

2006-05-31 06:46:32

by Piet Delaney

[permalink] [raw]
Subject: Re: linux-2.6 x86_64 kgdb issue

On Wed, 2006-05-31 at 07:50 +0200, Andi Kleen wrote:
> > Perhaps we should add a kgdb config menu option and #define
> > CONFIG_16KSTACKS to double the stack size so the kernel can be
> > debugged with more context available. I'm currently using -O0 for
> > the networking stack and -O1 for the rest of the kernel. Sounds like
> > it would be helpful now for AMD64 targets.
>
> You only got stack overflows when working with kgdb right?

Yes, I was using kgdb to look at the stack audits I stored in
the thread structure.

>
> Sounds like a kgdb problem to me then that can and should be probably fixed. e.g.
> afaik kgdb isn't reentrant anyways so it could just use static buffers.

On Solaris the problem was that the kernel stack was larger because tail
optimization was disabled with optimization disabled. I'm not having
a kgdb problem with i386, it seems that Vladimir is/was and Amit
suspected it being due to the AMD64 requiring largers stacks. Seems
plausible to me. I thought you might have thoughts on that.

>
> I would suggest against adding any new config options for this - it would
> conflict with the great goal of having loadable debuggers that work
> on any kernels.

What's the conflict with larger kernel stacks and a loadable (kgdb)
module? Like Andy Morton I prefer to avoid using modules when using
kgdb; so I wouldn't have run into a problem.

I was suggesting larger stack space for the kernel, not kgdb. I agree
this case might be one where kgdb has caused the kernel to trip over
the edge. I don't feel comfortable running on a kernel that running
that close to running out of stack space. Maybe that's just a personal
preference; I'm paranoid I guess. I like having rock solid systems and
wacking the stack isn't always detected. On SunOS we had a REDZONE but
last I check Linux didn't; has one been added? If it hasn't it might
be good to keep in mind having a CPU specific physical page available
when we grow into the REDZONE. Looked to me like the stack grows right
into the thread structure; might make a nice exploit for a linux root
kit.

Having loadable debuggers seems a bit high hopes, as 'we' haven't even
release linux with kgdb built into the Linux src yet.

-piet


> -Andi
>
--
---
[email protected]

2006-05-31 07:14:13

by Andi Kleen

[permalink] [raw]
Subject: Re: linux-2.6 x86_64 kgdb issue

On Wednesday 31 May 2006 08:46, Piet Delaney wrote:
> On Wed, 2006-05-31 at 07:50 +0200, Andi Kleen wrote:
> > > Perhaps we should add a kgdb config menu option and #define
> > > CONFIG_16KSTACKS to double the stack size so the kernel can be
> > > debugged with more context available. I'm currently using -O0 for
> > > the networking stack and -O1 for the rest of the kernel. Sounds like
> > > it would be helpful now for AMD64 targets.
> >
> > You only got stack overflows when working with kgdb right?
>
> Yes, I was using kgdb to look at the stack audits I stored in
> the thread structure.

Again this likely points to kgdb using too much stack.


> > Sounds like a kgdb problem to me then that can and should be probably fixed. e.g.
> > afaik kgdb isn't reentrant anyways so it could just use static buffers.
>
> On Solaris the problem was that the kernel stack was larger because tail
> optimization was disabled with optimization disabled. I'm not having
> a kgdb problem with i386, it seems that Vladimir is/was and Amit
> suspected it being due to the AMD64 requiring largers stacks. Seems
> plausible to me. I thought you might have thoughts on that.

Stack usage in Linux isn't that tight that it should make a difference.
If something needs too much stack we just fix it.

> >
> > I would suggest against adding any new config options for this - it would
> > conflict with the great goal of having loadable debuggers that work
> > on any kernels.
>
> What's the conflict with larger kernel stacks and a loadable (kgdb)
> module?

We'll not increase the stacks by default but the debugger should
work anyways.

> I was suggesting larger stack space for the kernel, not kgdb. I agree
> this case might be one where kgdb has caused the kernel to trip over
> the edge. I don't feel comfortable running on a kernel that running
> that close to running out of stack space. Maybe that's just a personal
> preference; I'm paranoid I guess. I like having rock solid systems and
> wacking the stack isn't always detected. On SunOS we had a REDZONE but
> last I check Linux didn't; has one been added?

Interrupts can check for too much stack space used.

> If it hasn't it might
> be good to keep in mind having a CPU specific physical page available
> when we grow into the REDZONE. Looked to me like the stack grows right
> into the thread structure; might make a nice exploit for a linux root
> kit.

If you have a stack overflow you usually have other problems than that.

> Having loadable debuggers seems a bit high hopes, as 'we' haven't even
> release linux with kgdb built into the Linux src yet.

Yes because you if modular works you don't need to build it in.

Modular was working at some point on x86-64 for kdb and the original 2.6 version
of kgdb was nearly there too.

-Andi

2006-05-31 08:39:15

by Piet Delaney

[permalink] [raw]
Subject: Re: linux-2.6 x86_64 kgdb issue

On Wed, 2006-05-31 at 09:13 +0200, Andi Kleen wrote:
> On Wednesday 31 May 2006 08:46, Piet Delaney wrote:
> > On Wed, 2006-05-31 at 07:50 +0200, Andi Kleen wrote:
> > > > Perhaps we should add a kgdb config menu option and #define
> > > > CONFIG_16KSTACKS to double the stack size so the kernel can be
> > > > debugged with more context available. I'm currently using -O0 for
> > > > the networking stack and -O1 for the rest of the kernel. Sounds like
> > > > it would be helpful now for AMD64 targets.
> > >
> > > You only got stack overflows when working with kgdb right?
> >
> > Yes, I was using kgdb to look at the stack audits I stored in
> > the thread structure.
>
> Again this likely points to kgdb using too much stack.

My bet is that in this case I was storing a LOT of
data in the thread structure, so the space left for
the stack was massively reduced.

I was using a debugging tool for taking snapshots of the
top 16 PC's on the stack when you take a spinlock; for
each premption level. It's thread specific data, so the
thread structure seemed like a reasonable place to store it.

>
>
> > > Sounds like a kgdb problem to me then that can and should be probably fixed. e.g.
> > > afaik kgdb isn't reentrant anyways so it could just use static buffers.
> >
> > On Solaris the problem was that the kernel stack was larger because tail
> > optimization was disabled with optimization disabled. I'm not having
> > a kgdb problem with i386, it seems that Vladimir is/was and Amit
> > suspected it being due to the AMD64 requiring largers stacks. Seems
> > plausible to me. I thought you might have thoughts on that.
>
> Stack usage in Linux isn't that tight that it should make a difference.
> If something needs too much stack we just fix it.
>
> > >
> > > I would suggest against adding any new config options for this - it would
> > > conflict with the great goal of having loadable debuggers that work
> > > on any kernels.
> >
> > What's the conflict with larger kernel stacks and a loadable (kgdb)
> > module?
>
> We'll not increase the stacks by default but the debugger should
> work anyways.

Sure but the debugger environment must tolerate larger stacks.
A developer may prefer to use a larger stack, like in my case
of storing debug info in the thread structure. The interrupt
stack checks can easily miss nicking the thread structure, so
increasing the stack size for experimentation SHOULD always work.


>
> > I was suggesting larger stack space for the kernel, not kgdb. I agree
> > this case might be one where kgdb has caused the kernel to trip over
> > the edge. I don't feel comfortable running on a kernel that running
> > that close to running out of stack space. Maybe that's just a personal
> > preference; I'm paranoid I guess. I like having rock solid systems and
> > wacking the stack isn't always detected. On SunOS we had a REDZONE but
> > last I check Linux didn't; has one been added?
>
> Interrupts can check for too much stack space used.

But this can miss a minor abuse. The interrupt check
is a quick and simple hack but I wonder if it's really
optimal for commercial implementations.

>
> > If it hasn't it might
> > be good to keep in mind having a CPU specific physical page available
> > when we grow into the REDZONE. Looked to me like the stack grows right
> > into the thread structure; might make a nice exploit for a linux root
> > kit.
>
> If you have a stack overflow you usually have other problems than that.

Yep, like viewing the stack with kgdb. With Solaris all of the task use
the same virtual address space for the stack, so mapping in a physical
page was easy. Linux stacks are mapped linear 1:1 to physical pages,
so it's not easy.


>
> > Having loadable debuggers seems a bit high hopes, as 'we' haven't even
> > release linux with kgdb built into the Linux src yet.
>
> Yes because you if modular works you don't need to build it in.

I think all modules should be ABLE to be built in. 2.6 has made good
progress in that regard. I'd prefer to see kgdb also ABLE to be linked
in or a module; just like most modules are doing now. I don't see an
advantage to the Linux community to REQUIRING the kgdb module to
NOT being ABLE to be linked in. Using modules often requires more
work to set up and get working right. It has it's advantages, I'd
prefer to leave it up to the developer to choose. Sometimes you
unload a module and pre-existing callbacks can mess up things in
unexpected way when the size of data structures change. I've used
kgdb on modules but for stuff like the TCP stack I really don't
want to mess with that. I'd prefer we make it easier for users
to get set up for kernel debugging. Just config kgdb, avoid using
kernel modules, install and run. Having kgdb in the kernel source
tree doesn't preclude it's use as a module. It does require it to
be under the GNU license. Unfortunately the HPUX kgdb patch wasn't
under the GNU license; it had some nice features, like kgdb over
TCP with ssh encryption. An that was 10 years ago!

>
> Modular was working at some point on x86-64 for kdb and the original 2.6 version
> of kgdb was nearly there too.

Yea, I saw that, too bad we can't seem to get a version of kgdb into
the kernel that can be update with the rest of the kernel. As I recall
NetBSD, FreeBSD, and OpenBSD have always done it that way. 'We' almost
got George Anzinger's kgdb patch into Linux; it was working great in the
mm series. Seems it should all be configurable with #ifdef CONFIG_KGDB
just like other kernel features. If you want to support other debuggers
than gdb I suppose common trap code could use a common CONFIG_* token.

-piet

>
> -Andi
>
--
---
[email protected]

2006-05-31 09:41:51

by Andi Kleen

[permalink] [raw]
Subject: Re: linux-2.6 x86_64 kgdb issue


> My bet is that in this case I was storing a LOT of
> data in the thread structure, so the space left for
> the stack was massively reduced.

Ok so it was your bug. Don't do that.

> Sure but the debugger environment must tolerate larger stacks.

No, Linux doesn't tolerate larger stacks.

> But this can miss a minor abuse. The interrupt check
> is a quick and simple hack but I wonder if it's really
> optimal for commercial implementations.

In practice if you overwrite thread_info you crash eventually
and it's noticed. If you write below thread_info but keep
ti intact then the redzone would likely not catch it either.

I don't think an additional red zone would improve overflow detection
in a significant way.

> I think all modules should be ABLE to be built in.

If you have a working module it can be easily built in too.
Just hacks that don't work with modules are bad.

-Andi

2006-05-31 15:03:45

by Tom Rini

[permalink] [raw]
Subject: Re: linux-2.6 x86_64 kgdb issue

On Wed, May 31, 2006 at 09:13:53AM +0200, Andi Kleen wrote:

[snip]
> Yes because you if modular works you don't need to build it in.
>
> Modular was working at some point on x86-64 for kdb and the original 2.6 version
> of kgdb was nearly there too.

FWIW, the only change the current version of kgdb makes that would
prevent it from being totally modular is the debugger_active check in
__might_sleep().

--
Tom Rini

2006-05-31 21:01:09

by Andi Kleen

[permalink] [raw]
Subject: Re: linux-2.6 x86_64 kgdb issue

On Wednesday 31 May 2006 17:03, Tom Rini wrote:
> On Wed, May 31, 2006 at 09:13:53AM +0200, Andi Kleen wrote:
>
> [snip]
>
> > Yes because you if modular works you don't need to build it in.
> >
> > Modular was working at some point on x86-64 for kdb and the original 2.6
> > version of kgdb was nearly there too.
>
> FWIW, the only change the current version of kgdb makes that would
> prevent it from being totally modular is the debugger_active check in
>

Can you post the patch and a description?

-Andi

2006-05-31 22:35:18

by Tom Rini

[permalink] [raw]
Subject: Re: linux-2.6 x86_64 kgdb issue

On Wed, May 31, 2006 at 11:01:56PM +0200, Andi Kleen wrote:
> On Wednesday 31 May 2006 17:03, Tom Rini wrote:
> > On Wed, May 31, 2006 at 09:13:53AM +0200, Andi Kleen wrote:
> >
> > [snip]
> >
> > > Yes because you if modular works you don't need to build it in.
> > >
> > > Modular was working at some point on x86-64 for kdb and the original 2.6
> > > version of kgdb was nearly there too.
> >
> > FWIW, the only change the current version of kgdb makes that would
> > prevent it from being totally modular is the debugger_active check in
>
> Can you post the patch and a description?

The change is a simple if (atomic_read(&debugger_active)) return right
at the start. And I'm embarrased to say the change predates me on the
project so I'm not 100% sure on the lineage and it might be totally
bogus now.

--
Tom Rini

2006-05-31 22:52:32

by Piet Delaney

[permalink] [raw]
Subject: Re: linux-2.6 x86_64 kgdb issue

On Wed, 2006-05-31 at 23:01 +0200, Andi Kleen wrote:
> On Wednesday 31 May 2006 17:03, Tom Rini wrote:
> > On Wed, May 31, 2006 at 09:13:53AM +0200, Andi Kleen wrote:
> >
> > [snip]
> >
> > > Yes because you if modular works you don't need to build it in.
> > >
> > > Modular was working at some point on x86-64 for kdb and the original 2.6
> > > version of kgdb was nearly there too.
> >
> > FWIW, the only change the current version of kgdb makes that would
> > prevent it from being totally modular is the debugger_active check in
> >
>
> Can you post the patch and a description?

It's maintained at SourceForge:

http://sourceforge.net/projects/kgdb

Patches can be downloaded from:

http://sourceforge.net/project/showfiles.php?group_id=5073

I suspect that you will likely want to add yourself to the
kgdb-bugreport mailing list.

https://lists.sourceforge.net/lists/listinfo/kgdb-bugreport

Mailing list is acting up; I've been having trouble posting due to
a mailserver problem; I contacted the SourceForge folks about it.

Andrew Morton has recently been taking snapshots and including them
in his mm series. It would be nice to get in "in order" and ready
for being merged up into the linux tree. When I applied the 2.6.13
patch I think I noticed some '#ifdef KGDB' code missing. I'd like
to see the patch totally disabled by not configuring KGDB. The
patch instructions are also a bit misleading, at least for 2.6.13;
I had to apply all of the patches in the series to avoid a lot a
patch rejects.

-piet

>
> -Andi
--
---
[email protected]


Attachments:
(No filename) (2.87 kB)
Attached message - Re: mm patches - what's the heirarchy for patches on www.kernel.org web page?

2006-05-31 23:01:32

by Andi Kleen

[permalink] [raw]
Subject: Re: linux-2.6 x86_64 kgdb issue

On Thursday 01 June 2006 00:35, Tom Rini wrote:
> On Wed, May 31, 2006 at 11:01:56PM +0200, Andi Kleen wrote:
> > On Wednesday 31 May 2006 17:03, Tom Rini wrote:
> > > On Wed, May 31, 2006 at 09:13:53AM +0200, Andi Kleen wrote:
> > >
> > > [snip]
> > >
> > > > Yes because you if modular works you don't need to build it in.
> > > >
> > > > Modular was working at some point on x86-64 for kdb and the original 2.6
> > > > version of kgdb was nearly there too.
> > >
> > > FWIW, the only change the current version of kgdb makes that would
> > > prevent it from being totally modular is the debugger_active check in
> >
> > Can you post the patch and a description?
>
> The change is a simple if (atomic_read(&debugger_active)) return right
> at the start. And I'm embarrased to say the change predates me on the
> project so I'm not 100% sure on the lineage and it might be totally
> bogus now.

And why do you need it? Where does the debugger call might sleep?

-Andi

2006-05-31 23:03:47

by Piet Delaney

[permalink] [raw]
Subject: Re: linux-2.6 x86_64 kgdb issue

On Wed, 2006-05-31 at 15:52 -0700, Piet Delaney wrote:

> Mailing list is acting up; I've been having trouble posting due to
> a mailserver problem; I contacted the SourceForge folks about it.

Perhaps my problem was likely not having a SourceForge account;
I added one last evening an it looks like my posting showed up on the
mailing list.

http://sourceforge.net/account/newuser_emailverify.php

-piet