2005-05-10 12:35:03

by kus Kusche Klaus

[permalink] [raw]
Subject: Real-Time Preemption: BUG initializing kgdb

I tried to merge the kgdb and the rt patches (not too difficult, only
three rejects, and they all look trivial). The resulting kernel
compiles, boots, and works fine.

However, kgdb initialization generates a BUG, and kgdb does not work at
all (gdb on the host is unable to connect to the target, and Alt-SysRq-g
has no effect).

The bug:

May 10 15:56:46 OF455 kern.notice kernel: Linux version
2.6.12-rc3-RT-V0.7.46-02 (kk@silver) (gcc version 3.4.3) #3 Tue May 10
13:41:59 CEST 2005

...

May 10 15:56:46 OF455 kern.warn kernel: kgdb <20030915.1651.33> : port
=2e8, IRQ=5, divisor =1
May 10 15:56:46 OF455 kern.err kernel: BUG: scheduling while atomic:
swapper/0x00000001/1
May 10 15:56:46 OF455 kern.warn kernel: caller is schedule+0x102/0x110
May 10 15:56:46 OF455 kern.warn kernel: [<c0102f29>]
dump_stack+0x17/0x19 (12)
May 10 15:56:46 OF455 kern.warn kernel: [<c025de85>]
__schedule+0xa5/0x633 (88)
May 10 15:56:46 OF455 kern.warn kernel: [<c025e515>]
schedule+0x102/0x110 (36)
May 10 15:56:46 OF455 kern.warn kernel: [<c025e671>]
wait_for_completion+0x86/0xbd (84)
May 10 15:56:46 OF455 kern.warn kernel: [<c01262eb>]
kthread_create+0x118/0x146 (208)
May 10 15:56:46 OF455 kern.warn kernel: [<c012be97>]
start_irq_thread+0x42/0x77 (32)
May 10 15:56:46 OF455 kern.warn kernel: [<c012ba57>]
setup_irq+0x51/0x12f (28)
May 10 15:56:46 OF455 kern.warn kernel: [<c012bc9e>]
request_irq+0x86/0x9f (24)
May 10 15:56:46 OF455 kern.warn kernel: [<c01a0b73>]
kgdb_enable_ints_now+0x76/0xae (16)
May 10 15:56:46 OF455 kern.warn kernel: [<c02d26e7>]
kgdb_enable_ints+0x38/0x3f (8)
May 10 15:56:46 OF455 kern.warn kernel: [<c02c47ff>]
do_initcalls+0x62/0xb3 (32)
May 10 15:56:46 OF455 kern.warn kernel: [<c02c4871>]
do_basic_setup+0x21/0x23 (8)
May 10 15:56:46 OF455 kern.warn kernel: [<c01002cd>] init+0x32/0xfd
(20)
May 10 15:56:46 OF455 kern.warn kernel: [<c0100c21>]
kernel_thread_helper+0x5/0xb (1055084564)

...

This is with RT-V0.7.46-02 and the kgdb patchset from -rc3-mm3.

Any hints or suggestions?

--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com


2005-05-10 21:58:11

by Bill Huey

[permalink] [raw]
Subject: Re: Real-Time Preemption: BUG initializing kgdb

On Tue, May 10, 2005 at 02:34:54PM +0200, kus Kusche Klaus wrote:
> I tried to merge the kgdb and the rt patches (not too difficult, only
> three rejects, and they all look trivial). The resulting kernel
> compiles, boots, and works fine.
...
> Any hints or suggestions?

Revert all spinlock_t types that kgdb uses to raw_spinlock_t to get the
actual spinlock code. A compile trick matches up the right functions
with the struct definition so that changes to the kernel code is minimized.
The spinlock_t defintion in the RT patch is #defined to be a blocking lock
which is not what kgdb wants in order to be happy.

Also, make the interrupt handler setup uses SA_NODELAY or something like
that from my memory. The rest is relatively trivial.

Thanks for making the attempt. Somebody needed to do this a long time
ago. :)

bill

2005-05-11 14:43:18

by kus Kusche Klaus

[permalink] [raw]
Subject: RE: Real-Time Preemption: BUG initializing kgdb

> On Tue, May 10, 2005 at 02:34:54PM +0200, kus Kusche Klaus wrote:
> > I tried to merge the kgdb and the rt patches (not too
> difficult, only
> > three rejects, and they all look trivial). The resulting kernel
> > compiles, boots, and works fine.
> ...
> > Any hints or suggestions?
>
> Revert all spinlock_t types that kgdb uses to raw_spinlock_t
> to get the
> actual spinlock code. A compile trick matches up the right functions
> with the struct definition so that changes to the kernel code
> is minimized.
> The spinlock_t defintion in the RT patch is #defined to be a
> blocking lock
> which is not what kgdb wants in order to be happy.
>
> Also, make the interrupt handler setup uses SA_NODELAY or
> something like
> that from my memory. The rest is relatively trivial.
>
> Thanks for making the attempt. Somebody needed to do this a long time
> ago. :)
>
> bill

Your hints helped a lot, thanks.

I changed three spinlocks (ts_spin, uart_interrupt_lock, one_at_atime)
and modified the kgdb serial interrupt handler to register with
SA_NODELAY (no need to change more locks, as I'm on a non-SMP target).

These changes resulted in a kernel which compiles and works fine, they
cured the BUG I reported yesterday, and they made kgdb "basically work":
I can connect over serial line or over ethernet, I can get "where"s and
variables etc., I can "cont", ...

However, there are still some issues:

* When debugging over ethernet, the kernel produces the following
messages in an infinite loop at full speed as long as it is halted by
gdb:

May 11 16:13:02 OF455 kern.warn kernel: ksoftirqd/0/2: BUG in
netpoll_poll at net/core/netpoll.c:157
May 11 16:13:02 OF455 kern.warn kernel: [<c0102f29>]
dump_stack+0x17/0x19 (12)
May 11 16:13:02 OF455 kern.warn kernel: [<c0224ac8>]
netpoll_poll+0xd0/0xed (36)
May 11 16:13:02 OF455 kern.warn kernel: [<c01da116>]
eth_getDebugChar+0x16/0x41 (8)
May 11 16:13:02 OF455 kern.warn kernel: [<c010c87e>]
getDebugChar+0x18/0x1a (8)
May 11 16:13:02 OF455 kern.warn kernel: [<c010cba2>]
putpacket+0x184/0x198 (80)
May 11 16:13:02 OF455 kern.warn kernel: [<c010e06b>]
kgdb_handle_exception+0xbfa/0xd0c (184)
May 11 16:13:02 OF455 kern.warn kernel: [<c01034a5>] do_int3+0x2c/0x9b
(64)
May 11 16:13:02 OF455 kern.warn kernel: [<c0102ca6>] int3+0x1e/0x2c
(60)
May 11 16:13:02 OF455 kern.warn kernel: [<c021c900>]
net_rx_action+0x14e/0x188 (28)
May 11 16:13:02 OF455 kern.warn kernel: [<c0118806>]
___do_softirq+0x42/0xc8 (44)
May 11 16:13:02 OF455 kern.warn kernel: [<c011890e>]
_do_softirq+0x19/0x1c (8)
May 11 16:13:02 OF455 kern.warn kernel: [<c0118c1e>]
ksoftirqd+0x8c/0xd9 (28)
May 11 16:13:02 OF455 kern.warn kernel: [<c012614e>] kthread+0x7b/0xab
(32)
May 11 16:13:02 OF455 kern.warn kernel: [<c0100c21>]
kernel_thread_helper+0x5/0xb (1055072276)

(this is a WARN_ON_RT(irqs_disabled()) in netpoll.c)

As soon as I "cont", the messages stop, and the kernel works fine. As
soon as I hit a breakpoint, these messages start again.

* When debugging over the ethernet, sooner or later all non-gdb traffic
fails: Although remote kgdb still works fine, the host and the target
can't even ping each other. The target even sends arp requests for the
host (which are answered, but it doesn't help).

This is a problem for me, because usually the targets neither have
keyboards nor screens, ssh is the only way in.

* Alt-SysRq-g is silently ignored when typed on the target's keyboard.
However, "echo g > /proc/sysrq-trigger" works fine.

Many thanks in advance for any help!

--
Klaus Kusche (Software Development - Control Systems)
KEBA AG Gewerbepark Urfahr, A-4041 Linz, Austria (Europe)
Tel: +43 / 732 / 7090-3120 Fax: +43 / 732 / 7090-6301
E-Mail: [email protected] WWW: http://www.keba.com

2005-05-11 23:16:52

by Bill Huey

[permalink] [raw]
Subject: Re: Real-Time Preemption: BUG initializing kgdb

On Wed, May 11, 2005 at 04:41:16PM +0200, kus Kusche Klaus wrote:
> These changes resulted in a kernel which compiles and works fine, they
> cured the BUG I reported yesterday, and they made kgdb "basically work":
> I can connect over serial line or over ethernet, I can get "where"s and
> variables etc., I can "cont", ...
>
> However, there are still some issues:
>
> * When debugging over ethernet, the kernel produces the following
> messages in an infinite loop at full speed as long as it is halted by
> gdb:

You'll have to survey the lock graph and make sure that all locks beneath
the reverted spinlocks are also atomic locks. You can't sleep within an
atomic critical section which creates a deadlock situation. I suspect that
those warnings are related to that in one way or another.

That means any use of the serial or ethernet systems must have their
locks revert to atomic locks as well. However this make those places
non-preemptible and you'll have to be careful about this proces so that
you don't defeat latency performance with theses changes.

bill