2002-11-05 04:20:32

by Zwane Mwaikambo

[permalink] [raw]
Subject: 2.5.45 odd deref in serial_in

The only modifications to this code are a slightly hacked up nmi watchdog
timer.

Unable to handle kernel paging request at virtual address 81acc58
printing eip:
c023b42f
*pde = 00000000
Oops: 0000

CPU: 0
EIP: 0060:[<c023b42f>] Not tainted
EFLAGS: 00010293
EIP is at serial_in+0x1f/0x60
eax: 00000000 ebx: 81acc5f0 ecx: 00000000 edx: 00000005
esi: 000026e9 edi: c065a6e0 ebp: c06242ae esp: c1471ebc
ds: 0068 es: 0068 ss: 0068
Process swapper (pid: 1, threadinfo=c1470000 task=c7f9e040)
Stack: 00000020 c023d9d8 c065a6e0 00000005 00000008 00000000 0000001d c05c0
0000001d c06242a6 00000b23 c0121459 c05c0880 c06242a6 0000001d 00000
00000b43 00000296 c0121541 00000b26 00000b43 00000006 00000b43 c1470
Call Trace:
[<c023d9d8>] serial8250_console_write+0x68/0x1f0
[<c0121459>] __call_console_drivers+0x49/0x50
[<c0121541>] call_console_drivers+0x71/0x100
[<c012196d>] release_console_sem+0xbd/0x170
[<c01217cc>] printk+0x18c/0x220
[<c01170d5>] nmi_add_task+0xc5/0xe0
[<c01177e0>] nmi_watchdog_tick+0x0/0x120
[<c01050a1>] init+0x71/0x1a0
[<c0105030>] init+0x0/0x1a0
[<c0105030>] init+0x0/0x1a0
[<c0105030>] init+0x0/0x1a0
[<c01072b5>] kernel_thread_helper+0x5/0x10

Code: 8b 43 08 01 c2 ec 25 ff 00 00 00 5b c3 8d 74 26 00 68 ad 00
<0>Kernel panic: Attempted to kill init!

(gdb) list *serial_in+0x1f
0xc023b42f is in serial_in (include/asm/io.h:371).
366 } \
367 static inline void ins##bwl(int port, void *addr, unsigned long count) { \
368 __asm__ __volatile__("rep; ins" #bwl : "+D"(addr), "+c"(count) : "d"(port)); \
369 }
370
371 BUILDIO(b,b,char)
372 BUILDIO(w,w,short)
373 BUILDIO(l,,int)
374
375 #endif

0xc023b428 <serial_in+24>: je 0xc023b461 <serial_in+81>
0xc023b42a <serial_in+26>: cmp $0x2,%eax
0xc023b42d <serial_in+29>: je 0xc023b440 <serial_in+48>
0xc023b42f <serial_in+31>: mov 0x8(%ebx),%eax
0xc023b432 <serial_in+34>: add %eax,%edx
0xc023b434 <serial_in+36>: in (%dx),%al

eax: 00000000 ebx: 81acc5f0 ecx: 00000000 edx: 00000005

...
default:
return inb(up->port.iobase + offset); <--
}
}

--
function.linuxpower.ca


2002-11-05 05:31:37

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: 2.5.45 odd deref in serial_in

On Mon, 4 Nov 2002, Zwane Mwaikambo wrote:

> The only modifications to this code are a slightly hacked up nmi watchdog
> timer.

Not that easy to reproduce, its probably the printk in my nmi handler.
Sorry, you can ignore this bugreport.

Zwane
--
function.linuxpower.ca

2002-11-05 08:56:24

by Russell King

[permalink] [raw]
Subject: Re: 2.5.45 odd deref in serial_in

On Mon, Nov 04, 2002 at 11:27:28PM -0500, Zwane Mwaikambo wrote:
> The only modifications to this code are a slightly hacked up nmi watchdog
> timer.

Well, Andi Kleen reported that serial_in appears in his boot time
profile after a boot.

Depending on the baud rate used, serial console printk spends time
spinning waiting for the serial port to send characters. The slower
the baud rate, obviously the longer it spins.

Andi tried to suggest ways to make serial console asynchronous so we
didn't have to spin, but any solution in that direction means that we:

a) have to rely on interrupts running
b) have to buffer the data somewhere, which may possibly fill up and
then what do we do with the printk message

Bear in mind that dropping random printk messages because we've filled
a buffer isn't acceptable. Also note that the behaviour in this area
hasn't changed since 2.4 times.

Obviously, the way to reduce the time spent writing console messages to
the serial port is to increase the baud rate. 8)

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-11-05 09:12:33

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: 2.5.45 odd deref in serial_in

On Tue, 5 Nov 2002, Russell King wrote:

> a) have to rely on interrupts running

Not really ;) If i'm successful i may get this working albeit only for
specific boxes. Although we'd need an IDT at least, but interrupts may
remain masked...

> b) have to buffer the data somewhere, which may possibly fill up and
> then what do we do with the printk message
>
> Bear in mind that dropping random printk messages because we've filled
> a buffer isn't acceptable. Also note that the behaviour in this area
> hasn't changed since 2.4 times.
>
> Obviously, the way to reduce the time spent writing console messages to
> the serial port is to increase the baud rate. 8)

I'm runnning 115200 :P It looks like a race however because i don't always
trigger it, but when i do the trace is always the same. However i'm not
going to make you run circles for my potentially dodgy code.

Cheers,
Zwane
--
function.linuxpower.ca

2002-11-05 10:09:54

by Russell King

[permalink] [raw]
Subject: Re: 2.5.45 odd deref in serial_in

On Tue, Nov 05, 2002 at 04:20:00AM -0500, Zwane Mwaikambo wrote:
> I'm runnning 115200 :P It looks like a race however because i don't always
> trigger it, but when i do the trace is always the same. However i'm not
> going to make you run circles for my potentially dodgy code.

The figures are actually rather horrifing. There are a couple of
messages I have in my boot log which are rather long - 137 chars
and 79 chars.

Even at 115200 baud is one character every 87us. This gives:

137 characters: 12ms
79 characters: 7ms

At these types of figures, x86 will drop 1000Hz interrupts like
crazy when writing console messages via the serial port (because
interrupts are turned off.)

With these figures, the longest message we can write at 115200
baud and not drop any timer ticks is 11 characters. Not many
kernel messages are less than 12 characters.

Now I'll go back and look at your original email... it was for a
slightly different problem. 8)

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-11-05 10:14:22

by Russell King

[permalink] [raw]
Subject: Re: 2.5.45 odd deref in serial_in

On Mon, Nov 04, 2002 at 11:27:28PM -0500, Zwane Mwaikambo wrote:
> 0xc023b428 <serial_in+24>: je 0xc023b461 <serial_in+81>
> 0xc023b42a <serial_in+26>: cmp $0x2,%eax
> 0xc023b42d <serial_in+29>: je 0xc023b440 <serial_in+48>
> 0xc023b42f <serial_in+31>: mov 0x8(%ebx),%eax
> 0xc023b432 <serial_in+34>: add %eax,%edx
> 0xc023b434 <serial_in+36>: in (%dx),%al
>
> eax: 00000000 ebx: 81acc5f0 ecx: 00000000 edx: 00000005
>
> ...
> default:
> return inb(up->port.iobase + offset); <--
> }

Ok, if I'm reading this correctly:

offset = %edx
up->port.iobase = 0x8(%ebx)
up = %ebx

To get to this return statement, we would have had to execute:

static _INLINE_ unsigned int serial_in(struct uart_8250_port *up, int offset)
{
offset <<= up->port.regshift;

switch (up->port.iotype) {

which also dereferences "up". So something may have corrupted %ebx
between executing that switch statement and executing the inb().

Could the NMI handler be corrupting %ebx ?

--
Russell King ([email protected]) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html

2002-11-05 14:22:23

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: 2.5.45 odd deref in serial_in

On Tue, 5 Nov 2002, Russell King wrote:

> static _INLINE_ unsigned int serial_in(struct uart_8250_port *up, int offset)
> {
> offset <<= up->port.regshift;
>
> switch (up->port.iotype) {
>
> which also dereferences "up". So something may have corrupted %ebx
> between executing that switch statement and executing the inb().
>
> Could the NMI handler be corrupting %ebx ?

Yes that is highly likely, my current NMI handler does a fair amount of
work, however it chains from the current default nmi handler, i thought
register saving was already taken care of otherwise wouldn't i have all
sorts of other strange bugs creeping up?

Zwane
--
function.linuxpower.ca

2002-11-06 02:15:26

by Philippe Elie

[permalink] [raw]
Subject: Re: 2.5.45 odd deref in serial_in

Zwane Mwaikambo wrote:
> On Tue, 5 Nov 2002, Russell King wrote:
>
>
>>static _INLINE_ unsigned int serial_in(struct uart_8250_port *up, int offset)
>>{
>> offset <<= up->port.regshift;
>>
>> switch (up->port.iotype) {
>>
>>which also dereferences "up". So something may have corrupted %ebx
>>between executing that switch statement and executing the inb().
>>
>>Could the NMI handler be corrupting %ebx ?

ok but

> [<c023d9d8>] serial8250_console_write+0x68/0x1f0
> [<c0121459>] __call_console_drivers+0x49/0x50
> [<c0121541>] call_console_drivers+0x71/0x100
> [<c012196d>] release_console_sem+0xbd/0x170
> [<c01217cc>] printk+0x18c/0x220
> [<c01170d5>] nmi_add_task+0xc5/0xe0
> [<c01177e0>] nmi_watchdog_tick+0x0/0x120

the oops occur during a NMI so I wonder how a NMI
can occur and clobber ebx

--
Philippe Elie

2002-11-06 08:29:52

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: 2.5.45 odd deref in serial_in

On Wed, 6 Nov 2002, Philippe Elie wrote:

> > [<c023d9d8>] serial8250_console_write+0x68/0x1f0
> > [<c0121459>] __call_console_drivers+0x49/0x50
> > [<c0121541>] call_console_drivers+0x71/0x100
> > [<c012196d>] release_console_sem+0xbd/0x170
> > [<c01217cc>] printk+0x18c/0x220
> > [<c01170d5>] nmi_add_task+0xc5/0xe0
> > [<c01177e0>] nmi_watchdog_tick+0x0/0x120
>
> the oops occur during a NMI so I wonder how a NMI
> can occur and clobber ebx

Actually it didn't occur during the NMI, it just appears to be that way
(ie don't consider the nmi_watchdog_tick ;) nmi_add_task is a registration
function and is called from normal kernel context.

Zwane
--
function.linuxpower.ca

2002-11-07 15:10:48

by David Woodhouse

[permalink] [raw]
Subject: Re: 2.5.45 odd deref in serial_in


[email protected] said:
> I'm runnning 115200 :P

Why so slow? Most current chipsets can do at least 230400, usually 460800.

--
dwmw2


2002-11-07 15:18:27

by Zwane Mwaikambo

[permalink] [raw]
Subject: Re: 2.5.45 odd deref in serial_in

On Thu, 7 Nov 2002, David Woodhouse wrote:

> [email protected] said:
> > I'm runnning 115200 :P
>
> Why so slow? Most current chipsets can do at least 230400, usually 460800.

Its an offboard card (netmos), i'll try jacking it up a bit and see if it
holds up.

Zwane
--
function.linuxpower.ca