2006-12-23 01:35:47

by Jiri Slaby

[permalink] [raw]
Subject: Re: moxa serial driver testing

[email protected] wrote:
> Hi Jiri,
>
> I've figured out that both old and new mxser drivers have two similar
> problems:
>
> 1. When there are data coming to a port, sometimes opening of the port
> entirely locks the box. This is quite reproducible. Any idea what's
> wrong and how can I help to debug it?

Could you test the patch below, if something changes?

---

drivers/char/mxser_new.c | 6 ++++--
1 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/char/mxser_new.c b/drivers/char/mxser_new.c
index a2bca5d..c0af201 100644
--- a/drivers/char/mxser_new.c
+++ b/drivers/char/mxser_new.c
@@ -2268,6 +2268,8 @@ static irqreturn_t mxser_interrupt(int irq, void *dev_id)
if (bits & irqbits)
continue;
port = &brd->ports[i];
+ if (!(port->flags & ASYNC_INITIALIZED))
+ continue;

int_cnt = 0;
do {
@@ -2320,9 +2322,9 @@ static irqreturn_t mxser_interrupt(int irq, void *dev_id)
if (status & UART_LSR_THRE)
mxser_transmit_chars(port);
}
- } while (int_cnt++ < MXSER_ISR_PASS_LIMIT);
+ } while (int_cnt++ < 256);
}
- if (pass_counter++ > MXSER_ISR_PASS_LIMIT)
+ if (pass_counter++ > 64)
break; /* Prevent infinite loops */
}


2006-12-23 09:36:25

by Sergei Organov

[permalink] [raw]
Subject: Re: moxa serial driver testing

Jiri,

Jiri Slaby <[email protected]> writes:
> [email protected] wrote:
>> Hi Jiri,
>>
>> I've figured out that both old and new mxser drivers have two similar
>> problems:
>>
>> 1. When there are data coming to a port, sometimes opening of the port
>> entirely locks the box. This is quite reproducible. Any idea what's
>> wrong and how can I help to debug it?
>
> Could you test the patch below, if something changes?

Thanks for looking into it. I'll be able to get to the box with moxa
installed on Monday and will try the patch.

As for SysRq, I'm afraid it didn't work though I'm not 100% sure. I'll
check that as well.

-- Sergei.

> ---
>
> drivers/char/mxser_new.c | 6 ++++--
> 1 files changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/char/mxser_new.c b/drivers/char/mxser_new.c
> index a2bca5d..c0af201 100644
> --- a/drivers/char/mxser_new.c
> +++ b/drivers/char/mxser_new.c
> @@ -2268,6 +2268,8 @@ static irqreturn_t mxser_interrupt(int irq, void *dev_id)
> if (bits & irqbits)
> continue;
> port = &brd->ports[i];
> + if (!(port->flags & ASYNC_INITIALIZED))
> + continue;
>
> int_cnt = 0;
> do {
> @@ -2320,9 +2322,9 @@ static irqreturn_t mxser_interrupt(int irq, void *dev_id)
> if (status & UART_LSR_THRE)
> mxser_transmit_chars(port);
> }
> - } while (int_cnt++ < MXSER_ISR_PASS_LIMIT);
> + } while (int_cnt++ < 256);
> }
> - if (pass_counter++ > MXSER_ISR_PASS_LIMIT)
> + if (pass_counter++ > 64)
> break; /* Prevent infinite loops */
> }
>

2006-12-25 11:26:59

by Sergei Organov

[permalink] [raw]
Subject: Re: moxa serial driver testing

Jiri Slaby <[email protected]> writes:

> [email protected] wrote:
>> Hi Jiri,
>>
>> I've figured out that both old and new mxser drivers have two similar
>> problems:
>>
>> 1. When there are data coming to a port, sometimes opening of the port
>> entirely locks the box. This is quite reproducible. Any idea what's
>> wrong and how can I help to debug it?
>
> Could you test the patch below, if something changes?

I'm preparing to test it. However, it seems that my version of
mxser_new.c got out of sync with your one:

osv@osv mxser_new$ patch -p3 < lock.patch
patching file mxser_new.c
Hunk #1 succeeded at 1900 (offset -368 lines).
Hunk #2 succeeded at 1968 (offset -354 lines).
osv@osv mxser_new$ patch -p3 < rmmod.patch
patching file mxser.c
Hunk #1 succeeded at 711 with fuzz 2 (offset -6 lines).
patching file mxser_new.c
Hunk #1 succeeded at 705 with fuzz 2 (offset -1985 lines).
osv@osv mxser_new$

I'll try this one anyway, but could you please either tell me where to
get the version you are using, or just send it to me by mail?

BTW, when system hangs, SysRq magic doesn't work anymore.

-- Sergei.

2006-12-25 13:42:11

by Sergei Organov

[permalink] [raw]
Subject: Re: moxa serial driver testing

Jiri Slaby <[email protected]> writes:

> [email protected] wrote:
>> Hi Jiri,
>>
>> I've figured out that both old and new mxser drivers have two similar
>> problems:
>>
>> 1. When there are data coming to a port, sometimes opening of the port
>> entirely locks the box. This is quite reproducible. Any idea what's
>> wrong and how can I help to debug it?
>
> Could you test the patch below, if something changes?

Something did change. Now it becomes rather difficult to get the box to
hang, though not impossible. Another thing that changed is that now I
can see [parts of] oopses on screen. I've got two pictures of the screen
with different oopses. If you need them, let me know and I'll send them
to you in a separate mail not to pollute lkml with JPEGs.

One of these oopses happened on kernel with lock debugging enabled, but
I can't see anything relevant in dmesg log after resetting the box.

SysRq magic still doesn't work after hang.

As for the problem with module unloading when port is open, your another
patch does fix it indeed.

-- Sergei.

2006-12-25 16:17:59

by Sergei Organov

[permalink] [raw]
Subject: Re: moxa serial driver testing (oopses)

Jiri Slaby <[email protected]> writes:
> [email protected] wrote:
>> Hi Jiri,
>>
>> I've figured out that both old and new mxser drivers have two similar
>> problems:
>>
>> 1. When there are data coming to a port, sometimes opening of the port
>> entirely locks the box. This is quite reproducible. Any idea what's
>> wrong and how can I help to debug it?
>
> Could you test the patch below, if something changes?

In addition to my previous answer, fortunately I was able to log oopses
to a serial console, so below are two of them.

They seem to appear from in a few seconds to in a few minutes after I
run:

$ while true; do cat /dev/ttyM7 > /dev/null; done

when /dev/ttyM7 is setup so that 'cat' immediately returns (due to zero
timeout).

Note that unpatched version always hangs in less than a second.

First oops is taken from the kernel with lock debugging enabled:

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000080
printing eip:
f8fa5136
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: ipv6 nfs lockd nfs_acl sunrpc dm_mod sr_mod sbp2 ieee1394 ide_generic ide_disk e1000 snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer snd evdev mxser_new tsdev soundcore psmouse snd_page_alloc serio_raw floppy i2c_i801 parport_pc parport pcspkr i2c_core rtc ext3 jbd mbcache usb_storage sd_mod usbhid ide_cd cdrom ata_piix libata uhci_hcd scsi_mod piix generic usbcore ide_core skge thermal processor fan
CPU: 0
EIP: 0060:[<f8fa5136>] Not tainted VLI
EFLAGS: 00010046 (2.6.18 #1)
EIP is at mxser_stoprx+0x5a/0x7e [mxser_new]
eax: 00000000 ebx: f7a55000 ecx: f8fabc1c edx: f76f6340
esi: c1e4651b edi: eb466fff ebp: f3af3de8 esp: f3af3de4
ds: 007b es: 007b ss: 0068
Process cc1plus (pid: 9540, ti=f3af2000 task=f4012ab0 task.ti=f3af2000)
Stack: f7a55000 f3af3df0 f8fa5162 f3af3ec0 c020aedb 000000ff c1e4641c f7a554dc
00000002 00000008 00000000 00000008 f4012ff0 00000001 0000017d 00000000
0000017d f4012ff0 00000001 0000017d 00000000 f4012ff0 0000017d 00000000
Call Trace:
[<c0103e2f>] show_stack_log_lvl+0x8c/0x97
[<c0103fa6>] show_registers+0x130/0x19d
[<c0104194>] die+0x181/0x287
[<c0115b92>] do_page_fault+0x3ca/0x49e
[<c01039cd>] error_code+0x39/0x40
[<f8fa5162>] mxser_throttle+0x8/0xa [mxser_new]
[<c020aedb>] n_tty_receive_buf+0xdce/0xdf0
[<c02062fc>] flush_to_ldisc+0x112/0x149
[<c0206370>] tty_flip_buffer_push+0x3d/0x53
[<f8fa63c4>] mxser_receive_chars+0x23c/0x244 [mxser_new]
[<f8fa6e48>] mxser_interrupt+0x14c/0x1cb [mxser_new]
[<c014563a>] handle_IRQ_event+0x20/0x4d
[<c01456fb>] __do_IRQ+0x94/0xef
[<c0105342>] do_IRQ+0x4e/0x60
[<c0103835>] common_interrupt+0x25/0x2c
Code: 83 e0 ee 89 41 3c 25 ee 00 00 00 42 eb 19 0f b6 42 1a 8b 51 08 89 41 38 31 c0 42 ee 89 d8 83 c8 02 89 41 3c 0f b6 c0 ee 8b 41 04 <8b> 80 80 00 00 00 83 78 08 00 79 15 8b 41 40 8b 51 08 83 e0 fd
EIP: [<f8fa5136>] mxser_stoprx+0x5a/0x7e [mxser_new] SS:ESP 0068:f3af3de4
<0>Kernel panic - not syncing: Fatal exception in interrupt
BUG: warning at arch/i386/kernel/smp.c:547/smp_call_function()
[<c0103e73>] show_trace+0xd/0x10
[<c010444c>] dump_stack+0x19/0x1b
[<c010f600>] smp_call_function+0x55/0xfd
[<c010f6be>] smp_send_stop+0x16/0x2a
[<c011d885>] panic+0x4d/0xec
[<c0104265>] die+0x252/0x287
[<c0115b92>] do_page_fault+0x3ca/0x49e
[<c01039cd>] error_code+0x39/0x40
[<f8fa5162>] mxser_throttle+0x8/0xa [mxser_new]
[<c020aedb>] n_tty_receive_buf+0xdce/0xdf0
[<c02062fc>] flush_to_ldisc+0x112/0x149
[<c0206370>] tty_flip_buffer_push+0x3d/0x53
[<f8fa63c4>] mxser_receive_chars+0x23c/0x244 [mxser_new]
[<f8fa6e48>] mxser_interrupt+0x14c/0x1cb [mxser_new]
[<c014563a>] handle_IRQ_event+0x20/0x4d
[<c01456fb>] __do_IRQ+0x94/0xef
[<c0105342>] do_IRQ+0x4e/0x60
[<c0103835>] common_interrupt+0x25/0x2c

Another oops is form kernel with lock debugging disabled:

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000068
printing eip:
f8f0911f
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: nvidia agpgart ipv6 nfs lockd nfs_acl sunrpc dm_mod sr_mod sbp2 ieee1394 ide_generic ide_disk e1000 snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm i2c_i801 psmouse snd_timer rtc parport_pc serio_raw i2c_core snd soundcore snd_page_alloc evdev pcspkr floppy parport tsdev mxser_new ext3 jbd mbcache usb_storage usbhid sd_mod ide_cd cdrom uhci_hcd ata_piix usbcore piix skge libata scsi_mod generic ide_core thermal processor fan
CPU: 0
EIP: 0060:[<f8f0911f>] Tainted: P VLI
EFLAGS: 00010246 (2.6.18-3-686 #1)
EIP is at mxser_stoprx+0x54/0x74 [mxser_new]
eax: 00000000 ebx: 00000001 ecx: f8f0f79c edx: f643a4c0
esi: efe6d51b edi: ee75dfff ebp: f6e2e000 esp: ec901e20
ds: 007b es: 007b ss: 0068
Process cc1plus (pid: 17764, ti=ec900000 task=dff41000 task.ti=ec900000)
Stack: 00000001 c01fe4af 000000ff efe6d41c f6e2e404 c02d6c68 00000000 00000001
ec901e64 c011669e 00000000 00000000 00000003 00000086 f6e2e00c 00000286
00000001 00000086 c01f97c9 00000000 f524bc00 00000286 f8f0a335 ec901ec8
Call Trace:
[<c01fe4af>] n_tty_receive_buf+0xcd6/0xcf9
[<c011669e>] __wake_up+0x2a/0x3d
[<c01f97c9>] tty_ldisc_deref+0x50/0x5f
[<f8f0a335>] mxser_receive_chars+0x241/0x249 [mxser_new]
[<f8f09678>] mxser_transmit_chars+0x14/0x164 [mxser_new]
[<f8f0adab>] mxser_interrupt+0x190/0x1e6 [mxser_new]
[<c0116412>] __activate_task+0x1c/0x29
[<c011776e>] try_to_wake_up+0x355/0x35f
[<c0116d0a>] find_busiest_group+0x177/0x46a
[<c01f9d21>] flush_to_ldisc+0x104/0x15c
[<f8f0a335>] mxser_receive_chars+0x241/0x249 [mxser_new]
[<f8f0ad7a>] mxser_interrupt+0x15f/0x1e6 [mxser_new]
[<c013fb83>] handle_IRQ_event+0x23/0x49
[<c013fc3c>] __do_IRQ+0x93/0xe8
[<c01050e5>] do_IRQ+0x43/0x52
[<c01036b6>] common_interrupt+0x1a/0x20
Code: 83 e0 ee 89 41 3c 25 ee 00 00 00 42 eb 19 0f b6 42 1a 8b 51 08 89 41 38 31 c0 42 ee 89 d8 83 c8 02 89 41 3c 0f b6 c0 ee 8b 41 04 <8b> 40 68 83 78 08 00 79 15 8b 41 40 8b 51 08 83 e0 fd 89 41 40
EIP: [<f8f0911f>] mxser_stoprx+0x54/0x74 [mxser_new] SS:ESP 0068:ec901e20
<0>Kernel panic - not syncing: Fatal exception in interrupt

-- Sergei.


2006-12-25 18:38:14

by Jiri Slaby

[permalink] [raw]
Subject: Re: moxa serial driver testing

Sergei Organov wrote:
> Jiri Slaby <[email protected]> writes:
>
>> [email protected] wrote:
>>> Hi Jiri,
>>>
>>> I've figured out that both old and new mxser drivers have two similar
>>> problems:
>>>
>>> 1. When there are data coming to a port, sometimes opening of the port
>>> entirely locks the box. This is quite reproducible. Any idea what's
>>> wrong and how can I help to debug it?
>> Could you test the patch below, if something changes?
>
> Something did change. Now it becomes rather difficult to get the box to
> hang, though not impossible. Another thing that changed is that now I
> can see [parts of] oopses on screen. I've got two pictures of the screen

Positive.

> with different oopses. If you need them, let me know and I'll send them
> to you in a separate mail not to pollute lkml with JPEGs.

These are those you've posted in another post?

> As for the problem with module unloading when port is open, your another
> patch does fix it indeed.

at least some good news, thanks,
--
http://www.fi.muni.cz/~xslaby/ Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E

2006-12-25 18:49:58

by Sergei Organov

[permalink] [raw]
Subject: Re: moxa serial driver testing

Jiri Slaby <[email protected]> writes:

> Sergei Organov wrote:
>> Jiri Slaby <[email protected]> writes:
>>
>>> [email protected] wrote:
>>>> Hi Jiri,
>>>>
>>>> I've figured out that both old and new mxser drivers have two similar
>>>> problems:
>>>>
>>>> 1. When there are data coming to a port, sometimes opening of the port
>>>> entirely locks the box. This is quite reproducible. Any idea what's
>>>> wrong and how can I help to debug it?
>>> Could you test the patch below, if something changes?
>>
>> Something did change. Now it becomes rather difficult to get the box to
>> hang, though not impossible. Another thing that changed is that now I
>> can see [parts of] oopses on screen. I've got two pictures of the screen
>
> Positive.
>
>> with different oopses. If you need them, let me know and I'll send them
>> to you in a separate mail not to pollute lkml with JPEGs.
>
> These are those you've posted in another post?

Not exactly. The oopses seem to vary from run to run, so I can't get
exactly the same every time. Though I kept those JPEGs, if you are still
interested after looking at those oopses that I've collected through
serial console and sent.

-- Sergei.

2006-12-27 11:09:48

by Sergei Organov

[permalink] [raw]
Subject: Re: moxa serial driver testing

Jiri Slaby <[email protected]> writes:
> [email protected] wrote:
>> Hi Jiri,
>>
>> I've figured out that both old and new mxser drivers have two similar
>> problems:
>>
>> 1. When there are data coming to a port, sometimes opening of the port
>> entirely locks the box. This is quite reproducible. Any idea what's
>> wrong and how can I help to debug it?
>
> Could you test the patch below, if something changes?

I've tested the latest version you've sent me yesterday. Still no
luck. The oops is below. I'll try with low_latency commented in a few
minutes.

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000068
printing eip:
f8f3711f
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: nvidia agpgart ipv6 nfs lockd nfs_acl sunrpc dm_mod sr_mod sbp2 ieee1394 ide_generic ide_disk e1000 snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss i2c_i801 psmouse snd_pcm floppy tsdev serio_raw snd_timer parport_pc parport snd i2c_core pcspkr evdev soundcore snd_page_alloc mxser_new rtc ext3 jbd mbcache sd_mod ide_cd cdrom usb_storage usbhid ata_piix libata piix scsi_mod generic uhci_hcd ide_core usbcore skge thermal processor fan
CPU: 0
EIP: 0060:[<f8f3711f>] Tainted: P VLI
EFLAGS: 00010246 (2.6.18-3-686 #1)
EIP is at mxser_stoprx+0x54/0x74 [mxser_new]
eax: 00000000 ebx: 00000001 ecx: f8f3d79c edx: dfde7240
esi: f391211b edi: f4f83fff ebp: c1b17800 esp: f5bc7d7c
ds: 007b es: 007b ss: 0068
Process make (pid: 17088, ti=f5bc6000 task=f4febaa0 task.ti=f5bc6000)
Stack: 00000001 c01fe4af 000000ff f391201c c1b17c04 c011669e 00000000 00000000
00000003 00000096 c1b1780c 00000286 00000001 00000096 c01f97c9 00000000
f456f800 00000286 f8f38335 f5bc7e14 f8f3d8a8 00000286 c01f9806 f456f800
Call Trace:
[<c01fe4af>] n_tty_receive_buf+0xcd6/0xcf9
[<c011669e>] __wake_up+0x2a/0x3d
[<c01f97c9>] tty_ldisc_deref+0x50/0x5f
[<f8f38335>] mxser_receive_chars+0x241/0x249 [mxser_new]
[<c01f9806>] tty_ldisc_try+0x2e/0x32
[<c01f9d4c>] flush_to_ldisc+0x12f/0x15c
[<c01f9793>] tty_ldisc_deref+0x1a/0x5f
[<f8f38335>] mxser_receive_chars+0x241/0x249 [mxser_new]
[<f8f37678>] mxser_transmit_chars+0x14/0x164 [mxser_new]
[<c01f9d21>] flush_to_ldisc+0x104/0x15c
[<f8f38335>] mxser_receive_chars+0x241/0x249 [mxser_new]
[<f8f38d7a>] mxser_interrupt+0x15f/0x1e6 [mxser_new]
[<c013fb83>] handle_IRQ_event+0x23/0x49
[<c013fc3c>] __do_IRQ+0x93/0xe8
[<c01050e5>] do_IRQ+0x43/0x52
[<c01036b6>] common_interrupt+0x1a/0x20
[<c01bacf4>] __copy_from_user_ll+0x19/0x38
[<c01071de>] convert_fxsr_from_user+0x15/0xd5
[<c0164bc8>] pipe_read+0x0/0x1e
[<c010774e>] restore_i387+0x6f/0xc0
[<c0102203>] restore_sigcontext+0x10f/0x165
[<c0102b3d>] sys_sigreturn+0xa4/0xcb
[<c0102c7b>] syscall_call+0x7/0xb
Code: 83 e0 ee 89 41 3c 25 ee 00 00 00 42 eb 19 0f b6 42 1a 8b 51 08 89 41 38 31 c0 42 ee 89 d8 83 c8 02 89 41 3c 0f b6 c0 ee 8b 41 04 <8b> 40 68 83 78 08 00 79 15 8b 41 40 8b 51 08 83 e0 fd 89 41 40
EIP: [<f8f3711f>] mxser_stoprx+0x54/0x74 [mxser_new] SS:ESP 0068:f5bc7d7c
<0>Kernel panic - not syncing: Fatal exception in interrupt

2006-12-27 11:49:17

by Sergei Organov

[permalink] [raw]
Subject: Re: moxa serial driver testing

Jiri Slaby <[email protected]> writes:

> [email protected] wrote:
>> Hi Jiri,
>>
>> I've figured out that both old and new mxser drivers have two similar
>> problems:
>>
>> 1. When there are data coming to a port, sometimes opening of the port
>> entirely locks the box. This is quite reproducible. Any idea what's
>> wrong and how can I help to debug it?
>
> Could you test the patch below, if something changes?

Just tested with low_latency commented out. Still oopses:

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
printing eip:
f8f1730f
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: nvidia agpgart ipv6 nfs lockd nfs_acl sunrpc dm_mod sr_mod sbp2 ieee1394 ide_generic ide_disk e1000 snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer i2c_i801 tsdev psmouse snd soundcore snd_page_alloc i2c_core serio_raw parport_pc parport mxser_new evdev floppy pcspkr rtc ext3 jbd mbcache usb_storage usbhid ide_cd cdrom sd_mod uhci_hcd piix usbcore skge ata_piix libata scsi_mod generic ide_core thermal processor fan
CPU: 0
EIP: 0060:[<f8f1730f>] Tainted: P VLI
EFLAGS: 00010046 (2.6.18-3-686 #1)
EIP is at mxser_receive_chars+0x21b/0x249 [mxser_new]
eax: 00000000 ebx: 00000000 ecx: 00000000 edx: 00000286
esi: f8f1c79c edi: 00000001 ebp: c1be6000 esp: c0313efc
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, ti=c0312000 task=c02c76a0 task.ti=c0312000)
Stack: c0313f48 f8f1c8a8 0000903d 00000000 00000fff 000000ff 00000286 0000007b
f8f1c79c 00000006 0000007f 000000c6 f8f17d7a 00000007 00000008 00000080
00000000 00000000 f8f1bdc0 00000060 df985500 00000000 00000000 0000003a
Call Trace:
[<f8f17d7a>] mxser_interrupt+0x15f/0x1e6 [mxser_new]
[<c013fb83>] handle_IRQ_event+0x23/0x49
[<c013fc3c>] __do_IRQ+0x93/0xe8
[<c01050e5>] do_IRQ+0x43/0x52
[<c01036b6>] common_interrupt+0x1a/0x20
[<c0101b91>] mwait_idle+0x25/0x38
[<c0101b52>] cpu_idle+0x9f/0xb9
[<c03186fd>] start_kernel+0x379/0x380
Code: ed ff ff eb 1f 8b 06 83 78 18 00 75 17 8b 56 08 83 c2 05 ec 8b 14 24 0f b6 c0 a8 01 89 02 0f 85 d4 fe ff ff 8b 46 04 8b 54 24 18 <8b> 40 08 01 3c 85 c4 ec f1 f8 8b 44 24 04 01 be f4 00 00 00 01
EIP: [<f8f1730f>] mxser_receive_chars+0x21b/0x249 [mxser_new] SS:ESP 0068:c0313efc
<0>Kernel panic - not syncing: Fatal exception in interrupt
BUG: warning at arch/i386/kernel/smp.c:547/smp_call_function()
[<c010f5a3>] smp_call_function+0x53/0xfe
[<c011d97e>] printk+0x14/0x18
[<c010f661>] smp_send_stop+0x13/0x1c
[<c011cfc6>] panic+0x4c/0xe2
[<c0104013>] die+0x256/0x28a
[<c01156e0>] do_page_fault+0x3b4/0x481
[<c01f9f8d>] tty_buffer_request_room+0x107/0x112
[<c011532c>] do_page_fault+0x0/0x481
[<c01037f9>] error_code+0x39/0x40
[<f8f1730f>] mxser_receive_chars+0x21b/0x249 [mxser_new]
[<f8f17d7a>] mxser_interrupt+0x15f/0x1e6 [mxser_new]
[<c013fb83>] handle_IRQ_event+0x23/0x49
[<c013fc3c>] __do_IRQ+0x93/0xe8
[<c01050e5>] do_IRQ+0x43/0x52
[<c01036b6>] common_interrupt+0x1a/0x20
[<c0101b91>] mwait_idle+0x25/0x38
[<c0101b52>] cpu_idle+0x9f/0xb9
[<c03186fd>] start_kernel+0x379/0x380

2006-12-27 13:36:52

by Jiri Slaby

[permalink] [raw]
Subject: Re: moxa serial driver testing

> Jiri Slaby <[email protected]> writes:
>
> > Could you test the patch below, if something changes?
>
> Just tested with low_latency commented out. Still oopses:
>
> BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
> printing eip:
> f8f1730f
> *pde = 00000000
> Oops: 0000 [#1]
> SMP
> Modules linked in: nvidia agpgart ipv6 nfs lockd nfs_acl sunrpc dm_mod sr_mod sbp2 ieee1394 ide_generic ide_disk e1000 snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer i2c_i801 tsdev psmouse snd soundcore snd_page_alloc i2c_core serio_raw parport_pc parport mxser_new evdev floppy pcspkr rtc ext3 jbd mbcache usb_storage usbhid ide_cd cdrom sd_mod uhci_hcd piix usbcore skge ata_piix libata scsi_mod generic ide_core thermal processor fan
> CPU: 0
> EIP: 0060:[<f8f1730f>] Tainted: P VLI
> EFLAGS: 00010046 (2.6.18-3-686 #1)
> EIP is at mxser_receive_chars+0x21b/0x249 [mxser_new]

Yes, port->tty still somewhere becomes NULL -- does this patch help?

---

diff -Nrup mxser_newa/mxser_new.c mxser_newb/mxser_new.c
--- mxser_newa/mxser_new.c 2006-12-26 18:22:30.000000000 +0100
+++ mxser_newb/mxser_new.c 2006-12-27 14:26:22.000000000 +0100
@@ -2051,13 +2051,16 @@ static void mxser_wait_until_sent(struct
void mxser_hangup(struct tty_struct *tty)
{
struct mxser_port *info = tty->driver_data;
+ unsigned long flags;

mxser_flush_buffer(tty);
mxser_shutdown(info);
+ spin_lock_irqsave(&info->slock, flags);
info->event = 0;
info->count = 0;
info->flags &= ~ASYNC_NORMAL_ACTIVE;
info->tty = NULL;
+ spin_unlock_irqrestore(&info->slock, flags);
wake_up_interruptible(&info->open_wait);
}

@@ -2263,8 +2266,6 @@ static irqreturn_t mxser_interrupt(int i
if (bits & irqbits)
continue;
port = &brd->ports[i];
- if (!(port->flags & ASYNC_INITIALIZED))
- continue;

int_cnt = 0;
spin_lock(&port->slock);
@@ -2274,7 +2275,9 @@ static irqreturn_t mxser_interrupt(int i
break;
iir &= MOXA_MUST_IIR_MASK;
if (!port->tty ||
- (port->flags & ASYNC_CLOSING)) {
+ (port->flags & ASYNC_CLOSING) ||
+ !(port->flags &
+ ASYNC_INITIALIZED)) {
status = inb(port->ioaddr + UART_LSR);
outb(0x27, port->ioaddr + UART_FCR);
inb(port->ioaddr + UART_MSR);

2006-12-27 14:01:12

by Sergei Organov

[permalink] [raw]
Subject: Re: moxa serial driver testing

Jiri Slaby <[email protected]> writes:

>> Jiri Slaby <[email protected]> writes:
>>
>> > Could you test the patch below, if something changes?
>>
>> Just tested with low_latency commented out. Still oopses:
>>
>> BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
>> printing eip:
>> f8f1730f
>> *pde = 00000000
>> Oops: 0000 [#1]
>> SMP
>> Modules linked in: nvidia agpgart ipv6 nfs lockd nfs_acl sunrpc dm_mod sr_mod sbp2 ieee1394 ide_generic ide_disk e1000 snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer i2c_i801 tsdev psmouse snd soundcore snd_page_alloc i2c_core serio_raw parport_pc parport mxser_new evdev floppy pcspkr rtc ext3 jbd mbcache usb_storage usbhid ide_cd cdrom sd_mod uhci_hcd piix usbcore skge ata_piix libata scsi_mod generic ide_core thermal processor fan
>> CPU: 0
>> EIP: 0060:[<f8f1730f>] Tainted: P VLI
>> EFLAGS: 00010046 (2.6.18-3-686 #1)
>> EIP is at mxser_receive_chars+0x21b/0x249 [mxser_new]
>
> Yes, port->tty still somewhere becomes NULL -- does this patch help?

No, it still oopses (the oops is at the end).

In addition, before this latest patch was applied, when playing with two
simultaneously run programs that do nothing but open/close the port in a
loop (and the port was idle, i.e., nothing came to it from outside),
I've once got this thing:

irq 58: nobody cared (try booting with the "irqpoll" option)
[<c014037f>] __report_bad_irq+0x2b/0x69
[<c014056c>] note_interrupt+0x1af/0x1e7
[<c013fb83>] handle_IRQ_event+0x23/0x49
[<c013fc5c>] __do_IRQ+0xb3/0xe8
[<c01050e5>] do_IRQ+0x43/0x52
[<c01036b6>] common_interrupt+0x1a/0x20
[<c011007b>] __cpu_up+0x88/0x164
[<c02810a0>] _spin_unlock_irqrestore+0x8/0x9
[<f8f8bf57>] mxser_startup+0x177/0x190 [mxser_new]
[<f8f8c3da>] mxser_open+0x6b/0x251 [mxser_new]
[<c01fcafd>] tty_open+0x16a/0x2e8
[<c01619a1>] chrdev_open+0x126/0x141
[<c016187b>] chrdev_open+0x0/0x141
[<c0158cb1>] __dentry_open+0xc8/0x1ac
[<c0158df9>] nameidata_to_filp+0x19/0x28
[<c0158e33>] do_filp_open+0x2b/0x31
[<c0158e77>] do_sys_open+0x3e/0xb3
[<c0158f19>] sys_open+0x16/0x18
[<c0102c11>] sysenter_past_esp+0x56/0x79
handlers:
[<f8f8cc1b>] (mxser_interrupt+0x0/0x1e6 [mxser_new])
Disabling IRQ #58

Another thing that I've noticed, is that when two programs that
open/close a port in a loop run simultaneously, each of them begin to
sometimes fail to open the port with errno=5 (I/O Errror). Though this
thing happends even if I try to use /dev/ttyS0, so it is not
mxser-specific (/dev/null never fails).

Here is the oops I promised at the beginning:

BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
printing eip:
f927630f
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: mxser_new nvidia agpgart ipv6 nfs lockd nfs_acl sunrpc dm_mod sr_mod sbp2 ieee1394 ide_generic ide_disk e1000 snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm psmouse snd_timer i2c_i801 snd serio_raw i2c_core tsdev parport_pc parport pcspkr evdev rtc soundcore snd_page_alloc floppy ext3 jbd mbcache sd_mod usb_storage usbhid ide_cd cdrom ata_piix libata scsi_mod uhci_hcd piix generic skge usbcore ide_core thermal processor fan
CPU: 0
EIP: 0060:[<f927630f>] Tainted: P VLI
EFLAGS: 00010046 (2.6.18-3-686 #1)
EIP is at mxser_receive_chars+0x21b/0x249 [mxser_new]
eax: 00000000 ebx: 00000000 ecx: 00000000 edx: 00000286
esi: f927b79c edi: 00000001 ebp: c1991800 esp: c0313efc
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, ti=c0312000 task=c02c76a0 task.ti=c0312000)
Stack: c0313f48 f927b8a8 0000903a 00000000 00000fff 000000ff 00000286 0000007b
f927b79c 00000006 0000007f 000000c6 f9276d7a 00000007 00000008 00000080
00000000 00000000 f927adc0 00000060 dfc74980 00000000 00000000 0000003a
Call Trace:
[<f9276d7a>] mxser_interrupt+0x15f/0x1e6 [mxser_new]
[<c013fb83>] handle_IRQ_event+0x23/0x49
[<c013fc3c>] __do_IRQ+0x93/0xe8
[<c01050e5>] do_IRQ+0x43/0x52
[<c01036b6>] common_interrupt+0x1a/0x20
[<c0101b91>] mwait_idle+0x25/0x38
[<c0101b52>] cpu_idle+0x9f/0xb9
[<c03186fd>] start_kernel+0x379/0x380
Code: ed ff ff eb 1f 8b 06 83 78 18 00 75 17 8b 56 08 83 c2 05 ec 8b 14 24 0f b6 c0 a8 01 89 02 0f 85 d4 fe ff ff 8b 46 04 8b 54 24 18 <8b> 40 08 01 3c 85 c4 dc 27 f9 8b 44 24 04 01 be f4 00 00 00 01
EIP: [<f927630f>] mxser_receive_chars+0x21b/0x249 [mxser_new] SS:ESP 0068:c0313efc
<0>Kernel panic - not syncing: Fatal exception in interrupt
BUG: warning at arch/i386/kernel/smp.c:547/smp_call_function()
[<c010f5a3>] smp_call_function+0x53/0xfe
[<c011d97e>] printk+0x14/0x18
[<c010f661>] smp_send_stop+0x13/0x1c
[<c011cfc6>] panic+0x4c/0xe2
[<c0104013>] die+0x256/0x28a
[<c01156e0>] do_page_fault+0x3b4/0x481
[<c01f9f8d>] tty_buffer_request_room+0x107/0x112
[<c011532c>] do_page_fault+0x0/0x481
[<c01037f9>] error_code+0x39/0x40
[<f927630f>] mxser_receive_chars+0x21b/0x249 [mxser_new]
[<f9276d7a>] mxser_interrupt+0x15f/0x1e6 [mxser_new]
[<c013fb83>] handle_IRQ_event+0x23/0x49
[<c013fc3c>] __do_IRQ+0x93/0xe8
[<c01050e5>] do_IRQ+0x43/0x52
[<c01036b6>] common_interrupt+0x1a/0x20
[<c0101b91>] mwait_idle+0x25/0x38
[<c0101b52>] cpu_idle+0x9f/0xb9
[<c03186fd>] start_kernel+0x379/0x380

2006-12-27 15:30:06

by Sergei Organov

[permalink] [raw]
Subject: Re: moxa serial driver testing

Jiri Slaby <[email protected]> writes:

>> Jiri Slaby <[email protected]> writes:
>>
>> > Could you test the patch below, if something changes?
>>
>> Just tested with low_latency commented out. Still oopses:
>>
>> BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
>> printing eip:
>> f8f1730f
>> *pde = 00000000
>> Oops: 0000 [#1]
>> SMP
>> Modules linked in: nvidia agpgart ipv6 nfs lockd nfs_acl sunrpc dm_mod sr_mod sbp2 ieee1394 ide_generic ide_disk e1000 snd_hda_intel snd_hda_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer i2c_i801 tsdev psmouse snd soundcore snd_page_alloc i2c_core serio_raw parport_pc parport mxser_new evdev floppy pcspkr rtc ext3 jbd mbcache usb_storage usbhid ide_cd cdrom sd_mod uhci_hcd piix usbcore skge ata_piix libata scsi_mod generic ide_core thermal processor fan
>> CPU: 0
>> EIP: 0060:[<f8f1730f>] Tainted: P VLI
>> EFLAGS: 00010046 (2.6.18-3-686 #1)
>> EIP is at mxser_receive_chars+0x21b/0x249 [mxser_new]
>
> Yes, port->tty still somewhere becomes NULL -- does this patch help?

In addition to my previous answer. I've performed the same tests with
regular PC serial port. The issues with "Disabling Irq #N" and with I/O
error on open() exist with this port as well, so they aren't
mxser-specific. I wasn't able to reproduce oopses, so they do seem to
be mxser-specific.

-- Sergei.

2006-12-28 17:15:05

by Sergei Organov

[permalink] [raw]
Subject: Re: moxa serial driver testing

Jiri Slaby <[email protected]> writes:

>> Jiri Slaby <[email protected]> writes:
>>
>> > Could you test the patch below, if something changes?
>>
>> Just tested with low_latency commented out. Still oopses:
>>
>> BUG: unable to handle kernel NULL pointer dereference at virtual address 00000008
>> printing eip:
>> f8f1730f
>> *pde = 00000000
>> Oops: 0000 [#1]
>> SMP
>> Modules linked in: ...
>> CPU: 0
>> EIP: 0060:[<f8f1730f>] Tainted: P VLI
>> EFLAGS: 00010046 (2.6.18-3-686 #1)
>> EIP is at mxser_receive_chars+0x21b/0x249 [mxser_new]
>
> Yes, port->tty still somewhere becomes NULL -- does this patch help?

Hi, Jiri!

I'm so sorry, I don't know what I smoked yesterday, but the latest
version you've sent me does not have this problem anymore! I think I
copied compiled module to modules directory for different kernel and
thus tested old code all the time. BTW, should you tell me that the
ports are now called /dev/ttyMIx instead of /dev/ttyMx, I think I'd
notice my mistake earlier. Yes, in fact I didn't test any version where
ports are called /dev/ttyMIx until now! In particular, it means that
maybe some of the recent changes you've made yesterday based on my wrong
reports are not in fact required.

Anyway, the only mxser-specific problem that currently remains for
me and that I didn't see before, is the following:

# rmmod mxser_new
Trying to free already-free IRQ 58
Trying to free nonexistent resource <0000000000009000-000000000000903f>
Trying to free nonexistent resource <0000000000008800-0000000000008800>
#

-- Sergei.

2006-12-28 23:31:32

by Jiri Slaby

[permalink] [raw]
Subject: Re: moxa serial driver testing

Sergei Organov wrote:
> Hi, Jiri!

Hi.

> I'm so sorry, I don't know what I smoked yesterday, but the latest
> version you've sent me does not have this problem anymore! I think I

YES!

> copied compiled module to modules directory for different kernel and
> thus tested old code all the time. BTW, should you tell me that the
> ports are now called /dev/ttyMIx instead of /dev/ttyMx, I think I'd

ttyM was reserved for isicom, and it caused many warnings in the kernel, when
both isicom and mxser were built and loaded. The proper name for mxser is (and
ever was) ttyMI -- sorry for not giving you a notice (I didn't realize the change).

> notice my mistake earlier. Yes, in fact I didn't test any version where
> ports are called /dev/ttyMIx until now! In particular, it means that
> maybe some of the recent changes you've made yesterday based on my wrong
> reports are not in fact required.

I think those with ASYNC_CLOSING test in the isr is the one (but also wakeup
spinlock change is requisite to go upstream).

> Anyway, the only mxser-specific problem that currently remains for
> me and that I didn't see before, is the following:
>
> # rmmod mxser_new
> Trying to free already-free IRQ 58
> Trying to free nonexistent resource <0000000000009000-000000000000903f>
> Trying to free nonexistent resource <0000000000008800-0000000000008800>

Thanks, I'll fix this and let you know. Does this happed every time you try to
unload it?

thanks,
--
http://www.fi.muni.cz/~xslaby/ Jiri Slaby
faculty of informatics, masaryk university, brno, cz
e-mail: jirislaby gmail com, gpg pubkey fingerprint:
B674 9967 0407 CE62 ACC8 22A0 32CC 55C3 39D4 7A7E

2006-12-29 08:56:39

by Sergei Organov

[permalink] [raw]
Subject: Re: moxa serial driver testing

Jiri Slaby <[email protected]> writes:

[...]

>> # rmmod mxser_new
>> Trying to free already-free IRQ 58
>> Trying to free nonexistent resource <0000000000009000-000000000000903f>
>> Trying to free nonexistent resource <0000000000008800-0000000000008800>
>
> Thanks, I'll fix this and let you know. Does this happed every time you try to
> unload it?

Yes, it's stable. Happens every time.

-- Sergei.