2006-05-26 18:22:58

by Frank Gevaerts

[permalink] [raw]
Subject: usb-serial ipaq kernel problem

Hi,

I got this when disconnecting an ipaq with 2,6,17rc4.

Frank

usb 1-4.5.7: USB disconnect, address 79
------------[ cut here ]------------
kernel BUG at kernel/workqueue.c:110!
invalid opcode: 0000 [#1]
Modules linked in: uhci_hcd ohci_hcd ehci_hcd ppp_deflate zlib_deflate bsd_comp ppp_async crc_ccitt ppp_generic slhc usbhid ipaq usbserial usbcore 8139too mii sr_mod sbp2 scsi_mod ieee1394 psmouse ide_generic ide_cd cdrom genrtc ext3 jbd mbcache ide_disk generic via82cxxx ide_core evdev mousedev
CPU: 0
EIP: 0060:[<b0121c03>] Not tainted VLI
EFLAGS: 00010213 (2.6.17-rc4 #3)
EIP is at queue_work+0x17/0x2f
eax: c1e5193c ebx: b13f2a40 ecx: 00000000 edx: c1e51938
esi: c7104160 edi: b9c90a14 ebp: 00000000 esp: c92bbeb8
ds: 007b es: 007b ss: 0068
Process khubd (pid: 1559, threadinfo=c92ba000 task=cbf6c050)
Stack: <0>c7104160 cc985ace c1e51800 b9c90a00 cc9cd980 cc9cd9a4 b9c90a14 cc9dd838
b9c90a00 b9c90a7c b9c90a14 b01fb254 b9c90a14 b9c90a14 00000000 cc9f0ba0
b01fb419 b9c90a14 b01fac3d b9c90a14 b9c90a5c b9c90a14 c8913058 00000000
Call Trace:
<cc985ace> usb_serial_disconnect+0x59/0xa1 [usbserial] <cc9dd838> usb_unbind_interface+0x36/0x6f [usbcore]
<b01fb254> __device_release_driver+0x5c/0x72 <b01fb419> device_release_driver+0x18/0x26
<b01fac3d> bus_remove_device+0x74/0x8c <b01fa0cc> device_del+0x39/0x65
<cc9dcaa1> usb_disable_device+0x6a/0xd4 [usbcore] <cc9d9225> usb_disconnect+0x7c/0xc9 [usbcore]
<cc9d9f3d> hub_thread+0x35b/0x9eb [usbcore] <b0123f84> autoremove_wake_function+0x0/0x3a
<b0123f36> kthread+0x80/0xc1 <cc9d9be2> hub_thread+0x0/0x9eb [usbcore]
<b0123f4a> kthread+0x94/0xc1 <b0123eb6> kthread+0x0/0xc1
<b0101005> kernel_thread_helper+0x5/0xb
Code: 89 d8 5b 5e 5f c3 89 d1 89 c2 a1 f4 71 32 b0 e9 86 ff ff ff 53 89 c3 0f ba 2a 00 19 c0 31 c9 85 c0 75 1c 8d 42 04 39 42 04 74 08 <0f> 0b 6e 00 64 61 27 b0 8b 03 e8 4a fc ff ff b9 01 00 00 00 5b
EIP: [<b0121c03>] queue_work+0x17/0x2f SS:ESP 0068:c92bbeb8

--
Frank Gevaerts [email protected]
fks bvba - Formal and Knowledge Systems http://www.fks.be/
Stationsstraat 108 Tel: ++32-(0)11-21 49 11
B-3570 ALKEN Fax: ++32-(0)11-22 04 19


2006-05-26 20:35:43

by Pete Zaitcev

[permalink] [raw]
Subject: Re: usb-serial ipaq kernel problem

On Fri, 26 May 2006 20:22:17 +0200, Frank Gevaerts <[email protected]> wrote:

> usb 1-4.5.7: USB disconnect, address 79
> ------------[ cut here ]------------
> kernel BUG at kernel/workqueue.c:110!

Please let me know if this helps:

--- linux-2.6.17-rc2/drivers/usb/serial/usb-serial.c 2006-04-23 21:06:18.000000000 -0700
+++ linux-2.6.17-rc2-lem/drivers/usb/serial/usb-serial.c 2006-05-22 21:23:29.000000000 -0700
@@ -162,6 +162,8 @@ static void destroy_serial(struct kref *
}
}

+ flush_scheduled_work(); /* port->work */
+
usb_put_dev(serial->dev);

/* free up any memory that we allocated */

-- Pete

2006-05-26 21:12:46

by Frank Gevaerts

[permalink] [raw]
Subject: Re: usb-serial ipaq kernel problem

On Fri, May 26, 2006 at 01:34:10PM -0700, Pete Zaitcev wrote:
> On Fri, 26 May 2006 20:22:17 +0200, Frank Gevaerts <[email protected]> wrote:
>
> > usb 1-4.5.7: USB disconnect, address 79
> > ------------[ cut here ]------------
> > kernel BUG at kernel/workqueue.c:110!
>
> Please let me know if this helps:

Thanks. It's now running with this applied. I'll let you know if it's
still running in a few days (I got the last one after running +- 2 days)

Frank

>
> --- linux-2.6.17-rc2/drivers/usb/serial/usb-serial.c 2006-04-23 21:06:18.000000000 -0700
> +++ linux-2.6.17-rc2-lem/drivers/usb/serial/usb-serial.c 2006-05-22 21:23:29.000000000 -0700
> @@ -162,6 +162,8 @@ static void destroy_serial(struct kref *
> }
> }
>
> + flush_scheduled_work(); /* port->work */
> +
> usb_put_dev(serial->dev);
>
> /* free up any memory that we allocated */
>
> -- Pete

--
Frank Gevaerts [email protected]
fks bvba - Formal and Knowledge Systems http://www.fks.be/
Stationsstraat 108 Tel: ++32-(0)11-21 49 11
B-3570 ALKEN Fax: ++32-(0)11-22 04 19

2006-05-27 11:41:48

by Frank Gevaerts

[permalink] [raw]
Subject: Re: usb-serial ipaq kernel problem

On Fri, May 26, 2006 at 01:34:10PM -0700, Pete Zaitcev wrote:
> On Fri, 26 May 2006 20:22:17 +0200, Frank Gevaerts <[email protected]> wrote:
>
> > usb 1-4.5.7: USB disconnect, address 79
> > ------------[ cut here ]------------
> > kernel BUG at kernel/workqueue.c:110!
>
> Please let me know if this helps:

It didn't help, I still get the error.

Frank

>
> --- linux-2.6.17-rc2/drivers/usb/serial/usb-serial.c 2006-04-23 21:06:18.000000000 -0700
> +++ linux-2.6.17-rc2-lem/drivers/usb/serial/usb-serial.c 2006-05-22 21:23:29.000000000 -0700
> @@ -162,6 +162,8 @@ static void destroy_serial(struct kref *
> }
> }
>
> + flush_scheduled_work(); /* port->work */
> +
> usb_put_dev(serial->dev);
>
> /* free up any memory that we allocated */
>
> -- Pete

--
Frank Gevaerts [email protected]
fks bvba - Formal and Knowledge Systems http://www.fks.be/
Stationsstraat 108 Tel: ++32-(0)11-21 49 11
B-3570 ALKEN Fax: ++32-(0)11-22 04 19

Subject: Re: usb-serial ipaq kernel problem


Hi Pete,

On Fri, 26 May 2006 13:34:10 -0700
Pete Zaitcev <[email protected]> wrote:

| On Fri, 26 May 2006 20:22:17 +0200, Frank Gevaerts <[email protected]> wrote:
|
| > usb 1-4.5.7: USB disconnect, address 79
| > ------------[ cut here ]------------
| > kernel BUG at kernel/workqueue.c:110!
|
| Please let me know if this helps:
|
| --- linux-2.6.17-rc2/drivers/usb/serial/usb-serial.c 2006-04-23 21:06:18.000000000 -0700
| +++ linux-2.6.17-rc2-lem/drivers/usb/serial/usb-serial.c 2006-05-22 21:23:29.000000000 -0700
| @@ -162,6 +162,8 @@ static void destroy_serial(struct kref *
| }
| }
|
| + flush_scheduled_work(); /* port->work */
| +
| usb_put_dev(serial->dev);
|
| /* free up any memory that we allocated */

IIUC, the problem occurred before the call to destroy_serial(),
otherwise it should be in the backtrace.

It seems that 'port->work' is becoming NULL when the device is
disconnected, but the ipaq_write_bulk_callback() is executing after
that.

I'm checking this also.

--
Luiz Fernando N. Capitulino

Subject: Re: usb-serial ipaq kernel problem

On Mon, 29 May 2006 12:01:02 -0300
"Luiz Fernando N. Capitulino" <[email protected]> wrote:

|
| Hi Pete,
|
| On Fri, 26 May 2006 13:34:10 -0700
| Pete Zaitcev <[email protected]> wrote:
|
| | On Fri, 26 May 2006 20:22:17 +0200, Frank Gevaerts <[email protected]> wrote:
| |
| | > usb 1-4.5.7: USB disconnect, address 79
| | > ------------[ cut here ]------------
| | > kernel BUG at kernel/workqueue.c:110!
| |
| | Please let me know if this helps:
| |
| | --- linux-2.6.17-rc2/drivers/usb/serial/usb-serial.c 2006-04-23 21:06:18.000000000 -0700
| | +++ linux-2.6.17-rc2-lem/drivers/usb/serial/usb-serial.c 2006-05-22 21:23:29.000000000 -0700
| | @@ -162,6 +162,8 @@ static void destroy_serial(struct kref *
| | }
| | }
| |
| | + flush_scheduled_work(); /* port->work */
| | +
| | usb_put_dev(serial->dev);
| |
| | /* free up any memory that we allocated */
|
| IIUC, the problem occurred before the call to destroy_serial(),
| otherwise it should be in the backtrace.
|
| It seems that 'port->work' is becoming NULL when the device is
| disconnected, but the ipaq_write_bulk_callback() is executing after
| that.

Err, I meant 'port->work->entry' is empty, of course.

--
Luiz Fernando N. Capitulino

Subject: Re: usb-serial ipaq kernel problem

On Mon, 29 May 2006 13:25:53 -0300
"Luiz Fernando N. Capitulino" <[email protected]> wrote:

| On Mon, 29 May 2006 12:01:02 -0300
| "Luiz Fernando N. Capitulino" <[email protected]> wrote:
|
| |
| | Hi Pete,
| |
| | On Fri, 26 May 2006 13:34:10 -0700
| | Pete Zaitcev <[email protected]> wrote:
| |
| | | On Fri, 26 May 2006 20:22:17 +0200, Frank Gevaerts <[email protected]> wrote:
| | |
| | | > usb 1-4.5.7: USB disconnect, address 79
| | | > ------------[ cut here ]------------
| | | > kernel BUG at kernel/workqueue.c:110!
| | |
| | | Please let me know if this helps:
| | |
| | | --- linux-2.6.17-rc2/drivers/usb/serial/usb-serial.c 2006-04-23 21:06:18.000000000 -0700
| | | +++ linux-2.6.17-rc2-lem/drivers/usb/serial/usb-serial.c 2006-05-22 21:23:29.000000000 -0700
| | | @@ -162,6 +162,8 @@ static void destroy_serial(struct kref *
| | | }
| | | }
| | |
| | | + flush_scheduled_work(); /* port->work */
| | | +
| | | usb_put_dev(serial->dev);
| | |
| | | /* free up any memory that we allocated */
| |
| | IIUC, the problem occurred before the call to destroy_serial(),
| | otherwise it should be in the backtrace.
| |
| | It seems that 'port->work' is becoming NULL when the device is
| | disconnected, but the ipaq_write_bulk_callback() is executing after
| | that.
|
| Err, I meant 'port->work->entry' is empty, of course.

Frank, could you try this one please?

I have no sure whether this makes sense, but every USB-Serial driver
I know exits in the write URB callback if the URB got an error.

-------------------

usbserial: ipaq: Don't handle URB on errors.

ipaq_write_bulk_callback() should exit if the URB got an error, otherwise
we can get weird problems.

Signed-off-by: Luiz Fernando N. Capitulino <[email protected]>

---

drivers/usb/serial/ipaq.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

6ad106187344769a4722f9e6d6f265529844d568
diff --git a/drivers/usb/serial/ipaq.c b/drivers/usb/serial/ipaq.c
index 9a5c979..d263a5e 100644
--- a/drivers/usb/serial/ipaq.c
+++ b/drivers/usb/serial/ipaq.c
@@ -855,6 +855,7 @@ static void ipaq_write_bulk_callback(str

if (urb->status) {
dbg("%s - nonzero write bulk status received: %d", __FUNCTION__, urb->status);
+ return;
}

spin_lock_irqsave(&write_list_lock, flags);
--
1.3.2



--
Luiz Fernando N. Capitulino

2006-05-29 19:44:23

by Frank Gevaerts

[permalink] [raw]
Subject: Re: usb-serial ipaq kernel problem

On Mon, May 29, 2006 at 02:11:10PM -0300, Luiz Fernando N. Capitulino wrote:
>
> Frank, could you try this one please?
>
> I have no sure whether this makes sense, but every USB-Serial driver
> I know exits in the write URB callback if the URB got an error.

It looks sane to me at least.
The machine is now running with this patch (and my ipaq_open patch, see
http://www.ussg.iu.edu/hypermail/linux/kernel/0605.2/1901.html).
If the problem is still there, it should occur within 24 hours in our
usage mode (25 ipaqs rebooting every 15 minutes to provide lots of
connects and disconnects). I'll let you know the results.

Frank

>
> -------------------
>
> usbserial: ipaq: Don't handle URB on errors.
>
> ipaq_write_bulk_callback() should exit if the URB got an error, otherwise
> we can get weird problems.
>
> Signed-off-by: Luiz Fernando N. Capitulino <[email protected]>
>
> ---
>
> drivers/usb/serial/ipaq.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> 6ad106187344769a4722f9e6d6f265529844d568
> diff --git a/drivers/usb/serial/ipaq.c b/drivers/usb/serial/ipaq.c
> index 9a5c979..d263a5e 100644
> --- a/drivers/usb/serial/ipaq.c
> +++ b/drivers/usb/serial/ipaq.c
> @@ -855,6 +855,7 @@ static void ipaq_write_bulk_callback(str
>
> if (urb->status) {
> dbg("%s - nonzero write bulk status received: %d", __FUNCTION__, urb->status);
> + return;
> }
>
> spin_lock_irqsave(&write_list_lock, flags);
> --
> 1.3.2
>
>
>
> --
> Luiz Fernando N. Capitulino

--
Frank Gevaerts [email protected]
fks bvba - Formal and Knowledge Systems http://www.fks.be/
Stationsstraat 108 Tel: ++32-(0)11-21 49 11
B-3570 ALKEN Fax: ++32-(0)11-22 04 19

Subject: Re: usb-serial ipaq kernel problem

On Mon, 29 May 2006 21:43:35 +0200
Frank Gevaerts <[email protected]> wrote:

| On Mon, May 29, 2006 at 02:11:10PM -0300, Luiz Fernando N. Capitulino wrote:
| >
| > Frank, could you try this one please?
| >
| > I have no sure whether this makes sense, but every USB-Serial driver
| > I know exits in the write URB callback if the URB got an error.
|
| It looks sane to me at least.
| The machine is now running with this patch (and my ipaq_open patch, see
| http://www.ussg.iu.edu/hypermail/linux/kernel/0605.2/1901.html).

Hmmm. Then does the workqueue problem began to happen _after_ you applied
your patch?

Are you sure your patch is the right thing to do? Does it look reasonable
to submit that urb 1000 times that way?

At first, it seems something else.

Couldn't you run your test-case in a kernel previous to the TTY layer
buffering revamp change?

| If the problem is still there, it should occur within 24 hours in our
| usage mode (25 ipaqs rebooting every 15 minutes to provide lots of
| connects and disconnects). I'll let you know the results.

Wow, nice.

--
Luiz Fernando N. Capitulino

2006-05-29 20:47:56

by Frank Gevaerts

[permalink] [raw]
Subject: Re: usb-serial ipaq kernel problem

On Mon, May 29, 2006 at 05:24:10PM -0300, Luiz Fernando N. Capitulino wrote:
> On Mon, 29 May 2006 21:43:35 +0200
> Frank Gevaerts <[email protected]> wrote:
>
> | On Mon, May 29, 2006 at 02:11:10PM -0300, Luiz Fernando N. Capitulino wrote:
> | >
> | > Frank, could you try this one please?
> | >
> | > I have no sure whether this makes sense, but every USB-Serial driver
> | > I know exits in the write URB callback if the URB got an error.
> |
> | It looks sane to me at least.
> | The machine is now running with this patch (and my ipaq_open patch, see
> | http://www.ussg.iu.edu/hypermail/linux/kernel/0605.2/1901.html).
>
> Hmmm. Then does the workqueue problem began to happen _after_ you applied
> your patch?

No. I saw it a few times before that as well. Here is the oldest one I found (using 2.6.15)

May 8 13:11:17 localhost kernel: kernel BUG at kernel/workqueue.c:109!
May 8 13:11:17 localhost kernel: invalid operand: 0000 [#1]
May 8 13:11:17 localhost kernel: Modules linked in: ppp_generic slhc ipaq usbserial sd_mod uhci_hcd yealink usb_storage usbhid ohci_hcd ehci_hcd usbcore tun 8139too mii sr_mod sbp2 scsi_mod ieee1394 psmouse ide_generic ide_cd cdrom genrtc ext3 jbd mbcache ide_disk generic via82cxxx ide_core evdev mousedev
May 8 13:11:17 localhost kernel: CPU: 0
May 8 13:11:17 localhost kernel: EIP: 0060:[queue_work+23/50] Not tainted VLI
May 8 13:11:17 localhost kernel: EFLAGS: 00010213 (2.6.15-1-686)
May 8 13:11:17 localhost kernel: EIP is at queue_work+0x17/0x32
May 8 13:11:17 localhost kernel: eax: d7fcf944 ebx: dbec6680 ecx: 00000000 edx: d7fcf940
May 8 13:11:17 localhost kernel: esi: 00000000 edi: d9b6aa14 ebp: d9b6aa14 esp: d7a03e18
May 8 13:11:17 localhost kernel: ds: 007b es: 007b ss: 0068
May 8 13:11:17 localhost kernel: Process rmmod (pid: 4340, threadinfo=d7a02000 task=d77fb050)
May 8 13:11:17 localhost kernel: Stack: d9ab4f20 dca3a9df d7fcf000 d9b6aa00 dca36740 dca36760 dc9e70ea d9b6aa00
May 8 13:11:17 localhost kernel: d9b6aa7c d9b6aa14 c0202e2a d9b6aa14 d9b6aa14 d9fe1068 d9fe1000 c0202e5c
May 8 13:11:17 localhost kernel: d9b6aa14 d9b6aa14 c02027f9 d9b6aa14 d9b6aa14 c0201ae9 d9b6aa14 00000000
May 8 13:11:17 localhost kernel: Call Trace:
May 8 13:11:17 localhost kernel: [pg0+476928479/1070175232] usb_serial_disconnect+0x5b/0x9f [usbserial]
May 8 13:11:17 localhost kernel: [pg0+476586218/1070175232] usb_unbind_interface+0x36/0x6f [usbcore]
May 8 13:11:17 localhost kernel: [__device_release_driver+72/99] __device_release_driver+0x48/0x63
May 8 13:11:17 localhost kernel: [device_release_driver+23/38] device_release_driver+0x17/0x26
May 8 13:11:17 localhost kernel: [bus_remove_device+82/101] bus_remove_device+0x52/0x65
May 8 13:11:17 localhost kernel: [device_del+57/101] device_del+0x39/0x65
May 8 13:11:17 localhost kernel: [pg0+476612208/1070175232] usb_disable_device+0x73/0xe7 [usbcore]
May 8 13:11:17 localhost kernel: [pg0+476594141/1070175232] usb_disconnect+0x93/0xec [usbcore]
May 8 13:11:17 localhost kernel: [pg0+476594123/1070175232] usb_disconnect+0x81/0xec [usbcore]
May 8 13:11:17 localhost kernel: [pg0+476594123/1070175232] usb_disconnect+0x81/0xec [usbcore]
May 8 13:11:17 localhost kernel: [pg0+476607388/1070175232] usb_remove_hcd+0x58/0xa3 [usbcore]
May 8 13:11:17 localhost kernel: [pg0+476632530/1070175232] usb_hcd_pci_remove+0x16/0x77 [usbcore]
May 8 13:11:17 localhost kernel: [pci_device_remove+25/44] pci_device_remove+0x19/0x2c
May 8 13:11:17 localhost kernel: [__device_release_driver+72/99] __device_release_driver+0x48/0x63
May 8 13:11:17 localhost kernel: [driver_detach+54/76] driver_detach+0x36/0x4c
May 8 13:11:17 localhost kernel: [bus_remove_driver+60/93] bus_remove_driver+0x3c/0x5d
May 8 13:11:17 localhost kernel: [driver_unregister+11/21] driver_unregister+0xb/0x15
May 8 13:11:17 localhost kernel: [pci_unregister_driver+14/25] pci_unregister_driver+0xe/0x19
May 8 13:11:17 localhost kernel: [pg0+476326132/1070175232] ehci_hcd_pci_cleanup+0xa/0xc [ehci_hcd]
May 8 13:11:17 localhost kernel: [sys_delete_module+304/347] sys_delete_module+0x130/0x15b
May 8 13:11:17 localhost kernel: [do_munmap+223/235] do_munmap+0xdf/0xeb
May 8 13:11:17 localhost kernel: [sys_munmap+58/85] sys_munmap+0x3a/0x55
May 8 13:11:17 localhost kernel: [sysenter_past_esp+84/117] sysenter_past_esp+0x54/0x75
May 8 13:11:17 localhost kernel: Code: ff 40 04 83 c0 10 6a 00 e8 fb 17 ff ff 58 57 9d 5b 5e 5f c3 53 31 c9 89 c3 0f ba 2a 00 19 c0 85 c0 75 1f 8d 42 04 39 42 04 74 08 <0f> 0b 6d 00 65 5f 28 c0 52 ff 33 e8 96 ff ff ff 5a b9 01 00 00
May 8 13:11:17 localhost kernel: <6>ohci_hcd 0000:00:0a.1: remove, state 1

> Are you sure your patch is the right thing to do? Does it look reasonable
> to submit that urb 1000 times that way?

It only submits it once, just after the control message has succeeded.
The loop is needed because sometimes the ipaq takes a very long time
(more than a minute) before it starts accepting the control message.

> At first, it seems something else.
>
> Couldn't you run your test-case in a kernel previous to the TTY layer
> buffering revamp change?

We first used 2.6.15. We got different types of error : a panic in
ipaq_read_bulk_callback(), the bug I mentionned in
http://www.ussg.iu.edu/hypermail/linux/kernel/0605.2/1770.html and the
current problem. We first tried upgrading to 2.6.16, which did not help.

The panic was caused by the read urb being submitten in ipaq_open,
regardless of success, and never killed in case of failure. What my
patch basically does is to submit the urb only after succesfully sending
the control message, and adding a sleep between tries. As long as this
patch is not applied, we hardly get any other error because the kernel
panics as soon as an ipaq reboots.

After changing ipaq_open, we did not get the panic any more, and the
first error (in do_tty_hangup) seems to have gone at the same time, but
the usb_serial_disconnect bug was still there.

Frank

> | If the problem is still there, it should occur within 24 hours in our
> | usage mode (25 ipaqs rebooting every 15 minutes to provide lots of
> | connects and disconnects). I'll let you know the results.
>
> Wow, nice.
>
> --
> Luiz Fernando N. Capitulino

--
Frank Gevaerts [email protected]
fks bvba - Formal and Knowledge Systems http://www.fks.be/
Stationsstraat 108 Tel: ++32-(0)11-21 49 11
B-3570 ALKEN Fax: ++32-(0)11-22 04 19

Subject: Re: usb-serial ipaq kernel problem

On Mon, 29 May 2006 22:47:24 +0200
Frank Gevaerts <[email protected]> wrote:

| On Mon, May 29, 2006 at 05:24:10PM -0300, Luiz Fernando N. Capitulino wrote:
| > On Mon, 29 May 2006 21:43:35 +0200
| > Frank Gevaerts <[email protected]> wrote:
| >
| > | On Mon, May 29, 2006 at 02:11:10PM -0300, Luiz Fernando N. Capitulino wrote:
| > | >
| > | > Frank, could you try this one please?
| > | >
| > | > I have no sure whether this makes sense, but every USB-Serial driver
| > | > I know exits in the write URB callback if the URB got an error.
| > |
| > | It looks sane to me at least.
| > | The machine is now running with this patch (and my ipaq_open patch, see
| > | http://www.ussg.iu.edu/hypermail/linux/kernel/0605.2/1901.html).
| >
| > Hmmm. Then does the workqueue problem began to happen _after_ you applied
| > your patch?
|
| No. I saw it a few times before that as well. Here is the oldest one I found (using 2.6.15)

Okay.

| > Are you sure your patch is the right thing to do? Does it look reasonable
| > to submit that urb 1000 times that way?
|
| It only submits it once, just after the control message has succeeded.

Oh, that's right. I didn't see the return statement.

| The loop is needed because sometimes the ipaq takes a very long time
| (more than a minute) before it starts accepting the control message.

Ok.

| > At first, it seems something else.
| >
| > Couldn't you run your test-case in a kernel previous to the TTY layer
| > buffering revamp change?
|
| We first used 2.6.15. We got different types of error : a panic in
| ipaq_read_bulk_callback(), the bug I mentionned in
| http://www.ussg.iu.edu/hypermail/linux/kernel/0605.2/1770.html and the
| current problem. We first tried upgrading to 2.6.16, which did not help.
|
| The panic was caused by the read urb being submitten in ipaq_open,
| regardless of success, and never killed in case of failure. What my
| patch basically does is to submit the urb only after succesfully sending
| the control message, and adding a sleep between tries. As long as this
| patch is not applied, we hardly get any other error because the kernel
| panics as soon as an ipaq reboots.

I see.

Did you try to just kill the read urb in the ipaq_open's error path?

--
Luiz Fernando N. Capitulino

2006-05-30 08:22:15

by Frank Gevaerts

[permalink] [raw]
Subject: Re: usb-serial ipaq kernel problem

On Mon, May 29, 2006 at 07:33:30PM -0300, Luiz Fernando N. Capitulino wrote:
> On Mon, 29 May 2006 22:47:24 +0200
> Frank Gevaerts <[email protected]> wrote:
> |
> | The panic was caused by the read urb being submitten in ipaq_open,
> | regardless of success, and never killed in case of failure. What my
> | patch basically does is to submit the urb only after succesfully sending
> | the control message, and adding a sleep between tries. As long as this
> | patch is not applied, we hardly get any other error because the kernel
> | panics as soon as an ipaq reboots.
>
> I see.
>
> Did you try to just kill the read urb in the ipaq_open's error path?

Yes, that's what I did at first. It works, but with the long waits (we see
waits up to 80-90 seconds right now) I was afraid that the urb might timeout
before the control message succeeds.

Frank

>
> --
> Luiz Fernando N. Capitulino

--
Frank Gevaerts [email protected]
fks bvba - Formal and Knowledge Systems http://www.fks.be/
Stationsstraat 108 Tel: ++32-(0)11-21 49 11
B-3570 ALKEN Fax: ++32-(0)11-22 04 19

Subject: Re: usb-serial ipaq kernel problem

On Tue, 30 May 2006 10:21:41 +0200
Frank Gevaerts <[email protected]> wrote:

| On Mon, May 29, 2006 at 07:33:30PM -0300, Luiz Fernando N. Capitulino wrote:
| > On Mon, 29 May 2006 22:47:24 +0200
| > Frank Gevaerts <[email protected]> wrote:
| > |
| > | The panic was caused by the read urb being submitten in ipaq_open,
| > | regardless of success, and never killed in case of failure. What my
| > | patch basically does is to submit the urb only after succesfully sending
| > | the control message, and adding a sleep between tries. As long as this
| > | patch is not applied, we hardly get any other error because the kernel
| > | panics as soon as an ipaq reboots.
| >
| > I see.
| >
| > Did you try to just kill the read urb in the ipaq_open's error path?
|
| Yes, that's what I did at first. It works, but with the long waits (we see
| waits up to 80-90 seconds right now) I was afraid that the urb might timeout
| before the control message succeeds.

Hmmm, I see.

My only (obvious) question is that if it's really safe to send the read
urb after the control message..

Greg, are you following this thread? WDT?

Anyways, if it's safe to do it, I do prefer the following (not tested)
version. Which is cleaner, IMO.

------------

diff --git a/drivers/usb/serial/ipaq.c b/drivers/usb/serial/ipaq.c
index 9a5c979..9333169 100644
--- a/drivers/usb/serial/ipaq.c
+++ b/drivers/usb/serial/ipaq.c
@@ -646,17 +646,6 @@ static int ipaq_open(struct usb_serial_p
port->write_urb->transfer_buffer = port->bulk_out_buffer;
port->read_urb->transfer_buffer_length = URBDATA_SIZE;
port->bulk_out_size = port->write_urb->transfer_buffer_length = URBDATA_SIZE;
-
- /* Start reading from the device */
- usb_fill_bulk_urb(port->read_urb, serial->dev,
- usb_rcvbulkpipe(serial->dev, port->bulk_in_endpointAddress),
- port->read_urb->transfer_buffer, port->read_urb->transfer_buffer_length,
- ipaq_read_bulk_callback, port);
- result = usb_submit_urb(port->read_urb, GFP_KERNEL);
- if (result) {
- err("%s - failed submitting read urb, error %d", __FUNCTION__, result);
- goto error;
- }

/*
* Send out control message observed in win98 sniffs. Not sure what
@@ -665,17 +654,31 @@ static int ipaq_open(struct usb_serial_p
* through. Since this has a reasonably high failure rate, we retry
* several times.
*/
-
while (retries--) {
result = usb_control_msg(serial->dev,
usb_sndctrlpipe(serial->dev, 0), 0x22, 0x21,
0x1, 0, NULL, 0, 100);
- if (result == 0) {
- return 0;
- }
+ if (!result)
+ break;
+ }
+ if (result) {
+ err("%s - failed doing control urb, error %d", __FUNCTION__,
+ result);
+ goto error;
}
- err("%s - failed doing control urb, error %d", __FUNCTION__, result);
- goto error;
+
+ /* Start reading from the device */
+ usb_fill_bulk_urb(port->read_urb, serial->dev,
+ usb_rcvbulkpipe(serial->dev, port->bulk_in_endpointAddress),
+ port->read_urb->transfer_buffer, port->read_urb->transfer_buffer_length,
+ ipaq_read_bulk_callback, port);
+ result = usb_submit_urb(port->read_urb, GFP_KERNEL);
+ if (result) {
+ err("%s - failed submitting read urb, error %d", __FUNCTION__, result);
+ goto error;
+ }
+
+ return 0;

enomem:
result = -ENOMEM;


--
Luiz Fernando N. Capitulino

Subject: Re: usb-serial ipaq kernel problem

On Tue, 30 May 2006 11:38:01 -0300
"Luiz Fernando N. Capitulino" <[email protected]> wrote:

| On Tue, 30 May 2006 10:21:41 +0200
| Frank Gevaerts <[email protected]> wrote:
|
| | On Mon, May 29, 2006 at 07:33:30PM -0300, Luiz Fernando N. Capitulino wrote:
| | > On Mon, 29 May 2006 22:47:24 +0200
| | > Frank Gevaerts <[email protected]> wrote:
| | > |
| | > | The panic was caused by the read urb being submitten in ipaq_open,
| | > | regardless of success, and never killed in case of failure. What my
| | > | patch basically does is to submit the urb only after succesfully sending
| | > | the control message, and adding a sleep between tries. As long as this
| | > | patch is not applied, we hardly get any other error because the kernel
| | > | panics as soon as an ipaq reboots.
| | >
| | > I see.
| | >
| | > Did you try to just kill the read urb in the ipaq_open's error path?
| |
| | Yes, that's what I did at first. It works, but with the long waits (we see
| | waits up to 80-90 seconds right now) I was afraid that the urb might timeout
| | before the control message succeeds.
|
| Hmmm, I see.

Thinking about this again, are you sure the read urb depends on the
control message? It's quite easy to test, just a add a long timeout after
the read URB was sent (say, five minutes) and waits for the read urb
callback to run.

If it ran _before_ the timeout expires with no timeout error it does not
depend. Then we can do the simpler solution: just kill the read urb in the
ipaq_open's error path.

--
Luiz Fernando N. Capitulino

2006-05-30 15:06:54

by Frank Gevaerts

[permalink] [raw]
Subject: Re: usb-serial ipaq kernel problem

On Tue, May 30, 2006 at 11:38:01AM -0300, Luiz Fernando N. Capitulino wrote:
> On Tue, 30 May 2006 10:21:41 +0200
> Frank Gevaerts <[email protected]> wrote:
>
> | On Mon, May 29, 2006 at 07:33:30PM -0300, Luiz Fernando N. Capitulino wrote:
> | > On Mon, 29 May 2006 22:47:24 +0200
> | > I see.
> | >
> | > Did you try to just kill the read urb in the ipaq_open's error path?
> |
> | Yes, that's what I did at first. It works, but with the long waits (we see
> | waits up to 80-90 seconds right now) I was afraid that the urb might timeout
> | before the control message succeeds.
>
> Hmmm, I see.
>
> My only (obvious) question is that if it's really safe to send the read
> urb after the control message..

Since it is bulk, it is not guaranteed to start before the control
message anyway, so it should be safe.

The patch looks correct to me, but I would still like to increase
KP_RETRIES a bit. If I read the code correctly, the current setup allows
for 10 seconds between usb connect and acknowledging the control
message. This is enough if the device is only connected when booted
(which is the normal use case). However, in our case, we do
software-initiated reboots of the ipaq while it is attached to the usb
bus, which can take much longer, so for us KP_RETRIES should be at least
1000, maybe 2000. Of course, we can always run a patched kernel for this.

I'll test the patch later today.

Anyway, we have not seen the usb_serial_disconnect bug since applying
your patch, so that bug is also probably gone (we have had nearly 1000
successfull connects/disconnects since then)

Frank

> Greg, are you following this thread? WDT?
>
> Anyways, if it's safe to do it, I do prefer the following (not tested)
> version. Which is cleaner, IMO.
>
> ------------
>
> diff --git a/drivers/usb/serial/ipaq.c b/drivers/usb/serial/ipaq.c
> index 9a5c979..9333169 100644
> --- a/drivers/usb/serial/ipaq.c
> +++ b/drivers/usb/serial/ipaq.c
> @@ -646,17 +646,6 @@ static int ipaq_open(struct usb_serial_p
> port->write_urb->transfer_buffer = port->bulk_out_buffer;
> port->read_urb->transfer_buffer_length = URBDATA_SIZE;
> port->bulk_out_size = port->write_urb->transfer_buffer_length = URBDATA_SIZE;
> -
> - /* Start reading from the device */
> - usb_fill_bulk_urb(port->read_urb, serial->dev,
> - usb_rcvbulkpipe(serial->dev, port->bulk_in_endpointAddress),
> - port->read_urb->transfer_buffer, port->read_urb->transfer_buffer_length,
> - ipaq_read_bulk_callback, port);
> - result = usb_submit_urb(port->read_urb, GFP_KERNEL);
> - if (result) {
> - err("%s - failed submitting read urb, error %d", __FUNCTION__, result);
> - goto error;
> - }
>
> /*
> * Send out control message observed in win98 sniffs. Not sure what
> @@ -665,17 +654,31 @@ static int ipaq_open(struct usb_serial_p
> * through. Since this has a reasonably high failure rate, we retry
> * several times.
> */
> -
> while (retries--) {
> result = usb_control_msg(serial->dev,
> usb_sndctrlpipe(serial->dev, 0), 0x22, 0x21,
> 0x1, 0, NULL, 0, 100);
> - if (result == 0) {
> - return 0;
> - }
> + if (!result)
> + break;
> + }
> + if (result) {
> + err("%s - failed doing control urb, error %d", __FUNCTION__,
> + result);
> + goto error;
> }
> - err("%s - failed doing control urb, error %d", __FUNCTION__, result);
> - goto error;
> +
> + /* Start reading from the device */
> + usb_fill_bulk_urb(port->read_urb, serial->dev,
> + usb_rcvbulkpipe(serial->dev, port->bulk_in_endpointAddress),
> + port->read_urb->transfer_buffer, port->read_urb->transfer_buffer_length,
> + ipaq_read_bulk_callback, port);
> + result = usb_submit_urb(port->read_urb, GFP_KERNEL);
> + if (result) {
> + err("%s - failed submitting read urb, error %d", __FUNCTION__, result);
> + goto error;
> + }
> +
> + return 0;
>
> enomem:
> result = -ENOMEM;
>
>
> --
> Luiz Fernando N. Capitulino

--
Frank Gevaerts [email protected]
fks bvba - Formal and Knowledge Systems http://www.fks.be/
Stationsstraat 108 Tel: ++32-(0)11-21 49 11
B-3570 ALKEN Fax: ++32-(0)11-22 04 19

2006-05-30 15:10:17

by Frank Gevaerts

[permalink] [raw]
Subject: Re: usb-serial ipaq kernel problem

On Tue, May 30, 2006 at 11:53:29AM -0300, Luiz Fernando N. Capitulino wrote:
> On Tue, 30 May 2006 11:38:01 -0300
> "Luiz Fernando N. Capitulino" <[email protected]> wrote:
>
> | On Tue, 30 May 2006 10:21:41 +0200
> | Frank Gevaerts <[email protected]> wrote:
> |
> | | On Mon, May 29, 2006 at 07:33:30PM -0300, Luiz Fernando N. Capitulino wrote:
> | | > On Mon, 29 May 2006 22:47:24 +0200
> | | > Frank Gevaerts <[email protected]> wrote:
> | | > |
> | | > | The panic was caused by the read urb being submitten in ipaq_open,
> | | > | regardless of success, and never killed in case of failure. What my
> | | > | patch basically does is to submit the urb only after succesfully sending
> | | > | the control message, and adding a sleep between tries. As long as this
> | | > | patch is not applied, we hardly get any other error because the kernel
> | | > | panics as soon as an ipaq reboots.
> | | >
> | | > I see.
> | | >
> | | > Did you try to just kill the read urb in the ipaq_open's error path?
> | |
> | | Yes, that's what I did at first. It works, but with the long waits (we see
> | | waits up to 80-90 seconds right now) I was afraid that the urb might timeout
> | | before the control message succeeds.
> |
> | Hmmm, I see.
>
> Thinking about this again, are you sure the read urb depends on the
> control message? It's quite easy to test, just a add a long timeout after
> the read URB was sent (say, five minutes) and waits for the read urb
> callback to run.
>
> If it ran _before_ the timeout expires with no timeout error it does not
> depend. Then we can do the simpler solution: just kill the read urb in the
> ipaq_open's error path.

I'll try it sometime today.

Frank

>
> --
> Luiz Fernando N. Capitulino

--
Frank Gevaerts [email protected]
fks bvba - Formal and Knowledge Systems http://www.fks.be/
Stationsstraat 108 Tel: ++32-(0)11-21 49 11
B-3570 ALKEN Fax: ++32-(0)11-22 04 19

Subject: Re: usb-serial ipaq kernel problem

On Tue, 30 May 2006 17:06:27 +0200
Frank Gevaerts <[email protected]> wrote:

| On Tue, May 30, 2006 at 11:38:01AM -0300, Luiz Fernando N. Capitulino wrote:
| > On Tue, 30 May 2006 10:21:41 +0200
| > Frank Gevaerts <[email protected]> wrote:
| >
| > | On Mon, May 29, 2006 at 07:33:30PM -0300, Luiz Fernando N. Capitulino wrote:
| > | > On Mon, 29 May 2006 22:47:24 +0200
| > | > I see.
| > | >
| > | > Did you try to just kill the read urb in the ipaq_open's error path?
| > |
| > | Yes, that's what I did at first. It works, but with the long waits (we see
| > | waits up to 80-90 seconds right now) I was afraid that the urb might timeout
| > | before the control message succeeds.
| >
| > Hmmm, I see.
| >
| > My only (obvious) question is that if it's really safe to send the read
| > urb after the control message..
|
| Since it is bulk, it is not guaranteed to start before the control
| message anyway, so it should be safe.
|
| The patch looks correct to me, but I would still like to increase
| KP_RETRIES a bit. If I read the code correctly, the current setup allows
| for 10 seconds between usb connect and acknowledging the control
| message. This is enough if the device is only connected when booted
| (which is the normal use case). However, in our case, we do
| software-initiated reboots of the ipaq while it is attached to the usb
| bus, which can take much longer, so for us KP_RETRIES should be at least
| 1000, maybe 2000. Of course, we can always run a patched kernel for this.

Hmmm, what do you think about keeping the current default value and
adding a module parameter to change it?

| I'll test the patch later today.
|
| Anyway, we have not seen the usb_serial_disconnect bug since applying
| your patch, so that bug is also probably gone (we have had nearly 1000
| successfull connects/disconnects since then)

Nice to know.

--
Luiz Fernando N. Capitulino

2006-05-30 17:48:59

by Frank Gevaerts

[permalink] [raw]
Subject: Re: usb-serial ipaq kernel problem

On Tue, May 30, 2006 at 11:53:29AM -0300, Luiz Fernando N. Capitulino wrote:
> On Tue, 30 May 2006 11:38:01 -0300
> "Luiz Fernando N. Capitulino" <[email protected]> wrote:
>
> If it ran _before_ the timeout expires with no timeout error it does not
> depend. Then we can do the simpler solution: just kill the read urb in the
> ipaq_open's error path.

That seems to work.
I also found that both the return in ipaq_write_bulk_callback and the
flush_scheduled_work() in destroy_serial() are needed to get rid of the
usb_serial_disconnect() bug.

It's now running with the following patch:

Signed-off-by: Frank Gevaerts <[email protected]>

diff -pur linux-2.6.17-rc4/drivers/usb/serial/ipaq.c linux-2.6.17-rc4.test/drivers/usb/serial/ipaq.c
--- linux-2.6.17-rc4/drivers/usb/serial/ipaq.c 2006-03-20 06:53:29.000000000 +0100
+++ linux-2.6.17-rc4.test/drivers/usb/serial/ipaq.c 2006-05-30 19:41:19.000000000 +0200
@@ -71,6 +71,7 @@

static __u16 product, vendor;
static int debug;
+static int connect_retries;

/* Function prototypes for an ipaq */
static int ipaq_open (struct usb_serial_port *port, struct file *filp);
@@ -583,7 +584,7 @@ static int ipaq_open(struct usb_serial_p
struct ipaq_private *priv;
struct ipaq_packet *pkt;
int i, result = 0;
- int retries = KP_RETRIES;
+ int retries = connect_retries;

dbg("%s - port %d", __FUNCTION__, port->number);

@@ -681,6 +682,7 @@ enomem:
result = -ENOMEM;
err("%s - Out of memory", __FUNCTION__);
error:
+ usb_kill_urb(port->read_urb);
ipaq_destroy_lists(port);
kfree(priv);
return result;
@@ -692,6 +694,7 @@ static void ipaq_close(struct usb_serial
struct ipaq_private *priv = usb_get_serial_port_data(port);

dbg("%s - port %d", __FUNCTION__, port->number);
+

/*
* shut down bulk read and write
@@ -855,6 +858,7 @@ static void ipaq_write_bulk_callback(str

if (urb->status) {
dbg("%s - nonzero write bulk status received: %d", __FUNCTION__, urb->status);
+ return;
}

spin_lock_irqsave(&write_list_lock, flags);
@@ -967,3 +971,6 @@ MODULE_PARM_DESC(vendor, "User specified

module_param(product, ushort, 0);
MODULE_PARM_DESC(product, "User specified USB idProduct");
+
+module_param(connect_retries, int, KP_RETRIES);
+MODULE_PARM_DESC(product, "Maximum number of connect retries (100ms each)");
diff -pur linux-2.6.17-rc4/drivers/usb/serial/usb-serial.c linux-2.6.17-rc4.test/drivers/usb/serial/usb-serial.c
--- linux-2.6.17-rc4/drivers/usb/serial/usb-serial.c 2006-05-30 19:01:16.000000000 +0200
+++ linux-2.6.17-rc4.test/drivers/usb/serial/usb-serial.c 2006-05-30 19:01:24.000000000 +0200
@@ -162,6 +162,8 @@ static void destroy_serial(struct kref *
}
}

+ flush_scheduled_work(); /* port->work */
+
usb_put_dev(serial->dev);

/* free up any memory that we allocated */

Frank

--
Frank Gevaerts [email protected]
fks bvba - Formal and Knowledge Systems http://www.fks.be/
Stationsstraat 108 Tel: ++32-(0)11-21 49 11
B-3570 ALKEN Fax: ++32-(0)11-22 04 19

2006-05-30 18:35:00

by Pete Zaitcev

[permalink] [raw]
Subject: Re: usb-serial ipaq kernel problem

On Tue, 30 May 2006 19:48:21 +0200, Frank Gevaerts <[email protected]> wrote:
+0100
> +++ linux-2.6.17-rc4.test/drivers/usb/serial/ipaq.c 2006-05-30 19:41:19.000000000 +0200
> @@ -692,6 +694,7 @@ static void ipaq_close(struct usb_serial
> struct ipaq_private *priv = usb_get_serial_port_data(port);
>
> dbg("%s - port %d", __FUNCTION__, port->number);
> +
>
> /*
> * shut down bulk read and write

Please get rid of the above.

> @@ -967,3 +971,6 @@ MODULE_PARM_DESC(vendor, "User specified
>
> module_param(product, ushort, 0);
> MODULE_PARM_DESC(product, "User specified USB idProduct");
> +
> +module_param(connect_retries, int, KP_RETRIES);
> +MODULE_PARM_DESC(product, "Maximum number of connect retries (100ms each)");

Personally, I'm not keen on adding knobs.

-- Pete

2006-05-30 19:05:04

by Frank Gevaerts

[permalink] [raw]
Subject: Re: usb-serial ipaq kernel problem

On Tue, May 30, 2006 at 11:33:27AM -0700, Pete Zaitcev wrote:
> On Tue, 30 May 2006 19:48:21 +0200, Frank Gevaerts <[email protected]> wrote:
> +0100
> > +++ linux-2.6.17-rc4.test/drivers/usb/serial/ipaq.c 2006-05-30 19:41:19.000000000 +0200
> > @@ -692,6 +694,7 @@ static void ipaq_close(struct usb_serial
> > struct ipaq_private *priv = usb_get_serial_port_data(port);
> >
> > dbg("%s - port %d", __FUNCTION__, port->number);
> > +
> >
> > /*
> > * shut down bulk read and write
>
> Please get rid of the above.

OK. I missed one while cleaning up.

> > @@ -967,3 +971,6 @@ MODULE_PARM_DESC(vendor, "User specified
> >
> > module_param(product, ushort, 0);
> > MODULE_PARM_DESC(product, "User specified USB idProduct");
> > +
> > +module_param(connect_retries, int, KP_RETRIES);
> > +MODULE_PARM_DESC(product, "Maximum number of connect retries (100ms each)");
>
> Personally, I'm not keen on adding knobs.

As far as I can see, the alternatives are that either it does not work
without patches for scenarios where the ipaq is rebooted while connected
(like we do), since these need a large number of retries (up to 3500 in
our tests today, about 6 minutes), or the default value is much higher
than the current 100 (which gives a 10 seconds timeout).

I'm not sure what the best solution is.

Frank

>
> -- Pete

--
Frank Gevaerts [email protected]
fks bvba - Formal and Knowledge Systems http://www.fks.be/
Stationsstraat 108 Tel: ++32-(0)11-21 49 11
B-3570 ALKEN Fax: ++32-(0)11-22 04 19

Subject: Re: usb-serial ipaq kernel problem

On Tue, 30 May 2006 19:48:21 +0200
Frank Gevaerts <[email protected]> wrote:

| On Tue, May 30, 2006 at 11:53:29AM -0300, Luiz Fernando N. Capitulino wrote:
| > On Tue, 30 May 2006 11:38:01 -0300
| > "Luiz Fernando N. Capitulino" <[email protected]> wrote:
| >
| > If it ran _before_ the timeout expires with no timeout error it does not
| > depend. Then we can do the simpler solution: just kill the read urb in the
| > ipaq_open's error path.
|
| That seems to work.
| I also found that both the return in ipaq_write_bulk_callback and the
| flush_scheduled_work() in destroy_serial() are needed to get rid of the
| usb_serial_disconnect() bug.

Then did you hit it with my patch?

I'm just worried with the fact that you're hitting it with every
proposed fix. Maybe it's something else.

--
Luiz Fernando N. Capitulino

Subject: Re: usb-serial ipaq kernel problem

On Tue, 30 May 2006 11:33:27 -0700
Pete Zaitcev <[email protected]> wrote:

| > @@ -967,3 +971,6 @@ MODULE_PARM_DESC(vendor, "User specified
| >
| > module_param(product, ushort, 0);
| > MODULE_PARM_DESC(product, "User specified USB idProduct");
| > +
| > +module_param(connect_retries, int, KP_RETRIES);
| > +MODULE_PARM_DESC(product, "Maximum number of connect retries (100ms each)");
|
| Personally, I'm not keen on adding knobs.

We have a device-dependent parameter here, I can't think in anything
better..

--
Luiz Fernando N. Capitulino

2006-05-30 21:37:23

by Frank Gevaerts

[permalink] [raw]
Subject: Re: usb-serial ipaq kernel problem

On Tue, May 30, 2006 at 05:52:08PM -0300, Luiz Fernando N. Capitulino wrote:
> On Tue, 30 May 2006 19:48:21 +0200
> Frank Gevaerts <[email protected]> wrote:
>
> | On Tue, May 30, 2006 at 11:53:29AM -0300, Luiz Fernando N. Capitulino wrote:
> | > On Tue, 30 May 2006 11:38:01 -0300
> | > "Luiz Fernando N. Capitulino" <[email protected]> wrote:
> | >
> | > If it ran _before_ the timeout expires with no timeout error it does not
> | > depend. Then we can do the simpler solution: just kill the read urb in the
> | > ipaq_open's error path.
> |
> | That seems to work.
> | I also found that both the return in ipaq_write_bulk_callback and the
> | flush_scheduled_work() in destroy_serial() are needed to get rid of the
> | usb_serial_disconnect() bug.
>
> Then did you hit it with my patch?
>
> I'm just worried with the fact that you're hitting it with every
> proposed fix. Maybe it's something else.

I'm hitting it with either of the proposed fixes, but not when both are
applied.

Frank

>
> --
> Luiz Fernando N. Capitulino

--
Frank Gevaerts [email protected]
fks bvba - Formal and Knowledge Systems http://www.fks.be/
Stationsstraat 108 Tel: ++32-(0)11-21 49 11
B-3570 ALKEN Fax: ++32-(0)11-22 04 19

Subject: Re: usb-serial ipaq kernel problem

On Tue, 30 May 2006 23:36:35 +0200
Frank Gevaerts <[email protected]> wrote:

| On Tue, May 30, 2006 at 05:52:08PM -0300, Luiz Fernando N. Capitulino wrote:
| > On Tue, 30 May 2006 19:48:21 +0200
| > Frank Gevaerts <[email protected]> wrote:
| >
| > | On Tue, May 30, 2006 at 11:53:29AM -0300, Luiz Fernando N. Capitulino wrote:
| > | > On Tue, 30 May 2006 11:38:01 -0300
| > | > "Luiz Fernando N. Capitulino" <[email protected]> wrote:
| > | >
| > | > If it ran _before_ the timeout expires with no timeout error it does not
| > | > depend. Then we can do the simpler solution: just kill the read urb in the
| > | > ipaq_open's error path.
| > |
| > | That seems to work.
| > | I also found that both the return in ipaq_write_bulk_callback and the
| > | flush_scheduled_work() in destroy_serial() are needed to get rid of the
| > | usb_serial_disconnect() bug.
| >
| > Then did you hit it with my patch?
| >
| > I'm just worried with the fact that you're hitting it with every
| > proposed fix. Maybe it's something else.
|
| I'm hitting it with either of the proposed fixes, but not when both are
| applied.

Is this still true? :)

--
Luiz Fernando N. Capitulino

2006-05-31 21:25:01

by Frank Gevaerts

[permalink] [raw]
Subject: Re: usb-serial ipaq kernel problem

On Wed, May 31, 2006 at 06:10:42PM -0300, Luiz Fernando N. Capitulino wrote:
> On Tue, 30 May 2006 23:36:35 +0200
> Frank Gevaerts <[email protected]> wrote:
>
> | On Tue, May 30, 2006 at 05:52:08PM -0300, Luiz Fernando N. Capitulino wrote:
> | > On Tue, 30 May 2006 19:48:21 +0200
> | > Frank Gevaerts <[email protected]> wrote:
> | >
> | > | On Tue, May 30, 2006 at 11:53:29AM -0300, Luiz Fernando N. Capitulino wrote:
> | > | > On Tue, 30 May 2006 11:38:01 -0300
> | > | > "Luiz Fernando N. Capitulino" <[email protected]> wrote:
> | > | >
> | > | > If it ran _before_ the timeout expires with no timeout error it does not
> | > | > depend. Then we can do the simpler solution: just kill the read urb in the
> | > | > ipaq_open's error path.
> | > |
> | > | That seems to work.
> | > | I also found that both the return in ipaq_write_bulk_callback and the
> | > | flush_scheduled_work() in destroy_serial() are needed to get rid of the
> | > | usb_serial_disconnect() bug.
> | >
> | > Then did you hit it with my patch?
> | >
> | > I'm just worried with the fact that you're hitting it with every
> | > proposed fix. Maybe it's something else.
> |
> | I'm hitting it with either of the proposed fixes, but not when both are
> | applied.
>
> Is this still true? :)

Yes. It's still running, with about 2000 disconnects since the last
boot.

Frank

>
> --
> Luiz Fernando N. Capitulino

--
Frank Gevaerts [email protected]
fks bvba - Formal and Knowledge Systems http://www.fks.be/
Stationsstraat 108 Tel: ++32-(0)11-21 49 11
B-3570 ALKEN Fax: ++32-(0)11-22 04 19

2006-05-31 21:38:57

by Frank Gevaerts

[permalink] [raw]
Subject: Re: usb-serial ipaq kernel problem

On Tue, May 30, 2006 at 11:33:27AM -0700, Pete Zaitcev wrote:
>
> Please get rid of the above.
> > * shut down bulk read and write

OK, So here's the corrected patch:

Signed-off-by: Frank Gevaerts <[email protected]>


diff -pur linux-2.6.17-rc4/drivers/usb/serial/ipaq.c linux-2.6.17-rc4.test/drivers/usb/serial/ipaq.c
--- linux-2.6.17-rc4/drivers/usb/serial/ipaq.c 2006-03-20 06:53:29.000000000 +0100
+++ linux-2.6.17-rc4.test/drivers/usb/serial/ipaq.c 2006-05-30 20:46:23.000000000 +0200
@@ -71,6 +71,7 @@

static __u16 product, vendor;
static int debug;
+static int connect_retries;

/* Function prototypes for an ipaq */
static int ipaq_open (struct usb_serial_port *port, struct file *filp);
@@ -583,7 +584,7 @@ static int ipaq_open(struct usb_serial_p
struct ipaq_private *priv;
struct ipaq_packet *pkt;
int i, result = 0;
- int retries = KP_RETRIES;
+ int retries = connect_retries;

dbg("%s - port %d", __FUNCTION__, port->number);

@@ -681,6 +682,7 @@ enomem:
result = -ENOMEM;
err("%s - Out of memory", __FUNCTION__);
error:
+ usb_kill_urb(port->read_urb);
ipaq_destroy_lists(port);
kfree(priv);
return result;
@@ -855,6 +857,7 @@ static void ipaq_write_bulk_callback(str

if (urb->status) {
dbg("%s - nonzero write bulk status received: %d", __FUNCTION__, urb->status);
+ return;
}

spin_lock_irqsave(&write_list_lock, flags);
@@ -967,3 +970,6 @@ MODULE_PARM_DESC(vendor, "User specified

module_param(product, ushort, 0);
MODULE_PARM_DESC(product, "User specified USB idProduct");
+
+module_param(connect_retries, int, KP_RETRIES);
+MODULE_PARM_DESC(product, "Maximum number of connect retries (100ms each)");
diff -pur linux-2.6.17-rc4/drivers/usb/serial/usb-serial.c linux-2.6.17-rc4.test/drivers/usb/serial/usb-serial.c
--- linux-2.6.17-rc4/drivers/usb/serial/usb-serial.c 2006-05-30 19:01:16.000000000 +0200
+++ linux-2.6.17-rc4.test/drivers/usb/serial/usb-serial.c 2006-05-30 19:01:24.000000000 +0200
@@ -162,6 +162,8 @@ static void destroy_serial(struct kref *
}
}

+ flush_scheduled_work(); /* port->work */
+
usb_put_dev(serial->dev);

/* free up any memory that we allocated */



--
Frank Gevaerts [email protected]
fks bvba - Formal and Knowledge Systems http://www.fks.be/
Stationsstraat 108 Tel: ++32-(0)11-21 49 11
B-3570 ALKEN Fax: ++32-(0)11-22 04 19

2006-05-31 21:58:00

by Greg KH

[permalink] [raw]
Subject: Re: usb-serial ipaq kernel problem

On Wed, May 31, 2006 at 11:38:28PM +0200, Frank Gevaerts wrote:
> On Tue, May 30, 2006 at 11:33:27AM -0700, Pete Zaitcev wrote:
> >
> > Please get rid of the above.
> > > * shut down bulk read and write
>
> OK, So here's the corrected patch:
>
> Signed-off-by: Frank Gevaerts <[email protected]>

Care to send it with a proper changelog description? And not the
usb-serial.c fix as that's already in my tree.

thanks,

greg k-h

2006-05-31 22:43:11

by Frank Gevaerts

[permalink] [raw]
Subject: [PATCH] ipaq.c bugfixes

This patch fixes several problems in the ipaq.c driver with connecting
and disconnecting pocketpc devices:
* The read urb stayed active if the connect failed, causing nullpointer
dereferences later on.
* If a write failed, the driver continued as if nothing happened. Now it
handles that case the same way as other usb serial devices (fix by
"Luiz Fernando N. Capitulino" <[email protected]>)

The connect_retries parameter is added because if a pocketpc device is
connected while it is rebooting, it can take a long time after the USB
connect (sometimes several minutes) before it starts accepting the
control packet that starts the serial connection. Since this is not the
normal usecase, it is probably better to leave the default number of
retries as-is.

Signed-off-by: Frank Gevaerts <[email protected]>

diff -pur linux-2.6.17-rc4/drivers/usb/serial/ipaq.c linux-2.6.17-rc4.test/drivers/usb/serial/ipaq.c
--- linux-2.6.17-rc4/drivers/usb/serial/ipaq.c 2006-03-20 06:53:29.000000000 +0100
+++ linux-2.6.17-rc4.test/drivers/usb/serial/ipaq.c 2006-05-30 20:46:23.000000000 +0200
@@ -71,6 +71,7 @@

static __u16 product, vendor;
static int debug;
+static int connect_retries;

/* Function prototypes for an ipaq */
static int ipaq_open (struct usb_serial_port *port, struct file *filp);
@@ -583,7 +584,7 @@ static int ipaq_open(struct usb_serial_p
struct ipaq_private *priv;
struct ipaq_packet *pkt;
int i, result = 0;
- int retries = KP_RETRIES;
+ int retries = connect_retries;

dbg("%s - port %d", __FUNCTION__, port->number);

@@ -681,6 +682,7 @@ enomem:
result = -ENOMEM;
err("%s - Out of memory", __FUNCTION__);
error:
+ usb_kill_urb(port->read_urb);
ipaq_destroy_lists(port);
kfree(priv);
return result;
@@ -855,6 +857,7 @@ static void ipaq_write_bulk_callback(str

if (urb->status) {
dbg("%s - nonzero write bulk status received: %d", __FUNCTION__, urb->status);
+ return;
}

spin_lock_irqsave(&write_list_lock, flags);
@@ -967,3 +970,6 @@ MODULE_PARM_DESC(vendor, "User specified

module_param(product, ushort, 0);
MODULE_PARM_DESC(product, "User specified USB idProduct");
+
+module_param(connect_retries, int, KP_RETRIES);
+MODULE_PARM_DESC(product, "Maximum number of connect retries (100ms each)");

--
Frank Gevaerts [email protected]
fks bvba - Formal and Knowledge Systems http://www.fks.be/
Stationsstraat 108 Tel: ++32-(0)11-21 49 11
B-3570 ALKEN Fax: ++32-(0)11-22 04 19

2006-05-31 22:48:51

by Greg KH

[permalink] [raw]
Subject: Re: [PATCH] ipaq.c bugfixes

On Thu, Jun 01, 2006 at 12:42:45AM +0200, Frank Gevaerts wrote:
> This patch fixes several problems in the ipaq.c driver with connecting
> and disconnecting pocketpc devices:
> * The read urb stayed active if the connect failed, causing nullpointer
> dereferences later on.
> * If a write failed, the driver continued as if nothing happened. Now it
> handles that case the same way as other usb serial devices (fix by
> "Luiz Fernando N. Capitulino" <[email protected]>)
>
> The connect_retries parameter is added because if a pocketpc device is
> connected while it is rebooting, it can take a long time after the USB
> connect (sometimes several minutes) before it starts accepting the
> control packet that starts the serial connection. Since this is not the
> normal usecase, it is probably better to leave the default number of
> retries as-is.
>
> Signed-off-by: Frank Gevaerts <[email protected]>
>
> diff -pur linux-2.6.17-rc4/drivers/usb/serial/ipaq.c linux-2.6.17-rc4.test/drivers/usb/serial/ipaq.c
> --- linux-2.6.17-rc4/drivers/usb/serial/ipaq.c 2006-03-20 06:53:29.000000000 +0100
> +++ linux-2.6.17-rc4.test/drivers/usb/serial/ipaq.c 2006-05-30 20:46:23.000000000 +0200
> @@ -71,6 +71,7 @@
>
> static __u16 product, vendor;
> static int debug;
> +static int connect_retries;
>
> /* Function prototypes for an ipaq */
> static int ipaq_open (struct usb_serial_port *port, struct file *filp);
> @@ -583,7 +584,7 @@ static int ipaq_open(struct usb_serial_p
> struct ipaq_private *priv;
> struct ipaq_packet *pkt;
> int i, result = 0;
> - int retries = KP_RETRIES;
> + int retries = connect_retries;
>
> dbg("%s - port %d", __FUNCTION__, port->number);
>
> @@ -681,6 +682,7 @@ enomem:
> result = -ENOMEM;
> err("%s - Out of memory", __FUNCTION__);
> error:
> + usb_kill_urb(port->read_urb);
> ipaq_destroy_lists(port);
> kfree(priv);
> return result;
> @@ -855,6 +857,7 @@ static void ipaq_write_bulk_callback(str
>
> if (urb->status) {
> dbg("%s - nonzero write bulk status received: %d", __FUNCTION__, urb->status);
> + return;
> }
>
> spin_lock_irqsave(&write_list_lock, flags);
> @@ -967,3 +970,6 @@ MODULE_PARM_DESC(vendor, "User specified
>
> module_param(product, ushort, 0);
> MODULE_PARM_DESC(product, "User specified USB idProduct");
> +
> +module_param(connect_retries, int, KP_RETRIES);

I really do not think that you want KP_RETRIES as a mode value in sysfs
:)

This is not how you pre-initialize a module parameter...

thanks,

greg k-h